Skip to content

The Basics

Dummy dataset

Let's start first with a dummy dataset based on the UCI credit card dataset.

from skorecard.datasets import load_uci_credit_card

X, y = load_uci_credit_card(return_X_y=True)
X.head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 1 2 400000.0 201800.0
1 2 2 80000.0 80610.0
2 1 2 500000.0 499452.0
3 1 1 140000.0 450.0

A basic bucketer

skorecard offers a set of bucketers that have a scikit-learn compatible interface. By default they will bucket all variables into n_bins buckets.

Some bucketers like OptimalBucketer and DecisionTreeBucketer are supervised and can use information from y to find good buckets. You can control the numbers of buckets using max_n_bins instead of n_bins.

from skorecard.bucketers import DecisionTreeBucketer

bucketer = DecisionTreeBucketer(max_n_bins=10)
X_transformed = bucketer.fit_transform(X, y)
X_transformed.head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0 1 9 9
1 1 1 3 7
2 0 1 9 9
3 0 0 5 0
X_transformed['BILL_AMT1'].value_counts().sort_index()
0    1343
1     404
2     574
3     462
4     400
5     359
6     857
7     789
8     500
9     312
Name: BILL_AMT1, dtype: int64

Bucketing specific variables

Instead of applying a bucketer on all features, you'll likely want to apply it only to specific features. You can use the variables parameter for that:

bucketer = DecisionTreeBucketer(max_n_bins=10, variables=["BILL_AMT1"])
bucketer.fit_transform(X, y).head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 1 2 400000.0 9
1 2 2 80000.0 7
2 1 2 500000.0 9
3 1 1 140000.0 0

Inspecting bucketing results

skorecard bucketers have some methods to help you inspect the result of the bucketing process:

from skorecard.bucketers import EqualWidthBucketer
bucketer = EqualWidthBucketer(n_bins=5, variables=["BILL_AMT1"])
bucketer.fit(X, y)
bucketer.bucket_table('BILL_AMT1')
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
1 0 (-inf, -10319.399999999994] 3.0 0.05 3.0 0.0 0.000000 4.181 -0.003
2 1 (-10319.399999999994, 144941.2] 5408.0 90.13 4188.0 1220.0 0.225592 -0.008 -0.000
3 2 (144941.2, 300201.80000000005] 490.0 8.17 395.0 95.0 0.193878 0.183 -0.003
4 3 (300201.80000000005, 455462.4] 75.0 1.25 55.0 20.0 0.266667 -0.230 -0.001
5 4 (455462.4, inf] 24.0 0.40 14.0 10.0 0.416667 -0.903 -0.004
bucketer.plot_bucket('BILL_AMT1', format="png", scale=2, width=1050, height=525) # remove format argument for an interactive plotly plot.

Last update: 2021-11-24
Back to top