Using the BucketingProcess¶

The BucketingProcess enables a two-step bucketing approach, where a feature is first pre-bucketed to e.g. 100 pre-buckets, and then bucketed.

This is a common practice - it reduces the complexity of finding exact boundaries to the problem of finding which of 100 buckets to merge together.

Define the BucketingProcess¶

The bucketing process incorporates a pre-bucketing pipeline and a bucketing pipeline. You can also pass specials or variables and BucketingProcess will pass those settings on to the bucketers in the pipelines.

In the example below, we prebucket numerical features to max 100 bins, and prebucket categorical columns as-is (each unique value is a category and new categories end up in the other bucket).

from skorecard import datasets
from skorecard.bucketers import DecisionTreeBucketer, OptimalBucketer, AsIsCategoricalBucketer
from skorecard.pipeline import BucketingProcess

from sklearn.pipeline import make_pipeline

df = datasets.load_uci_credit_card(as_frame=True)
y = df["default"]
X = df.drop(columns=["default"])

num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]
specials = {"EDUCATION": {"Is 1": [1]}}

bucketing_process = BucketingProcess(
    prebucketing_pipeline=make_pipeline(
        DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05),
        AsIsCategoricalBucketer(variables=cat_cols),
    ),
    bucketing_pipeline=make_pipeline(
        OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
        OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
    ),
    specials=specials,
)

bucketing_process.fit_transform(X, y).head()

	EDUCATION	MARRIAGE	LIMIT_BAL	BILL_AMT1
0	-3	0	8	5
1	1	0	3	4
2	-3	0	8	5
3	-3	1	4	0
4	1	1	8	3

Methods and Attributes¶

A BucketingProcess instance has all the similar methods & attributes of a bucketer:

.summary()
.bucket_table(column)
.plot_bucket(column)
.features_bucket_mapping
.save_to_yaml()
.fit_interactive()

but also adds a few unique ones:

.prebucket_table(column)
.plot_prebucket(column)

bucketing_process.summary()

	column	num_prebuckets	num_buckets	IV_score	dtype
0	EDUCATION	9	5	0.036308	int64
1	MARRIAGE	6	4	0.013054	int64
2	LIMIT_BAL	14	10	0.168862	float64
3	BILL_AMT1	15	7	0.005823	float64

bucketing_process.prebucket_table("MARRIAGE")

	pre-bucket	label	Count	Count (%)	Non-event	Event	Event Rate	WoE	IV	bucket
0	-2	Other	0.0	0.00	0.0	0.0	NaN	0.000	0.000	-2
1	-1	Missing	0.0	0.00	0.0	0.0	NaN	0.000	0.000	-1
2	0	2	3138.0	52.30	2493.0	645.0	0.205545	0.110	0.006	0
3	1	1	2784.0	46.40	2108.0	676.0	0.242816	-0.104	0.005	1
4	2	3	64.0	1.07	42.0	22.0	0.343750	-0.594	0.004	1
5	3	0	14.0	0.23	12.0	2.0	0.142857	0.547	0.001	0

bucketing_process.bucket_table("MARRIAGE")

	bucket	label	Count	Count (%)	Non-event	Event	Event Rate	WoE	IV
0	-2	Other	0.0	0.00	0.0	0.0	NaN	0.000	0.000
1	-1	Missing	0.0	0.00	0.0	0.0	NaN	0.000	0.000
2	0	0, 3	3152.0	52.53	2505.0	647.0	0.205266	0.112	0.006
3	1	1, 2	2848.0	47.47	2150.0	698.0	0.245084	-0.117	0.007

bucketing_process.plot_prebucket("LIMIT_BAL", format="png", scale=2, width=1050, height=525)

No description has been provided for this image

The `.features_bucket_mapping` attribute¶

All skorecard bucketing classes have a .features_bucket_mapping attribute to access the stored bucketing information to go from an input feature to a bucketed feature. In the case of BucketingProcess, because there is a prebucketing and bucketing step, this means the bucket mapping reflects the net effect of merging both steps into one. This is demonstrated below:

bucketing_process.pre_pipeline_.features_bucket_mapping_.get("MARRIAGE").labels

{3: '0', 1: '1', 0: '2', 2: '3', -1: 'Missing', -2: 'Other'}

bucketing_process.pipeline_.features_bucket_mapping_.get("EDUCATION")

BucketMapping(feature_name='EDUCATION', type='categorical', missing_bucket=None, other_bucket=None, map={1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 0: 1}, right=False, specials={'Is 1': [-3]})

bucketing_process.features_bucket_mapping_.get("EDUCATION")

BucketMapping(feature_name='EDUCATION', type='categorical', missing_bucket=None, other_bucket=None, map={0: 0, 3: 0, 4: 0, 5: 0, 6: 0, 2: 1}, right=True, specials={'Is 1': [1]})

The `.fit_interactive()` method¶

All skorecard bucketing classes have a .fit_interactive() method. In the case of BucketingProcess this will launch a slightly different app that shows the pre-buckets and the buckets, and allows you to edit the prebucketing as well.

# bucketing_process.fit_interactive(X, y) # not run

Using the BucketingProcess¶

Define the BucketingProcess¶

Methods and Attributes¶

The .features_bucket_mapping attribute¶

The .fit_interactive() method¶

The `.features_bucket_mapping` attribute¶

The `.fit_interactive()` method¶