Using the BucketingProcess¶
The BucketingProcess
enables a two-step bucketing approach, where a feature is first pre-bucketed to e.g. 100 pre-buckets, and then bucketed.
This is a common practice - it reduces the complexity of finding exact boundaries to the problem of finding which of 100 buckets to merge together.
Define the BucketingProcess¶
The bucketing process incorporates a pre-bucketing pipeline and a bucketing pipeline. You can also pass specials
or variables
and BucketingProcess
will pass those settings on to the bucketers in the pipelines.
In the example below, we prebucket numerical features to max 100 bins, and prebucket categorical columns as-is (each unique value is a category and new categories end up in the other bucket).
from skorecard import datasets
from skorecard.bucketers import DecisionTreeBucketer, OptimalBucketer, AsIsCategoricalBucketer
from skorecard.pipeline import BucketingProcess
from sklearn.pipeline import make_pipeline
df = datasets.load_uci_credit_card(as_frame=True)
y = df["default"]
X = df.drop(columns=["default"])
num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]
specials = {"EDUCATION": {"Is 1": [1]}}
bucketing_process = BucketingProcess(
prebucketing_pipeline=make_pipeline(
DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05),
AsIsCategoricalBucketer(variables=cat_cols),
),
bucketing_pipeline=make_pipeline(
OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
),
specials=specials,
)
bucketing_process.fit_transform(X, y).head()
Methods and Attributes¶
A BucketingProcess
instance has all the similar methods & attributes of a bucketer:
.summary()
.bucket_table(column)
.plot_bucket(column)
.features_bucket_mapping
.save_to_yaml()
.fit_interactive()
but also adds a few unique ones:
.prebucket_table(column)
.plot_prebucket(column)
bucketing_process.summary()
bucketing_process.prebucket_table("MARRIAGE")
bucketing_process.bucket_table("MARRIAGE")
bucketing_process.plot_prebucket("LIMIT_BAL", format="png", scale=2, width=1050, height=525)
The .features_bucket_mapping
attribute¶
All skorecard bucketing classes have a .features_bucket_mapping
attribute to access the stored bucketing information to go from an input feature to a bucketed feature. In the case of BucketingProcess
, because there is a prebucketing and bucketing step, this means the bucket mapping reflects the net effect of merging both steps into one. This is demonstrated below:
bucketing_process.pre_pipeline_.features_bucket_mapping_.get("MARRIAGE").labels
bucketing_process.pipeline_.features_bucket_mapping_.get("EDUCATION")
bucketing_process.features_bucket_mapping_.get("EDUCATION")
The .fit_interactive()
method¶
All skorecard bucketing classes have a .fit_interactive()
method. In the case of BucketingProcess
this will launch a slightly different app that shows the pre-buckets and the buckets, and allows you to edit the prebucketing as well.
# bucketing_process.fit_interactive(X, y) # not run