The Basics¶
Dummy dataset¶
Let's start first with a dummy dataset based on the UCI credit card dataset.
from skorecard.datasets import load_uci_credit_card
X, y = load_uci_credit_card(return_X_y=True)
X.head(4)
A basic bucketer¶
skorecard
offers a set of bucketers that have a scikit-learn compatible interface. By default they will bucket all variables into n_bins
buckets.
Some bucketers like OptimalBucketer and DecisionTreeBucketer are supervised and can use information from y
to find good buckets. You can control the numbers of buckets using max_n_bins
instead of n_bins
.
from skorecard.bucketers import DecisionTreeBucketer
bucketer = DecisionTreeBucketer(max_n_bins=10)
X_transformed = bucketer.fit_transform(X, y)
X_transformed.head(4)
X_transformed["BILL_AMT1"].value_counts().sort_index()
Bucketing specific variables¶
Instead of applying a bucketer on all features, you'll likely want to apply it only to specific features. You can use the variables
parameter for that:
bucketer = DecisionTreeBucketer(max_n_bins=10, variables=["BILL_AMT1"])
bucketer.fit_transform(X, y).head(4)
Inspecting bucketing results¶
skorecard
bucketers have some methods to help you inspect the result of the bucketing process:
from skorecard.bucketers import EqualWidthBucketer
bucketer = EqualWidthBucketer(n_bins=5, variables=["BILL_AMT1"])
bucketer.fit(X, y)
bucketer.bucket_table("BILL_AMT1")
bucketer.plot_bucket(
"BILL_AMT1", format="png", scale=2, width=1050, height=525
) # remove format argument for an interactive plotly plot.