Interactive bucketing¶

You might want to manually edit the bucketing boundaries, for example to incorporate specific domain knowledge. You can manually define buckets, but you could also use to interactive explore and update the buckets. All skorecard.bucketers have a method called .fit_interactive(), which will call .fit() if the bucketer is not yet fitted, and then launch a dash webapp.

Make sure to have the up to date dash dependencies by running pip install --upgrade skorecard[dashboard].

from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import DecisionTreeBucketer

X, y = load_uci_credit_card(return_X_y=True)
bucketer = DecisionTreeBucketer(max_n_bins=10)
# bucketer.fit_interactive(X, y) # not run

This should look like:

dash app example

This also works for categorical features:

from skorecard.bucketers import OrdinalCategoricalBucketer
import random

pets = ["no pets"] * 3000 + ["cat lover"] * 1500 + ["dog lover"] * 1000 + ["rabbit"] * 498 + ["gold fish"] * 2
random.Random(42).shuffle(pets)
X["pet_ownership"] = pets

bucketer = OrdinalCategoricalBucketer(variables=["pet_ownership"])
# bucketer.fit_interactive(X, y) # not run

Dash app running on http://127.0.0.1:8050/

Which should look like:

cat bucketer interactive

Pipelines¶

You can also run .fit_interactive() on a pipeline of bucketers. You'll need to convert to a SkorecardPipeline in order to have access to the method:

from skorecard.bucketers import OrdinalCategoricalBucketer
from skorecard.pipeline import to_skorecard_pipeline
from sklearn.pipeline import make_pipeline

pipe = make_pipeline(
    OrdinalCategoricalBucketer(variables=["EDUCATION", "MARRIAGE"]),
    DecisionTreeBucketer(max_n_bins=10, variables=["LIMIT_BAL", "BILL_AMT1"]),
)

# Make this a skorecard pipeline, which adds some convenience methods
pipe = to_skorecard_pipeline(pipe)

# pipe.fit_interactive(X, y) # not run

Dash app running on http://127.0.0.1:8050/

BucketingProcess and Skorecard models¶

Interactively setting pre-bucketing and bucketing per column is also possible on BucketingProcess and Skorecard models

from skorecard import Skorecard
from skorecard.datasets import load_uci_credit_card

model = Skorecard(variables=["EDUCATION", "MARRIAGE", "LIMIT_BAL", "BILL_AMT1"])

# model.fit_interactive(X, y) # not run

Dash app running on http://127.0.0.1:8050/