Skip to content

Saving bucket information to a file

If you have a specific set of bucketing boundaries you are satisfied with, it's useful to save them to a file. You might want to save the bucketing information as configuration files along with your code.

All skorecard bucketers, the BucketingProcess and Skorecard model support saving to yaml files with save_yml().

The special UserInputBucketer can read in these configuration files and can be used in the final model pipeline.

Example with a bucketer

from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import DecisionTreeBucketer, UserInputBucketer

X, y = load_uci_credit_card(return_X_y=True)

bucketer = DecisionTreeBucketer(max_n_bins=10)
bucketer = bucketer.fit(X, y)
bucketer.save_yml("bucketer.yml")

uib = UserInputBucketer("bucketer.yml")
uib.transform(X).head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0 1 9 9
1 1 1 3 7
2 0 1 9 9
3 0 0 5 0

Example with BucketingProcess

A bucketing process works in exactly the same way. Because there is a prebucketing pipeline and a bucketing pipeline, skorecard makes sure that the buckets are the transformation from raw data to final bucket.

from skorecard.pipeline import BucketingProcess
from skorecard.bucketers import EqualFrequencyBucketer, OptimalBucketer, AsIsCategoricalBucketer
from sklearn.pipeline import make_pipeline

num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]

bucketing_process = BucketingProcess(
    prebucketing_pipeline=make_pipeline(
        DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05),
        AsIsCategoricalBucketer(variables=cat_cols),
    ),
    bucketing_pipeline=make_pipeline(
        OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
        OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
    ),
)

bucketing_process.fit(X, y)
bucketing_process.save_yml("bucket_process.yml")

uib = UserInputBucketer("bucket_process.yml")
uib.transform(X).head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0 0 8 5
1 2 0 3 4
2 0 0 8 5
3 0 1 4 0

Example with ScorecardPipelines

skorecard supports converting scikit-learn pipelines to a SkorecardPipeline using to_skorecard_pipeline. This will add support for .save_yml():

from sklearn.pipeline import make_pipeline
from skorecard.bucketers import EqualFrequencyBucketer
from skorecard.pipeline.pipeline import to_skorecard_pipeline

pipe = make_pipeline(
    EqualFrequencyBucketer(n_bins=10, variables=["BILL_AMT1"]),
    DecisionTreeBucketer(max_n_bins=5, variables=["LIMIT_BAL"]),
)
pipe.fit(X, y)
sk_pipe = to_skorecard_pipeline(pipe)
sk_pipe.save_yml("pipe.yml")

uib = UserInputBucketer("pipe.yml")
uib.transform(X).head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 1 2 4 9
1 2 2 2 7
2 1 2 4 9
3 1 1 3 1

Last update: 2023-08-08