Saving bucket information to a file¶
If you have a specific set of bucketing boundaries you are satisfied with, it's useful to save them to a file. You might want to save the bucketing information as configuration files along with your code.
All skorecard
bucketers, the BucketingProcess and Skorecard
model support saving to yaml files with save_yml()
.
The special UserInputBucketer
can read in these configuration files and can be used in the final model pipeline.
Example with a bucketer¶
from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import DecisionTreeBucketer, UserInputBucketer
X, y = load_uci_credit_card(return_X_y=True)
bucketer = DecisionTreeBucketer(max_n_bins=10)
bucketer = bucketer.fit(X, y)
bucketer.save_yml("bucketer.yml")
uib = UserInputBucketer("bucketer.yml")
uib.transform(X).head(4)
Example with BucketingProcess¶
A bucketing process works in exactly the same way. Because there is a prebucketing pipeline and a bucketing pipeline, skorecard
makes sure that the buckets are the transformation from raw data to final bucket.
from skorecard.pipeline import BucketingProcess
from skorecard.bucketers import EqualFrequencyBucketer, OptimalBucketer, AsIsCategoricalBucketer
from sklearn.pipeline import make_pipeline
num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]
bucketing_process = BucketingProcess(
prebucketing_pipeline=make_pipeline(
DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05),
AsIsCategoricalBucketer(variables=cat_cols),
),
bucketing_pipeline=make_pipeline(
OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
),
)
bucketing_process.fit(X, y)
bucketing_process.save_yml("bucket_process.yml")
uib = UserInputBucketer("bucket_process.yml")
uib.transform(X).head(4)
Example with ScorecardPipelines¶
skorecard
supports converting scikit-learn
pipelines to a SkorecardPipeline
using to_skorecard_pipeline
. This will add support for .save_yml()
:
from sklearn.pipeline import make_pipeline
from skorecard.bucketers import EqualFrequencyBucketer
from skorecard.pipeline.pipeline import to_skorecard_pipeline
pipe = make_pipeline(
EqualFrequencyBucketer(n_bins=10, variables=["BILL_AMT1"]),
DecisionTreeBucketer(max_n_bins=5, variables=["LIMIT_BAL"]),
)
pipe.fit(X, y)
sk_pipe = to_skorecard_pipeline(pipe)
sk_pipe.save_yml("pipe.yml")
uib = UserInputBucketer("pipe.yml")
uib.transform(X).head(4)