Skip to content

Mixed usage with other packages

There are quite some excellent packages out there offering functionality around bucketing/binning/discretizing numerical variables and encoding categorical variables. Chances are you'd like to combine them in your skorecard pipelines.

Here are some packages are are compatible with pandas dataframes:

%%capture
!pip install category_encoders
%%capture
from sklearn.pipeline import make_pipeline
from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import OrdinalCategoricalBucketer

X, y = load_uci_credit_card(return_X_y=True)

from category_encoders import TargetEncoder

pipe = make_pipeline(
    TargetEncoder(cols=["EDUCATION"]),  #  category_encoders.TargetEncoder passes through other columns
    OrdinalCategoricalBucketer(variables=["MARRIAGE"]),
)
pipe.fit(X, y)
pipe.transform(X).head(5)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0.0 2.0 400000.0 201800.0
1 1.0 2.0 80000.0 80610.0
2 0.0 2.0 500000.0 499452.0
3 0.0 1.0 140000.0 450.0
4 1.0 1.0 420000.0 56107.0

Some packages do not return pandas DataFrames, like:

You can wrap the class in skorecard.pipeline.KeepPandas to use these transformers in a pipeline:

from sklearn.preprocessing import KBinsDiscretizer
from skorecard.pipeline import KeepPandas
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer(
    [("binner", KBinsDiscretizer(n_bins=3, encode="ordinal", strategy="uniform"), ["EDUCATION"])],
    remainder="passthrough",
)
pipe = make_pipeline(KeepPandas(ct), OrdinalCategoricalBucketer(variables=["MARRIAGE"]))
pipe.fit_transform(X, y).head(5)
WARNING:root:sklearn.compose.ColumnTransformer can change the order of columns, be very careful when using with KeepPandas()

EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0.0 2.0 400000.0 201800.0
1 1.0 2.0 80000.0 80610.0
2 0.0 2.0 500000.0 499452.0
3 0.0 1.0 140000.0 450.0
4 1.0 1.0 420000.0 56107.0