Mixed usage with other packages¶

There are quite some excellent packages out there offering functionality around bucketing/binning/discretizing numerical variables and encoding categorical variables. Chances are you'd like to combine them in your skorecard pipelines.

Here are some packages are are compatible with pandas dataframes:

%%capture
!pip install category_encoders

%%capture
from sklearn.pipeline import make_pipeline
from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import OrdinalCategoricalBucketer

X, y = load_uci_credit_card(return_X_y=True)

from category_encoders import TargetEncoder

pipe = make_pipeline(
    TargetEncoder(cols=["EDUCATION"]),  #  category_encoders.TargetEncoder passes through other columns
    OrdinalCategoricalBucketer(variables=["MARRIAGE"]),
)
pipe.fit(X, y)

pipe.transform(X).head(5)

	EDUCATION	MARRIAGE	LIMIT_BAL	BILL_AMT1
0	0.0	2.0	400000.0	201800.0
1	1.0	2.0	80000.0	80610.0
2	0.0	2.0	500000.0	499452.0
3	0.0	1.0	140000.0	450.0
4	1.0	1.0	420000.0	56107.0

Some packages do not return pandas DataFrames, like:

sklearn.preprocessing.KBinsDiscretizer

You can wrap the class in skorecard.pipeline.KeepPandas to use these transformers in a pipeline:

from sklearn.preprocessing import KBinsDiscretizer
from skorecard.pipeline import KeepPandas
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer(
    [("binner", KBinsDiscretizer(n_bins=3, encode="ordinal", strategy="uniform"), ["EDUCATION"])],
    remainder="passthrough",
)
pipe = make_pipeline(KeepPandas(ct), OrdinalCategoricalBucketer(variables=["MARRIAGE"]))
pipe.fit_transform(X, y).head(5)

WARNING:root:sklearn.compose.ColumnTransformer can change the order of columns, be very careful when using with KeepPandas()

	EDUCATION	MARRIAGE	LIMIT_BAL	BILL_AMT1
0	0.0	2.0	400000.0	201800.0
1	1.0	2.0	80000.0	80610.0
2	0.0	2.0	500000.0	499452.0
3	0.0	1.0	140000.0	450.0
4	1.0	1.0	420000.0	56107.0