Skip to content

Bucketing

The core of a good skorecard model is to bucket the features.

This section showcases how to use skorecard to achieve this.

Let's start by loading the demo data

from skorecard.datasets import load_credit_card

data = load_credit_card(as_frame=True)
data.head()
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 ... x15 x16 x17 x18 x19 x20 x21 x22 x23 y
0 20000.0 2.0 2.0 1.0 24.0 2.0 2.0 -1.0 -1.0 -2.0 ... 0.0 0.0 0.0 0.0 689.0 0.0 0.0 0.0 0.0 1
1 120000.0 2.0 2.0 2.0 26.0 -1.0 2.0 0.0 0.0 0.0 ... 3272.0 3455.0 3261.0 0.0 1000.0 1000.0 1000.0 0.0 2000.0 1
2 90000.0 2.0 2.0 2.0 34.0 0.0 0.0 0.0 0.0 0.0 ... 14331.0 14948.0 15549.0 1518.0 1500.0 1000.0 1000.0 1000.0 5000.0 0
3 50000.0 2.0 2.0 1.0 37.0 0.0 0.0 0.0 0.0 0.0 ... 28314.0 28959.0 29547.0 2000.0 2019.0 1200.0 1100.0 1069.0 1000.0 0
4 50000.0 1.0 2.0 1.0 57.0 -1.0 0.0 -1.0 0.0 0.0 ... 20940.0 19146.0 19131.0 2000.0 36681.0 10000.0 9000.0 689.0 679.0 0

5 rows × 24 columns

The dataset used contains 30K rows, 23 features and a binary target.

Let's start by splitting the data in train and test sample, as per common practice when building a model.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    data.drop(["y"], axis=1), data["y"], test_size=0.25, random_state=42
)

Define the numerical and categorical features

The features x2, x3 and x4 are of a categorical nature

cat_cols = ["x2", "x3", "x4"]
num_cols = [col for col in X_train.columns if col not in cat_cols]

print(f"Total categorical columns: {len(cat_cols)}")
print(f"Total numerical columns: {len(num_cols)}")
Total categorical columns: 3
Total numerical columns: 20

Bucketing features

Skorecard implements different bucketers, but they are not applicable to all the features.

  • Categorical features: In most of the cases, there is no numerical relationship between categories.
    Therefore automatic bucketing is very difficult to perform.
    skorecard implements the OrdinalCategoricalBucketer that orders the buckets either by the count or by the target rate.
    It includes a tolerance (tol) input, which represents the lower bound of the fraction of data allowed to keep a category in the same bucket.

  • Numerical features: differently from categorical features, algorithmic bucketing can be applied to numerical features. skorecard implements different bucketers that can be used for numerical features:

    • DecisionTreeBucketer: fits a univariate decision tree to find the optimal splits (requires the target y)
    • EqualWidthBucketer: generates buckets with equal spacing in the bucket boundaries (i.e. historgrams)
    • EqualFrequencyBucketer: generates buckets with equal counts in the buckets (i.e. quantiles)
    • AgglomerativeClusteringBucketer: generates bucketes by applying AgglomerativeClustering (density-based bucketer)

Manual correction

Note: to build a high quality scorecard, it's highly recommended to manually assess every bucket. Algorithms implemented in skorecard are very helpful, but are obscure to the business sense.
This is especially true for categorical features, where business sense should prevail.

Default bucketers

Categorical features

Due to (generally speaking) no relationship between categories, it's not possible to implement an algorithmic way of bucketing the values (in the same way as it is possible for numerical features).
The only suitable bucketer for categorical features in skorecard is the OrdinalCategoricalBucketer, which groups together low-frequency categorical variables (all variables with a frequency below the tol threshold are put in the other bucket)

Let's fix the tol to 5%, as this is the recommended minimum.

# Start with categorical features
from skorecard.bucketers import OrdinalCategoricalBucketer
cat_bucketer = OrdinalCategoricalBucketer(variables=cat_cols, tol=0.05, encoding_method="ordered")

cat_bucketer.fit(X_train, y_train)
OrdinalCategoricalBucketer(encoding_method='ordered',
                           variables=['x2', 'x3', 'x4'])

Example: Look up the feature 'x3'

The feature 'x3' has a few categories that are sparsely-populated

X_train["x3"].value_counts(normalize=True)
2.0    0.466311
1.0    0.354089
3.0    0.163911
5.0    0.009333
4.0    0.004089
6.0    0.001822
0.0    0.000444
Name: x3, dtype: float64

The OrdinalCategorical bucketer populates the other category with sparse values

cat_bucketer.bucket_table("x3")
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -2 Other 353.0 1.57 326.0 27.0 0.076487 1.239 0.016
1 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
2 0 3.0 3688.0 16.39 2755.0 933.0 0.252983 -0.168 0.005
3 1 1.0 7967.0 35.41 6445.0 1522.0 0.191038 0.193 0.012
4 2 2.0 10492.0 46.63 7965.0 2527.0 0.240850 -0.102 0.005

Note There might be a different strategy applied to different bucketers.
This is addressed by defining a pipeline of bucketers (see the numerical features section for details)

Numerical features

Numerical features allow for different bucketers (as described above).

However, the recommended approach for bucketing is to use either the DecisionTreeBucketer or the BucketingProcess.

X_train["x1"].value_counts()
50000.0      2510
20000.0      1470
30000.0      1218
80000.0      1173
200000.0     1151
             ... 
650000.0        1
1000000.0       1
730000.0        1
690000.0        1
327680.0        1
Name: x1, Length: 79, dtype: int64
from skorecard.bucketers import DecisionTreeBucketer

specials = {"x1": {"special_demo": [50000]}}

num_bucketer = DecisionTreeBucketer(
    variables=num_cols,
    max_n_bins=5,  # max number of bins allowed
    min_bin_size=0.06,  # min fraction of data allowed in the bin
    dt_kwargs={
        "criterion": "entropy",
        "min_impurity_decrease": 0.0005,  # as in sklearn. Helps to decide how to split the buckets
    },
    specials=specials,
)
num_bucketer.fit(X_train, y_train)
DecisionTreeBucketer(dt_kwargs={'criterion': 'entropy',
                                'min_impurity_decrease': 0.0005,
                                'random_state': None},
                     max_n_bins=5, min_bin_size=0.06,
                     specials={'x1': {'special_demo': [50000]}},
                     variables=['x1', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10',
                                'x11', 'x12', 'x13', 'x14', 'x15', 'x16', 'x17',
                                'x18', 'x19', 'x20', 'x21', 'x22', 'x23'])
num_bucketer.bucket_table("x1")
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -3 Special: special_demo 2510.0 11.16 1848.0 662.0 0.263745 -0.224 0.006
1 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
2 0 [-inf, 75000.0) 4399.0 19.55 2880.0 1519.0 0.345306 -0.611 0.085
3 1 [75000.0, 145000.0) 4567.0 20.30 3494.0 1073.0 0.234946 -0.070 0.001
4 2 [145000.0, 375000.0) 9230.0 41.02 7685.0 1545.0 0.167389 0.354 0.046
5 3 [375000.0, inf) 1794.0 7.97 1584.0 210.0 0.117057 0.770 0.037

See the bucket outputs (for the first three features)

from IPython.display import display

for x in num_cols[:3]:
    display(num_bucketer.fit(X_train, y_train).bucket_table(x))
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -3 Special: special_demo 2510.0 11.16 1848.0 662.0 0.263745 -0.224 0.006
1 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
2 0 [-inf, 75000.0) 4399.0 19.55 2880.0 1519.0 0.345306 -0.611 0.085
3 1 [75000.0, 145000.0) 4567.0 20.30 3494.0 1073.0 0.234946 -0.070 0.001
4 2 [145000.0, 375000.0) 9230.0 41.02 7685.0 1545.0 0.167389 0.354 0.046
5 3 [375000.0, inf) 1794.0 7.97 1584.0 210.0 0.117057 0.770 0.037
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
1 0 [-inf, 25.5) 2927.0 13.01 2143.0 784.0 0.267851 -0.245 0.008
2 1 [25.5, 35.5) 9664.0 42.95 7750.0 1914.0 0.198055 0.148 0.009
3 2 [35.5, inf) 9909.0 44.04 7598.0 2311.0 0.233222 -0.060 0.002
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
1 0 [-inf, -0.5) 6370.0 28.31 5361.0 1009.0 0.158399 0.420 0.044
2 1 [-0.5, 0.5) 11032.0 49.03 9620.0 1412.0 0.127991 0.668 0.179
3 2 [0.5, 1.5) 2720.0 12.09 1796.0 924.0 0.339706 -0.586 0.048
4 3 [1.5, inf) 2378.0 10.57 714.0 1664.0 0.699748 -2.096 0.611

With different bucketers for different features in one go

Note that below a warning is given to alert you that there are too many unique values in the numerical features. It is good to pay attention to these warnings, as the quantiles are approximate.

from sklearn.pipeline import make_pipeline
from skorecard.bucketers import EqualFrequencyBucketer, DecisionTreeBucketer

pipe = make_pipeline(
    EqualFrequencyBucketer(n_bins=5, variables=num_cols[:5]), DecisionTreeBucketer(max_n_bins=5, variables=num_cols[5:])
)
pipe.fit(X_train, y_train)
/Users/iv58uq/Documents/open_source/skorecard/skorecard/bucketers/bucketers.py:502: ApproximationWarning: Approximated quantiles - too many unique values
  warnings.warn(ApproximationWarning("Approximated quantiles - too many unique values"))
/Users/iv58uq/Documents/open_source/skorecard/skorecard/bucketers/bucketers.py:502: ApproximationWarning: Approximated quantiles - too many unique values
  warnings.warn(ApproximationWarning("Approximated quantiles - too many unique values"))
/Users/iv58uq/Documents/open_source/skorecard/skorecard/bucketers/bucketers.py:502: ApproximationWarning: Approximated quantiles - too many unique values
  warnings.warn(ApproximationWarning("Approximated quantiles - too many unique values"))

Pipeline(steps=[('equalfrequencybucketer',
                 EqualFrequencyBucketer(variables=['x1', 'x5', 'x6', 'x7',
                                                   'x8'])),
                ('decisiontreebucketer',
                 DecisionTreeBucketer(max_n_bins=5,
                                      variables=['x9', 'x10', 'x11', 'x12',
                                                 'x13', 'x14', 'x15', 'x16',
                                                 'x17', 'x18', 'x19', 'x20',
                                                 'x21', 'x22', 'x23']))])
pipe.transform(X_test)
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 ... x14 x15 x16 x17 x18 x19 x20 x21 x22 x23
2308 0 1.0 2.0 2.0 0 1 1 1 2 2 ... 2 1 3 3 1 2 1 1 1 1
22404 2 2.0 1.0 2.0 0 1 1 1 2 2 ... 4 4 4 4 2 2 2 3 2 2
23397 1 2.0 3.0 1.0 2 1 1 1 2 2 ... 3 4 4 4 1 2 2 2 1 2
25058 2 1.0 3.0 2.0 4 1 1 1 2 2 ... 2 2 2 1 1 2 3 1 3 2
2664 0 2.0 2.0 2.0 2 1 1 1 2 2 ... 3 2 3 3 1 1 1 2 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3211 0 2.0 3.0 1.0 4 2 0 0 0 0 ... 0 0 0 0 0 0 0 0 1 1
9355 4 2.0 2.0 1.0 3 1 1 1 2 2 ... 2 3 3 4 3 3 3 3 3 2
28201 4 2.0 3.0 2.0 4 2 2 2 1 2 ... 0 0 1 0 1 0 1 1 1 0
19705 1 2.0 2.0 1.0 4 0 0 0 1 1 ... 0 1 0 0 3 0 3 0 0 0
28313 2 2.0 3.0 1.0 2 0 0 0 1 1 ... 0 1 0 0 1 1 1 0 1 0

7500 rows × 23 columns

Parenthesis: compare the buckets from two different algorithms

By comparing the DecisionTreeBucketer in the first example, and the EqualFrequencyBucketer from the pipeline example, here comes a quick preview on assessing the two bucketing results.

The first case results in the higher IV, with less buckets, hence it's definetely a better result!

print(f"\nDecisionTreeBucketer, with IV: {num_bucketer.bucket_table(num_cols[0])['IV'].sum()}")
display(num_bucketer.bucket_table(num_cols[0]))

print(f"\nEqualFrequencyBucketer, with IV: {pipe.steps[0][1].bucket_table(num_cols[0])['IV'].sum()}")
display(pipe.steps[0][1].bucket_table(num_cols[0]))

DecisionTreeBucketer, with IV: 0.17500000000000002

bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -3 Special: special_demo 2510.0 11.16 1848.0 662.0 0.263745 -0.224 0.006
1 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
2 0 [-inf, 75000.0) 4399.0 19.55 2880.0 1519.0 0.345306 -0.611 0.085
3 1 [75000.0, 145000.0) 4567.0 20.30 3494.0 1073.0 0.234946 -0.070 0.001
4 2 [145000.0, 375000.0) 9230.0 41.02 7685.0 1545.0 0.167389 0.354 0.046
5 3 [375000.0, inf) 1794.0 7.97 1584.0 210.0 0.117057 0.770 0.037

EqualFrequencyBucketer, with IV: 0.159

bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
1 0 (-inf, 50000.0] 5741.0 25.52 3885.0 1856.0 0.323289 -0.512 0.076
2 1 (50000.0, 100000.0] 3613.0 16.06 2691.0 922.0 0.255190 -0.179 0.005
3 2 (100000.0, 180000.0] 4629.0 20.57 3711.0 918.0 0.198315 0.146 0.004
4 3 (180000.0, 270000.0] 4062.0 18.05 3362.0 700.0 0.172329 0.319 0.017
5 4 (270000.0, inf] 4455.0 19.80 3842.0 613.0 0.137598 0.585 0.057

Make a pipeline for all the features

So far we have shown how to deal with bucketers for categoricals/numericals.

The whole process can be put together as in a scikit-learn pipeline.

bucketing_pipe = make_pipeline(num_bucketer, cat_bucketer)
bucketing_pipe.fit(X_train, y_train)
Pipeline(steps=[('decisiontreebucketer',
                 DecisionTreeBucketer(dt_kwargs={'criterion': 'entropy',
                                                 'min_impurity_decrease': 0.0005,
                                                 'random_state': None},
                                      max_n_bins=5, min_bin_size=0.06,
                                      specials={'x1': {'special_demo': [50000]}},
                                      variables=['x1', 'x5', 'x6', 'x7', 'x8',
                                                 'x9', 'x10', 'x11', 'x12',
                                                 'x13', 'x14', 'x15', 'x16',
                                                 'x17', 'x18', 'x19', 'x20',
                                                 'x21', 'x22', 'x23'])),
                ('ordinalcategoricalbucketer',
                 OrdinalCategoricalBucketer(encoding_method='ordered',
                                            variables=['x2', 'x3', 'x4']))])

Save the bucketers to file

Once the buckets are satisfactory, save the ouputs to a yaml file

from skorecard.pipeline import to_skorecard_pipeline

to_skorecard_pipeline(bucketing_pipe).save_yml(open("buckets.yml", "w"))

Using the bucketing process

The most common approach in bucketing is to perform what is known as fine-coarse classing.

In less fancy words: - you start with very loose bucketing requirements (many buckets, where some minimal (hopefully significant) aggregations and statistics can be computed - this is known as fine classing - for numerical features, it starts by merging together adjacent buckets with similar default rate/WoE - for categorical features, one should merge together categories with similar default rate/WoE, but only when it makes sense - the last two steps (or merging together buckets) is known as coarse classing

In skorecard, this process is known as Bucketing Process, as shown below:

  1. The bucketing process starts by defining the loose (fine) buckets (prebucketing pipeline)

  2. It then runs an optimization algorithm, that merges the buckets together according to an optimization algorithm (bucketing pipeline)

from skorecard.bucketers import DecisionTreeBucketer, OrdinalCategoricalBucketer, OptimalBucketer
from skorecard.pipeline import BucketingProcess
from sklearn.pipeline import make_pipeline


specials = {"x1": {"special_demo": [50000]}}


prebucketing_pipeline = make_pipeline(
    DecisionTreeBucketer(variables=num_cols, max_n_bins=40, min_bin_size=0.03),  # loose requirements
    OrdinalCategoricalBucketer(variables=cat_cols, tol=0.02),
)

bucketing_pipeline = make_pipeline(
    OptimalBucketer(variables=num_cols, max_n_bins=6, min_bin_size=0.05),
    OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
)


bucketing_process = BucketingProcess(
    prebucketing_pipeline=prebucketing_pipeline,
    bucketing_pipeline=bucketing_pipeline,
    specials=specials,
)

bucketing_process = bucketing_process.fit(X_train, y_train)

Let's see the output of this optimization step

print(f"Prebucketing step (fine classing), IV {bucketing_process.prebucket_table('x1')['IV'].sum()}")
display(bucketing_process.prebucket_table("x1"))


print(f"\nBucketing step (coarse classing), IV {bucketing_process.bucket_table('x1')['IV'].sum()}")
display(bucketing_process.bucket_table("x1"))
Prebucketing step (fine classing), IV 0.192

pre-bucket label Count Count (%) Non-event Event Event Rate WoE IV bucket
0 -3 Special: special_demo 2510.0 11.16 1848.0 662.0 0.263745 -0.224 0.006 -3
1 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000 -1
2 0 [-inf, 25000.0) 1830.0 8.13 1153.0 677.0 0.369945 -0.718 0.050 0
3 1 [25000.0, 50000.0) 1401.0 6.23 884.0 517.0 0.369022 -0.714 0.038 0
4 2 [50000.0, 75000.0) 1168.0 5.19 843.0 325.0 0.278253 -0.297 0.005 1
5 3 [75000.0, 85000.0) 1173.0 5.21 907.0 266.0 0.226769 -0.024 0.000 2
6 4 [85000.0, 105000.0) 1272.0 5.65 941.0 331.0 0.260220 -0.206 0.003 2
7 5 [105000.0, 125000.0) 995.0 4.42 770.0 225.0 0.226131 -0.020 0.000 2
8 6 [125000.0, 145000.0) 1127.0 5.01 876.0 251.0 0.222715 -0.001 0.000 2
9 7 [145000.0, 155000.0) 821.0 3.65 683.0 138.0 0.168088 0.349 0.004 3
10 8 [155000.0, 175000.0) 937.0 4.16 768.0 169.0 0.180363 0.263 0.003 3
11 9 [175000.0, 185000.0) 749.0 3.33 614.0 135.0 0.180240 0.264 0.002 3
12 10 [185000.0, 205000.0) 1321.0 5.87 1070.0 251.0 0.190008 0.199 0.002 3
13 11 [205000.0, 225000.0) 877.0 3.90 729.0 148.0 0.168757 0.344 0.004 3
14 12 [225000.0, 245000.0) 1037.0 4.61 859.0 178.0 0.171649 0.323 0.004 3
15 13 [245000.0, 285000.0) 1193.0 5.30 1018.0 175.0 0.146689 0.510 0.012 4
16 14 [285000.0, 305000.0) 680.0 3.02 573.0 107.0 0.157353 0.427 0.005 4
17 15 [305000.0, 355000.0) 908.0 4.04 791.0 117.0 0.128855 0.660 0.014 4
18 16 [355000.0, 375000.0) 707.0 3.14 580.0 127.0 0.179632 0.268 0.002 4
19 17 [375000.0, 495000.0) 1098.0 4.88 965.0 133.0 0.121129 0.731 0.021 5
20 18 [495000.0, inf) 696.0 3.09 619.0 77.0 0.110632 0.833 0.017 5

Bucketing step (coarse classing), IV 0.186

bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -3 Special: special_demo 2510.0 11.16 1848.0 662.0 0.263745 -0.224 0.006
1 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
2 0 [-inf, 2.0) 3231.0 14.36 2037.0 1194.0 0.369545 -0.716 0.087
3 1 [2.0, 3.0) 1168.0 5.19 843.0 325.0 0.278253 -0.297 0.005
4 2 [3.0, 7.0) 4567.0 20.30 3494.0 1073.0 0.234946 -0.070 0.001
5 3 [7.0, 13.0) 5742.0 25.52 4723.0 1019.0 0.177464 0.283 0.019
6 4 [13.0, 17.0) 3488.0 15.50 2962.0 526.0 0.150803 0.478 0.031
7 5 [17.0, inf) 1794.0 7.97 1584.0 210.0 0.117057 0.770 0.037

Notice in the first table there is an additional column that is the optimized buckets (those indexes are aggregated in the second table).
Because of the additive nature of the IV, by reducing the number of buckets, the IV will normally decrease.
The goal is to reduce the number of buckets as much as possible, by keeping a high IV value, and check for monotonicity wherever needed.

y_train.value_counts(normalize=True)
0    0.777378
1    0.222622
Name: y, dtype: float64
bucketing_process.plot_prebucket("x1", format="png", scale=2, width=1050, height=525)
No description has been provided for this image
bucketing_process.plot_bucket("x1", format="png", scale=2, width=1050, height=525)
No description has been provided for this image
bucketing_process.summary()
column num_prebuckets num_buckets IV_score dtype
0 x1 21 8 0.002778 float64
1 x2 4 4 0.001921 float64
2 x3 5 5 0.002564 float64
3 x4 4 4 0.001909 float64
4 x5 24 3 0.001914 float64
5 x6 6 5 0.002351 float64
6 x7 5 3 0.002112 float64
7 x8 5 4 0.002129 float64
8 x9 5 4 0.002809 float64
9 x10 5 4 0.001997 float64
10 x11 5 4 0.002106 float64
11 x12 26 7 0.002408 float64
12 x13 27 6 0.002798 float64
13 x14 25 5 0.002571 float64
14 x15 22 6 0.002321 float64
15 x16 24 4 0.002236 float64
16 x17 25 4 0.002109 float64
17 x18 21 7 0.003035 float64
18 x19 21 7 0.003631 float64
19 x20 21 7 0.001999 float64
20 x21 22 7 0.002747 float64
21 x22 19 7 0.002383 float64
22 x23 20 7 0.002647 float64
bucketing_process.save_yml("best_bucketing.yml")

Manual bucket refinement

Besides manually inspecting, it's often necessary to manually refine the buckets. skorecard implements a handy dash web-app that allows the user to redefine the bucket allocation.

bucketing_process.fit(X_train, y_train)
BucketingProcess(bucketing_pipeline=Pipeline(steps=[('optimalbucketer-1',
                                                     OptimalBucketer(max_n_bins=6,
                                                                     variables=['x1',
                                                                                'x5',
                                                                                'x6',
                                                                                'x7',
                                                                                'x8',
                                                                                'x9',
                                                                                'x10',
                                                                                'x11',
                                                                                'x12',
                                                                                'x13',
                                                                                'x14',
                                                                                'x15',
                                                                                'x16',
                                                                                'x17',
                                                                                'x18',
                                                                                'x19',
                                                                                'x20',
                                                                                'x21',
                                                                                'x22',
                                                                                'x23'])),
                                                    ('optimalbucketer-2',
                                                     OptimalBucketer(variables=['x2',
                                                                                'x3',
                                                                                'x4'],
                                                                     variables_type='categorical'))]),
                 prebucketing_pipeline=P...steps=[('decisiontreebucketer',
                                                        DecisionTreeBucketer(max_n_bins=40,
                                                                             min_bin_size=0.03,
                                                                             variables=['x1',
                                                                                        'x5',
                                                                                        'x6',
                                                                                        'x7',
                                                                                        'x8',
                                                                                        'x9',
                                                                                        'x10',
                                                                                        'x11',
                                                                                        'x12',
                                                                                        'x13',
                                                                                        'x14',
                                                                                        'x15',
                                                                                        'x16',
                                                                                        'x17',
                                                                                        'x18',
                                                                                        'x19',
                                                                                        'x20',
                                                                                        'x21',
                                                                                        'x22',
                                                                                        'x23'])),
                                                       ('ordinalcategoricalbucketer',
                                                        OrdinalCategoricalBucketer(tol=0.02,
                                                                                   variables=['x2',
                                                                                              'x3',
                                                                                              'x4']))]),
                 specials={'x1': {'special_demo': [50000]}})

Up next

How to perform feature selection in a skorecard model