Skip to content

Working with manually defined buckets

Often bucketing is tweaked manually to incorporate domain expertise. Skorecard offers good support for manually defining buckets.

From a bucketer

If you've used .fit_interactive() (see interactive bucketing), you can choose to explicitly use the updated bucket mapping in a UserInputBucketer:

from skorecard.datasets import load_uci_credit_card, load_credit_card
from skorecard.bucketers import DecisionTreeBucketer, UserInputBucketer
X, y = load_uci_credit_card(return_X_y=True)

bucketer = DecisionTreeBucketer(variables=['EDUCATION'])
bucketer.fit(X, y) # can also be .fit_interactive()
bucketer.features_bucket_mapping_
FeaturesBucketMapping([BucketMapping(feature_name='EDUCATION', type='numerical', missing_bucket=None, other_bucket=None, map=[1.5, 2.5], right=False, specials={})])
uib = UserInputBucketer(bucketer.features_bucket_mapping_)
uib.transform(X).head(1) # note uib does not require a .fit() step
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0 2 400000.0 201800.0

From a dictionary

You can manually define the buckets in a python dictionary. For every feature, the following keys must be present.

  • feature_name (mandatory): must match the column name in the dataframe
  • type (mandatory): type of feature (categorical or numerical)
  • map (mandatory): contains the actual mapping for the bins.
    • categorical features: expect a dictionary {value:bin_index}
    • numerical features: expect a list of boundaries [value, value]
  • right (optional, defaults to True): flag that indicates if to include the upper bound (True) or lower bound (False) in the bucket definition. Applicable only to numerical bucketers
  • specials (optional, defaults to {}): dictionary of special values that will be put in their own bucket.

Here's an example:

bucket_maps = {
    'EDUCATION':{
        "feature_name": 'EDUCATION', 
        "type": 'categorical', 
        "map": {2: 0, 1: 1, 3: 2}, 
        "right": True, 
        "specials": {}
    },
    'LIMIT_BAL':{
        "feature_name": 'LIMIT_BAL', 
        "type": 'numerical', 
        "map": [ 25000.,  55000.,  105000., 225000., 275000., 325000.], 
        "right": True, 
        "specials": {}
    },
    'BILL_AMT1':{
        "feature_name": 'BILL_AMT1', 
        "type": 'numerical', 
        "map": [  800. ,  12500 ,   50000,    77800, 195000. ],
        "right": True, 
        "specials": {}
    }
}

You can create a bucketer using the input dictionary using UserInputBucketer:

from skorecard.bucketers import UserInputBucketer
uib = UserInputBucketer(bucket_maps)

From a file

You can also work with manually defined buckets that have saved in a .yml file. See the how to on Read/write buckets to file.


Last update: 2021-11-24
Back to top