Skip to content

Working with manually defined buckets

Often bucketing is tweaked manually to incorporate domain expertise. Skorecard offers good support for manually defining buckets.

From a bucketer

If you've used .fit_interactive() (see interactive bucketing), you can choose to explicitly use the updated bucket mapping in a UserInputBucketer:

from skorecard.datasets import load_uci_credit_card, load_credit_card
from skorecard.bucketers import DecisionTreeBucketer, UserInputBucketer

X, y = load_uci_credit_card(return_X_y=True)

bucketer = DecisionTreeBucketer(variables=["EDUCATION"])
bucketer.fit(X, y)  # can also be .fit_interactive()
bucketer.features_bucket_mapping_
FeaturesBucketMapping([BucketMapping(feature_name='EDUCATION', type='numerical', missing_bucket=None, other_bucket=None, map=[1.5, 2.5], right=False, specials={})])
uib = UserInputBucketer(bucketer.features_bucket_mapping_)
uib.transform(X).head(1)  # note uib does not require a .fit() step
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0 2 400000.0 201800.0

From a dictionary

You can manually define the buckets in a python dictionary. For every feature, the following keys must be present.

  • feature_name (mandatory): must match the column name in the dataframe
  • type (mandatory): type of feature (categorical or numerical)
  • map (mandatory): contains the actual mapping for the bins.
    • categorical features: expect a dictionary {value:bin_index}
    • numerical features: expect a list of boundaries [value, value]
  • right (optional, defaults to True): flag that indicates if to include the upper bound (True) or lower bound (False) in the bucket definition. Applicable only to numerical bucketers
  • specials (optional, defaults to {}): dictionary of special values that will be put in their own bucket.

Here's an example:

bucket_maps = {
    "EDUCATION": {
        "feature_name": "EDUCATION",
        "type": "categorical",
        "map": {2: 0, 1: 1, 3: 2},
        "right": True,
        "specials": {},
    },
    "LIMIT_BAL": {
        "feature_name": "LIMIT_BAL",
        "type": "numerical",
        "map": [25000.0, 55000.0, 105000.0, 225000.0, 275000.0, 325000.0],
        "right": True,
        "specials": {},
    },
    "BILL_AMT1": {
        "feature_name": "BILL_AMT1",
        "type": "numerical",
        "map": [800.0, 12500, 50000, 77800, 195000.0],
        "right": True,
        "specials": {},
    },
}

You can create a bucketer using the input dictionary using UserInputBucketer:

from skorecard.bucketers import UserInputBucketer

uib = UserInputBucketer(bucket_maps)

From a file

You can also work with manually defined buckets that have saved in a .yml file. See the how to on Read/write buckets to file.