Working with manually defined buckets¶
Often bucketing is tweaked manually to incorporate domain expertise. Skorecard offers good support for manually defining buckets.
From a bucketer¶
If you've used .fit_interactive()
(see interactive bucketing), you can choose to explicitly use the updated bucket mapping in a UserInputBucketer
:
from skorecard.datasets import load_uci_credit_card, load_credit_card
from skorecard.bucketers import DecisionTreeBucketer, UserInputBucketer
X, y = load_uci_credit_card(return_X_y=True)
bucketer = DecisionTreeBucketer(variables=["EDUCATION"])
bucketer.fit(X, y) # can also be .fit_interactive()
bucketer.features_bucket_mapping_
uib = UserInputBucketer(bucketer.features_bucket_mapping_)
uib.transform(X).head(1) # note uib does not require a .fit() step
From a dictionary¶
You can manually define the buckets in a python dictionary. For every feature, the following keys must be present.
feature_name
(mandatory): must match the column name in the dataframetype
(mandatory): type of feature (categorical or numerical)map
(mandatory): contains the actual mapping for the bins.- categorical features: expect a dictionary
{value:bin_index}
- numerical features: expect a list of boundaries
[value, value]
- categorical features: expect a dictionary
right
(optional, defaults toTrue
): flag that indicates if to include the upper bound (True) or lower bound (False) in the bucket definition. Applicable only to numerical bucketersspecials
(optional, defaults to{}
): dictionary of special values that will be put in their own bucket.
Here's an example:
bucket_maps = {
"EDUCATION": {
"feature_name": "EDUCATION",
"type": "categorical",
"map": {2: 0, 1: 1, 3: 2},
"right": True,
"specials": {},
},
"LIMIT_BAL": {
"feature_name": "LIMIT_BAL",
"type": "numerical",
"map": [25000.0, 55000.0, 105000.0, 225000.0, 275000.0, 325000.0],
"right": True,
"specials": {},
},
"BILL_AMT1": {
"feature_name": "BILL_AMT1",
"type": "numerical",
"map": [800.0, 12500, 50000, 77800, 195000.0],
"right": True,
"specials": {},
},
}
You can create a bucketer using the input dictionary using UserInputBucketer
:
from skorecard.bucketers import UserInputBucketer
uib = UserInputBucketer(bucket_maps)
From a file¶
You can also work with manually defined buckets that have saved in a .yml
file. See the how to on Read/write buckets to file.