UserInputBucketer
The UserInputBucketer
transformer creates buckets by implementing user-defined boundaries.
Support:
This is a special bucketer that is not fitted but rather relies on pre-defined user input. The most common use-case is loading bucket mapping information previously fitted by other bucketers.
Examples:
from skorecard import datasets
from skorecard.bucketers import AgglomerativeClusteringBucketer, UserInputBucketer
X, y = datasets.load_uci_credit_card(return_X_y=True)
ac_bucketer = AgglomerativeClusteringBucketer(n_bins=3, variables=['LIMIT_BAL'])
ac_bucketer.fit(X)
mapping = ac_bucketer.features_bucket_mapping_
ui_bucketer = UserInputBucketer(mapping)
new_X = ui_bucketer.fit_transform(X)
assert len(new_X['LIMIT_BAL'].unique()) == 3
#Map some values to the special buckets
specials = {
"LIMIT_BAL":{
"=50000":[50000],
"in [20001,30000]":[20000,30000],
}
}
ac_bucketer = AgglomerativeClusteringBucketer(n_bins=3, variables=['LIMIT_BAL'], specials = specials)
ac_bucketer.fit(X)
mapping = ac_bucketer.features_bucket_mapping_
ui_bucketer = UserInputBucketer(mapping)
new_X = ui_bucketer.fit_transform(X)
assert len(new_X['LIMIT_BAL'].unique()) == 5
__init__(self, features_bucket_mapping=None, variables=[], remainder='passthrough')
special
¶
Initialise the user-defined boundaries with a dictionary.
Notes: - features_bucket_mapping is stored without the trailing underscore (_) because it is not fitted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features_bucket_mapping |
None, Dict, FeaturesBucketMapping, str or Path |
Contains the feature name and boundaries defined for this feature. If a dict, it will be converted to an internal FeaturesBucketMapping object. If a string or path, which will attempt to load the file as a yaml and convert to FeaturesBucketMapping object. |
None |
variables |
list |
The features to bucket. Uses all features in features_bucket_mapping if not defined. |
[] |
remainder |
str |
How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"]. passthrough (Default): all columns that were not specified in "variables" will be passed through. drop: all remaining columns that were not specified in "variables" will be dropped. |
'passthrough' |
bucket_table(self, column)
inherited
¶
Generates the statistics for the buckets of a particular column.
The pre-buckets are matched to the post-buckets, so that the user has a much clearer understanding of how the BucketingProcess ends up with the final buckets. An example:
bucket | label | Count | Count (%) | Non-event | Event | % Event | % Non-event | Event Rate | WoE | IV |
---|---|---|---|---|---|---|---|---|---|---|
0 | (-inf, 25.0) | 61.0 | 1.36 | 57.0 | 4.0 | 0.41 | 1.62 | 0.066 | 1.380 | 0.017 |
1 | [25.0, 45.0) | 2024.0 | 44.98 | 1536.0 | 488.0 | 49.64 | 43.67 | 0.241 | -0.128 | 0.008 |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
The column we wish to analyse |
required |
Returns:
Type | Description |
---|---|
df (pd.DataFrame) |
A pandas dataframe of the format above |
fit(self, X, y=None)
¶
Init the class.
fit_interactive(self, X, y=None, mode='external', **server_kwargs)
inherited
¶
Fit a bucketer and then interactive edit the fit using a dash app.
Note we are using a jupyterdash app, which supports 3 different modes:
- 'external' (default): Start dash server and print URL
- 'inline': Start dash app inside an Iframe in the jupyter notebook
- 'jupyterlab': Start dash app as a new tab inside jupyterlab
fit_transform(self, X, y=None, **fit_params)
inherited
¶
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Parameters¶
X : array-like of shape (n_samples, n_features) Input samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).
**fit_params : dict Additional fit parameters.
Returns¶
X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params(self, deep=True)
inherited
¶
plot_bucket(self, column, line='event_rate', format=None, scale=None, width=None, height=None)
inherited
¶
Plot the buckets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
The column we want to visualise |
required | |
line |
The line to plot on the secondary axis. Default is Event Rate. |
'event_rate' |
|
format |
The format of the image, such as 'png'. The default None returns a plotly image. |
None |
|
scale |
If format is specified, the scale of the image |
None |
|
width |
If format is specified, the width of the image |
None |
|
height |
If format is specified, the image of the image |
None |
Returns:
Type | Description |
---|---|
plot |
plotly fig |
predict(self, X)
inherited
¶
Applies the transform method. To be used for the grid searches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
The numerical data which will be transformed into the corresponding buckets |
required |
Returns:
Type | Description |
---|---|
y (np.array) |
Transformed X, such that the values of X are replaced by the corresponding bucket numbers |
predict_proba(self, X)
inherited
¶
Applies the transform method. To be used for the grid searches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
The numerical data which will be transformed into the corresponding buckets |
required |
Returns:
Type | Description |
---|---|
yhat (np.array) |
transformed X, such that the values of X are replaced by the corresponding bucket numbers |
save_yml(self, fout)
inherited
¶
Save the features bucket to a yaml file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fout |
~PathLike |
file output |
required |
set_params(self, **params)
inherited
¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:~sklearn.pipeline.Pipeline
). The latter have
parameters of the form <component>__<parameter>
so that it's
possible to update each component of a nested object.
Parameters¶
**params : dict Estimator parameters.
Returns¶
self : estimator instance Estimator instance.
summary(self)
inherited
¶
Display a summary table for columns passed to .fit()
.
The format is the following:
column | num_prebuckets | num_buckets | dtype |
---|---|---|---|
LIMIT_BAL | 15 | 10 | float64 |
BILL_AMT1 | 15 | 6 | float64 |
transform(self, X, y=None)
inherited
¶
Transforms an array into the corresponding buckets fitted by the Transformer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
dataframe which will be transformed into the corresponding buckets |
required |
y |
array |
target |
None |
Returns:
Type | Description |
---|---|
df (pd.DataFrame) |
dataset with transformed features |