Skip to content

UserInputBucketer

The UserInputBucketer transformer creates buckets by implementing user-defined boundaries.

Support: badge badge badge

This is a special bucketer that is not fitted but rather relies on pre-defined user input. The most common use-case is loading bucket mapping information previously fitted by other bucketers.

Examples:

from skorecard import datasets
from skorecard.bucketers import AgglomerativeClusteringBucketer, UserInputBucketer

X, y = datasets.load_uci_credit_card(return_X_y=True)

ac_bucketer = AgglomerativeClusteringBucketer(n_bins=3, variables=['LIMIT_BAL'])
ac_bucketer.fit(X)
mapping = ac_bucketer.features_bucket_mapping_

ui_bucketer = UserInputBucketer(mapping)
new_X = ui_bucketer.fit_transform(X)
assert len(new_X['LIMIT_BAL'].unique()) == 3

#Map some values to the special buckets
specials = {
    "LIMIT_BAL":{
        "=50000":[50000],
        "in [20001,30000]":[20000,30000],
        }
}

ac_bucketer = AgglomerativeClusteringBucketer(n_bins=3, variables=['LIMIT_BAL'], specials = specials)
ac_bucketer.fit(X)
mapping = ac_bucketer.features_bucket_mapping_

ui_bucketer = UserInputBucketer(mapping)
new_X = ui_bucketer.fit_transform(X)
assert len(new_X['LIMIT_BAL'].unique()) == 5

__init__(self, features_bucket_mapping=None, variables=[], remainder='passthrough') special

Initialise the user-defined boundaries with a dictionary.

Notes: - features_bucket_mapping is stored without the trailing underscore (_) because it is not fitted.

Parameters:

Name Type Description Default
features_bucket_mapping None, Dict, FeaturesBucketMapping, str or Path

Contains the feature name and boundaries defined for this feature. If a dict, it will be converted to an internal FeaturesBucketMapping object. If a string or path, which will attempt to load the file as a yaml and convert to FeaturesBucketMapping object.

None
variables list

The features to bucket. Uses all features in features_bucket_mapping if not defined.

[]
remainder str

How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"]. passthrough (Default): all columns that were not specified in "variables" will be passed through. drop: all remaining columns that were not specified in "variables" will be dropped.

'passthrough'

bucket_table(self, column) inherited

Generates the statistics for the buckets of a particular column.

The pre-buckets are matched to the post-buckets, so that the user has a much clearer understanding of how the BucketingProcess ends up with the final buckets. An example:

bucket label Count Count (%) Non-event Event % Event % Non-event Event Rate WoE IV
0 (-inf, 25.0) 61.0 1.36 57.0 4.0 0.41 1.62 0.066 1.380 0.017
1 [25.0, 45.0) 2024.0 44.98 1536.0 488.0 49.64 43.67 0.241 -0.128 0.008

Parameters:

Name Type Description Default
column

The column we wish to analyse

required

Returns:

Type Description
df (pd.DataFrame)

A pandas dataframe of the format above

fit(self, X, y=None)

Init the class.

fit_interactive(self, X, y=None, mode='external', **server_kwargs) inherited

Fit a bucketer and then interactive edit the fit using a dash app.

Note we are using a jupyterdash app, which supports 3 different modes:

  • 'external' (default): Start dash server and print URL
  • 'inline': Start dash app inside an Iframe in the jupyter notebook
  • 'jupyterlab': Start dash app as a new tab inside jupyterlab

fit_transform(self, X, y=None, **fit_params) inherited

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : array-like of shape (n_samples, n_features) Input samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).

**fit_params : dict Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params(self, deep=True) inherited

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict Parameter names mapped to their values.

plot_bucket(self, column, line='event_rate', format=None, scale=None, width=None, height=None) inherited

Plot the buckets.

Parameters:

Name Type Description Default
column

The column we want to visualise

required
line

The line to plot on the secondary axis. Default is Event Rate.

'event_rate'
format

The format of the image, such as 'png'. The default None returns a plotly image.

None
scale

If format is specified, the scale of the image

None
width

If format is specified, the width of the image

None
height

If format is specified, the image of the image

None

Returns:

Type Description
plot

plotly fig

predict(self, X) inherited

Applies the transform method. To be used for the grid searches.

Parameters:

Name Type Description Default
X pd.DataFrame

The numerical data which will be transformed into the corresponding buckets

required

Returns:

Type Description
y (np.array)

Transformed X, such that the values of X are replaced by the corresponding bucket numbers

predict_proba(self, X) inherited

Applies the transform method. To be used for the grid searches.

Parameters:

Name Type Description Default
X pd.DataFrame

The numerical data which will be transformed into the corresponding buckets

required

Returns:

Type Description
yhat (np.array)

transformed X, such that the values of X are replaced by the corresponding bucket numbers

save_yml(self, fout) inherited

Save the features bucket to a yaml file.

Parameters:

Name Type Description Default
fout ~PathLike

file output

required

set_params(self, **params) inherited

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as :class:~sklearn.pipeline.Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : estimator instance Estimator instance.

summary(self) inherited

Display a summary table for columns passed to .fit().

The format is the following:

column num_prebuckets num_buckets dtype
LIMIT_BAL 15 10 float64
BILL_AMT1 15 6 float64

transform(self, X, y=None) inherited

Transforms an array into the corresponding buckets fitted by the Transformer.

Parameters:

Name Type Description Default
X pd.DataFrame

dataframe which will be transformed into the corresponding buckets

required
y array

target

None

Returns:

Type Description
df (pd.DataFrame)

dataset with transformed features


Last update: 2021-11-24
Back to top