Skip to content

AsIsCategoricalBucketer

The AsIsCategoricalBucketer treats unique values as categories.

Support: badge badge badge

It will assign each a bucket number in the order of appearance. If new data contains new, unknown labels they will be replaced by 'Other'.

This is bucketer is useful when you have data that is already sufficiented bucketed, but you would like to be able to bucket new data in the same way.

Examples:

from skorecard import datasets
from skorecard.bucketers import AsIsCategoricalBucketer

X, y = datasets.load_uci_credit_card(return_X_y=True)
bucketer = AsIsCategoricalBucketer(variables=['EDUCATION'])
bucketer.fit_transform(X)

variables_type property readonly

Signals variables type supported by this bucketer.

__init__(self, variables=[], specials={}, missing_treatment='separate', remainder='passthrough') special

Init the class.

Parameters:

Name Type Description Default
variables list

The features to bucket. Uses all features if not defined.

[]
specials

(nested) dictionary of special values that require their own binning. The dictionary has the following format: {"" : {"name of special bucket" : }} For every feature that needs a special value, a dictionary must be passed as value. This dictionary contains a name of a bucket (key) and an array of unique values that should be put in that bucket. When special values are defined, they are not considered in the fitting procedure.

{}
missing_treatment

Defines how we treat the missing values present in the data. If a string, it must be one of the following options: separate: Missing values get put in a separate 'Other' bucket: -1 most_risky: Missing values are put into the bucket containing the largest percentage of Class 1. least_risky: Missing values are put into the bucket containing the largest percentage of Class 0. most_frequent: Missing values are put into the most common bucket. neutral: Missing values are put into the bucket with WoE closest to 0. similar: Missing values are put into the bucket with WoE closest to the bucket with only missing values. passthrough: Leaves missing values untouched. If a dict, it must be of the following format: {"": } This bucket number is where we will put the missing values.

'separate'
remainder

How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"]. passthrough (Default): all columns that were not specified in "variables" will be passed through. drop: all remaining columns that were not specified in "variables" will be dropped.

'passthrough'

bucket_table(self, column) inherited

Generates the statistics for the buckets of a particular column.

The pre-buckets are matched to the post-buckets, so that the user has a much clearer understanding of how the BucketingProcess ends up with the final buckets. An example:

bucket label Count Count (%) Non-event Event % Event % Non-event Event Rate WoE IV
0 (-inf, 25.0) 61.0 1.36 57.0 4.0 0.41 1.62 0.066 1.380 0.017
1 [25.0, 45.0) 2024.0 44.98 1536.0 488.0 49.64 43.67 0.241 -0.128 0.008

Parameters:

Name Type Description Default
column

The column we wish to analyse

required

Returns:

Type Description
df (pd.DataFrame)

A pandas dataframe of the format above

fit(self, X, y=None) inherited

Fit X, y.

fit_interactive(self, X, y=None, mode='external', **server_kwargs) inherited

Fit a bucketer and then interactive edit the fit using a dash app.

Note we are using a jupyterdash app, which supports 3 different modes:

  • 'external' (default): Start dash server and print URL
  • 'inline': Start dash app inside an Iframe in the jupyter notebook
  • 'jupyterlab': Start dash app as a new tab inside jupyterlab

fit_transform(self, X, y=None, **fit_params) inherited

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : array-like of shape (n_samples, n_features) Input samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).

**fit_params : dict Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params(self, deep=True) inherited

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict Parameter names mapped to their values.

plot_bucket(self, column, line='event_rate', format=None, scale=None, width=None, height=None) inherited

Plot the buckets.

Parameters:

Name Type Description Default
column

The column we want to visualise

required
line

The line to plot on the secondary axis. Default is Event Rate.

'event_rate'
format

The format of the image, such as 'png'. The default None returns a plotly image.

None
scale

If format is specified, the scale of the image

None
width

If format is specified, the width of the image

None
height

If format is specified, the image of the image

None

Returns:

Type Description
plot

plotly fig

predict(self, X) inherited

Applies the transform method. To be used for the grid searches.

Parameters:

Name Type Description Default
X pd.DataFrame

The numerical data which will be transformed into the corresponding buckets

required

Returns:

Type Description
y (np.array)

Transformed X, such that the values of X are replaced by the corresponding bucket numbers

predict_proba(self, X) inherited

Applies the transform method. To be used for the grid searches.

Parameters:

Name Type Description Default
X pd.DataFrame

The numerical data which will be transformed into the corresponding buckets

required

Returns:

Type Description
yhat (np.array)

transformed X, such that the values of X are replaced by the corresponding bucket numbers

save_yml(self, fout) inherited

Save the features bucket to a yaml file.

Parameters:

Name Type Description Default
fout ~PathLike

file output

required

set_params(self, **params) inherited

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as :class:~sklearn.pipeline.Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : estimator instance Estimator instance.

summary(self) inherited

Display a summary table for columns passed to .fit().

The format is the following:

column num_prebuckets num_buckets dtype
LIMIT_BAL 15 10 float64
BILL_AMT1 15 6 float64

transform(self, X, y=None) inherited

Transforms an array into the corresponding buckets fitted by the Transformer.

Parameters:

Name Type Description Default
X pd.DataFrame

dataframe which will be transformed into the corresponding buckets

required
y array

target

None

Returns:

Type Description
df (pd.DataFrame)

dataset with transformed features


Last update: 2021-11-24
Back to top