AsIsCategoricalBucketer
The AsIsCategoricalBucketer
treats unique values as categories.
Support:
It will assign each a bucket number in the order of appearance. If new data contains new, unknown labels they will be replaced by 'Other'.
This is bucketer is useful when you have data that is already sufficiented bucketed, but you would like to be able to bucket new data in the same way.
Examples:
from skorecard import datasets
from skorecard.bucketers import AsIsCategoricalBucketer
X, y = datasets.load_uci_credit_card(return_X_y=True)
bucketer = AsIsCategoricalBucketer(variables=['EDUCATION'])
bucketer.fit_transform(X)
variables_type
property
readonly
¶
Signals variables type supported by this bucketer.
__init__(self, variables=[], specials={}, missing_treatment='separate', remainder='passthrough')
special
¶
Init the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
variables |
list |
The features to bucket. Uses all features if not defined. |
[] |
specials |
(nested) dictionary of special values that require their own binning.
The dictionary has the following format:
{" |
{} |
|
missing_treatment |
Defines how we treat the missing values present in the data.
If a string, it must be one of the following options:
separate: Missing values get put in a separate 'Other' bucket: |
'separate' |
|
remainder |
How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"]. passthrough (Default): all columns that were not specified in "variables" will be passed through. drop: all remaining columns that were not specified in "variables" will be dropped. |
'passthrough' |
bucket_table(self, column)
inherited
¶
Generates the statistics for the buckets of a particular column.
The pre-buckets are matched to the post-buckets, so that the user has a much clearer understanding of how the BucketingProcess ends up with the final buckets. An example:
bucket | label | Count | Count (%) | Non-event | Event | % Event | % Non-event | Event Rate | WoE | IV |
---|---|---|---|---|---|---|---|---|---|---|
0 | (-inf, 25.0) | 61.0 | 1.36 | 57.0 | 4.0 | 0.41 | 1.62 | 0.066 | 1.380 | 0.017 |
1 | [25.0, 45.0) | 2024.0 | 44.98 | 1536.0 | 488.0 | 49.64 | 43.67 | 0.241 | -0.128 | 0.008 |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
The column we wish to analyse |
required |
Returns:
Type | Description |
---|---|
df (pd.DataFrame) |
A pandas dataframe of the format above |
fit(self, X, y=None)
inherited
¶
Fit X, y.
fit_interactive(self, X, y=None, mode='external', **server_kwargs)
inherited
¶
Fit a bucketer and then interactive edit the fit using a dash app.
Note we are using a jupyterdash app, which supports 3 different modes:
- 'external' (default): Start dash server and print URL
- 'inline': Start dash app inside an Iframe in the jupyter notebook
- 'jupyterlab': Start dash app as a new tab inside jupyterlab
fit_transform(self, X, y=None, **fit_params)
inherited
¶
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Parameters¶
X : array-like of shape (n_samples, n_features) Input samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).
**fit_params : dict Additional fit parameters.
Returns¶
X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params(self, deep=True)
inherited
¶
plot_bucket(self, column, line='event_rate', format=None, scale=None, width=None, height=None)
inherited
¶
Plot the buckets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
The column we want to visualise |
required | |
line |
The line to plot on the secondary axis. Default is Event Rate. |
'event_rate' |
|
format |
The format of the image, such as 'png'. The default None returns a plotly image. |
None |
|
scale |
If format is specified, the scale of the image |
None |
|
width |
If format is specified, the width of the image |
None |
|
height |
If format is specified, the image of the image |
None |
Returns:
Type | Description |
---|---|
plot |
plotly fig |
predict(self, X)
inherited
¶
Applies the transform method. To be used for the grid searches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
The numerical data which will be transformed into the corresponding buckets |
required |
Returns:
Type | Description |
---|---|
y (np.array) |
Transformed X, such that the values of X are replaced by the corresponding bucket numbers |
predict_proba(self, X)
inherited
¶
Applies the transform method. To be used for the grid searches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
The numerical data which will be transformed into the corresponding buckets |
required |
Returns:
Type | Description |
---|---|
yhat (np.array) |
transformed X, such that the values of X are replaced by the corresponding bucket numbers |
save_yml(self, fout)
inherited
¶
Save the features bucket to a yaml file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fout |
~PathLike |
file output |
required |
set_params(self, **params)
inherited
¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:~sklearn.pipeline.Pipeline
). The latter have
parameters of the form <component>__<parameter>
so that it's
possible to update each component of a nested object.
Parameters¶
**params : dict Estimator parameters.
Returns¶
self : estimator instance Estimator instance.
summary(self)
inherited
¶
Display a summary table for columns passed to .fit()
.
The format is the following:
column | num_prebuckets | num_buckets | dtype |
---|---|---|---|
LIMIT_BAL | 15 | 10 | float64 |
BILL_AMT1 | 15 | 6 | float64 |
transform(self, X, y=None)
inherited
¶
Transforms an array into the corresponding buckets fitted by the Transformer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
dataframe which will be transformed into the corresponding buckets |
required |
y |
array |
target |
None |
Returns:
Type | Description |
---|---|
df (pd.DataFrame) |
dataset with transformed features |