EqualFrequencyBucketer
The EqualFrequencyBucketer
transformer creates buckets with equal number of elements.
Support:
Examples:
from skorecard import datasets
from skorecard.bucketers import EqualFrequencyBucketer
X, y = datasets.load_uci_credit_card(return_X_y=True)
bucketer = EqualFrequencyBucketer(n_bins = 10, variables=['LIMIT_BAL'])
bucketer.fit_transform(X)
bucketer.fit_transform(X)['LIMIT_BAL'].value_counts()
variables_type
property
readonly
¶
Signals variables type supported by this bucketer.
__init__(self, n_bins=5, variables=[], specials={}, missing_treatment='separate', remainder='passthrough')
special
¶
Init the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_bins |
int |
Number of bins to create. |
5 |
variables |
list |
The features to bucket. Uses all features if not defined. |
[] |
specials |
(nested) dictionary of special values that require their own binning.
The dictionary has the following format:
{" |
{} |
|
missing_treatment |
Defines how we treat the missing values present in the data.
If a string, it must be one of the following options:
separate: Missing values get put in a separate 'Other' bucket: |
'separate' |
|
remainder |
How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"]. passthrough (Default): all columns that were not specified in "variables" will be passed through. drop: all remaining columns that were not specified in "variables" will be dropped. |
'passthrough' |
bucket_table(self, column)
inherited
¶
Generates the statistics for the buckets of a particular column.
The pre-buckets are matched to the post-buckets, so that the user has a much clearer understanding of how the BucketingProcess ends up with the final buckets. An example:
bucket | label | Count | Count (%) | Non-event | Event | % Event | % Non-event | Event Rate | WoE | IV |
---|---|---|---|---|---|---|---|---|---|---|
0 | (-inf, 25.0) | 61.0 | 1.36 | 57.0 | 4.0 | 0.41 | 1.62 | 0.066 | 1.380 | 0.017 |
1 | [25.0, 45.0) | 2024.0 | 44.98 | 1536.0 | 488.0 | 49.64 | 43.67 | 0.241 | -0.128 | 0.008 |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
The column we wish to analyse |
required |
Returns:
Type | Description |
---|---|
df (pd.DataFrame) |
A pandas dataframe of the format above |
fit(self, X, y=None)
inherited
¶
Fit X, y.
fit_interactive(self, X, y=None, mode='external', **server_kwargs)
inherited
¶
Fit a bucketer and then interactive edit the fit using a dash app.
Note we are using a jupyterdash app, which supports 3 different modes:
- 'external' (default): Start dash server and print URL
- 'inline': Start dash app inside an Iframe in the jupyter notebook
- 'jupyterlab': Start dash app as a new tab inside jupyterlab
fit_transform(self, X, y=None, **fit_params)
inherited
¶
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Parameters¶
X : array-like of shape (n_samples, n_features) Input samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).
**fit_params : dict Additional fit parameters.
Returns¶
X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params(self, deep=True)
inherited
¶
plot_bucket(self, column, line='event_rate', format=None, scale=None, width=None, height=None)
inherited
¶
Plot the buckets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
The column we want to visualise |
required | |
line |
The line to plot on the secondary axis. Default is Event Rate. |
'event_rate' |
|
format |
The format of the image, such as 'png'. The default None returns a plotly image. |
None |
|
scale |
If format is specified, the scale of the image |
None |
|
width |
If format is specified, the width of the image |
None |
|
height |
If format is specified, the image of the image |
None |
Returns:
Type | Description |
---|---|
plot |
plotly fig |
predict(self, X)
inherited
¶
Applies the transform method. To be used for the grid searches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
The numerical data which will be transformed into the corresponding buckets |
required |
Returns:
Type | Description |
---|---|
y (np.array) |
Transformed X, such that the values of X are replaced by the corresponding bucket numbers |
predict_proba(self, X)
inherited
¶
Applies the transform method. To be used for the grid searches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
The numerical data which will be transformed into the corresponding buckets |
required |
Returns:
Type | Description |
---|---|
yhat (np.array) |
transformed X, such that the values of X are replaced by the corresponding bucket numbers |
save_yml(self, fout)
inherited
¶
Save the features bucket to a yaml file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fout |
~PathLike |
file output |
required |
set_params(self, **params)
inherited
¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:~sklearn.pipeline.Pipeline
). The latter have
parameters of the form <component>__<parameter>
so that it's
possible to update each component of a nested object.
Parameters¶
**params : dict Estimator parameters.
Returns¶
self : estimator instance Estimator instance.
summary(self)
inherited
¶
Display a summary table for columns passed to .fit()
.
The format is the following:
column | num_prebuckets | num_buckets | dtype |
---|---|---|---|
LIMIT_BAL | 15 | 10 | float64 |
BILL_AMT1 | 15 | 6 | float64 |
transform(self, X, y=None)
inherited
¶
Transforms an array into the corresponding buckets fitted by the Transformer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
dataframe which will be transformed into the corresponding buckets |
required |
y |
array |
target |
None |
Returns:
Type | Description |
---|---|
df (pd.DataFrame) |
dataset with transformed features |