Skip to content

SkorecardPipeline

A sklearn Pipeline with several attribute and methods added.

This Pipeline of bucketers behaves more like a bucketer and adds:

  • .summary(): See which columns are bucketed
  • .plot_bucket(): Plot buckets of a column
  • .bucket_table(): Table with buckets of a column
  • .save_to_yaml(): Save information necessary for bucketing to a YAML file
  • .features_bucket_mapping_: Access bucketing information
  • .fit_interactive(): Edit fitted buckets interactively in a dash app
from skorecard.pipeline.pipeline import SkorecardPipeline
from skorecard.bucketers import DecisionTreeBucketer, OrdinalCategoricalBucketer
from skorecard import datasets

pipe = SkorecardPipeline([
    ('decisiontreebucketer', DecisionTreeBucketer(variables = ["LIMIT_BAL", "BILL_AMT1"],max_n_bins=5)),
    ('ordinalcategoricalbucketer', OrdinalCategoricalBucketer(variables = ["EDUCATION", "MARRIAGE"], tol =0.05)),
])

df = datasets.load_uci_credit_card(as_frame=True)
features = ["LIMIT_BAL", "BILL_AMT1", "EDUCATION", "MARRIAGE"]
X = df[features]
y = df["default"].values

pipe.fit(X, y)
pipe.bucket_table('LIMIT_BAL')

bucket_tables_ property readonly

Retrieve bucket tables.

Used by .bucket_table()

classes_ inherited property readonly

The classes labels. Only exist if the last step is a classifier.

feature_names_in_ inherited property readonly

Names of features seen during first step fit method.

features_bucket_mapping_ property readonly

Retrieve features bucket mapping.

n_features_in_ inherited property readonly

Number of features seen during first step fit method.

named_steps inherited property readonly

Access the steps by name.

Read-only attribute to access any step by given name. Keys are steps names and values are the steps objects.

summary_dict_: Dict property readonly

Retrieve summary_dicts and combine.

Used by .summary()

__class__ (type) inherited

Metaclass for defining Abstract Base Classes (ABCs).

Use this metaclass to create an ABC. An ABC can be subclassed directly, and then acts as a mix-in class. You can also register unrelated concrete classes (even built-in classes) and unrelated ABCs as 'virtual subclasses' -- these and their descendants will be considered subclasses of the registering ABC by the built-in issubclass() function, but the registering ABC won't show up in their MRO (Method Resolution Order) nor will method implementations defined by the registering ABC be callable (not even via super()).

__instancecheck__(cls, instance) special

Override for isinstance(instance, cls).

__new__(mcls, name, bases, namespace, **kwargs) special staticmethod

Create and return a new object. See help(type) for accurate signature.

__subclasscheck__(cls, subclass) special

Override for issubclass(subclass, cls).

register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

__init__(self, steps, *, memory=None, verbose=False) special

Wraps sklearn Pipeline.

bucket_table(self, column) inherited

Generates the statistics for the buckets of a particular column.

The pre-buckets are matched to the post-buckets, so that the user has a much clearer understanding of how the BucketingProcess ends up with the final buckets. An example:

bucket label Count Count (%) Non-event Event % Event % Non-event Event Rate WoE IV
0 (-inf, 25.0) 61.0 1.36 57.0 4.0 0.41 1.62 0.066 1.380 0.017
1 [25.0, 45.0) 2024.0 44.98 1536.0 488.0 49.64 43.67 0.241 -0.128 0.008

Parameters:

Name Type Description Default
column

The column we wish to analyse

required

Returns:

Type Description
df (pd.DataFrame)

A pandas dataframe of the format above

decision_function(self, X) inherited

Transform the data, and apply decision_function with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls decision_function method. Only valid if the final estimator implements decision_function.

Parameters

X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.

Returns

y_score : ndarray of shape (n_samples, n_classes) Result of calling decision_function on the final estimator.

fit(self, X, y=None, **fit_params) inherited

Fit the model.

Fit all the transformers one after the other and transform the data. Finally, fit the transformed data using the final estimator.

Parameters

X : iterable Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns

self : object Pipeline with fitted steps.

fit_interactive(self, X, y=None, mode='external')

Fit a bucketer and then interactively edit the fit using a dash app.

Note we are using a jupyterdash app, which supports 3 different modes:

  • 'external' (default): Start dash server and print URL
  • 'inline': Start dash app inside an Iframe in the jupyter notebook
  • 'jupyterlab': Start dash app as a new tab inside jupyterlab

fit_predict(self, X, y=None, **fit_params) inherited

Transform the data, and apply fit_predict with the final estimator.

Call fit_transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls fit_predict method. Only valid if the final estimator implements fit_predict.

Parameters

X : iterable Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns

y_pred : ndarray Result of calling fit_predict on the final estimator.

fit_transform(self, X, y=None, **fit_params) inherited

Fit the model and transform with the final estimator.

Fits all the transformers one after the other and transform the data. Then uses fit_transform on transformed data with the final estimator.

Parameters

X : iterable Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns

Xt : ndarray of shape (n_samples, n_transformed_features) Transformed samples.

get_feature_names_out(self, input_features=None) inherited

Get output feature names for transformation.

Transform input features using the pipeline.

Parameters

input_features : array-like of str or None, default=None Input features.

Returns

feature_names_out : ndarray of str objects Transformed feature names.

get_params(self, deep=True) inherited

Get parameters for this estimator.

Returns the parameters given in the constructor as well as the estimators contained within the steps of the Pipeline.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any Parameter names mapped to their values.

inverse_transform(self, Xt) inherited

Apply inverse_transform for each step in a reverse order.

All estimators in the pipeline must support inverse_transform.

Parameters

Xt : array-like of shape (n_samples, n_transformed_features) Data samples, where n_samples is the number of samples and n_features is the number of features. Must fulfill input requirements of last step of pipeline's inverse_transform method.

Returns

Xt : ndarray of shape (n_samples, n_features) Inverse transformed data, that is, data in the original feature space.

plot_bucket(self, column, line='event_rate', format=None, scale=None, width=None, height=None) inherited

Plot the buckets.

Parameters:

Name Type Description Default
column

The column we want to visualise

required
line

The line to plot on the secondary axis. Default is Event Rate.

'event_rate'
format

The format of the image, such as 'png'. The default None returns a plotly image.

None
scale

If format is specified, the scale of the image

None
width

If format is specified, the width of the image

None
height

If format is specified, the image of the image

None

Returns:

Type Description
plot

plotly fig

predict(self, X, **predict_params) inherited

Transform the data, and apply predict with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict method. Only valid if the final estimator implements predict.

Parameters

X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.

**predict_params : dict of string -> object Parameters to the predict called at the end of all transformations in the pipeline. Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

.. versionadded:: 0.20

Returns

y_pred : ndarray Result of calling predict on the final estimator.

predict_log_proba(self, X, **predict_log_proba_params) inherited

Transform the data, and apply predict_log_proba with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_log_proba method. Only valid if the final estimator implements predict_log_proba.

Parameters

X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.

**predict_log_proba_params : dict of string -> object Parameters to the predict_log_proba called at the end of all transformations in the pipeline.

Returns

y_log_proba : ndarray of shape (n_samples, n_classes) Result of calling predict_log_proba on the final estimator.

predict_proba(self, X, **predict_proba_params) inherited

Transform the data, and apply predict_proba with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_proba method. Only valid if the final estimator implements predict_proba.

Parameters

X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.

**predict_proba_params : dict of string -> object Parameters to the predict_proba called at the end of all transformations in the pipeline.

Returns

y_proba : ndarray of shape (n_samples, n_classes) Result of calling predict_proba on the final estimator.

save_yml(self, fout)

Save the features bucket to a yaml file.

Parameters:

Name Type Description Default
fout

file output

required

score(self, X, y=None, sample_weight=None) inherited

Transform the data, and apply score with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score method. Only valid if the final estimator implements score.

Parameters

X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.

sample_weight : array-like, default=None If not None, this argument is passed as sample_weight keyword argument to the score method of the final estimator.

Returns

score : float Result of calling score on the final estimator.

score_samples(self, X) inherited

Transform the data, and apply score_samples with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score_samples method. Only valid if the final estimator implements score_samples.

Parameters

X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.

Returns

y_score : ndarray of shape (n_samples,) Result of calling score_samples on the final estimator.

set_params(self, **kwargs) inherited

Set the parameters of this estimator.

Valid parameter keys can be listed with get_params(). Note that you can directly set the parameters of the estimators contained in steps.

Parameters

**kwargs : dict Parameters of this estimator or parameters of estimators contained in steps. Parameters of the steps may be set using its name and the parameter name separated by a '__'.

Returns

self : object Pipeline class instance.

summary(self) inherited

Display a summary table for columns passed to .fit().

The format is the following:

column num_prebuckets num_buckets dtype
LIMIT_BAL 15 10 float64
BILL_AMT1 15 6 float64

transform(self, X) inherited

Transform the data, and apply transform with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls transform method. Only valid if the final estimator implements transform.

This also works where final estimator is None in which case all prior transformations are applied.

Parameters

X : iterable Data to transform. Must fulfill input requirements of first step of the pipeline.

Returns

Xt : ndarray of shape (n_samples, n_transformed_features) Transformed data.


Last update: 2021-11-24
Back to top