Skip to content

WoeEncoder

Transformer that encodes unique values in features to their Weight of Evidence estimation.

This class has been deprecated in favor of category_encoders.woe.WOEEncoder

Only works for binary classification (target y has 0 and 1 values).

The weight of evidence is given by: np.log( p(1) / p(0) ) The target probability ratio is given by: p(1) / p(0)

For example in the variable colour, if the mean of the target = 1 for blue is 0.8 and the mean of the target = 0 is 0.2, blue will be replaced by: np.log(0.8/0.2) = 1.386 if log_ratio is selected. Alternatively, blue will be replaced by 0.8 / 0.2 = 4 if ratio is selected.

More formally:

  • for each unique value 𝑥, consider the corresponding rows in the training set
  • compute what percentage of positives is in these rows, compared to the whole set
  • compute what percentage of negatives is in these rows, compared to the whole set
  • take the ratio of these percentages
  • take the natural logarithm of that ratio to get the weight of evidence corresponding to 𝑥, so that 𝑊𝑂𝐸(𝑥) is either positive or negative according to whether 𝑥 is more representative of positives or negatives

More details:

Examples:

from skorecard import datasets
from skorecard.preprocessing import WoeEncoder

X, y = datasets.load_uci_credit_card(return_X_y=True)
we = WoeEncoder(variables=['EDUCATION'])
we.fit_transform(X, y)
we.fit_transform(X, y)['EDUCATION'].value_counts()

Credits: Some inspiration taken from feature_engine.categorical_encoders.

__init__(self, epsilon=0.0001, variables=[], handle_unknown='value') special

Constructor for WoEEncoder.

Parameters:

Name Type Description Default
epsilon float

Amount to be added to relative counts in order to avoid division by zero in the WOE calculation.

0.0001
variables list

The features to bucket. Uses all features if not defined.

[]
handle_unknown str

How to handle any new values encountered in X on transform(). options are 'return_nan', 'error' and 'value', defaults to 'value', which will assume WOE=0.

'value'

fit(self, X, y)

Calculate the WOE for every column.

Parameters:

Name Type Description Default
X np.array

(binned) features

required
y np.array

target

required

fit_transform(self, X, y=None, **fit_params) inherited

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : array-like of shape (n_samples, n_features) Input samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).

**fit_params : dict Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params(self, deep=True) inherited

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict Parameter names mapped to their values.

set_params(self, **params) inherited

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as :class:~sklearn.pipeline.Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : estimator instance Estimator instance.

transform(self, X)

Transform X to weight of evidence encoding.

Parameters:

Name Type Description Default
X pd.DataFrame

dataset

required

Last update: 2021-11-24
Back to top