WoeEncoder
Transformer that encodes unique values in features to their Weight of Evidence estimation.
This class has been deprecated in favor of category_encoders.woe.WOEEncoder
Only works for binary classification (target y has 0 and 1 values).
The weight of evidence is given by: np.log( p(1) / p(0) )
The target probability ratio is given by: p(1) / p(0)
For example in the variable colour, if the mean of the target = 1 for blue is 0.8 and the mean of the target = 0 is 0.2, blue will be replaced by: np.log(0.8/0.2) = 1.386 if log_ratio is selected. Alternatively, blue will be replaced by 0.8 / 0.2 = 4 if ratio is selected.
More formally:
- for each unique value 𝑥, consider the corresponding rows in the training set
- compute what percentage of positives is in these rows, compared to the whole set
- compute what percentage of negatives is in these rows, compared to the whole set
- take the ratio of these percentages
- take the natural logarithm of that ratio to get the weight of evidence corresponding to 𝑥, so that 𝑊𝑂𝐸(𝑥) is either positive or negative according to whether 𝑥 is more representative of positives or negatives
More details:
Examples:
from skorecard import datasets
from skorecard.preprocessing import WoeEncoder
X, y = datasets.load_uci_credit_card(return_X_y=True)
we = WoeEncoder(variables=['EDUCATION'])
we.fit_transform(X, y)
we.fit_transform(X, y)['EDUCATION'].value_counts()
Credits: Some inspiration taken from feature_engine.categorical_encoders.
__init__(self, epsilon=0.0001, variables=[], handle_unknown='value')
special
¶
Constructor for WoEEncoder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
epsilon |
float |
Amount to be added to relative counts in order to avoid division by zero in the WOE calculation. |
0.0001 |
variables |
list |
The features to bucket. Uses all features if not defined. |
[] |
handle_unknown |
str |
How to handle any new values encountered in X on transform(). options are 'return_nan', 'error' and 'value', defaults to 'value', which will assume WOE=0. |
'value' |
fit(self, X, y)
¶
Calculate the WOE for every column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
np.array |
(binned) features |
required |
y |
np.array |
target |
required |
fit_transform(self, X, y=None, **fit_params)
inherited
¶
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Parameters¶
X : array-like of shape (n_samples, n_features) Input samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations).
**fit_params : dict Additional fit parameters.
Returns¶
X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params(self, deep=True)
inherited
¶
set_params(self, **params)
inherited
¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:~sklearn.pipeline.Pipeline
). The latter have
parameters of the form <component>__<parameter>
so that it's
possible to update each component of a nested object.
Parameters¶
**params : dict Estimator parameters.
Returns¶
self : estimator instance Estimator instance.
transform(self, X)
¶
Transform X to weight of evidence encoding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
dataset |
required |