Skip to content

Information Value

Calculate the Information Value (IV) of the features in X.

X must be the output of fitted bucketers.

\[ IV = \sum { (\% goods - \% bads) } * { WOE } \]
\[ WOE=\ln (\% { goods } / \% { bads }) \]

Example:

from skorecard import datasets
from sklearn.model_selection import train_test_split
from skorecard.bucketers import DecisionTreeBucketer
from skorecard.reporting import iv

X, y = datasets.load_uci_credit_card(return_X_y=True)

dbt = DecisionTreeBucketer()
X_bins = dbt.fit_transform(X,y)

iv_dict = iv(X_bins, y)

Parameters:

Name Type Description Default
X pd.DataFrame

pd.DataFrame (bucketed) features

required
y pd.Series

pd.Series: target values

required
epsilon float

Amount to be added to relative counts in order to avoid division by zero in the WOE calculation.

0.0001
digits int

number of significant decimal digits in the IV calculation

None

Returns:

Name Type Description
IVs dict

Keys are feature names, values are the IV values

Source code in skorecard/reporting/report.py
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
def iv(X: pd.DataFrame, y: pd.Series, epsilon: float = 0.0001, digits: Optional[int] = None) -> Dict:
    r"""
    Calculate the Information Value (IV) of the features in `X`.

    `X` must be the output of fitted bucketers.

    $$
    IV = \sum { (\% goods - \% bads) } * { WOE }
    $$

    $$
    WOE=\ln (\% { goods } /  \% { bads })
    $$

    Example:

    ```python
    from skorecard import datasets
    from sklearn.model_selection import train_test_split
    from skorecard.bucketers import DecisionTreeBucketer
    from skorecard.reporting import iv

    X, y = datasets.load_uci_credit_card(return_X_y=True)

    dbt = DecisionTreeBucketer()
    X_bins = dbt.fit_transform(X,y)

    iv_dict = iv(X_bins, y)
    ```

    Args:
        X: pd.DataFrame (bucketed) features
        y: pd.Series: target values
        epsilon (float): Amount to be added to relative counts in order to avoid division by zero in the WOE
            calculation.
        digits (int): number of significant decimal digits in the IV calculation

    Returns:
        IVs (dict): Keys are feature names, values are the IV values
    """  # noqa
    return {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X.columns}

Last update: 2023-08-08