Population Stability Index¶

Calculate the PSI between the features in two dataframes, X1 and X2.

X1 and X2 should be bucketed (outputs of fitted bucketers).

\[ PSI = \sum((\%{ Good } - \%{ Bad }) imes \ln rac{\%{ Good }}{\%{ Bad }}) \]

Parameters:

Name	Type	Description	Default
`X1`	`DataFrame`	bucketed features, expected	required
`X2`	`DataFrame`	bucketed features, actual data	required
`epsilon`	`float`	Amount to be added to relative counts in order to avoid division by zero in the WOE calculation.	`0.0001`
`digits`		(int): number of significant decimal digits in the IV calculation	`None`

Examples:

from skorecard import datasets
from sklearn.model_selection import train_test_split
from skorecard.bucketers import DecisionTreeBucketer
from skorecard.reporting import psi

X, y = datasets.load_uci_credit_card(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X,y,
    test_size=0.25,
    random_state=42
)

dbt = DecisionTreeBucketer()
X_train_bins = dbt.fit_transform(X_train,y_train)
X_test_bins = dbt.transform(X_test)

psi_dict = psi(X_train_bins, X_test_bins)

Source code in skorecard/reporting/report.py

def psi(X1: pd.DataFrame, X2: pd.DataFrame, epsilon=0.0001, digits=None) -> Dict:
    """
    Calculate the PSI between the features in two dataframes, `X1` and `X2`.

    `X1` and `X2` should be bucketed (outputs of fitted bucketers).

    $$
    PSI = \\sum((\\%{ Good } - \\%{ Bad }) \times \\ln \frac{\\%{ Good }}{\\%{ Bad }})
    $$

    Args:
        X1 (pd.DataFrame): bucketed features, expected
        X2 (pd.DataFrame): bucketed features, actual data
        epsilon (float): Amount to be added to relative counts in order to avoid division by zero in the WOE
            calculation.
        digits: (int): number of significant decimal digits in the IV calculation

    Returns: dictionary of psi values. keys are feature names, values are the psi values

    Examples:

    ```python
    from skorecard import datasets
    from sklearn.model_selection import train_test_split
    from skorecard.bucketers import DecisionTreeBucketer
    from skorecard.reporting import psi

    X, y = datasets.load_uci_credit_card(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(
        X,y,
        test_size=0.25,
        random_state=42
    )

    dbt = DecisionTreeBucketer()
    X_train_bins = dbt.fit_transform(X_train,y_train)
    X_test_bins = dbt.transform(X_test)

    psi_dict = psi(X_train_bins, X_test_bins)
    ```
    """  # noqa
    assert (X1.columns == X2.columns).all(), "X1 and X2 must have same columns"

    y1 = pd.Series(0, index=X1.index)
    y2 = pd.Series(1, index=X2.index)

    X = pd.concat([X1, X2], axis=0)
    y = pd.concat([y1, y2], axis=0).reset_index(drop=True)

    psis = {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X1.columns}

    return psis