Skip to content

Population Stability Index

Calculate the PSI between the features in two dataframes, X1 and X2.

X1 and X2 should be bucketed (outputs of fitted bucketers).

\[ PSI = \sum((\%{ Good } - \%{ Bad }) imes \ln rac{\%{ Good }}{\%{ Bad }}) \]

Parameters:

Name Type Description Default
X1 DataFrame

bucketed features, expected

required
X2 DataFrame

bucketed features, actual data

required
epsilon float

Amount to be added to relative counts in order to avoid division by zero in the WOE calculation.

0.0001
digits

(int): number of significant decimal digits in the IV calculation

None

Examples:

from skorecard import datasets
from sklearn.model_selection import train_test_split
from skorecard.bucketers import DecisionTreeBucketer
from skorecard.reporting import psi

X, y = datasets.load_uci_credit_card(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X,y,
    test_size=0.25,
    random_state=42
)

dbt = DecisionTreeBucketer()
X_train_bins = dbt.fit_transform(X_train,y_train)
X_test_bins = dbt.transform(X_test)

psi_dict = psi(X_train_bins, X_test_bins)
Source code in skorecard/reporting/report.py
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
def psi(X1: pd.DataFrame, X2: pd.DataFrame, epsilon=0.0001, digits=None) -> Dict:
    """
    Calculate the PSI between the features in two dataframes, `X1` and `X2`.

    `X1` and `X2` should be bucketed (outputs of fitted bucketers).

    $$
    PSI = \\sum((\\%{ Good } - \\%{ Bad }) \times \\ln \frac{\\%{ Good }}{\\%{ Bad }})
    $$

    Args:
        X1 (pd.DataFrame): bucketed features, expected
        X2 (pd.DataFrame): bucketed features, actual data
        epsilon (float): Amount to be added to relative counts in order to avoid division by zero in the WOE
            calculation.
        digits: (int): number of significant decimal digits in the IV calculation

    Returns: dictionary of psi values. keys are feature names, values are the psi values

    Examples:

    ```python
    from skorecard import datasets
    from sklearn.model_selection import train_test_split
    from skorecard.bucketers import DecisionTreeBucketer
    from skorecard.reporting import psi

    X, y = datasets.load_uci_credit_card(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(
        X,y,
        test_size=0.25,
        random_state=42
    )

    dbt = DecisionTreeBucketer()
    X_train_bins = dbt.fit_transform(X_train,y_train)
    X_test_bins = dbt.transform(X_test)

    psi_dict = psi(X_train_bins, X_test_bins)
    ```
    """  # noqa
    assert (X1.columns == X2.columns).all(), "X1 and X2 must have same columns"

    y1 = pd.Series(0, index=X1.index)
    y2 = pd.Series(1, index=X2.index)

    X = pd.concat([X1, X2], axis=0)
    y = pd.concat([y1, y2], axis=0).reset_index(drop=True)

    psis = {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X1.columns}

    return psis