# Measuring bucketed distribution shifts.¶

## Population staibility index - PSI¶

The PSI (population stability index), is a common measure to evaluate how similar two univariate distributions are.

It's given by the following formula

$PSI=\sum_{i}^{N_{bins}} (\%x_{i}^{actual} - \%x_{i}^{expected}) log\frac{\%x_{i}^{actual}}{\%x_{i}^{expected}}$

where the sum runs over all the buckets of the feature x.

skorecard implements a simple functionality to calculate the PSI between two datasets.
As two datasets are needed, we split the X and y into a train and test set.

from skorecard import datasets
from sklearn.model_selection import train_test_split
from skorecard.bucketers import DecisionTreeBucketer

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.25,
random_state=42
)


By definition, the PSI acts on bucketed features.

Failing to bucket the features would still yield a value of the PSI. However, in this case the PSI will be computed over all the unique values. For numerical features, this will return artifically high and meaningless values.

dbt = DecisionTreeBucketer()

X_train_bins = dbt.fit_transform(X_train,y_train)
X_test_bins = dbt.transform(X_test)


Calculating the PSI

from skorecard.reporting import psi

psi_dict = psi(X_train_bins, X_test_bins)
psi_dict

{'EDUCATION': 0.0005202506508081382,
'MARRIAGE': 0.0003497580712116056,
'LIMIT_BAL': 0.013577676978376134,
'BILL_AMT1': 0.017027519474734677}


# Univariate predictive power¶

## Information value (IV)¶

The information value is nothing else than the PSI, but it's computed between the features set defined by the target y=0 and y=1.

In other words, it can be summarized by the formula.

$IV=\sum_{i}^{N_{bins}} (\%x_{i}^{y=0} - \%x_{i}^{y=1}) log\frac{\%x_{i}^{y=0}}{\%x_{i}^{y=1}}$
dbt = DecisionTreeBucketer()
X_bins = dbt.fit_transform(X,y)


To compute the iv, skorecard implements a handy function.
The function consumes the (binned) feature set X, and the target y

from skorecard.reporting import iv
iv = iv(X_bins, y)
iv

{'EDUCATION': 0.036451028950383324,
'MARRIAGE': 0.009494315565036299,
'LIMIT_BAL': 0.17922043483265943,
'BILL_AMT1': 0.05239237644085838}


Last update: 2021-11-24