Building a scorecard model¶
This tutorial shows how to build a skorecard
model.
Start by loading the data and performiing the train test split:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from skorecard.datasets import load_credit_card
from sklearn.model_selection import train_test_split
data = load_credit_card(as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
data.drop(["y"], axis=1), data["y"], test_size=0.25, random_state=42
)
Load the buckets and the selected features that were created in the previous tutorials.
import yaml
buckets_dict = yaml.safe_load(open("buckets.yml"))
selected_features = ["x6", "x8", "x10", "x18", "x1", "x19", "x20", "x21", "x23", "x22", "x3", "x17", "x16"]
Define the scorecard model¶
A Skorecard class has two main components:
- the bucketer
- the list of selected features (if None
is passed, it uses all the features defined in the bucketer)
It behaves like a scikit-learn model
from skorecard import Skorecard
from skorecard.bucketers import UserInputBucketer
scorecard = Skorecard(bucketing=UserInputBucketer(buckets_dict), variables=selected_features, calculate_stats=True)
scorecard = scorecard.fit(X_train, y_train)
The get_stats
method returns the coefficients with their standard error and p-values
scorecard.get_stats()
Retrieve the model performance like in any sklearn classifier
from sklearn.metrics import roc_auc_score, classification_report
proba_train = scorecard.predict_proba(X_train)[:, 1]
proba_test = scorecard.predict_proba(X_test)[:, 1]
print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}\n")
print(classification_report(y_test, scorecard.predict(X_test)))
Removing features based on their statistical properties¶
Features can be further removed.
In a scorecard model, the coefficients are expected to be between 0 and -1.
Coefficients smaller than -1 indicate that the model relies heavily on features (likely to overfit), while positive coefficients show an inverted trend.
Additionally, p-values of the coefficients should be smaller that 0.05. (or 0.01).
Looking at the stats table above, this would suggest removing the following features from the list ['x21','x16','x17','x22']
.
Note that feature removal should be done carefully, as every time the feature is removed, the coefficients might converge elsewhere, and would hence give a different model with a different interpretation.
from IPython.display import display
new_feats = [feat for feat in selected_features if feat not in ["x21", "x16", "x17", "x22"]]
scorecard = Skorecard(UserInputBucketer(buckets_dict), variables=new_feats, calculate_stats=True)
scorecard = scorecard.fit(X_train, y_train)
model_stats = scorecard.get_stats()
model_stats.index = ["Const"] + new_feats
display(model_stats)
proba_train = scorecard.predict_proba(X_train)[:, 1]
proba_test = scorecard.predict_proba(X_test)[:, 1]
print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}")
Retrieving the transformed data¶
Buckets and WoE transformations are available directly in a fitted skorecard
model
from IPython.display import display
print("Top 5 rows and the transformed buckets")
display(scorecard.bucket_transform(X_test)[new_feats].head())
print("\nTop 5 rows and the transformed WoEs")
display(scorecard.woe_transform(X_test)[new_feats].head())
Getting the feature importance (to be integrated in the skorecard class)¶
In order to talk of feature importance, we should consider both the coefficients and the IV of the single feature. The importance cab be approximated as the product of the two numbers.
from skorecard.reporting import iv
X_train_bins = scorecard.bucket_transform(X_train)
iv_dict = iv(X_train_bins, y_train)
iv_values = pd.Series(iv_dict).sort_values(ascending=False)
iv_values.name = "IV"
feat_importance = model_stats[["Coef."]].join(iv_values)
feat_importance["importance"] = -1.0 * feat_importance["Coef."] * feat_importance["IV"]
feat_importance.sort_values(by="importance", ascending=False)
Scaling the scores¶
The last step of building skorecard models is the rescaling of the predictions.
This is a very common practice within the Credit Risk domain, where scorecard models are widely used.
Rescaling scorecards has no impact on the model performance, but rather returns the predictions on an arbitrary scale (normally from 0-1000) which are more meaningful for risk managers and underwriters in a bank than probabilities.
The rescaling is a linear transfromation performed on the log-odds of the predicted probability \(p\),
Where the odds are defined as:
The reference for the linear transformation are commonly defined by the following values:
ref_score
: reference score, that should match a given reference odds (ref_odds)ref_odds
: reference odds that should match a giver reference scorepdo
: points to double the odds, number of points to add where the odds double.
An example: with the following settings:
ref_score = 400
ref_odds = 20
pdo = 25
A score of 400
corresponds to the odds 20:1
of being a "good client" (y=0
). This means that the predicted probability for y=1
is in this case ~4.76%
, which you can get by rearranging the equation for the odds, above.
When the score increases to 425
, the odds double to 40:1
(predicted probability to be y=1
is ~2,43%
).
When the score decreases to 375
, the odds are reduced by a factor 2, ie, 10:1
(predicted probability to be y=1
is ~9,09%
).
In skorecard
, one can use the calibrate_to_master_scale
function.
from skorecard.rescale import calibrate_to_master_scale
proba_train = pd.Series(proba_train, index=y_train.index).sort_values() # sorting for visualization purposes
scores = calibrate_to_master_scale(proba_train, pdo=25, ref_score=400, ref_odds=20)
Visualize the score dependencies¶
fig, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, figsize=(8, 12), gridspec_kw={"hspace": 0})
ax1.plot(scores.values, proba_train.values)
ax1.set_ylabel("Predicted probability")
ax1.set_title("Rescaled scores and probabilities")
ax1.grid()
ax2.plot(scores.values, proba_train.apply(lambda x: (1 - x) / x).values)
ax2.set_ylabel("Odds")
ax2.grid()
ax3.plot(
scores.values,
proba_train.apply(lambda x: np.log(1 - x) - np.log(x)).values,
)
ax3.set_ylabel("log-odds")
ax3.grid()
ax3.set_xlabel("Rescaled scores")
plt.show()
Assigning points to every feature¶
The last step of a scorecard development is to convert all the features into the rescaled model.
A scorecard model is a logisitic regression fitted on the WoE values of every single bucketed feature.
In other words, the following equations holds:
As the rescaling performed earlier is linear in the predicted log-odds
, this means that the every feature-bucket contribution can be rescaled to an integer value (by rescaling directly the $ \beta_{i} \cdot WOE(X_{i})$ factors with the same calculations.
This returns the final scorecard, that can be easily implemented.
The functionality in skorecard
to rescale the features is as follows
from skorecard.rescale import ScoreCardPoints
# ensure that pdo, ref_score and ref_odds are consistent
scp = ScoreCardPoints(skorecard_model=scorecard, pdo=25, ref_score=400, ref_odds=20)
one can extract the final scorecard as follows
scp.get_scorecard_points()
Or one can apply the transformation directly on the data, by calling the transform
method, in order to map each feature to its actual points.
Validate the rescaling¶
As the last step, in order to ensure that the rescaling was successfull, one can verify that the sum of the points of each row in the dataset matches the rescaled scores.
The rescaling steps has some integer rounding, therefore small discrepancies of 1-2 points might occur due to the rounding error
proba_train = pd.Series(
proba_train, index=y_train.index
) # convert to pandas and correct index in order to be able to perform the diff
scores = calibrate_to_master_scale(proba_train, pdo=25, ref_score=400, ref_odds=20)
# Check the distribution of the differences
(scores - scp.transform(X_train).sum(axis=1)).value_counts()