Skip to content

Building a scorecard model

This tutorial shows how to build a skorecard model.

Start by loading the data and performiing the train test split:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

from skorecard.datasets import load_credit_card
from sklearn.model_selection import train_test_split

data = load_credit_card(as_frame=True)


X_train, X_test, y_train, y_test = train_test_split(
    data.drop(["y"], axis=1), data["y"], test_size=0.25, random_state=42
)

Load the buckets and the selected features that were created in the previous tutorials.

import yaml

buckets_dict = yaml.safe_load(open("buckets.yml"))
selected_features = ["x6", "x8", "x10", "x18", "x1", "x19", "x20", "x21", "x23", "x22", "x3", "x17", "x16"]

Define the scorecard model

A Skorecard class has two main components: - the bucketer - the list of selected features (if None is passed, it uses all the features defined in the bucketer)

It behaves like a scikit-learn model

from skorecard import Skorecard
from skorecard.bucketers import UserInputBucketer

scorecard = Skorecard(bucketing=UserInputBucketer(buckets_dict), variables=selected_features, calculate_stats=True)
scorecard = scorecard.fit(X_train, y_train)

The get_stats method returns the coefficients with their standard error and p-values

scorecard.get_stats()
Coef. Std.Err z P>|z|
const -1.242955 0.018238 -68.152266 0.000000e+00
x6 0.765416 0.020622 37.116970 1.495993e-301
x8 0.276995 0.033657 8.229831 1.874781e-16
x10 0.323640 0.036329 8.908558 5.170055e-19
x18 0.226277 0.050398 4.489820 7.128339e-06
x1 0.394922 0.048176 8.197563 2.453087e-16
x19 0.165650 0.055950 2.960659 3.069817e-03
x20 0.254235 0.062924 4.040349 5.337170e-05
x21 0.097257 0.072504 1.341396 1.797919e-01
x23 0.176958 0.074045 2.389866 1.685452e-02
x22 0.110976 0.076718 1.446541 1.480256e-01
x3 0.443555 0.096009 4.619955 3.838229e-06
x17 0.203349 0.133504 1.523175 1.277148e-01
x16 -0.166103 0.142547 -1.165254 2.439163e-01

Retrieve the model performance like in any sklearn classifier

from sklearn.metrics import roc_auc_score, classification_report

proba_train = scorecard.predict_proba(X_train)[:, 1]
proba_test = scorecard.predict_proba(X_test)[:, 1]

print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}\n")

print(classification_report(y_test, scorecard.predict(X_test)))
AUC train:0.7714
AUC test :0.7642

              precision    recall  f1-score   support

           0       0.84      0.95      0.89      5873
           1       0.66      0.34      0.45      1627

    accuracy                           0.82      7500
   macro avg       0.75      0.65      0.67      7500
weighted avg       0.80      0.82      0.80      7500


Removing features based on their statistical properties

Features can be further removed.
In a scorecard model, the coefficients are expected to be between 0 and -1.
Coefficients smaller than -1 indicate that the model relies heavily on features (likely to overfit), while positive coefficients show an inverted trend.

Additionally, p-values of the coefficients should be smaller that 0.05. (or 0.01).

Looking at the stats table above, this would suggest removing the following features from the list ['x21','x16','x17','x22'].

Note that feature removal should be done carefully, as every time the feature is removed, the coefficients might converge elsewhere, and would hence give a different model with a different interpretation.

from IPython.display import display

new_feats = [feat for feat in selected_features if feat not in ["x21", "x16", "x17", "x22"]]

scorecard = Skorecard(UserInputBucketer(buckets_dict), variables=new_feats, calculate_stats=True)

scorecard = scorecard.fit(X_train, y_train)

model_stats = scorecard.get_stats()

model_stats.index = ["Const"] + new_feats
display(model_stats)

proba_train = scorecard.predict_proba(X_train)[:, 1]
proba_test = scorecard.predict_proba(X_test)[:, 1]

print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}")
Coef. Std.Err z P>|z|
Const -1.242598 0.018229 -68.166667 0.000000e+00
x6 0.763946 0.020596 37.092613 3.695794e-301
x8 0.269057 0.033234 8.095809 5.688515e-16
x10 0.339016 0.035156 9.643180 5.253488e-22
x18 0.241832 0.049687 4.867076 1.132612e-06
x1 0.409354 0.046618 8.781019 1.619998e-18
x19 0.191910 0.053865 3.562812 3.669030e-04
x20 0.282166 0.060879 4.634866 3.571691e-06
x23 0.227794 0.069624 3.271788 1.068698e-03
x3 0.441695 0.095994 4.601264 4.199341e-06
AUC train:0.7712
AUC test :0.7648

Retrieving the transformed data

Buckets and WoE transformations are available directly in a fitted skorecard model

from IPython.display import display

print("Top 5 rows and the transformed buckets")
display(scorecard.bucket_transform(X_test)[new_feats].head())

print("\nTop 5 rows and the transformed WoEs")
display(scorecard.woe_transform(X_test)[new_feats].head())
Top 5 rows and the transformed buckets

x6 x8 x10 x18 x1 x19 x20 x23 x3
2308 1 0 0 1 0 1 1 1 2
22404 1 0 0 1 2 1 1 2 1
23397 1 0 0 1 0 1 1 2 0
25058 1 0 0 1 1 1 2 2 0
2664 1 0 0 1 -3 1 1 1 2

Top 5 rows and the transformed WoEs

x6 x8 x10 x18 x1 x19 x20 x23 x3
2308 -0.668068 -0.321161 -0.230059 -0.029168 0.610738 0.000397 -0.015444 0.105318 0.102411
22404 -0.668068 -0.321161 -0.230059 -0.029168 -0.353564 0.000397 -0.015444 -0.256887 -0.192612
23397 -0.668068 -0.321161 -0.230059 -0.029168 0.610738 0.000397 -0.015444 -0.256887 0.168106
25058 -0.668068 -0.321161 -0.230059 -0.029168 0.070222 0.000397 -0.405293 -0.256887 0.168106
2664 -0.668068 -0.321161 -0.230059 -0.029168 0.224540 0.000397 -0.015444 0.105318 0.102411

Getting the feature importance (to be integrated in the skorecard class)

In order to talk of feature importance, we should consider both the coefficients and the IV of the single feature. The importance cab be approximated as the product of the two numbers.

from skorecard.reporting import iv

X_train_bins = scorecard.bucket_transform(X_train)
iv_dict = iv(X_train_bins, y_train)

iv_values = pd.Series(iv_dict).sort_values(ascending=False)
iv_values.name = "IV"

feat_importance = model_stats[["Coef."]].join(iv_values)
feat_importance["importance"] = -1.0 * feat_importance["Coef."] * feat_importance["IV"]
feat_importance.sort_values(by="importance", ascending=False)
Coef. IV importance
x23 0.227794 0.002257 -0.000514
x8 0.269057 0.001924 -0.000518
x20 0.282166 0.001998 -0.000564
x18 0.241832 0.002503 -0.000605
x19 0.191910 0.003325 -0.000638
x10 0.339016 0.001917 -0.000650
x1 0.409354 0.002457 -0.001006
x3 0.441695 0.002968 -0.001311
x6 0.763946 0.002430 -0.001857
Const -1.242598 NaN NaN

Scaling the scores

The last step of building skorecard models is the rescaling of the predictions.
This is a very common practice within the Credit Risk domain, where scorecard models are widely used.

Rescaling scorecards has no impact on the model performance, but rather returns the predictions on an arbitrary scale (normally from 0-1000) which are more meaningful for risk managers and underwriters in a bank than probabilities.

The rescaling is a linear transfromation performed on the log-odds of the predicted probability \(p\),

\[ log(\frac{1-p}{p}) \]

Where the odds are defined as:

\[ \frac{1-p}{p} \]

The reference for the linear transformation are commonly defined by the following values:

  • ref_score: reference score, that should match a given reference odds (ref_odds)
  • ref_odds: reference odds that should match a giver reference score
  • pdo: points to double the odds, number of points to add where the odds double.

An example: with the following settings:

  • ref_score = 400
  • ref_odds = 20
  • pdo = 25

A score of 400 corresponds to the odds 20:1 of being a "good client" (y=0). This means that the predicted probability for y=1 is in this case ~4.76%, which you can get by rearranging the equation for the odds, above.
When the score increases to 425, the odds double to 40:1 (predicted probability to be y=1 is ~2,43%).
When the score decreases to 375, the odds are reduced by a factor 2, ie, 10:1 (predicted probability to be y=1 is ~9,09%).

In skorecard, one can use the calibrate_to_master_scale function.

from skorecard.rescale import calibrate_to_master_scale

proba_train = pd.Series(proba_train, index=y_train.index).sort_values()  # sorting for visualization purposes
scores = calibrate_to_master_scale(proba_train, pdo=25, ref_score=400, ref_odds=20)

Visualize the score dependencies

fig, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, figsize=(8, 12), gridspec_kw={"hspace": 0})
ax1.plot(scores.values, proba_train.values)

ax1.set_ylabel("Predicted probability")
ax1.set_title("Rescaled scores and probabilities")
ax1.grid()


ax2.plot(scores.values, proba_train.apply(lambda x: (1 - x) / x).values)
ax2.set_ylabel("Odds")
ax2.grid()


ax3.plot(
    scores.values,
    proba_train.apply(lambda x: np.log(1 - x) - np.log(x)).values,
)
ax3.set_ylabel("log-odds")
ax3.grid()
ax3.set_xlabel("Rescaled scores")

plt.show()
No description has been provided for this image

Assigning points to every feature

The last step of a scorecard development is to convert all the features into the rescaled model.

A scorecard model is a logisitic regression fitted on the WoE values of every single bucketed feature.
In other words, the following equations holds:

\[ log(odds) = log(\frac{1-p}{p}) = \beta_{0} + \sum_{i} \beta_{i} \cdot WOE(X_{i}) \]

As the rescaling performed earlier is linear in the predicted log-odds, this means that the every feature-bucket contribution can be rescaled to an integer value (by rescaling directly the $ \beta_{i} \cdot WOE(X_{i})$ factors with the same calculations.

This returns the final scorecard, that can be easily implemented.

The functionality in skorecard to rescale the features is as follows

from skorecard.rescale import ScoreCardPoints

# ensure that pdo, ref_score and ref_odds are consistent
scp = ScoreCardPoints(skorecard_model=scorecard, pdo=25, ref_score=400, ref_odds=20)

one can extract the final scorecard as follows

scp.get_scorecard_points()
bin_index map woe feature coef contribution Points
0 -2 NaN 0.000000 x6 0.763946 0.000000 37
1 -1 Missing 0.000000 x6 0.763946 0.000000 37
3 1 [-0.5, 0.5) -0.668068 x6 0.763946 -0.510368 56
4 2 [0.5, 1.5) -0.419221 x6 0.763946 -0.320262 49
5 3 [1.5, inf) 0.586085 x6 0.763946 0.447737 21
6 4 NaN 2.095463 x6 0.763946 1.600820 -20
7 -2 NaN 0.000000 x8 0.269057 0.000000 37
8 -1 Missing 0.000000 x8 0.269057 0.000000 37
10 1 [1.5, inf) -0.321161 x8 0.269057 -0.086411 41
11 2 NaN 1.392671 x8 0.269057 0.374708 24
12 -2 NaN 0.000000 x10 0.339016 0.000000 37
13 -1 Missing 0.000000 x10 0.339016 0.000000 37
15 1 [1.0, inf) -0.230059 x10 0.339016 -0.077994 40
16 2 NaN 1.491617 x10 0.339016 0.505681 19
17 -2 NaN 0.000000 x18 0.241832 0.000000 37
18 -1 Missing 0.000000 x18 0.241832 0.000000 37
20 1 [21.0, 4552.5) -0.029168 x18 0.241832 -0.007054 38
21 2 [4552.5, 15001.5) 0.692001 x18 0.241832 0.167348 31
22 3 [15001.5, inf) -0.457263 x18 0.241832 -0.110581 41
23 4 NaN -0.860845 x18 0.241832 -0.208180 45
25 -2 NaN 0.000000 x1 0.409354 0.000000 37
26 -1 Missing 0.000000 x1 0.409354 0.000000 37
28 1 [75000.0, 145000.0) -0.353564 x1 0.409354 -0.144733 43
29 2 [145000.0, 375000.0) 0.610738 x1 0.409354 0.250008 28
30 3 [375000.0, inf) 0.070222 x1 0.409354 0.028746 36
31 4 NaN -0.766316 x1 0.409354 -0.313695 49
32 5 NaN 0.224540 x1 0.409354 0.091916 34
33 -2 NaN 0.000000 x19 0.191910 0.000000 37
34 -1 Missing 0.000000 x19 0.191910 0.000000 37
36 1 [131.5, 4970.5) 0.000397 x19 0.191910 0.000076 37
37 2 [4970.5, 15001.0) 0.576300 x19 0.191910 0.110598 33
38 3 [15001.0, inf) -0.403868 x19 0.191910 -0.077506 40
39 4 NaN -1.162523 x19 0.191910 -0.223100 45
40 -2 NaN 0.000000 x20 0.282166 0.000000 37
41 -1 Missing 0.000000 x20 0.282166 0.000000 37
43 1 [16.5, 4513.5) -0.015444 x20 0.282166 -0.004358 38
44 2 [4513.5, 12490.5) 0.509437 x20 0.282166 0.143746 32
45 3 [12490.5, inf) -0.405293 x20 0.282166 -0.114360 42
46 4 NaN -0.832020 x20 0.282166 -0.234768 46
47 -2 NaN 0.000000 x23 0.227794 0.000000 37
48 -1 Missing 0.000000 x23 0.227794 0.000000 37
50 1 [1.5, 2000.5) -0.256887 x23 0.227794 -0.058517 40
51 2 [2000.5, 9849.5) 0.105318 x23 0.227794 0.023991 37
52 3 [9849.5, inf) 0.350867 x23 0.227794 0.079925 35
53 4 NaN -0.701982 x23 0.227794 -0.159907 43
54 -2 Other 0.000000 x3 0.441695 0.000000 37
55 -1 Missing 0.000000 x3 0.441695 0.000000 37
57 1 1.0 0.168106 x3 0.441695 0.074252 35
58 2 2.0 0.102411 x3 0.441695 0.045235 36
59 3 NaN -0.192612 x3 0.441695 -0.085076 40
60 4 NaN -1.207590 x3 0.441695 -0.533387 57
61 0 0 0.000000 Intercept -1.242598 -0.000000 0

Or one can apply the transformation directly on the data, by calling the transform method, in order to map each feature to its actual points.

Validate the rescaling

As the last step, in order to ensure that the rescaling was successfull, one can verify that the sum of the points of each row in the dataset matches the rescaled scores.
The rescaling steps has some integer rounding, therefore small discrepancies of 1-2 points might occur due to the rounding error

proba_train = pd.Series(
    proba_train, index=y_train.index
)  # convert to pandas and correct index in order to be able to perform the diff
scores = calibrate_to_master_scale(proba_train, pdo=25, ref_score=400, ref_odds=20)

# Check the distribution of the differences
(scores - scp.transform(X_train).sum(axis=1)).value_counts()
 111.0    878
 105.0    708
 146.0    560
 125.0    516
 130.0    442
         ... 
 284.0      1
 292.0      1
-54.0       1
 261.0      1
-80.0       1
Length: 373, dtype: int64