Building a scorecard model¶

This tutorial shows how to build a skorecard model.

Start by loading the data and performiing the train test split:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

from skorecard.datasets import load_credit_card
from sklearn.model_selection import train_test_split

data = load_credit_card(as_frame=True)


X_train, X_test, y_train, y_test = train_test_split(
    data.drop(["y"], axis=1), data["y"], test_size=0.25, random_state=42
)

Load the buckets and the selected features that were created in the previous tutorials.

import yaml

buckets_dict = yaml.safe_load(open("buckets.yml"))
selected_features = ["x6", "x8", "x10", "x18", "x1", "x19", "x20", "x21", "x23", "x22", "x3", "x17", "x16"]

Define the scorecard model¶

A Skorecard class has two main components: - the bucketer - the list of selected features (if None is passed, it uses all the features defined in the bucketer)

It behaves like a scikit-learn model

from skorecard import Skorecard
from skorecard.bucketers import UserInputBucketer

scorecard = Skorecard(bucketing=UserInputBucketer(buckets_dict), variables=selected_features, calculate_stats=True)
scorecard = scorecard.fit(X_train, y_train)

The get_stats method returns the coefficients with their standard error and p-values

scorecard.get_stats()

	Coef.	Std.Err	z	P>\|z\|
const	-1.242955	0.018238	-68.152266	0.000000e+00
x6	0.765416	0.020622	37.116970	1.495993e-301
x8	0.276995	0.033657	8.229831	1.874781e-16
x10	0.323640	0.036329	8.908558	5.170055e-19
x18	0.226277	0.050398	4.489820	7.128339e-06
x1	0.394922	0.048176	8.197563	2.453087e-16
x19	0.165650	0.055950	2.960659	3.069817e-03
x20	0.254235	0.062924	4.040349	5.337170e-05
x21	0.097257	0.072504	1.341396	1.797919e-01
x23	0.176958	0.074045	2.389866	1.685452e-02
x22	0.110976	0.076718	1.446541	1.480256e-01
x3	0.443555	0.096009	4.619955	3.838229e-06
x17	0.203349	0.133504	1.523175	1.277148e-01
x16	-0.166103	0.142547	-1.165254	2.439163e-01

Retrieve the model performance like in any sklearn classifier

from sklearn.metrics import roc_auc_score, classification_report

proba_train = scorecard.predict_proba(X_train)[:, 1]
proba_test = scorecard.predict_proba(X_test)[:, 1]

print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}\n")

print(classification_report(y_test, scorecard.predict(X_test)))

AUC train:0.7714
AUC test :0.7642

              precision    recall  f1-score   support

           0       0.84      0.95      0.89      5873
           1       0.66      0.34      0.45      1627

    accuracy                           0.82      7500
   macro avg       0.75      0.65      0.67      7500
weighted avg       0.80      0.82      0.80      7500

Removing features based on their statistical properties¶

Features can be further removed.
In a scorecard model, the coefficients are expected to be between 0 and -1.
Coefficients smaller than -1 indicate that the model relies heavily on features (likely to overfit), while positive coefficients show an inverted trend.

Additionally, p-values of the coefficients should be smaller that 0.05. (or 0.01).

Looking at the stats table above, this would suggest removing the following features from the list ['x21','x16','x17','x22'].

Note that feature removal should be done carefully, as every time the feature is removed, the coefficients might converge elsewhere, and would hence give a different model with a different interpretation.

from IPython.display import display

new_feats = [feat for feat in selected_features if feat not in ["x21", "x16", "x17", "x22"]]

scorecard = Skorecard(UserInputBucketer(buckets_dict), variables=new_feats, calculate_stats=True)

scorecard = scorecard.fit(X_train, y_train)

model_stats = scorecard.get_stats()

model_stats.index = ["Const"] + new_feats
display(model_stats)

proba_train = scorecard.predict_proba(X_train)[:, 1]
proba_test = scorecard.predict_proba(X_test)[:, 1]

print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}")

	Coef.	Std.Err	z	P>\|z\|
Const	-1.242598	0.018229	-68.166667	0.000000e+00
x6	0.763946	0.020596	37.092613	3.695794e-301
x8	0.269057	0.033234	8.095809	5.688515e-16
x10	0.339016	0.035156	9.643180	5.253488e-22
x18	0.241832	0.049687	4.867076	1.132612e-06
x1	0.409354	0.046618	8.781019	1.619998e-18
x19	0.191910	0.053865	3.562812	3.669030e-04
x20	0.282166	0.060879	4.634866	3.571691e-06
x23	0.227794	0.069624	3.271788	1.068698e-03
x3	0.441695	0.095994	4.601264	4.199341e-06

AUC train:0.7712
AUC test :0.7648

Retrieving the transformed data¶

Buckets and WoE transformations are available directly in a fitted skorecard model

from IPython.display import display

print("Top 5 rows and the transformed buckets")
display(scorecard.bucket_transform(X_test)[new_feats].head())

print("\nTop 5 rows and the transformed WoEs")
display(scorecard.woe_transform(X_test)[new_feats].head())

Top 5 rows and the transformed buckets

	x6	x18	x1	x19	x20	x23	x3
2308	1	1	0	1	1	1	2
22404	1	1	2	1	1	2	1
23397	1	1	0	1	1	2	0
25058	1	1	1	1	2	2	0
2664	1	1	-3	1	1	1	2


Top 5 rows and the transformed WoEs

	x6	x8	x10	x18	x1	x19	x20	x23	x3
2308	-0.668068	-0.321161	-0.230059	-0.029168	0.610738	0.000397	-0.015444	0.105318	0.102411
22404	-0.668068	-0.321161	-0.230059	-0.029168	-0.353564	0.000397	-0.015444	-0.256887	-0.192612
23397	-0.668068	-0.321161	-0.230059	-0.029168	0.610738	0.000397	-0.015444	-0.256887	0.168106
25058	-0.668068	-0.321161	-0.230059	-0.029168	0.070222	0.000397	-0.405293	-0.256887	0.168106
2664	-0.668068	-0.321161	-0.230059	-0.029168	0.224540	0.000397	-0.015444	0.105318	0.102411

Getting the feature importance (to be integrated in the skorecard class)¶

In order to talk of feature importance, we should consider both the coefficients and the IV of the single feature. The importance cab be approximated as the product of the two numbers.

from skorecard.reporting import iv

X_train_bins = scorecard.bucket_transform(X_train)
iv_dict = iv(X_train_bins, y_train)

iv_values = pd.Series(iv_dict).sort_values(ascending=False)
iv_values.name = "IV"

feat_importance = model_stats[["Coef."]].join(iv_values)
feat_importance["importance"] = -1.0 * feat_importance["Coef."] * feat_importance["IV"]
feat_importance.sort_values(by="importance", ascending=False)

	Coef.	IV	importance
x23	0.227794	0.002257	-0.000514
x8	0.269057	0.001924	-0.000518
x20	0.282166	0.001998	-0.000564
x18	0.241832	0.002503	-0.000605
x19	0.191910	0.003325	-0.000638
x10	0.339016	0.001917	-0.000650
x1	0.409354	0.002457	-0.001006
x3	0.441695	0.002968	-0.001311
x6	0.763946	0.002430	-0.001857
Const	-1.242598	NaN	NaN

Scaling the scores¶

The last step of building skorecard models is the rescaling of the predictions.
This is a very common practice within the Credit Risk domain, where scorecard models are widely used.

Rescaling scorecards has no impact on the model performance, but rather returns the predictions on an arbitrary scale (normally from 0-1000) which are more meaningful for risk managers and underwriters in a bank than probabilities.

The rescaling is a linear transfromation performed on the log-odds of the predicted probability $p$,

\[ log(\frac{1-p}{p}) \]

Where the odds are defined as:

\[ \frac{1-p}{p} \]

The reference for the linear transformation are commonly defined by the following values:

ref_score: reference score, that should match a given reference odds (ref_odds)
ref_odds: reference odds that should match a giver reference score
pdo: points to double the odds, number of points to add where the odds double.

An example: with the following settings:

ref_score = 400
ref_odds = 20
pdo = 25

A score of 400 corresponds to the odds 20:1 of being a "good client" (y=0). This means that the predicted probability for y=1 is in this case ~4.76%, which you can get by rearranging the equation for the odds, above.
When the score increases to 425, the odds double to 40:1 (predicted probability to be y=1 is ~2,43%).
When the score decreases to 375, the odds are reduced by a factor 2, ie, 10:1 (predicted probability to be y=1 is ~9,09%).

In skorecard, one can use the calibrate_to_master_scale function.

from skorecard.rescale import calibrate_to_master_scale

proba_train = pd.Series(proba_train, index=y_train.index).sort_values()  # sorting for visualization purposes
scores = calibrate_to_master_scale(proba_train, pdo=25, ref_score=400, ref_odds=20)

Visualize the score dependencies¶

fig, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, figsize=(8, 12), gridspec_kw={"hspace": 0})
ax1.plot(scores.values, proba_train.values)

ax1.set_ylabel("Predicted probability")
ax1.set_title("Rescaled scores and probabilities")
ax1.grid()


ax2.plot(scores.values, proba_train.apply(lambda x: (1 - x) / x).values)
ax2.set_ylabel("Odds")
ax2.grid()


ax3.plot(
    scores.values,
    proba_train.apply(lambda x: np.log(1 - x) - np.log(x)).values,
)
ax3.set_ylabel("log-odds")
ax3.grid()
ax3.set_xlabel("Rescaled scores")

plt.show()

No description has been provided for this image

Assigning points to every feature¶

The last step of a scorecard development is to convert all the features into the rescaled model.

A scorecard model is a logisitic regression fitted on the WoE values of every single bucketed feature.
In other words, the following equations holds:

\[ log(odds) = log(\frac{1-p}{p}) = \beta_{0} + \sum_{i} \beta_{i} \cdot WOE(X_{i}) \]

As the rescaling performed earlier is linear in the predicted log-odds, this means that the every feature-bucket contribution can be rescaled to an integer value (by rescaling directly the $ \beta_{i} \cdot WOE(X_{i})$ factors with the same calculations.

This returns the final scorecard, that can be easily implemented.

The functionality in skorecard to rescale the features is as follows

from skorecard.rescale import ScoreCardPoints

# ensure that pdo, ref_score and ref_odds are consistent
scp = ScoreCardPoints(skorecard_model=scorecard, pdo=25, ref_score=400, ref_odds=20)

one can extract the final scorecard as follows

scp.get_scorecard_points()

	bin_index	map	woe	feature	coef	contribution	Points
0	-2	NaN	0.000000	x6	0.763946	0.000000	37
1	-1	Missing	0.000000	x6	0.763946	0.000000	37
3	1	[-0.5, 0.5)	-0.668068	x6	0.763946	-0.510368	56
4	2	[0.5, 1.5)	-0.419221	x6	0.763946	-0.320262	49
5	3	[1.5, inf)	0.586085	x6	0.763946	0.447737	21
6	4	NaN	2.095463	x6	0.763946	1.600820	-20
7	-2	NaN	0.000000	x8	0.269057	0.000000	37
8	-1	Missing	0.000000	x8	0.269057	0.000000	37
10	1	[1.5, inf)	-0.321161	x8	0.269057	-0.086411	41
11	2	NaN	1.392671	x8	0.269057	0.374708	24
12	-2	NaN	0.000000	x10	0.339016	0.000000	37
13	-1	Missing	0.000000	x10	0.339016	0.000000	37
15	1	[1.0, inf)	-0.230059	x10	0.339016	-0.077994	40
16	2	NaN	1.491617	x10	0.339016	0.505681	19
17	-2	NaN	0.000000	x18	0.241832	0.000000	37
18	-1	Missing	0.000000	x18	0.241832	0.000000	37
20	1	[21.0, 4552.5)	-0.029168	x18	0.241832	-0.007054	38
21	2	[4552.5, 15001.5)	0.692001	x18	0.241832	0.167348	31
22	3	[15001.5, inf)	-0.457263	x18	0.241832	-0.110581	41
23	4	NaN	-0.860845	x18	0.241832	-0.208180	45
25	-2	NaN	0.000000	x1	0.409354	0.000000	37
26	-1	Missing	0.000000	x1	0.409354	0.000000	37
28	1	[75000.0, 145000.0)	-0.353564	x1	0.409354	-0.144733	43
29	2	[145000.0, 375000.0)	0.610738	x1	0.409354	0.250008	28
30	3	[375000.0, inf)	0.070222	x1	0.409354	0.028746	36
31	4	NaN	-0.766316	x1	0.409354	-0.313695	49
32	5	NaN	0.224540	x1	0.409354	0.091916	34
33	-2	NaN	0.000000	x19	0.191910	0.000000	37
34	-1	Missing	0.000000	x19	0.191910	0.000000	37
36	1	[131.5, 4970.5)	0.000397	x19	0.191910	0.000076	37
37	2	[4970.5, 15001.0)	0.576300	x19	0.191910	0.110598	33
38	3	[15001.0, inf)	-0.403868	x19	0.191910	-0.077506	40
39	4	NaN	-1.162523	x19	0.191910	-0.223100	45
40	-2	NaN	0.000000	x20	0.282166	0.000000	37
41	-1	Missing	0.000000	x20	0.282166	0.000000	37
43	1	[16.5, 4513.5)	-0.015444	x20	0.282166	-0.004358	38
44	2	[4513.5, 12490.5)	0.509437	x20	0.282166	0.143746	32
45	3	[12490.5, inf)	-0.405293	x20	0.282166	-0.114360	42
46	4	NaN	-0.832020	x20	0.282166	-0.234768	46
47	-2	NaN	0.000000	x23	0.227794	0.000000	37
48	-1	Missing	0.000000	x23	0.227794	0.000000	37
50	1	[1.5, 2000.5)	-0.256887	x23	0.227794	-0.058517	40
51	2	[2000.5, 9849.5)	0.105318	x23	0.227794	0.023991	37
52	3	[9849.5, inf)	0.350867	x23	0.227794	0.079925	35
53	4	NaN	-0.701982	x23	0.227794	-0.159907	43
54	-2	Other	0.000000	x3	0.441695	0.000000	37
55	-1	Missing	0.000000	x3	0.441695	0.000000	37
57	1	1.0	0.168106	x3	0.441695	0.074252	35
58	2	2.0	0.102411	x3	0.441695	0.045235	36
59	3	NaN	-0.192612	x3	0.441695	-0.085076	40
60	4	NaN	-1.207590	x3	0.441695	-0.533387	57
61	0	0	0.000000	Intercept	-1.242598	-0.000000	0

Or one can apply the transformation directly on the data, by calling the transform method, in order to map each feature to its actual points.

Validate the rescaling¶

As the last step, in order to ensure that the rescaling was successfull, one can verify that the sum of the points of each row in the dataset matches the rescaled scores.
The rescaling steps has some integer rounding, therefore small discrepancies of 1-2 points might occur due to the rounding error

proba_train = pd.Series(
    proba_train, index=y_train.index
)  # convert to pandas and correct index in order to be able to perform the diff
scores = calibrate_to_master_scale(proba_train, pdo=25, ref_score=400, ref_odds=20)

# Check the distribution of the differences
(scores - scp.transform(X_train).sum(axis=1)).value_counts()

 111.0    878
 105.0    708
 146.0    560
 125.0    516
 130.0    442
         ... 
 284.0      1
 292.0      1
-54.0       1
 261.0      1
-80.0       1
Length: 373, dtype: int64