Skip to content

Reporting

Reporting plays a crucial role in building scorecard models.

Skorecard bucketers include a reporting module, and this tutorial shows how to extract it

%matplotlib inline
from skorecard.datasets import load_uci_credit_card

X, y = load_uci_credit_card(return_X_y=True)
X.head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 1 2 400000.0 201800.0
1 2 2 80000.0 80610.0
2 1 2 500000.0 499452.0
3 1 1 140000.0 450.0

Reporting in bucketers

Once a bucketer is fitted, the reporting module is incorporated directly in the bucketer object

from skorecard.bucketers import DecisionTreeBucketer

bucketer = DecisionTreeBucketer(max_n_bins=10)
X_transformed = bucketer.fit_transform(X, y)
X_transformed.head(4)
EDUCATION MARRIAGE LIMIT_BAL BILL_AMT1
0 0 1 9 9
1 1 1 3 7
2 0 1 9 9
3 0 0 5 0

Retrieve the bucket summary table

bucketer.bucket_table(column="LIMIT_BAL")
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
1 0 [-inf, 45000.0) 849.0 14.15 533.0 316.0 0.372203 -0.719 0.087
2 1 [45000.0, 55000.0) 676.0 11.27 518.0 158.0 0.233728 -0.054 0.000
3 2 [55000.0, 75000.0) 336.0 5.60 233.0 103.0 0.306548 -0.425 0.011
4 3 [75000.0, 85000.0) 319.0 5.32 243.0 76.0 0.238245 -0.079 0.000
5 4 [85000.0, 105000.0) 330.0 5.50 241.0 89.0 0.269697 -0.245 0.004
6 5 [105000.0, 145000.0) 566.0 9.43 436.0 130.0 0.229682 -0.031 0.000
7 6 [145000.0, 275000.0) 1719.0 28.65 1429.0 290.0 0.168703 0.353 0.032
8 7 [275000.0, 325000.0) 379.0 6.32 326.0 53.0 0.139842 0.575 0.018
9 8 [325000.0, 385000.0) 350.0 5.83 287.0 63.0 0.180000 0.275 0.004
10 9 [385000.0, inf) 476.0 7.93 409.0 67.0 0.140756 0.567 0.022

Plotting the buckets

bucketer.plot_bucket(column="LIMIT_BAL", format="png", scale=2, width=1050, height=525)
No description has been provided for this image

Statistical Significance

You will often want to report and analyse the statistical significance of the coefficients generated by the Logistic Regression Model. We do this by calculating the p-values of the coefficients. Typically, any coefficient with p-value > 0.05 is regarded as insignificant, and hence should not be reported as a contributing feature.

Below, we show an example of how to get the summary statistics including the p-values using the .get_stats() function. As can be seen from the resulting dataframe, there are 2 features - EDUCATION and BILL_AMT1 - with "unreliable" p-values.

The coefficients can be further analysed using the weight_plot() function. The 2-sigma confidence interval is plotted. Assuming a Gaussian distribution, 95% of data exists within this spread. The plot corroboartes the p-values: we can see that there is a significant chance the coefficients of EDUCATION and BILL_AMT1 are 0.

from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import EqualFrequencyBucketer
from skorecard.linear_model import LogisticRegression
from skorecard.reporting import weight_plot
from sklearn.pipeline import Pipeline

X, y = load_uci_credit_card(return_X_y=True)
pipeline = Pipeline(
    [("bucketer", EqualFrequencyBucketer(n_bins=10)), ("clf", LogisticRegression(calculate_stats=True))]
)
pipeline.fit(X, y)
stats = pipeline.named_steps["clf"].get_stats()

stats
/Users/iv58uq/Documents/open_source/skorecard/skorecard/bucketers/bucketers.py:502: ApproximationWarning:

Approximated quantiles - too many unique values

/Users/iv58uq/Documents/open_source/skorecard/skorecard/bucketers/bucketers.py:502: ApproximationWarning:

Approximated quantiles - too many unique values


Coef. Std.Err z P>|z|
const -0.537571 0.096108 -5.593394 2.226735e-08
EDUCATION 0.010091 0.044874 0.224876 8.220757e-01
MARRIAGE -0.255608 0.062513 -4.088864 4.334903e-05
LIMIT_BAL -0.136681 0.011587 -11.796145 4.086051e-32
BILL_AMT1 -0.006634 0.011454 -0.579160 5.624809e-01
weight_plot(stats, format="png", scale=2, width=1050, height=525)
No description has been provided for this image

Reporting in Bucketing Process

The Bucketing Process module incorporates two bucketing steps: - the prebucketing step - bucketing step

Let's first fit a bucketing process step

from skorecard import datasets
from skorecard.bucketers import DecisionTreeBucketer, OptimalBucketer, AsIsCategoricalBucketer
from skorecard.pipeline import BucketingProcess
from sklearn.pipeline import make_pipeline

df = datasets.load_uci_credit_card(as_frame=True)
y = df["default"]
X = df.drop(columns=["default"])

num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]

bucketing_process = BucketingProcess(
    prebucketing_pipeline=make_pipeline(
        DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05),
        AsIsCategoricalBucketer(variables=cat_cols),
    ),
    bucketing_pipeline=make_pipeline(
        OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
        OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
    ),
)

_ = bucketing_process.fit(X, y)

Prebucketing step

Retrieve the bucketing report of the prebucketing step by calling the prebucket_table.

In addition to the statstics, the prebucket_table returns also the recommended bucket for the merging.

bucketing_process.prebucket_table("LIMIT_BAL")
pre-bucket label Count Count (%) Non-event Event Event Rate WoE IV bucket
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000 -1
1 0 [-inf, 25000.0) 479.0 7.98 300.0 179.0 0.373695 -0.725 0.050 0
2 1 [25000.0, 45000.0) 370.0 6.17 233.0 137.0 0.370270 -0.710 0.037 1
3 2 [45000.0, 55000.0) 676.0 11.27 518.0 158.0 0.233728 -0.054 0.000 2
4 3 [55000.0, 75000.0) 336.0 5.60 233.0 103.0 0.306548 -0.425 0.011 2
5 4 [75000.0, 85000.0) 319.0 5.32 243.0 76.0 0.238245 -0.079 0.000 3
6 5 [85000.0, 105000.0) 330.0 5.50 241.0 89.0 0.269697 -0.245 0.004 3
7 6 [105000.0, 145000.0) 566.0 9.43 436.0 130.0 0.229682 -0.031 0.000 4
8 7 [145000.0, 175000.0) 449.0 7.48 380.0 69.0 0.153675 0.464 0.014 5
9 8 [175000.0, 225000.0) 769.0 12.82 630.0 139.0 0.180754 0.270 0.009 5
10 9 [225000.0, 275000.0) 501.0 8.35 419.0 82.0 0.163673 0.390 0.011 6
11 10 [275000.0, 325000.0) 379.0 6.32 326.0 53.0 0.139842 0.575 0.018 7
12 11 [325000.0, 385000.0) 350.0 5.83 287.0 63.0 0.180000 0.275 0.004 7
13 12 [385000.0, inf) 476.0 7.93 409.0 67.0 0.140756 0.567 0.022 8

Visualizing the bucketing

bucketing_process.plot_prebucket("LIMIT_BAL", format="png", scale=2, width=1050, height=525)
No description has been provided for this image

Bucketing step

Retreving the bucketing table from the second step is the same like in every bucketer, ie

bucketing_process.bucket_table("LIMIT_BAL")
bucket label Count Count (%) Non-event Event Event Rate WoE IV
0 -1 Missing 0.0 0.00 0.0 0.0 NaN 0.000 0.000
1 0 [-inf, 1.0) 479.0 7.98 300.0 179.0 0.373695 -0.725 0.050
2 1 [1.0, 2.0) 370.0 6.17 233.0 137.0 0.370270 -0.710 0.037
3 2 [2.0, 4.0) 1012.0 16.87 751.0 261.0 0.257905 -0.185 0.006
4 3 [4.0, 6.0) 649.0 10.82 484.0 165.0 0.254237 -0.165 0.003
5 4 [6.0, 7.0) 566.0 9.43 436.0 130.0 0.229682 -0.031 0.000
6 5 [7.0, 9.0) 1218.0 20.30 1010.0 208.0 0.170772 0.339 0.021
7 6 [9.0, 10.0) 501.0 8.35 419.0 82.0 0.163673 0.390 0.011
8 7 [10.0, 12.0) 729.0 12.15 613.0 116.0 0.159122 0.423 0.019
9 8 [12.0, inf) 476.0 7.93 409.0 67.0 0.140756 0.567 0.022

and the same applies to plotting the bucketing step

bucketing_process.plot_bucket("LIMIT_BAL", format="png", scale=2, width=1050, height=525)
No description has been provided for this image

Last update: 2023-08-08