Reporting¶
Reporting plays a crucial role in building scorecard models.
Skorecard bucketers include a reporting module, and this tutorial shows how to extract it
%matplotlib inline
from skorecard.datasets import load_uci_credit_card
X, y = load_uci_credit_card(return_X_y=True)
X.head(4)
Reporting in bucketers¶
Once a bucketer is fitted, the reporting module is incorporated directly in the bucketer object
from skorecard.bucketers import DecisionTreeBucketer
bucketer = DecisionTreeBucketer(max_n_bins=10)
X_transformed = bucketer.fit_transform(X, y)
X_transformed.head(4)
Retrieve the bucket summary table
bucketer.bucket_table(column="LIMIT_BAL")
Plotting the buckets
bucketer.plot_bucket(column="LIMIT_BAL", format="png", scale=2, width=1050, height=525)
Statistical Significance¶
You will often want to report and analyse the statistical significance of the coefficients generated by the Logistic Regression Model. We do this by calculating the p-values
of the coefficients. Typically, any coefficient with p-value > 0.05
is regarded as insignificant, and hence should not be reported as a contributing feature.
Below, we show an example of how to get the summary statistics including the p-values
using the .get_stats()
function. As can be seen from the resulting dataframe, there are 2 features - EDUCATION and BILL_AMT1 - with "unreliable" p-values
.
The coefficients can be further analysed using the weight_plot()
function. The 2-sigma confidence interval is plotted. Assuming a Gaussian distribution, 95% of data exists within this spread. The plot corroboartes the p-values
: we can see that there is a significant chance the coefficients of EDUCATION and BILL_AMT1 are 0.
from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import EqualFrequencyBucketer
from skorecard.linear_model import LogisticRegression
from skorecard.reporting import weight_plot
from sklearn.pipeline import Pipeline
X, y = load_uci_credit_card(return_X_y=True)
pipeline = Pipeline(
[("bucketer", EqualFrequencyBucketer(n_bins=10)), ("clf", LogisticRegression(calculate_stats=True))]
)
pipeline.fit(X, y)
stats = pipeline.named_steps["clf"].get_stats()
stats
weight_plot(stats, format="png", scale=2, width=1050, height=525)
Reporting in Bucketing Process¶
The Bucketing Process module incorporates two bucketing steps: - the prebucketing step - bucketing step
Let's first fit a bucketing process step
from skorecard import datasets
from skorecard.bucketers import DecisionTreeBucketer, OptimalBucketer, AsIsCategoricalBucketer
from skorecard.pipeline import BucketingProcess
from sklearn.pipeline import make_pipeline
df = datasets.load_uci_credit_card(as_frame=True)
y = df["default"]
X = df.drop(columns=["default"])
num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]
bucketing_process = BucketingProcess(
prebucketing_pipeline=make_pipeline(
DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05),
AsIsCategoricalBucketer(variables=cat_cols),
),
bucketing_pipeline=make_pipeline(
OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
OptimalBucketer(variables=cat_cols, variables_type="categorical", max_n_bins=10, min_bin_size=0.05),
),
)
_ = bucketing_process.fit(X, y)
Prebucketing step¶
Retrieve the bucketing report of the prebucketing step by calling the prebucket_table
.
In addition to the statstics, the prebucket_table returns also the recommended bucket for the merging.
bucketing_process.prebucket_table("LIMIT_BAL")
Visualizing the bucketing
bucketing_process.plot_prebucket("LIMIT_BAL", format="png", scale=2, width=1050, height=525)
Bucketing step¶
Retreving the bucketing table from the second step is the same like in every bucketer, ie
bucketing_process.bucket_table("LIMIT_BAL")
and the same applies to plotting the bucketing step
bucketing_process.plot_bucket("LIMIT_BAL", format="png", scale=2, width=1050, height=525)