Missing Values¶
skorecard
bucketers offer native support for missing values and will put them in a separate bucket by default.
In the example below, you can see that the single missing value is put into a new bucket '-1'.
import numpy as np
import pandas as pd
from skorecard.bucketers import EqualFrequencyBucketer
df = pd.DataFrame({"counts": [1, 2, 2, 1, 4, 2, np.nan, 1, 3]})
EqualFrequencyBucketer(n_bins=2).fit_transform(df).value_counts()
Specific¶
Alternatively, the user can give a specific bucket for the missing values.
In the example below, you can see we put the missing value into bucket 1
EqualFrequencyBucketer(n_bins=2, missing_treatment={"counts": 1}).fit_transform(df).value_counts()
Passthrough¶
If the user wishes the missing values to be left untouched, they can specify this with the passthrough
argument
EqualFrequencyBucketer(n_bins=2, missing_treatment="passthrough").fit_transform(df)
Most frequent¶
It's also possible to put the missing values into the most common bucket. Below, we see that the missing values are put into the '0' bucket
EqualFrequencyBucketer(n_bins=2, missing_treatment="most_frequent").fit_transform(df)
X = pd.DataFrame({"counts": [1, 2, 2, 1, 4, 2, np.nan, 1, 3]})
y = pd.DataFrame({"target": [0, 0, 1, 0, 1, 0, 1, 0, 1]})
EqualFrequencyBucketer(n_bins=2, missing_treatment="neutral").fit_transform(X, y)
Similar¶
We can also put the missing values into the bucket that has a Weight of Evidence closest to the bucket containing only missing values
EqualFrequencyBucketer(n_bins=2, missing_treatment="similar").fit_transform(X, y)
Least risky¶
Missing values are put into the bucket containing the largest percentage of Class 0.
a = EqualFrequencyBucketer(n_bins=2, missing_treatment="least_risky") # .fit_transform(X, y)
a.fit_transform(X, y)
EqualFrequencyBucketer(n_bins=2, missing_treatment="least_risky").fit_transform(X, y)
Most risky¶
Missing values are put into the bucket containing the largest percentage of Class 1.
EqualFrequencyBucketer(n_bins=2, missing_treatment="most_risky").fit_transform(X, y)