Custom Scoring Metrics¶
In many features of probatus, the user can provide the scoring
parameter. The parameter can be one of the following:
- String indicating the scoring metric, one of the classification scorers names in sklearn.
- Object of a class Scorer from probatus.utils.Scorer. This object encapsulates the scoring metric name and the scorer used to calculate the model performance.
The following tutorial will present how the scoring
parameter can be used on the example of a Resemblance Model.
Setup¶
Let's prepare some data:
In [ ]:
Copied!
%%capture
!pip install probatus
%%capture
!pip install probatus
In [1]:
Copied!
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer
from probatus.sample_similarity import SHAPImportanceResemblance
from probatus.utils import Scorer
# Prepare two samples
feature_names = ["f1", "f2", "f3", "f4"]
X1 = pd.DataFrame(make_classification(n_samples=1000, n_features=4, random_state=0)[0], columns=feature_names)
X2 = pd.DataFrame(
make_classification(n_samples=1000, n_features=4, shift=0.5, random_state=0)[0], columns=feature_names
)
# Prepare model
model = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer
from probatus.sample_similarity import SHAPImportanceResemblance
from probatus.utils import Scorer
# Prepare two samples
feature_names = ["f1", "f2", "f3", "f4"]
X1 = pd.DataFrame(make_classification(n_samples=1000, n_features=4, random_state=0)[0], columns=feature_names)
X2 = pd.DataFrame(
make_classification(n_samples=1000, n_features=4, shift=0.5, random_state=0)[0], columns=feature_names
)
# Prepare model
model = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
Standard metrics¶
Now, we can set the scoring
parameter as a string:
In [2]:
Copied!
rm = SHAPImportanceResemblance(model, scoring="accuracy")
feature_importance, train_score, test_score = rm.fit_compute(X1, X2, column_names=feature_names, return_scores=True)
print(f"Train Accuracy: {np.round(train_score, 3)},\n" f"Test Accuracy: {np.round(test_score, 3)}.")
rm = SHAPImportanceResemblance(model, scoring="accuracy")
feature_importance, train_score, test_score = rm.fit_compute(X1, X2, column_names=feature_names, return_scores=True)
print(f"Train Accuracy: {np.round(train_score, 3)},\n" f"Test Accuracy: {np.round(test_score, 3)}.")
Train Accuracy: 0.708, Test Accuracy: 0.714.
Custom metric¶
Let's make a custom function (in this case accuracy as well), that we want to use for scoring and use it within ShapImportanceResemblance
In [3]:
Copied!
def custom_metric(y_true, y_pred):
return np.sum(y_true == y_pred) / len(y_true)
scorer = Scorer("custom_metric", custom_scorer=make_scorer(custom_metric))
rm2 = SHAPImportanceResemblance(model, scoring=scorer)
feature_importance2, train_score2, test_score2 = rm2.fit_compute(X1, X2, column_names=feature_names, return_scores=True)
print(f"Train custom_metric: {np.round(train_score2, 3)},\n" f"Test custom_metric: {np.round(test_score2, 3)}.")
def custom_metric(y_true, y_pred):
return np.sum(y_true == y_pred) / len(y_true)
scorer = Scorer("custom_metric", custom_scorer=make_scorer(custom_metric))
rm2 = SHAPImportanceResemblance(model, scoring=scorer)
feature_importance2, train_score2, test_score2 = rm2.fit_compute(X1, X2, column_names=feature_names, return_scores=True)
print(f"Train custom_metric: {np.round(train_score2, 3)},\n" f"Test custom_metric: {np.round(test_score2, 3)}.")
Train custom_metric: 0.725, Test custom_metric: 0.72.
In [4]:
Copied!
figure = rm2.plot()
figure = rm2.plot()