Skip to content

LogisticRegression

Extended Logistic Regression.

Extends sklearn.linear_model.LogisticRegression.

This class provides the following extra statistics, calculated on .fit() and accessible via .get_stats():

  • cov_matrix_: covariance matrix for the estimated parameters.
  • std_err_intercept_: estimated uncertainty for the intercept
  • std_err_coef_: estimated uncertainty for the coefficients
  • z_intercept_: estimated z-statistic for the intercept
  • z_coef_: estimated z-statistic for the coefficients
  • p_value_intercept_: estimated p-value for the intercept
  • p_value_coef_: estimated p-value for the coefficients

Examples:

from skorecard.datasets import load_uci_credit_card
from skorecard.bucketers import EqualFrequencyBucketer
from skorecard.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

X, y = load_uci_credit_card(return_X_y=True)

pipeline = Pipeline([
    ('bucketer', EqualFrequencyBucketer(n_bins=10)),
    ('clf', LogisticRegression(calculate_stats=True))
])
pipeline.fit(X, y)
assert pipeline.named_steps['clf'].p_val_coef_[0][0] > 0

pipeline.named_steps['clf'].get_stats()

An example output of .get_stats():

Index Coef. Std.Err z Pz
const -0.537571 0.096108 -5.593394 2.226735e-08
EDUCATION 0.010091 0.044874 0.224876 8.220757e-01

__init__(self, penalty='l2', calculate_stats=False, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None) special

Extends sklearn.linear_model.LogisticRegression.fit().

Parameters:

Name Type Description Default
calculate_stats bool

If true, calculate statistics like standard error during fit, accessible with .get_stats()

False

decision_function(self, X) inherited

Predict confidence scores for samples.

The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane.

Parameters

X : array-like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

densify(self) inherited

Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns

self Fitted estimator.

fit(self, X, y, sample_weight=None, calculate_stats=False, **kwargs)

Fit the model.

Overwrites sklearn.linear_model.LogisticRegression.fit().

In addition to the standard fit by sklearn, this function will compute the covariance of the coefficients.

Parameters:

Name Type Description Default
X array-like, sparse matrix

Matrix of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

required
y array-like

of shape (n_samples,) Target vector relative to X.

required
sample_weight array-like

of shape (n_samples,) default=None Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

None
calculate_stats bool

If true, calculate statistics like standard error during fit, accessible with .get_stats()

False

Returns:

Type Description
self (LogisticRegression)

Fitted estimator.

get_params(self, deep=True) inherited

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict Parameter names mapped to their values.

get_stats(self)

Puts the summary statistics of the fit() function into a pandas DataFrame.

Returns:

Type Description
data (pandas DataFrame)

The statistics dataframe, indexed by the column name

plot_weights(self)

Plots the relative importance of coefficients of the model.

Examples:

from skorecard.datasets import load_uci_credit_card from skorecard.bucketers import EqualFrequencyBucketer from skorecard.linear_model import LogisticRegression from skorecard.reporting.plotting import weight_plot from sklearn.pipeline import Pipeline from sklearn.preprocessing import OneHotEncoder X, y = load_uci_credit_card(return_X_y=True) pipeline = Pipeline([ ('bucketer', EqualFrequencyBucketer(n_bins=10)), ('clf', LogisticRegression(calculate_stats=True)) ]) pipeline.fit(X, y) assert pipeline.named_steps['clf'].p_val_coef_[0][0] > 0 stats = pipeline.named_steps['clf'].get_stats() pipeline.named_steps['clf'].plot_weights()

predict(self, X) inherited

Predict class labels for samples in X.

Parameters

X : array-like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

C : array, shape [n_samples] Predicted class label per sample.

predict_log_proba(self, X) inherited

Predict logarithm of probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

X : array-like of shape (n_samples, n_features) Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

T : array-like of shape (n_samples, n_classes) Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

predict_proba(self, X) inherited

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be "multinomial" the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.

Parameters

X : array-like of shape (n_samples, n_features) Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

T : array-like of shape (n_samples, n_classes) Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

score(self, X, y, sample_weight=None) inherited

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X : array-like of shape (n_samples, n_features) Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.

sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

score : float Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params) inherited

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as :class:~sklearn.pipeline.Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : estimator instance Estimator instance.

sparsify(self) inherited

Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation.

The intercept_ member is not converted.

Returns

self Fitted estimator.

Notes

For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits.

After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.


Last update: 2021-11-24
Back to top