AsIsNumericalBucketer

Bases: BaseBucketer

The AsIsNumericalBucketer transformer creates buckets by treating the existing unique values as boundaries.

Support:

This is bucketer is useful when you have data that is already sufficiented bucketed, but you would like to be able to bucket new data in the same way.

Example:

from skorecard import datasets
from skorecard.bucketers import AsIsNumericalBucketer

X, y = datasets.load_uci_credit_card(return_X_y=True)
bucketer = AsIsNumericalBucketer(variables=['LIMIT_BAL'])
bucketer.fit_transform(X)

Source code in skorecard/bucketers/bucketers.py

class AsIsNumericalBucketer(BaseBucketer):
    """
    The `AsIsNumericalBucketer` transformer creates buckets by treating the existing unique values as boundaries.

    Support: ![badge](https://img.shields.io/badge/numerical-true-green) ![badge](https://img.shields.io/badge/categorical-false-red) ![badge](https://img.shields.io/badge/supervised-false-blue)

    This is bucketer is useful when you have data that is already sufficiented bucketed,
    but you would like to be able to bucket new data in the same way.

    Example:

    ```python
    from skorecard import datasets
    from skorecard.bucketers import AsIsNumericalBucketer

    X, y = datasets.load_uci_credit_card(return_X_y=True)
    bucketer = AsIsNumericalBucketer(variables=['LIMIT_BAL'])
    bucketer.fit_transform(X)
    ```
    """  # noqa

    def __init__(
        self,
        right=True,
        variables=[],
        specials={},
        missing_treatment="separate",
        remainder="passthrough",
        get_statistics=True,
    ):
        """
        Init the class.

        Args:
            right (boolean): Is the right value included in a range (default) or is 'up to not but including'.
                For example, if you have [5, 10], the ranges for right=True would be (-Inf, 5], (5, 10], (10, Inf]
                or [-Inf, 5), [5, 10), [10, Inf) for right=False
            variables (list): The features to bucket. Uses all features if not defined.
            specials (dict): (nested) dictionary of special values that require their own binning.
                The dictionary has the following format:
                 {"<column name>" : {"name of special bucket" : <list with 1 or more values>}}
                For every feature that needs a special value, a dictionary must be passed as value.
                This dictionary contains a name of a bucket (key) and an array of unique values that should be put
                in that bucket.
                When special values are defined, they are not considered in the fitting procedure.
            missing_treatment (str or dict): Defines how we treat the missing values present in the data.
                If a string, it must be one of the following options:
                    separate: Missing values get put in a separate 'Other' bucket: `-1`
                    most_risky: Missing values are put into the bucket containing the largest percentage of Class 1.
                    least_risky: Missing values are put into the bucket containing the largest percentage of Class 0.
                    most_frequent: Missing values are put into the most common bucket.
                    neutral: Missing values are put into the bucket with WoE closest to 0.
                    similar: Missing values are put into the bucket with WoE closest to the bucket with only missing values.
                    passthrough: Leaves missing values untouched.
                If a dict, it must be of the following format:
                    {"<column name>": <bucket_number>}
                    This bucket number is where we will put the missing values..
            remainder (str): How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"].
                passthrough (Default): all columns that were not specified in "variables" will be passed through.
                drop: all remaining columns that were not specified in "variables" will be dropped.
        """  # noqa
        self.right = right
        self.variables = variables
        self.specials = specials
        self.missing_treatment = missing_treatment
        self.remainder = remainder
        self.get_statistics = get_statistics

    @property
    def variables_type(self):
        """
        Signals variables type supported by this bucketer.
        """
        return "numerical"

    def _get_feature_splits(self, feature, X, y, X_unfiltered=None):
        """
        Finds the splits for a single feature.

        X and y have already been preprocessed, and have specials removed.

        Args:
            feature (str): Name of the feature.
            X (pd.Series): df with single column of feature to bucket
            y (np.ndarray): array with target
            X_unfiltered (pd.Series): df with single column of feature to bucket before any filtering was applied

        Returns:
            splits, right (tuple): The splits (dict or array), and whether right=True or False.
        """
        boundaries = X.unique().tolist()
        boundaries.sort()

        if len(boundaries) > 100:
            msg = f"The column '{feature}' has more than 100 unique values "
            msg += "and cannot be used with the AsIsBucketer."
            msg += "Apply a different bucketer first."
            raise NotPreBucketedError(msg)

        return (boundaries, self.right)

`variables_type` `property` ¶

Signals variables type supported by this bucketer.

`init(right=True, variables=[], specials={}, missing_treatment='separate', remainder='passthrough', get_statistics=True)` ¶

Init the class.

Parameters:

Name	Type	Description	Default
`right`	`boolean`	Is the right value included in a range (default) or is 'up to not but including'. For example, if you have [5, 10], the ranges for right=True would be (-Inf, 5], (5, 10], (10, Inf] or [-Inf, 5), [5, 10), [10, Inf) for right=False	`True`
`variables`	`list`	The features to bucket. Uses all features if not defined.	`[]`
`specials`	`dict`	(nested) dictionary of special values that require their own binning. The dictionary has the following format: {"" : {"name of special bucket" : }} For every feature that needs a special value, a dictionary must be passed as value. This dictionary contains a name of a bucket (key) and an array of unique values that should be put in that bucket. When special values are defined, they are not considered in the fitting procedure.	`{}`
`missing_treatment`	`str or dict`	Defines how we treat the missing values present in the data. If a string, it must be one of the following options: separate: Missing values get put in a separate 'Other' bucket: `-1` most_risky: Missing values are put into the bucket containing the largest percentage of Class 1. least_risky: Missing values are put into the bucket containing the largest percentage of Class 0. most_frequent: Missing values are put into the most common bucket. neutral: Missing values are put into the bucket with WoE closest to 0. similar: Missing values are put into the bucket with WoE closest to the bucket with only missing values. passthrough: Leaves missing values untouched. If a dict, it must be of the following format: {"": } This bucket number is where we will put the missing values..	`'separate'`
`remainder`	`str`	How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"]. passthrough (Default): all columns that were not specified in "variables" will be passed through. drop: all remaining columns that were not specified in "variables" will be dropped.	`'passthrough'`

Source code in skorecard/bucketers/bucketers.py

def __init__(
    self,
    right=True,
    variables=[],
    specials={},
    missing_treatment="separate",
    remainder="passthrough",
    get_statistics=True,
):
    """
    Init the class.

    Args:
        right (boolean): Is the right value included in a range (default) or is 'up to not but including'.
            For example, if you have [5, 10], the ranges for right=True would be (-Inf, 5], (5, 10], (10, Inf]
            or [-Inf, 5), [5, 10), [10, Inf) for right=False
        variables (list): The features to bucket. Uses all features if not defined.
        specials (dict): (nested) dictionary of special values that require their own binning.
            The dictionary has the following format:
             {"<column name>" : {"name of special bucket" : <list with 1 or more values>}}
            For every feature that needs a special value, a dictionary must be passed as value.
            This dictionary contains a name of a bucket (key) and an array of unique values that should be put
            in that bucket.
            When special values are defined, they are not considered in the fitting procedure.
        missing_treatment (str or dict): Defines how we treat the missing values present in the data.
            If a string, it must be one of the following options:
                separate: Missing values get put in a separate 'Other' bucket: `-1`
                most_risky: Missing values are put into the bucket containing the largest percentage of Class 1.
                least_risky: Missing values are put into the bucket containing the largest percentage of Class 0.
                most_frequent: Missing values are put into the most common bucket.
                neutral: Missing values are put into the bucket with WoE closest to 0.
                similar: Missing values are put into the bucket with WoE closest to the bucket with only missing values.
                passthrough: Leaves missing values untouched.
            If a dict, it must be of the following format:
                {"<column name>": <bucket_number>}
                This bucket number is where we will put the missing values..
        remainder (str): How we want the non-specified columns to be transformed. It must be in ["passthrough", "drop"].
            passthrough (Default): all columns that were not specified in "variables" will be passed through.
            drop: all remaining columns that were not specified in "variables" will be dropped.
    """  # noqa
    self.right = right
    self.variables = variables
    self.specials = specials
    self.missing_treatment = missing_treatment
    self.remainder = remainder
    self.get_statistics = get_statistics

AsIsNumericalBucketer

variables_type property ¶

__init__(right=True, variables=[], specials={}, missing_treatment='separate', remainder='passthrough', get_statistics=True) ¶

`variables_type` `property` ¶

`init(right=True, variables=[], specials={}, missing_treatment='separate', remainder='passthrough', get_statistics=True)` ¶