Skip to content

Datasets

Loads the UCI Credit Card Dataset.

This dataset contains a sample of Default of Credit Card Clients Dataset.

Example:

from skorecard import datasets
df = datasets.load_uci_credit_card(as_frame=True)

Parameters:

Name Type Description Default
return_X_y bool

If True, returns (data, target) instead of a dict object.

False
as_frame bool

give the pandas dataframe instead of X, y matrices (default=False).

False

(pd.DataFrame, dict or tuple) features and target, with as follows:

Type Description
  • if as_frame is True: returns pd.DataFrame with y as a target
  • return_X_y is True: returns a tuple: (X,y)
  • is both are false (default setting): returns a dictionary where the key data contains the features,

and the key target is the target

Source code in skorecard/datasets.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def load_uci_credit_card(return_X_y=False, as_frame=False):
    """Loads the UCI Credit Card Dataset.

    This dataset contains a sample of [Default of Credit Card Clients Dataset](https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset).

    Example:

    ```python
    from skorecard import datasets
    df = datasets.load_uci_credit_card(as_frame=True)
    ```

    Args:
        return_X_y (bool): If True, returns `(data, target)` instead of a dict object.
        as_frame (bool): give the pandas dataframe instead of X, y matrices (default=False).

    Returns: (pd.DataFrame, dict or tuple) features and target, with as follows:
        - if as_frame is True: returns pd.DataFrame with y as a target
        - return_X_y is True: returns a tuple: (X,y)
        - is both are false (default setting): returns a dictionary where the key `data` contains the features,
        and the key `target` is the target

    """  # noqa
    file = pkgutil.get_data("skorecard", "data/UCI_Credit_Card.zip")
    df = pd.read_csv(io.BytesIO(file), compression="zip")
    df = df.rename(columns={"default.payment.next.month": "default"})
    if as_frame:
        return df[["EDUCATION", "MARRIAGE", "LIMIT_BAL", "BILL_AMT1", "default"]]
    X, y = (
        df[["EDUCATION", "MARRIAGE", "LIMIT_BAL", "BILL_AMT1"]],
        df["default"].values,
    )
    if return_X_y:
        return X, y

    return {"data": X, "target": y}