Loads the UCI Credit Card Dataset.
This dataset contains a sample of Default of Credit Card Clients Dataset.
Example:
from skorecard import datasets
df = datasets.load_uci_credit_card(as_frame=True)
Parameters:
Name |
Type |
Description |
Default |
return_X_y |
bool
|
If True, returns (data, target) instead of a dict object.
|
False
|
as_frame |
bool
|
give the pandas dataframe instead of X, y matrices (default=False).
|
False
|
(pd.DataFrame, dict or tuple) features and target, with as follows:
Type |
Description |
|
- if as_frame is True: returns pd.DataFrame with y as a target
|
|
- return_X_y is True: returns a tuple: (X,y)
|
|
- is both are false (default setting): returns a dictionary where the key
data contains the features,
|
|
and the key target is the target
|
Source code in skorecard/datasets.py
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43 | def load_uci_credit_card(return_X_y=False, as_frame=False):
"""Loads the UCI Credit Card Dataset.
This dataset contains a sample of [Default of Credit Card Clients Dataset](https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset).
Example:
```python
from skorecard import datasets
df = datasets.load_uci_credit_card(as_frame=True)
```
Args:
return_X_y (bool): If True, returns `(data, target)` instead of a dict object.
as_frame (bool): give the pandas dataframe instead of X, y matrices (default=False).
Returns: (pd.DataFrame, dict or tuple) features and target, with as follows:
- if as_frame is True: returns pd.DataFrame with y as a target
- return_X_y is True: returns a tuple: (X,y)
- is both are false (default setting): returns a dictionary where the key `data` contains the features,
and the key `target` is the target
""" # noqa
file = pkgutil.get_data("skorecard", "data/UCI_Credit_Card.zip")
df = pd.read_csv(io.BytesIO(file), compression="zip")
df = df.rename(columns={"default.payment.next.month": "default"})
if as_frame:
return df[["EDUCATION", "MARRIAGE", "LIMIT_BAL", "BILL_AMT1", "default"]]
X, y = (
df[["EDUCATION", "MARRIAGE", "LIMIT_BAL", "BILL_AMT1"]],
df["default"].values,
)
if return_X_y:
return X, y
return {"data": X, "target": y}
|