Skip to content

ordeq_huggingface

HuggingfaceDataset dataclass

Bases: Input[Dataset | DatasetDict | IterableDatasetDict | IterableDataset]

Load a dataset from the Huggingface datasets library.

Example usage:

1
2
3
4
>>> from ordeq_huggingface import HuggingfaceDataset
>>> ds = HuggingfaceDataset(path="imdb")
>>> data = ds.load(split="train[:10%]")  # doctest: +SKIP
>>> len(data)  # doctest: +SKIP

HuggingfaceDiskDataset dataclass

Bases: IO[Dataset | DatasetDict]

Load and save a dataset from/to disk using the Huggingface datasets library.

Example usage:

1
2
3
>>> from ordeq_huggingface import HuggingfaceDiskDataset
>>> ds = HuggingfaceDiskDataset(path="path/to/dataset")  # doctest: +SKIP
>>> data = ds.load()  # doctest: +SKIP