Skip to content

ordeq_files

Bytes dataclass

Bases: IO[bytes]

IO representing bytes.

Example:

>>> from ordeq_files import Bytes
>>> from pathlib import Path
>>> my_png = Bytes(
...     path=Path("path/to.png")
... )

Bz2 dataclass

Bases: IO[bytes]

IO representing a bzip2-compressed file.

Example usage:

>>> from ordeq_files import Bz2
>>> from pathlib import Path
>>> my_bz2 = Bz2(
...     path=Path("path/to.bz2")
... )

CSV dataclass

Bases: IO[Iterable[Iterable[Any]]]

IO representing a CSV file.

Example usage:

>>> from ordeq_files import CSV
>>> from pathlib import Path
>>> computer_sales = CSV(
...     path=Path("path/to/computer_sales.csv")
... )

Example in a node:

>>> from ordeq import node
>>> computer_sales_in_nl = CSV(path=Path("computer_sales_nl.csv"))
>>> @node(
...     inputs=computer_sales,
...     outputs=computer_sales_in_nl
... )
... def filter_computer_sales(computer_sales: list) -> list:
...     return [row for row in computer_sales if row[1] == "NL"]

Example with a node generator:

>>> from pathlib import Path
>>> from ordeq import node
>>> from ordeq_files import CSV
>>> @node(outputs=CSV(path=Path("output.csv")))
... def generator():
...     yield ["constant", "idx"]
...     for idx in range(100):
...         yield [1, idx]

>>> if __name__ == "__main__":
...     from ordeq import run
...     run(generator)

Loading and saving can be configured with additional parameters, e.g:

>>> computer_sales.load(quotechar='"', delimiter=',')  # doctest: +SKIP
>>> computer_sales.with_load_options(dialect='excel').load()  # doctest: +SKIP
>>> data = [["NL", "2023-10-01", 1000], ["BE", "2023-10-02", 1500]]
>>> computer_sales.save(data, quoting=csv.QUOTE_MINIMAL)  # doctest: +SKIP

Refer to 1 for more details on the available options.

Glob dataclass

Bases: Input[Generator[PathLike, None, None]]

IO class that loads all paths provided a pattern. Although this class can be used as dataset in your nodes, for most cases it would be more suitable to inherit from this class and extend the load method, for example:

>>> class LoadPartitions(Glob):
...     def load(self):
...         paths = super().load()
...         for path in paths:
...             yield my_load_func(path)

Gzip dataclass

Bases: IO[bytes]

IO representing a gzip-compressed file.

Example usage:

>>> from ordeq_files import Gzip
>>> from pathlib import Path
>>> my_gzip = Gzip(
...     path=Path("path/to.gz")
... )

JSON dataclass

Bases: IO[dict[str, Any]]

IO representing a JSON.

Example usage:

>>> from ordeq_files import JSON
>>> from pathlib import Path
>>> my_json = JSON(
...     path=Path("path/to.json")
... )

Pickle dataclass

Bases: IO[T]

IO that loads and saves a Pickle files.

Example usage:

>>> from ordeq_files import Pickle
>>> from pathlib import Path
>>> my_pickle = Pickle(
...     path=Path("path/to.pkl")
... )

Text dataclass

Bases: IO[str]

IO representing a plain-text file.

Examples:

>>> from ordeq_files import Text
>>> from pathlib import Path
>>> my_text = Text(
...     path=Path("path/to.txt")
... )

TextLinesStream dataclass

Bases: IO[Generator[str]]

IO representing a file stream as a generator of lines.

Useful for processing large files line-by-line.

By default, lines are separated by newline characters during load.

When saving, the newline character is appended to each line by default, this can be changed by providing a different end argument to the save method using with_save_options.

Examples:

>>> from ordeq_files import TextLinesStream
>>> from pathlib import Path
>>> my_file = TextLinesStream(
...     path=Path("path/to.txt")
... )

>>> my_file_no_endline = TextLinesStream(
...     path=Path("path/to.txt")
... ).with_save_options(end="")

persist(data)

Don't persist since is a stream-based IO.