ordeq_files

`Bytes` `dataclass` ¶

Bases: IO[bytes]

IO representing bytes.

Example:

>>> from ordeq_files import Bytes
>>> from pathlib import Path
>>> my_png = Bytes(
...     path=Path("path/to.png")
... )

`Bz2` `dataclass` ¶

Bases: IO[bytes | str]

IO representing a bzip2-compressed file.

Example usage:

>>> from ordeq_files import Bz2
>>> from pathlib import Path
>>> my_bz2 = Bz2(
...     path=Path("path/to.bz2")
... )

`CSV` `dataclass` ¶

Bases: IO[Iterable[Iterable[Any]]]

IO representing a CSV file.

Example usage:

>>> from ordeq_files import CSV
>>> from pathlib import Path
>>> computer_sales = CSV(
...     path=Path("path/to/computer_sales.csv")
... )

Example in a node:

>>> from ordeq import node
>>> computer_sales_in_nl = CSV(path=Path("computer_sales_nl.csv"))
>>> @node(
...     inputs=computer_sales,
...     outputs=computer_sales_in_nl
... )
... def filter_computer_sales(computer_sales: list) -> list:
...     return [row for row in computer_sales if row[1] == "NL"]

Example with a node generator:

>>> from pathlib import Path
>>> from ordeq import node
>>> from ordeq_files import CSV
>>> @node(outputs=CSV(path=Path("output.csv")))
... def generator():
...     yield ["constant", "idx"]
...     for idx in range(100):
...         yield [1, idx]

>>> if __name__ == "__main__":
...     from ordeq import run
...     run(generator)

Loading and saving can be configured with additional parameters, e.g:

>>> computer_sales.load(quotechar='"', delimiter=',')  # doctest: +SKIP
>>> computer_sales.with_load_options(dialect='excel').load()  # doctest: +SKIP
>>> data = [["NL", "2023-10-01", 1000], ["BE", "2023-10-02", 1500]]
>>> computer_sales.save(data, quoting=csv.QUOTE_MINIMAL)  # doctest: +SKIP

Refer to 1 for more details on the available options.

`Glob` `dataclass` ¶

Bases: Input[Generator[PathLike, None, None]]

IO class that loads all paths provided a pattern. Although this class can be used as dataset in your nodes, for most cases it would be more suitable to inherit from this class and extend the load method, for example:

>>> class LoadPartitions(Glob):
...     def load(self):
...         paths = super().load()
...         for path in paths:
...             yield my_load_func(path)

`Gzip` `dataclass` ¶

Bases: IO[bytes | str]

IO representing a gzip-compressed file.

Example usage:

>>> from ordeq_files import Gzip
>>> from pathlib import Path
>>> my_gzip = Gzip(
...     path=Path("path/to.gz")
... )

`JSON` `dataclass` ¶

Bases: IO[dict[str, Any]]

IO representing a JSON.

Example usage:

>>> from ordeq_files import JSON
>>> from pathlib import Path
>>> my_json = JSON(
...     path=Path("path/to.json")
... )

`Pickle` `dataclass` ¶

Bases: IO[T]

IO that loads and saves a Pickle files.

Example usage:

>>> from ordeq_files import Pickle
>>> from pathlib import Path
>>> my_pickle = Pickle(
...     path=Path("path/to.pkl")
... )

`Text` `dataclass` ¶

Bases: IO[str]

IO representing a plain-text file.

Examples:

>>> from ordeq_files import Text
>>> from pathlib import Path
>>> my_text = Text(
...     path=Path("path/to.txt")
... )

`TextLinesStream` `dataclass` ¶

Bases: IO[Generator[str]]

IO representing a file stream as a generator of lines.

Useful for processing large files line-by-line.

By default, lines are separated by newline characters during load.

When saving, the newline character is appended to each line by default, this can be changed by providing a different end argument to the save method using with_save_options.

Examples:

>>> from ordeq_files import TextLinesStream
>>> from pathlib import Path
>>> my_file = TextLinesStream(
...     path=Path("path/to.txt")
... )

>>> my_file_no_endline = TextLinesStream(
...     path=Path("path/to.txt")
... ).with_save_options(end="")

`persist(data)` ¶

Don't persist since is a stream-based IO.

ordeq_files

Bytes dataclass ¶

Bz2 dataclass ¶

CSV dataclass ¶

Glob dataclass ¶

Gzip dataclass ¶

JSON dataclass ¶

Pickle dataclass ¶

Text dataclass ¶

TextLinesStream dataclass ¶

persist(data) ¶

`Bytes` `dataclass` ¶

`Bz2` `dataclass` ¶

`CSV` `dataclass` ¶

`Glob` `dataclass` ¶

`Gzip` `dataclass` ¶

`JSON` `dataclass` ¶

`Pickle` `dataclass` ¶

`Text` `dataclass` ¶

`TextLinesStream` `dataclass` ¶

`persist(data)` ¶