ordeq_files
Bytes
dataclass
¶
Bases: IO[bytes]
IO representing bytes.
Example:
>>> from ordeq_files import Bytes
>>> from pathlib import Path
>>> my_png = Bytes(
... path=Path("path/to.png")
... )
Bz2
dataclass
¶
Bases: IO[bytes]
IO representing a bzip2-compressed file.
Example usage:
>>> from ordeq_files import Bz2
>>> from pathlib import Path
>>> my_bz2 = Bz2(
... path=Path("path/to.bz2")
... )
CSV
dataclass
¶
Bases: IO[Iterable[Iterable[Any]]]
IO representing a CSV file.
Example usage:
>>> from ordeq_files import CSV
>>> from pathlib import Path
>>> computer_sales = CSV(
... path=Path("path/to/computer_sales.csv")
... )
Example in a node:
>>> from ordeq import node
>>> computer_sales_in_nl = CSV(path=Path("computer_sales_nl.csv"))
>>> @node(
... inputs=computer_sales,
... outputs=computer_sales_in_nl
... )
... def filter_computer_sales(computer_sales: list) -> list:
... return [row for row in computer_sales if row[1] == "NL"]
Example with a node generator:
>>> from pathlib import Path
>>> from ordeq import node
>>> from ordeq_files import CSV
>>> @node(outputs=CSV(path=Path("output.csv")))
... def generator():
... yield ["constant", "idx"]
... for idx in range(100):
... yield [1, idx]
>>> if __name__ == "__main__":
... from ordeq import run
... run(generator)
Loading and saving can be configured with additional parameters, e.g:
>>> computer_sales.load(quotechar='"', delimiter=',') # doctest: +SKIP
>>> computer_sales.with_load_options(dialect='excel').load() # doctest: +SKIP
>>> data = [["NL", "2023-10-01", 1000], ["BE", "2023-10-02", 1500]]
>>> computer_sales.save(data, quoting=csv.QUOTE_MINIMAL) # doctest: +SKIP
Refer to 1 for more details on the available options.
Glob
dataclass
¶
Bases: Input[Generator[PathLike, None, None]]
IO class that loads all paths provided a pattern.
Although this class can be used as dataset in your nodes,
for most cases it would be more suitable to inherit from
this class and extend the load method, for example:
>>> class LoadPartitions(Glob):
... def load(self):
... paths = super().load()
... for path in paths:
... yield my_load_func(path)
Gzip
dataclass
¶
Bases: IO[bytes]
IO representing a gzip-compressed file.
Example usage:
>>> from ordeq_files import Gzip
>>> from pathlib import Path
>>> my_gzip = Gzip(
... path=Path("path/to.gz")
... )
JSON
dataclass
¶
Bases: IO[dict[str, Any]]
IO representing a JSON.
Example usage:
>>> from ordeq_files import JSON
>>> from pathlib import Path
>>> my_json = JSON(
... path=Path("path/to.json")
... )
Pickle
dataclass
¶
Bases: IO[T]
IO that loads and saves a Pickle files.
Example usage:
>>> from ordeq_files import Pickle
>>> from pathlib import Path
>>> my_pickle = Pickle(
... path=Path("path/to.pkl")
... )
Text
dataclass
¶
Bases: IO[str]
IO representing a plain-text file.
Examples:
>>> from ordeq_files import Text
>>> from pathlib import Path
>>> my_text = Text(
... path=Path("path/to.txt")
... )
TextLinesStream
dataclass
¶
Bases: IO[Generator[str]]
IO representing a file stream as a generator of lines.
Useful for processing large files line-by-line.
By default, lines are separated by newline characters during load.
When saving, the newline character is appended to each line
by default, this can be changed by providing a different end argument
to the save method using with_save_options.
Examples:
>>> from ordeq_files import TextLinesStream
>>> from pathlib import Path
>>> my_file = TextLinesStream(
... path=Path("path/to.txt")
... )
>>> my_file_no_endline = TextLinesStream(
... path=Path("path/to.txt")
... ).with_save_options(end="")
persist(data)
¶
Don't persist since is a stream-based IO.