ordeq_common
BytesBuffer
dataclass
¶
Bases: IO[bytes]
IO that uses an in-memory bytes buffer to load and save data. Useful for buffering data across nodes without writing to disk.
Example:
>>> from ordeq_common import BytesBuffer
>>> buffer = BytesBuffer()
>>> buffer.load()
b''
The buffer is initially empty, unless provided with initial data:
>>> buffer = BytesBuffer(b"Initial data")
>>> buffer.load()
b'Initial data'
Saving to the buffer appends data to the existing content:
>>> buffer.save(b"New data")
>>> buffer.load()
b'Initial dataNew data'
Example in a node:
>>> from ordeq_args import CommandLineArg
>>> from ordeq_common import BytesBuffer
>>> from ordeq import node, run, Input
>>> result = BytesBuffer()
>>> @node(
... inputs=[BytesBuffer(b"Hello"), Input[bytes](b"you")],
... outputs=result
... )
... def greet(greeting: bytes, name: bytes) -> bytes:
... return greeting + b" to " + name + b"!"
>>> run(greet)
>>> result.load()
b'Hello to you!'
Dataclass
dataclass
¶
Bases: Input['DataclassInstance']
IO that parses data as Python dataclass on load.
Example:
>>> from ordeq_common import Dataclass
>>> from ordeq_files import JSON
>>> from pathlib import Path
>>> ValidJSON = JSON(path=Path("to/valid.json"))
>>> ValidJSON.load() # doctest: +SKIP
{"name": "banana", "colour": "yellow"}
>>> @dataclass
... class Fruit:
... name: str
... colour: str
>>> Dataclass(ValidJSON, Fruit).load() # doctest: +SKIP
Fruit(name="banana", colour="yellow")
>>> InvalidJSON = JSON(path=Path("to/invalid.json"))
>>> InvalidJSON.load() # doctest: +SKIP
{"name": "banana", "weight_gr": "100"}
>>> Dataclass(InvalidJSON, Fruit).load() # doctest: +SKIP
TypeError: Fruit.__init__() got an unexpected keyword argument 'weight_gr'
For nested models, or other more sophisticated parsing requirements
consider using ordeq-pydantic instead.
Literal
dataclass
¶
Bases: Input[T]
IO that returns a pre-defined value on load. Mostly useful for testing purposes.
Example:
>>> from ordeq_common import Literal
>>> value = Literal("someValue")
>>> value.load()
'someValue'
>>> print(value)
Literal('someValue')
LoggerHook
¶
Bases: InputHook, OutputHook, NodeHook
Hook that prints the calls to the methods. Typically only used for test purposes.
Print
dataclass
¶
Bases: Output[Any]
Output that prints data on save. Mostly useful for debugging purposes.
The difference between other utilities like StringBuffer and Pass is
that Print shows the output of the node directly on the console.
Example:
>>> from ordeq_common import Print
>>> from ordeq import node, run, Input
>>> @node(
... inputs=Input[str]("hello, world!"),
... outputs=Print()
... )
... def print_message(message: str) -> str:
... return message.capitalize()
>>> run(print_message)
Hello, world!
>>> import sys
>>> @node(
... inputs=Input[str]("error message"),
... outputs=Print().with_save_options(file=sys.stderr)
... )
... def log_error(message: str) -> str:
... return f"Error: {message}"
>>> run(log_error) # prints to stderr
SpyHook
¶
Bases: InputHook, OutputHook, NodeHook
Hook that stores the arguments it is called with in a list. Typically only used for test purposes.
StringBuffer
dataclass
¶
Bases: IO[str]
IO that uses an in-memory string buffer to load and save data. Useful for buffering data across nodes without writing to disk.
Example:
>>> from ordeq_common import StringBuffer
>>> buffer = StringBuffer()
>>> buffer.load()
''
The buffer is initially empty, unless provided with initial data:
>>> buffer = StringBuffer("Initial data")
>>> buffer.load()
'Initial data'
Saving to the buffer appends data to the existing content:
>>> buffer.save("New data")
>>> buffer.load()
'Initial dataNew data'
Example in a node:
>>> from ordeq_args import CommandLineArg
>>> from ordeq_common import StringBuffer
>>> from ordeq import node, run, Input
>>> result = StringBuffer()
>>> @node(
... inputs=[StringBuffer("Hello"), Input[str]("you")],
... outputs=result
... )
... def greet(greeting: str, name: str) -> str:
... return f"{greeting} to {name}!"
>>> run(greet)
>>> result.load()
'Hello to you!'
Iterate(*ios)
¶
IO for loading and saving iteratively. This can be useful when processing multiple IOs using the same node, while only requiring to have one of them in memory at the same time.
Examples:
The load function returns a generator:
>>> from pathlib import Path
>>> from ordeq_files import Text, JSON
>>> from ordeq_common import Iterate
>>> paths = [Path("hello.txt"), Path("world.txt")]
>>> text_ios = Iterate(*[Text(path=path) for path in paths])
>>> text_ios.load() # doctest: +SKIP
<generator object Iterate._load at 0x104946f60>
The load function returns the contents of the files in this case:
>>> list(text_ios.load()) # doctest: +SKIP
['hello', 'world']
By iterating over the contents, each file will be loaded and saved without the need to keep multiple files in memory at the same time:
>>> for idx, content in enumerate(text_ios.load()): # doctest: +SKIP
... JSON(
... path=paths[idx].with_suffix(".json")
... ).save({"content": content})
We can achieve the same by passing a generator to the Iterate.save
method:
>>> json_dataset = Iterate(
... *[
... JSON(path=path.with_suffix(".json"))
... for path in paths
... ]
... )
>>> json_dataset.save(
... ({"content": content} for content in text_ios.load())
... ) # doctest: +SKIP
>>> from collections.abc import Iterable
>>> def generate_json_contents(
... contents: Iterable[str]
... ) -> Iterable[dict[str, str]]:
... for content in contents:
... yield {"content": content}
>>> json_dataset.save(generate_json_contents(text_ios.load())) # doctest: +SKIP
Returns:
| Type | Description |
|---|---|
_Iterate[T]
|
_Iterate |
Match(io=None)
¶
Match(io: Input[Tkey]) -> MatchOnLoad[Tval, Tkey]
Match() -> MatchOnSave[Tval, Tkey]
Utility IO that allows dynamic switching between IO, like the match-case statement in Python.
Example:
>>> from ordeq import Input
>>> from ordeq_common import Match
>>> from ordeq_args import EnvironmentVariable
>>> import os
>>> Country = (
... Match(EnvironmentVariable("COUNTRY"))
... .Case("NL", Input[str]("Netherlands"))
... .Case("BE", Input[str]("Belgium"))
... .Default(Input[str]("Unknown"))
... )
>>> os.environ["COUNTRY"] = "NL"
>>> Country.load()
'Netherlands'
If a default is provided, it will be used when no cases match:
>>> os.environ["COUNTRY"] = "DE"
>>> Country.load()
'Unknown'
Otherwise, it raises an error when none of the provided cases are matched:
>>> Match(EnvironmentVariable("COUNTRY")).load() # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ordeq.IOException: Failed to load
Unsupported case 'DE'
Match on save works as follows:
>>> SmallOrLarge = (
... Match()
... .Case("S", EnvironmentVariable("SMALL"))
... .Case("L", EnvironmentVariable("LARGE"))
... .Default(EnvironmentVariable("UNKNOWN"))
... )
>>> SmallOrLarge.save(("S", "Andorra"))
>>> SmallOrLarge.save(("L", "Russia"))
>>> SmallOrLarge.save(("XXL", "Mars"))
>>> os.environ["SMALL"]
'Andorra'
>>> os.environ.get("LARGE")
'Russia'
>>> os.environ.get("UNKNOWN")
'Mars'
Example in a node:
>>> from ordeq import node
>>> from ordeq_files import JSON
>>> from ordeq_args import CommandLineArg
>>> from pathlib import Path
>>> TestOrTrain = (
... Match(CommandLineArg("--split"))
... .Case("test", JSON(path=Path("to/test.json")))
... .Case("train", JSON(path=Path("to/train.json")))
... )
>>> @node(
... inputs=TestOrTrain,
... )
... def evaluate(data: dict) -> dict:
... ...
Returns:
| Type | Description |
|---|---|
MatchOnLoad | MatchOnSave
|
MatchOnLoad or MatchOnSave |