Hooks
Hooks can be used to run code at certain phases in the run. With hooks, you can execute custom functionality such as data quality checks, logging and timing, without affecting data transformations in the nodes.
Why hooks are useful
To see how hooks can be useful, let's consider the following example:
import catalog
@node(
inputs=catalog.names,
outputs=catalog.greetings
)
def greet(names: tuple[str, ...]) -> list[str]:
"""Returns a greeting for each person."""
greetings = []
for name in names:
greetings.append(f"Hello, {name}!")
return greetings
from ordeq_files import CSV, Text
from pathlib import Path
names = CSV(path=Path("names.csv"))
greetings = Text(path=Path("greetings.txt"))
As we learned earlier, running the node greet
is roughly equivalent to:
>>> import catalog
>>> names = catalog.names.load()
>>> greetings = greet(names)
>>> catalog.greetings.save(greetings)
With hooks, you can inject custom logic around each step.
For example, you might want to log the time taken to load the input names
:
>>> import catalog
>>> import time
>>> start_time = time.time()
>>> names = catalog.names.load()
>>> end_time = time.time()
>>> print(f"Loading names took {end_time - start_time} seconds")
>>> greetings = greet(names)
>>> catalog.greetings.save(greetings)
As you can see, this custom logic adds a lot of boilerplate code around loading the input. If you want to do this for every input, it quickly becomes tedious and error-prone.
Instead, you can use a hook to run this logic automatically around every input loading:
import time
from ordeq import InputHook, Node, Input
class TimeHook(InputHook):
def before_input_load(self, io: Input) -> None:
self.start_time = time.time()
def after_input_load(self, io: Input) -> None:
end_time = time.time()
print(f"Loading {io} took {end_time - self.start_time} seconds")
To ensure the hook is executed, it needs to be attached to the input:
from ordeq_files import CSV, Text
from pathlib import Path
names = CSV(
path=Path("names.csv")
).with_input_hooks(TimeHook())
greetings = Text(path=Path("greetings.txt"))
Types of hooks
Ordeq provides three types of hooks:
RunHook
: called around a set of nodesNodeHook
: called around running a nodeInputHook
: called around the loading of inputsOutputHook
: called around the saving of outputs
The following diagram depicts how the hooks are applied by Ordeq:
graph LR
subgraph Run
Z[Start run] --> Z1[RunHook.before_run]
Z1 --> Z2{{For each node}}
Z2 --> A
subgraph Node
A[Start node run] --> B[NodeHook.before_node_run]
B --> C1{{For each input}}
subgraph Input
C1 --> C2[InputHook.before_input_load]
C2 --> C3[Load input]
C3 --> C4[InputHook.after_input_load]
C4 --> D1{{Next input}}
D1 -- More? --> C2
end
D1 -- All done --> E[Call node function]
E -->|Success| F1{{For each output}}
subgraph Output
F1 --> F2[OutputHook.before_output_save]
F2 --> F3[Save output]
F3 --> F4[OutputHook.after_output_save]
F4 --> G1{{Next output}}
G1 -- More? --> F2
end
G1 -- All done --> H[NodeHook.after_node_run]
H --> I[End node run]
E -->|Error| X[NodeHook.on_node_call_error]
X --> I
end
I --> Z3[RunHook.after_run]
Z3 --> End[End run]
end
This page demonstrated the concept of hooks and discussed an elementary examples. For a more elaborate guide, including details on how to implement your own hooks, see Creating custom hooks.
Where to go from here?
- See how to create custom hooks in the guide
- Check out the guide on testing nodes