Skip to content

Hooks

Hooks can be used to run code at certain phases in the run. With hooks, you can execute custom functionality such as data quality checks, logging and timing, without affecting data transformations in the nodes.

Why hooks are useful

To see how hooks can be useful, let's consider the following example:

import catalog

@node(
    inputs=catalog.names,
    outputs=catalog.greetings
)
def greet(names: tuple[str, ...]) -> list[str]:
    """Returns a greeting for each person."""
    greetings = []
    for name in names:
        greetings.append(f"Hello, {name}!")
    return greetings
from ordeq_files import CSV, Text
from pathlib import Path

names = CSV(path=Path("names.csv"))
greetings = Text(path=Path("greetings.txt"))

As we learned earlier, running the node greet is roughly equivalent to:

>>> import catalog
>>> names = catalog.names.load()
>>> greetings = greet(names)
>>> catalog.greetings.save(greetings)

With hooks, you can inject custom logic around each step. For example, you might want to log the time taken to load the input names:

>>> import catalog
>>> import time
>>> start_time = time.time()
>>> names = catalog.names.load()
>>> end_time = time.time()
>>> print(f"Loading names took {end_time - start_time} seconds")
>>> greetings = greet(names)
>>> catalog.greetings.save(greetings)

As you can see, this custom logic adds a lot of boilerplate code around loading the input. If you want to do this for every input, it quickly becomes tedious and error-prone.

Instead, you can use a hook to run this logic automatically around every input loading:

import time

from ordeq import InputHook, Node, Input

class TimeHook(InputHook):
    def before_input_load(self, io: Input) -> None:
        self.start_time = time.time()

    def after_input_load(self, io: Input) -> None:
        end_time = time.time()
        print(f"Loading {io} took {end_time - self.start_time} seconds")

To ensure the hook is executed, it needs to be attached to the input:

from ordeq_files import CSV, Text
from pathlib import Path

names = CSV(
    path=Path("names.csv")
).with_input_hooks(TimeHook())
greetings = Text(path=Path("greetings.txt"))

Types of hooks

Ordeq provides three types of hooks:

  • RunHook: called around a set of nodes
  • NodeHook: called around running a node
  • InputHook: called around the loading of inputs
  • OutputHook: called around the saving of outputs

The following diagram depicts how the hooks are applied by Ordeq:

graph LR
    subgraph Run
        Z[Start run] --> Z1[RunHook.before_run]
        Z1 --> Z2{{For each node}}
        Z2 --> A
        subgraph Node
            A[Start node run] --> B[NodeHook.before_node_run]
            B --> C1{{For each input}}
            subgraph Input
                C1 --> C2[InputHook.before_input_load]
                C2 --> C3[Load input]
                C3 --> C4[InputHook.after_input_load]
                C4 --> D1{{Next input}}
                D1 -- More? --> C2

            end
            D1 -- All done --> E[Call node function]
            E -->|Success| F1{{For each output}}
            subgraph Output
                F1 --> F2[OutputHook.before_output_save]
                F2 --> F3[Save output]
                F3 --> F4[OutputHook.after_output_save]
                F4 --> G1{{Next output}}
                G1 -- More? --> F2
            end
            G1 -- All done --> H[NodeHook.after_node_run]
            H --> I[End node run]
            E -->|Error| X[NodeHook.on_node_call_error]
            X --> I

        end
        I --> Z3[RunHook.after_run]
        Z3 --> End[End run]
    end

This page demonstrated the concept of hooks and discussed an elementary examples. For a more elaborate guide, including details on how to implement your own hooks, see Creating custom hooks.

Where to go from here?