Skip to content

Testing nodes

Because nodes behave like plain Python functions, they can be tested using any Python testing framework. Let's reconsider the greet node from the node concepts section:

import catalog

@node(inputs=catalog.names, outputs=catalog.greetings)
def greet(names: tuple[str, ...]) -> list[str]:
    """Returns a greeting for each person."""
    greetings = []
    for name in names:
        greetings.append(f"Hello, {name}!")
    return greetings
from ordeq_files import CSV, Text
from pathlib import Path

names = CSV(path=Path("names.csv"))
greetings = Text(path=Path("greetings.txt"))

This node can be unit-tested as follows:

def test_greet_empty():
    assert greet() == []

def test_greet_one_name():
    assert greet(["Alice"]) == ["Hello, Alice!"]

def test_greet_two_names():
    assert greet(["Alice", "Bob"]) == ["Hello, Alice!", "Hello, Bob!"]

def test_greet_special_chars():
    assert greet(["A$i%*c"]) == ["Hello, A$i%*c!"]

These tests only test the transformations. They do not load or save any data, and do not use any hooks. This is a good practice for unit tests, as it keeps them fast and isolated.

Running nodes in tests

Alternatively, you can test nodes by running them. This will load the data from the node inputs, and save the returned data to the node outputs. The result of the run will be a dictionary containing the data for each input and output used in the run:

def test_run_greet():
    result = run(greet)
    assert result[greetings] == ["Hello, Abraham!", "Hello, Adam!", "Hello, Azul!", ...]

In contrast to the unit tests, this test depends on the content of the CSV file used as input to greet. As shown above, the result of greet can be retrieved by accessing the result dictionary with the Output of greet as the key:

Running nodes with alternative IO

Many times we do not want to connect to a real file system or database when testing. This can be because connecting to the real data is slow, or because we do not want the tests to change the actual data. Instead, we want to test the logic with some seed data, often stored locally.

Suppose reading from greetings is very expensive, because it is a large file. We can use a local file with the same structure to test the node:

from ordeq_files import CSV, Text
from pathlib import Path
from ordeq import run

from nodes import greet, names, greetings

def test_run_greet():
    local_names = CSV(path=Path("to/local/names.csv"))
    local_greetings = Text(path=Path("to/local/greetings.txt"))
    result = run(greet, io={names: local_names, greetings: local_greetings})
    assert result[greetings] == ["Hello, Abraham!", "Hello, Adam!", "Hello, Azul!", ...]

When greet is run, Ordeq will use the local_names and local_greetings IOs as replacements of the names and greetings defined in the catalog.

IO fixtures

You can also use the io argument to run as a fixture in your tests. This allows you to define the IOs once and reuse them multiple times.

import pytest
from ordeq import IO, Input, Output
from nodes import names, greetings
from ordeq_files import CSV, Text
from pathlib import Path

@pytest.fixture(scope="session")
def io() -> dict[IO | Input | Output, IO | Input | Output]:
    """Mapping of node inputs and outputs to the inputs and outputs used throughout tests."""
    return {
        names: CSV(path=Path("to/local/names.csv")),
        greetings: Text(path=Path("to/local/greetings.txt")),
    }

Now we can use the io fixture in our tests:

def test_run_greet(io):
    result = run(greet, io=io)
    # do your asserts ...

For more information on the fixture scope, refer to the pytest documentation.