ordeq_args

`CommandLineArg` ¶

Bases: Input[T]

Dataset that represents a command line argument as node input. Useful for parameterization of node logic based on arguments in the run command.

Parses the argument from sys.argv on load. See argparse for more information.

Example:

main.py
from ordeq import node, run
from ordeq_spark import SparkHiveTable
import pyspark.sql.functions as F
from pyspark.sql import DataFrame


@node(
    inputs=[SparkHiveTable(table="my.table"), CommandLineArg("--value")],
    outputs=SparkHiveTable(table="my.output"),
)
def transform(df: DataFrame, value: str) -> DataFrame:
    return df.where(F.col("col") == value)


if __name__ == "__main__":
    run(transform)

When you run transform through the CLI as follows:

python main.py --value MyValue

MyValue will be used as value in transform.

By default, the command line arguments are parsed as string. You can parse as different type using built-in type converters, for instance:

import pathlib
import datetime

k = CommandLineArg("--k", type=int)
threshold = CommandLineArg("--threshold", type=float)
address = CommandLineArg("--address", type=ascii)
path = CommandLineArg("--path", type=pathlib.Path)
date_time = CommandLineArg("--date", type=datetime.date.fromisoformat)

Alternatively, you can parse using a user-defined function, e.g.:

def hyphenated(string: str) -> str:
    return "-".join([w[:4] for w in string.casefold().split()])


title = CommandLineArg("--title", type=hyphenated)

When using multiple CommandLineArg IOs in a node, then you can link them to the same argument parser:

import argparse

parser = argparse.ArgumentParser()
arg1 = CommandLineArg("--arg1", parser=parser)
arg2 = CommandLineArg("--arg2", parser=parser)

Parsing command line arguments as argparse.FileType is discouraged as it has been deprecated from Python 3.14.

More info on parsing types here

`EnvironmentVariable` `dataclass` ¶

Bases: IO[str]

IO used to load and save environment variables. Use: - as input, to parameterize the node logic - as output, to set an environment variable based on node logic

Gets and sets os.environ on load and save. See the Python docs for more information.

Example in a node:

main.py
from ordeq import run, node
from ordeq_spark import SparkHiveTable
import pyspark.sql.functions as F
from pyspark.sql import DataFrame


@node(
    inputs=[
        SparkHiveTable(table="my.table"),
        EnvironmentVariable("KEY", default="DEFAULT"),
    ],
    outputs=SparkHiveTable(table="my.output"),
)
def transform(df: DataFrame, value: str) -> DataFrame:
    return df.where(F.col("col") == value)


if __name__ == "__main__":
    run(transform)

When you run transform through the CLI as follows:

export KEY=MyValue
python main.py transform

MyValue will be used as value in transform.

ordeq_args

CommandLineArg ¶

EnvironmentVariable dataclass ¶

`CommandLineArg` ¶

`EnvironmentVariable` `dataclass` ¶