ordeq_args
CommandLineArg
¶
Bases: Input[T]
Dataset that represents a command line argument as node input. Useful for parameterization of node logic based on arguments in the run command.
Parses the argument from sys.argv on load. See
argparse for more
information.
Example:
from ordeq import node, run
from ordeq_spark import SparkHiveTable
import pyspark.sql.functions as F
from pyspark.sql import DataFrame
@node(
inputs=[SparkHiveTable(table="my.table"), CommandLineArg("--value")],
outputs=SparkHiveTable(table="my.output"),
)
def transform(df: DataFrame, value: str) -> DataFrame:
return df.where(F.col("col") == value)
if __name__ == "__main__":
run(transform)
When you run transform through the CLI as follows:
python main.py --value MyValue
MyValue will be used as value in transform.
By default, the command line arguments are parsed as string. You can parse as different type using built-in type converters, for instance:
import pathlib
import datetime
k = CommandLineArg("--k", type=int)
threshold = CommandLineArg("--threshold", type=float)
address = CommandLineArg("--address", type=ascii)
path = CommandLineArg("--path", type=pathlib.Path)
date_time = CommandLineArg("--date", type=datetime.date.fromisoformat)
Alternatively, you can parse using a user-defined function, e.g.:
def hyphenated(string: str) -> str:
return "-".join([w[:4] for w in string.casefold().split()])
title = CommandLineArg("--title", type=hyphenated)
When using multiple CommandLineArg IOs in a node, then you can link them
to the same argument parser:
import argparse
parser = argparse.ArgumentParser()
arg1 = CommandLineArg("--arg1", parser=parser)
arg2 = CommandLineArg("--arg2", parser=parser)
Parsing command line arguments as argparse.FileType is discouraged as
it has been deprecated
from Python 3.14.
More info on parsing types here
EnvironmentVariable
dataclass
¶
Bases: IO[str]
IO used to load and save environment variables. Use: - as input, to parameterize the node logic - as output, to set an environment variable based on node logic
Gets and sets os.environ on load and save. See the Python docs for more
information.
Example in a node:
from ordeq import run, node
from ordeq_spark import SparkHiveTable
import pyspark.sql.functions as F
from pyspark.sql import DataFrame
@node(
inputs=[
SparkHiveTable(table="my.table"),
EnvironmentVariable("KEY", default="DEFAULT"),
],
outputs=SparkHiveTable(table="my.output"),
)
def transform(df: DataFrame, value: str) -> DataFrame:
return df.where(F.col("col") == value)
if __name__ == "__main__":
run(transform)
When you run transform through the CLI as follows:
export KEY=MyValue
python main.py transform
MyValue will be used as value in transform.