command_line_arg.py
CommandLineArg
Bases: Input[T]
Dataset that represents a command line argument as node input. Useful for parameterization of node logic based on arguments in the run command.
Parses the argument from sys.argv
on load. See
argparse for more
information.
Example:
>>> from ordeq import node
>>> from ordeq_spark import SparkHiveTable
>>> import pyspark.sql.functions as F
>>> from pyspark.sql import DataFrame
>>> @node(
... inputs=[
... SparkHiveTable(table="my.table"),
... CommandLineArg("--value")
... ],
... outputs=SparkHiveTable(table="my.output"),
... )
... def transform(df: DataFrame, value: str) -> DataFrame:
... return df.where(F.col("col") == value)
When you run transform
through the CLI as follows:
python {your-entrypoint} run --node transform --value MyValue
MyValue
will be used as value
in transform
.
By default, the command line arguments are parsed as string. You can parse as different type using built-in type converters, for instance:
>>> K = CommandLineArg("--k", type=int)
>>> Threshold = CommandLineArg("--threshold", type=float)
>>> Address = CommandLineArg("--address", type=ascii)
>>> import pathlib
>>> Path = CommandLineArg("--path", type=pathlib.Path)
>>> import datetime
>>> DateTime = CommandLineArg("--date", type=datetime.date.fromisoformat)
Alternatively, you can parse using a user-defined function, e.g.:
>>> def hyphenated(string: str) -> str:
... return '-'.join([w[:4] for w in string.casefold().split()])
>>> parser = argparse.ArgumentParser()
>>> Title = CommandLineArg("--title", type=hyphenated)
Parsing command line arguments as argparse.FileType
is discouraged as
it has been deprecated
from Python 3.14.
More info on parsing types here