dataframe.py
SparkDataFrame
dataclass
Bases: Input[DataFrame]
Allows a Spark DataFrame to be hard-coded in python. This is suitable for small tables such as very simple dimension tables that are unlikely to change. It may also be useful in unit testing.
Example usage:
>>> from pyspark.sql.types import *
>>> from ordeq_spark import SparkDataFrame
>>> df = SparkDataFrame(
... schema=StructType([
... StructField("year", IntegerType()),
... StructField("datafile", StringType()),
... ]),
... data=(
... (2022, "file_2022.xlsx"),
... (2023, "file_2023.xlsx"),
... (2024, "file_2023.xlsx"),
... )
... )