Skip to content

session.py

SparkSession dataclass

Bases: Input[SparkSession]

Input representing the active Spark session. Useful for accessing the active Spark session in nodes.

Example:

>>> from ordeq_spark.io.session import SparkSession
>>> spark_session = SparkSession()
>>> spark = spark_session.load()  # doctest: +SKIP
>>> print(spark.version)  # doctest: +SKIP
3.3.1

Example in a node:

>>> from ordeq import node
>>> from ordeq_common import Static
>>> items = Static({'id': [1, 2, 3], 'value': ['a', 'b', 'c']})
>>> @node(
...     inputs=[items, spark_session],
...     outputs=[],
... )
... def convert_to_df(
...     data: dict, spark: pyspark.sql.SparkSession
... ) -> pyspark.sql.DataFrame:
...     return spark.createDataFrame(data)

load()

Gets the active SparkSession

Returns:

Type Description
SparkSession

pyspark.sql.SparkSession: The Spark session.

Raises:

Type Description
ValueError

If there is no active Spark session.