utils.py
apply_schema(df, schema)
Applies a schema to the DataFrame, meaning: - select only those columns in the schema from the DataFrame - cast these to the types specified in the schema
Note that this method does not check for nullability in the schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame |
required |
schema
|
StructType
|
StructType (schema) |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
the DataFrame with schema applied |
create_schema(schema)
Utility method that creates a Spark StructType (or, schema) from tuple inputs, without need to import and write StructType & StructField
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
tuple[tuple[str, DataType, bool], ...]
|
tuple of schema inputs |
required |
Returns:
Type | Description |
---|---|
StructType
|
StructType |
get_spark_session()
Helper to get the SparkSession
Returns:
Type | Description |
---|---|
SparkSession
|
the spark session object |
Raises:
Type | Description |
---|---|
RuntimeError
|
when the spark session is not active |