Skip to content

jdbc.py

SparkJDBC dataclass

IO for loading from and saving to a database with Spark and JDBC.

The JDBC driver need to be added to your Spark application. For instance, to connect to Microsoft SQL Server, pass com.microsoft.sqlserver:mssql-jdbc:12.8.1.jre11 as extra JAR.

When saving, the types of the DataFrame should be compatible with the target table. More info [1].

[1] https://spark.apache.org/docs/4.0.0-preview1/sql-data-sources-jdbc.html

Example:

>>> from ordeq_spark import SparkJDBCTable
>>> SparkJDBCTable(
...     table="schema.table",
...     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",
...     url=(
...         "jdbc:sqlserver://host:52840"
...         "databaseName=tempdb;"
...         "user=SA;"
...         "password=1Secure*Password1;"
...     )
...  ).load()  # doctest: +SKIP

Use SparkJDBCQuery to execute a custom query on load:

>>> from ordeq_spark import SparkJDBCQuery
>>> SparkJDBCQuery(
...     query="SELECT 1;",
...     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",
...     url=(
...         "jdbc:sqlserver://host:52840"
...         "databaseName=tempdb;"
...         "user=SA;"
...         "password=1Secure*Password1;"
...     )
...  ).load()  # doctest: +SKIP