jdbc.py
SparkJDBC
dataclass
IO for loading from and saving to a database with Spark and JDBC.
The JDBC driver need to be added to your Spark application. For
instance, to connect to Microsoft SQL Server, pass
com.microsoft.sqlserver:mssql-jdbc:12.8.1.jre11
as extra JAR.
When saving, the types of the DataFrame should be compatible with the target table. More info [1].
[1] https://spark.apache.org/docs/4.0.0-preview1/sql-data-sources-jdbc.html
Example:
>>> from ordeq_spark import SparkJDBCTable
>>> SparkJDBCTable(
... table="schema.table",
... driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",
... url=(
... "jdbc:sqlserver://host:52840"
... "databaseName=tempdb;"
... "user=SA;"
... "password=1Secure*Password1;"
... )
... ).load() # doctest: +SKIP
Use SparkJDBCQuery
to execute a custom query on load:
>>> from ordeq_spark import SparkJDBCQuery
>>> SparkJDBCQuery(
... query="SELECT 1;",
... driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",
... url=(
... "jdbc:sqlserver://host:52840"
... "databaseName=tempdb;"
... "user=SA;"
... "password=1Secure*Password1;"
... )
... ).load() # doctest: +SKIP