ordeq_duckdb
DuckDBCSV
dataclass
¶
Bases: IO[DuckDBPyRelation]
IO to load and save CSV files using DuckDB.
Example:
>>> from ordeq import node, run
>>> from ordeq_duckdb import DuckDBCSV
>>> csv = DuckDBCSV(path="data.csv")
>>> csv.save(duckdb.values([1, "a"]))
>>> data = csv.load()
>>> data.describe()
┌─────────┬────────┬─────────┐
│ aggr │ col0 │ col1 │
│ varchar │ double │ varchar │
├─────────┼────────┼─────────┤
│ count │ 1.0 │ 1 │
│ mean │ 1.0 │ NULL │
│ stddev │ NULL │ NULL │
│ min │ 1.0 │ a │
│ max │ 1.0 │ a │
│ median │ 1.0 │ NULL │
└─────────┴────────┴─────────┘
<BLANKLINE>
load(**kwargs)
¶
Load a CSV file into a DuckDB relation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Additional options to pass to duckdb.read_csv. |
{}
|
Returns:
| Type | Description |
|---|---|
DuckDBPyRelation
|
The DuckDB relation representing the loaded CSV data. |
save(relation, **kwargs)
¶
Save a DuckDB relation to a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
relation
|
DuckDBPyRelation
|
The relation to save. |
required |
**kwargs
|
Any
|
Additional options to pass to |
{}
|
DuckDBConnection
dataclass
¶
DuckDBTable
dataclass
¶
Bases: IO[DuckDBPyRelation]
IO to load from and save to a DuckDB table.
Example:
>>> import duckdb
>>> from ordeq_duckdb import DuckDBTable
>>> connection = duckdb.connect(":memory:")
>>> table = DuckDBTable(
... table="my_table",
... connection=connection
... )
>>> table.save(
... connection.values([123, "abc"])
... )
>>> connection.sql("SELECT * FROM my_table").show()
┌───────┬─────────┐
│ col0 │ col1 │
│ int32 │ varchar │
├───────┼─────────┤
│ 123 │ abc │
└───────┴─────────┘
<BLANKLINE>
Example in a node:
>>> from ordeq import node, run
>>> from pathlib import Path
>>> connection = duckdb.connect(":memory:")
>>> table = DuckDBTable(
... table="my_data",
... connection=connection,
... )
>>> @node(outputs=table)
... def convert_to_duckdb_relation() -> duckdb.DuckDBPyRelation:
... return connection.values([2, "b"])
>>> result = run(convert_to_duckdb_relation)
>>> connection.table("my_data").show()
┌───────┬─────────┐
│ col0 │ col1 │
│ int32 │ varchar │
├───────┼─────────┤
│ 2 │ b │
└───────┴─────────┘
<BLANKLINE>
load()
¶
Load the DuckDB table into a DuckDB relation.
Returns:
| Type | Description |
|---|---|
DuckDBPyRelation
|
A relation representing the loaded table. |
save(relation, mode='create')
¶
Save a relation to the DuckDB table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
relation
|
DuckDBPyRelation
|
The relation to save. |
required |
mode
|
Literal['create', 'insert']
|
The save mode. "create" will create the table, "insert" will insert into the table if it exists, or create it if it doesn't. |
'create'
|
Raises:
| Type | Description |
|---|---|
CatalogException
|
If the table already exists and mode is "create". |
DuckDBView
dataclass
¶
Bases: IO[DuckDBPyRelation]
IO to load and save a DuckDB view.
Example:
>>> import duckdb
>>> from ordeq_duckdb import DuckDBView
>>> connection = duckdb.connect(":memory:")
>>> view = DuckDBView(
... view="fruits",
... connection=connection
... )
>>> data = connection.values([1, "apples", "red"])
>>> view.save(data)
>>> view.load()
┌───────┬─────────┬─────────┐
│ col0 │ col1 │ col2 │
│ int32 │ varchar │ varchar │
├───────┼─────────┼─────────┤
│ 1 │ apples │ red │
└───────┴─────────┴─────────┘
<BLANKLINE>
By default, the view will be replaced if it already exists.
To change this, pass replace=False to the save method:
>>> view = view.with_save_options(replace=False)
>>> view.save(data) # doctest: +SKIP
IOException('Failed to save DuckDBView(view='fruits', ...
Example in a node:
>>> from ordeq import node
>>> from ordeq_duckdb import DuckDBTable
>>> import duckdb
>>> connection = duckdb.connect(":memory:")
>>> fruits = DuckDBTable(
... table="fruits",
... connection=connection,
... )
>>> fruits_filtered = DuckDBView(
... view="fruits_filtered",
... connection=connection,
... )
>>> @node(inputs=fruits, outputs=fruits_filtered)
... def filter_fruits(
... fruits: duckdb.DuckDBPyRelation
... ) -> duckdb.DuckDBPyRelation:
... return fruits.filter("color = 'red'")
load()
¶
Loads a DuckDB view.
Returns:
| Type | Description |
|---|---|
DuckDBPyRelation
|
The DuckDB view. |
save(relation, replace=True)
¶
Saves a DuckDB relation to a DuckDB view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
relation
|
DuckDBPyRelation
|
The DuckDB relation to save. |
required |
replace
|
bool
|
Whether to replace the view if it already exists. |
True
|