Skip to content

ordeq_duckdb

DuckDBCSV dataclass

Bases: IO[DuckDBPyRelation]

IO to load and save CSV files using DuckDB.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> from ordeq import node, run
>>> from ordeq_duckdb import DuckDBCSV
>>> csv = DuckDBCSV(path="data.csv")
>>> csv.save(duckdb.values([1, "a"]))
>>> data = csv.load()
>>> data.describe()
┌─────────┬────────┬─────────┐
│  aggr   │  col0  │  col1   │
│ varchar │ double │ varchar │
├─────────┼────────┼─────────┤
│ count   │    1.0 │ 1       │
│ mean    │    1.0 │ NULL    │
│ stddev  │   NULL │ NULL    │
│ min     │    1.0 │ a       │
│ max     │    1.0 │ a       │
│ median  │    1.0 │ NULL    │
└─────────┴────────┴─────────┘
<BLANKLINE>

load(**kwargs)

Load a CSV file into a DuckDB relation.

Parameters:

Name Type Description Default
**kwargs Any

Additional options to pass to duckdb.read_csv.

{}

Returns:

Type Description
DuckDBPyRelation

The DuckDB relation representing the loaded CSV data.

save(relation, **kwargs)

Save a DuckDB relation to a CSV file.

Parameters:

Name Type Description Default
relation DuckDBPyRelation

The relation to save.

required
**kwargs Any

Additional options to pass to relation.to_csv

{}

DuckDBConnection dataclass

Bases: Input[DuckDBPyConnection]

Input that loads a DuckDB connection.

load(**kwargs)

Loads a DuckDB connection.

Parameters:

Name Type Description Default
**kwargs Any

Additional kwargs to pass to duckdb.connect.

{}

Returns:

Type Description
DuckDBPyConnection

The DuckDB connection.

DuckDBParquet dataclass

Bases: IO[DuckDBPyRelation]

IO to load and save Parquet files using DuckDB.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> from ordeq import node, run
>>> from ordeq_duckdb import DuckDBCSV
>>> parquet = DuckDBParquet(path="data.csv")
>>> parquet.save(duckdb.values([1, "a"]))
>>> data = parquet.load()
>>> data.describe()
┌─────────┬────────┬─────────┐
│  aggr   │  col0  │  col1   │
│ varchar │ double │ varchar │
├─────────┼────────┼─────────┤
│ count   │    1.0 │ 1       │
│ mean    │    1.0 │ NULL    │
│ stddev  │   NULL │ NULL    │
│ min     │    1.0 │ a       │
│ max     │    1.0 │ a       │
│ median  │    1.0 │ NULL    │
└─────────┴────────┴─────────┘
<BLANKLINE>

load(**kwargs)

Load a Parquet file into a DuckDB relation.

Parameters:

Name Type Description Default
**kwargs Any

Additional options to pass to duckdb.read_parquet.

{}

Returns:

Type Description
DuckDBPyRelation

The DuckDB relation representing the loaded Parquet data.

save(relation, **kwargs)

Save a DuckDB relation to a Parquet file.

Parameters:

Name Type Description Default
relation DuckDBPyRelation

The relation to save.

required
**kwargs Any

Additional options to pass to relation.to_parquet

{}

DuckDBTable dataclass

Bases: IO[DuckDBPyRelation]

IO to load from and save to a DuckDB table.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> import duckdb
>>> from ordeq_duckdb import DuckDBTable
>>> connection = duckdb.connect(":memory:")
>>> table = DuckDBTable(
...     table="my_table",
...     connection=connection
... )
>>> table.save(
...     connection.values([123, "abc"])
... )
>>> connection.sql("SELECT * FROM my_table").show()
┌───────┬─────────┐
│ col0  │  col1   │
│ int32 │ varchar │
├───────┼─────────┤
│   123 │ abc     │
└───────┴─────────┘
<BLANKLINE>

Example in a node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
>>> from ordeq import node, run
>>> from pathlib import Path
>>> connection = duckdb.connect(":memory:")
>>> table = DuckDBTable(
...     table="my_data",
...     connection=connection,
... )
>>> @node(outputs=table)
... def convert_to_duckdb_relation() -> duckdb.DuckDBPyRelation:
...     return connection.values([2, "b"])
>>> result = run(convert_to_duckdb_relation)
>>> connection.table("my_data").show()
┌───────┬─────────┐
│ col0  │  col1   │
│ int32 │ varchar │
├───────┼─────────┤
│     2 │ b       │
└───────┴─────────┘
<BLANKLINE>

load()

Load the DuckDB table into a DuckDB relation.

Returns:

Type Description
DuckDBPyRelation

A relation representing the loaded table.

save(relation, mode='create')

Save a relation to the DuckDB table.

Parameters:

Name Type Description Default
relation DuckDBPyRelation

The relation to save.

required
mode Literal['create', 'insert']

The save mode. "create" will create the table, "insert" will insert into the table if it exists, or create it if it doesn't.

'create'

Raises:

Type Description
CatalogException

If the table already exists and mode is "create".

DuckDBView dataclass

Bases: IO[DuckDBPyRelation]

IO to load and save a DuckDB view.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> import duckdb
>>> from ordeq_duckdb import DuckDBView
>>> connection = duckdb.connect(":memory:")
>>> view = DuckDBView(
...     view="fruits",
...     connection=connection
... )
>>> data = connection.values([1, "apples", "red"])
>>> view.save(data)
>>> view.load()
┌───────┬─────────┬─────────┐
│ col0  │  col1   │  col2   │
│ int32 │ varchar │ varchar │
├───────┼─────────┼─────────┤
│     1 │ apples  │ red     │
└───────┴─────────┴─────────┘
<BLANKLINE>

By default, the view will be replaced if it already exists. To change this, pass replace=False to the save method:

1
2
3
>>> view = view.with_save_options(replace=False)
>>> view.save(data) # doctest: +SKIP
IOException('Failed to save DuckDBView(view='fruits', ...

Example in a node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> from ordeq import node
>>> from ordeq_duckdb import DuckDBTable
>>> import duckdb
>>> connection = duckdb.connect(":memory:")
>>> fruits = DuckDBTable(
...     table="fruits",
...     connection=connection,
... )
>>> fruits_filtered = DuckDBView(
...     view="fruits_filtered",
...     connection=connection,
... )
>>> @node(inputs=fruits, outputs=fruits_filtered)
... def filter_fruits(
...     fruits: duckdb.DuckDBPyRelation
... ) -> duckdb.DuckDBPyRelation:
...     return fruits.filter("color = 'red'")

load()

Loads a DuckDB view.

Returns:

Type Description
DuckDBPyRelation

The DuckDB view.

save(relation, replace=True)

Saves a DuckDB relation to a DuckDB view.

Parameters:

Name Type Description Default
relation DuckDBPyRelation

The DuckDB relation to save.

required
replace bool

Whether to replace the view if it already exists.

True