Skip to content

Commit befe9ce

Browse files
sh-rprudolfixanuunchin
authored
transformations - updates (#2718)
* rename flag for executing raw queries to "execute_raw_query" * return sge queries from the internal _query method which removes a lot of unneeded transpiling clean up make_transformation function tests still pending * adds some tests to readable dataset and a test for column hint merging * allows any dialect when writing queries and fixes tests * update docs and set correct quoting to queries in normalization and load stage * fixes normalizer tests * fix limit on mssql normalize aliases in normalization step * add missing quote to alias * revert identifier normalization step in normalizer_query and use bigquery compiler for bigquery destinations * post rebase fix * smallish pr fixes * add materializable sqlmodel and handle hints in extractor * add and test always_materialize setting * add test for sql transformation type * convert transformation functions to need yield instead of return * migrate tests and docs snippets to yield in transformations * add simple test for materializable model * use correct compiler for converting ibis into sqlglot for each dialect fixes on transformation test * add first simple version of using unbound ibis tables in transformations * skip ibis test on python 3.9 * fix query building in new relation * return a "real" relation from a transformation * add ibis option when getting table from dataset natively support unbound ibis tables in transformations and when getting relations from dataset * update model item format tests to use relation * * remove one unneeded test (same thing is already tested in transformations) * fix wei conversion in linneage * adds support for adding resource hints to pyarrow items * switch most read access tests to default dataset * update datasets and transformations docs pages * separate ibis and default dbapi datasets and fix typing * update transformation tests and small typing fixes for updated datasets * fix default dataset type * fix wei sqlglot conversion * add sqlglot dialect type and some cleanup * fix dataset snippets * fix sqlglot schema test * removes ibis relation and dataset consolidates relation and dataset baseclasses with implementations updates interfaces/protocols fro relation and dataset and makes those the publicly available interface with "Relation" and "Dataset" remove query method from relation interface * fix one doc snippet * rename dataset and relation interfaces * fix relation ship between cursor and relation, remove function wiring hack in favor of explicit forwarding for better typing * clean up readablerelation (no actual code changes) * fix str test to assume pretty sql (which it is now) fix one transformation snippet * small changes from review comments: * query method on dataset * typing update of table method * rename query method to "to_sql" on relation * clean up transform function a bit (could maybe be even better= reject non-sql strings in transformation to not shadow errors * add support for "non-generator" transformations * move hints computation into resource class * smallish PR fixes * add support for dynamic hints in transformations -> this allows to have multiple relations with different schemas in the relation, so this is allowed now too * fixes dynamic table caching * Enhances ReadableDBAPIRelation: min/max, filter with expression (#2833) * Min max, filter with expr_or_string * Fix in min max test * Overload fix and docs * Test read interfaces partially uses default relation max * prevent sqglot schema from adding default hints info, only allow parametrized types and don't supply hints if none are present in dlt schema * make multi schema transformations work again * move model item format tests to transformations folder * re-order interface tests and fix playground dataset access * PR review test updated * update dataset and transformation pages * update transformations tests to new fruitshop * Last PR fixes * update columns_schema property --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
1 parent ffa88a3 commit befe9ce

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2028
-1658
lines changed

.github/workflows/test_common.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ jobs:
110110
pytest tests/pipeline/test_pipeline_extra.py -k arrow ${{ matrix.pytest_args }}
111111
112112
- name: Install pipeline and sources dependencies
113-
run: uv sync ${{ matrix.uv_sync_args }} --extra duckdb --extra cli --extra parquet --extra deltalake --extra sql_database --group sentry-sdk --group pipeline --group sources
113+
run: uv sync ${{ matrix.uv_sync_args }} --extra duckdb --extra cli --extra parquet --extra deltalake --extra sql_database --group sentry-sdk --group pipeline --group sources --group ibis
114114

115115
- name: Run extract and pipeline tests
116116
run: |

dlt/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030
from dlt.extract.decorators import source, resource, transformer, defer
3131
from dlt.destinations.decorators import destination
3232
from dlt.transformations.decorators import transformation
33-
from dlt.destinations.dataset import dataset, ReadableDBAPIDataset as Dataset
33+
from dlt.common.destination.dataset import Dataset, Relation
34+
from dlt.destinations.dataset import dataset
3435

3536
from dlt.pipeline import (
3637
pipeline as _pipeline,
@@ -81,6 +82,7 @@
8182
"sources",
8283
"destinations",
8384
"Dataset",
85+
"Relation",
8486
"dataset",
8587
"transformation",
8688
]

dlt/common/data_writers/writers.py

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -190,16 +190,9 @@ def write_header(self, columns_schema: TTableSchemaColumns) -> None:
190190
def write_data(self, items: Sequence[TDataItem]) -> None:
191191
super().write_data(items)
192192
for item in items:
193-
dialect = item.dialect or (self._caps.sqlglot_dialect if self._caps else None)
194-
query = item.query
195-
parsed_query = sqlglot.parse_one(query, read=dialect)
196-
197-
# Ensure the parsed query is a SELECT statement
198-
if not isinstance(parsed_query, sqlglot.exp.Select):
199-
raise ValueError("Only SELECT statements are allowed to write model files.")
200-
201-
normalized_query = parsed_query.sql(dialect=dialect)
202-
self._f.write("dialect: " + (dialect or "") + "\n" + normalized_query + "\n")
193+
dialect = item.query_dialect()
194+
query = item.to_sql()
195+
self._f.write("dialect: " + (dialect or "") + "\n" + query + "\n")
203196

204197
@classmethod
205198
def writer_spec(cls) -> FileWriterSpec:

dlt/common/destination/capabilities.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
Any,
44
Callable,
55
ClassVar,
6-
Iterable,
76
Literal,
87
Optional,
98
Sequence,
@@ -12,6 +11,7 @@
1211
Protocol,
1312
Type,
1413
)
14+
from dlt.common.libs.sqlglot import TSqlGlotDialect
1515
from dlt.common.data_types import TDataType
1616
from dlt.common.destination.configuration import ParquetFormatConfiguration
1717
from dlt.common.exceptions import TerminalValueError
@@ -203,7 +203,7 @@ class DestinationCapabilitiesContext(ContainerInjectableContext):
203203
enforces_nulls_on_alter: bool = True
204204
"""Tells if destination enforces null constraints when adding NOT NULL columns to existing tables"""
205205

206-
sqlglot_dialect: Optional[str] = None
206+
sqlglot_dialect: Optional[TSqlGlotDialect] = None
207207
"""The SQL dialect used by sqlglot to transpile a query to match the destination syntax."""
208208

209209
parquet_format: Optional[ParquetFormatConfiguration] = None

0 commit comments

Comments
 (0)