Skip to content

Commit a29733a

Browse files
authored
chore(query): reapply iceberg bump (#19304)
* chore: bump iceberg-rust to v0.8.0 and re-enable CI tests - Upgrade iceberg-rust from v0.4.0 to v0.8.0 with breaking API changes - Add IcebergFileIO wrapper to adapt new FileIO API to OperatorRegistry trait - Update catalog builders to use new CatalogBuilder trait with load() method - Re-enable standalone_iceberg_tpch CI tests that were disabled for arm64 - Upgrade hive_metastore from v0.1.0 to v0.2.0 - Simplify generate_catalog_meta to generate_default_catalog_meta * add agents.md * fix: address review feedback for IcebergFileIO - Handle file:// and memory:// URIs that don't have a host/bucket - Fix gcs.project-id mapping to project_id (was incorrectly default_storage_class) - Use Error::other() to fix clippy io_other_error lint * test: add iceberg table write tests - Basic insert with multiple rows - Multiple insert statements - Various data types (int, bigint, float, double, string, date, boolean) - NULL value handling - Partitioned table writes - INSERT SELECT from another iceberg table - Aggregation queries on inserted data * fix: return UNKNOWN_TABLE error for non-existent iceberg tables - Map iceberg 'table not found' errors to ErrorCode::UnknownTable - This allows DROP TABLE IF EXISTS to work correctly for iceberg tables - Fix column type annotation in base.test (ITI -> TAT) - Remove write tests since iceberg tables don't support INSERT yet * test: add iceberg table write tests (expecting errors) - Test CREATE TABLE with various types (int, bigint, double, string, date, boolean) - Test CREATE TABLE with partition by clause - Test INSERT statements (expected to fail with error 1002 since writes not yet supported) - Test DROP TABLE cleanup * feat(iceberg): add write support for iceberg tables - Implement IcebergDataFileWriter for writing data blocks to parquet files - Add IcebergCommitSink for committing data via Transaction API - Support both partitioned and non-partitioned table writes - Handle multi-field partitioning with FanoutWriter - Add type conversion from Databend scalars to Iceberg literals - Include cache invalidation after successful commits - Update tests to verify write functionality works correctly * fix: regenerate Cargo.lock to fix duplicate package error * fix: address lint warnings after Cargo.lock regeneration - Suppress deprecated as_slice/from_slice warnings in hash.rs and jwk.rs - Add missing rand::Rng import in sized_spsc.rs tests - Add allow for diverging_sub_expression in raft_state_machine_impl.rs * fix(ci): use multi-arch eclipse-temurin image for iceberg driver The alpine variant doesn't support ARM64 architecture. * fix(test): update ST_GEOMFROMGEOHASH expected output The polygon ring starting point changed after dependency upgrade. The polygon is geometrically equivalent - same shape, different vertex order. * fix(ci): add uv setup and update test expected output - Add astral-sh/setup-uv@v7 to iceberg tpch test action - Update float scientific notation format (e308 -> e+308) in tests * fix(ci): sync scripts package for pyspark dependency * update * update * update * update * fix: stabilize geohash polygon order * update * update ast version * update ast version * fix
1 parent b751cdb commit a29733a

40 files changed

+1477
-707
lines changed

.github/actions/test_sqllogic_iceberg_tpch/action.yml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ runs:
1616
with:
1717
artifacts: sqllogictests,meta,query
1818

19+
- name: Setup uv
20+
uses: astral-sh/setup-uv@v7
21+
1922
- name: Iceberg Setup for (ubuntu-latest only)
2023
shell: bash
2124
run: |
@@ -29,11 +32,12 @@ runs:
2932
fi
3033
tar -zxf ${data_dir}/tpch.tar.gz -C $data_dir
3134
32-
uv sync
35+
script_dir="tests/sqllogictests/scripts"
36+
(cd "$script_dir" && uv sync)
3337
echo "Running prepare_iceberg_tpch_data.py..."
34-
uv run python tests/sqllogictests/scripts/prepare_iceberg_tpch_data.py
38+
uv run tests/sqllogictests/scripts/prepare_iceberg_tpch_data.py
3539
echo "Running prepare_iceberg_test_data.py..."
36-
uv run python tests/sqllogictests/scripts/prepare_iceberg_test_data.py
40+
uv run tests/sqllogictests/scripts/prepare_iceberg_test_data.py
3741
3842
3943
- name: Run sqllogic Tests with Standalone lib

.github/workflows/reuse.sqllogic.yml

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -228,30 +228,29 @@ jobs:
228228
with:
229229
name: test-sqllogic-cluster-minio-${{ matrix.dirs }}-${{ matrix.handler }}
230230

231-
# TODO: tmp disable since iceberg image not running on arm64
232-
# standalone_iceberg_tpch:
233-
# runs-on:
234-
# - self-hosted
235-
# - ${{ inputs.runner_arch }}
236-
# - Linux
237-
# - 4c
238-
# - "${{ inputs.runner_provider }}"
239-
# steps:
240-
# - uses: actions/checkout@v4
241-
# - uses: actions/setup-java@v4
242-
# with:
243-
# distribution: "temurin"
244-
# java-version: "17"
245-
# - uses: ./.github/actions/test_sqllogic_iceberg_tpch
246-
# timeout-minutes: 15
247-
# with:
248-
# dirs: tpch_iceberg
249-
# handlers: http,hybrid
250-
# - name: Upload failure
251-
# if: failure()
252-
# uses: ./.github/actions/artifact_failure
253-
# with:
254-
# name: test-sqllogic-standalone-iceberg-tpch
231+
standalone_iceberg_tpch:
232+
runs-on:
233+
- self-hosted
234+
- ${{ inputs.runner_arch }}
235+
- Linux
236+
- 4c
237+
- "${{ inputs.runner_provider }}"
238+
steps:
239+
- uses: actions/checkout@v4
240+
- uses: actions/setup-java@v4
241+
with:
242+
distribution: "temurin"
243+
java-version: "17"
244+
- uses: ./.github/actions/test_sqllogic_iceberg_tpch
245+
timeout-minutes: 15
246+
with:
247+
dirs: tpch_iceberg
248+
handlers: http,hybrid
249+
- name: Upload failure
250+
if: failure()
251+
uses: ./.github/actions/artifact_failure
252+
with:
253+
name: test-sqllogic-standalone-iceberg-tpch
255254

256255
cluster:
257256
runs-on:

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,5 +87,5 @@ benchmark/clickbench/results
8787

8888
# tmp
8989
tmp
90-
91-
docs/
90+
# superpowers agent plan docs
91+
./docs/

AGENTS.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,70 @@ Always use `cargo clippy` to make sure there are no compilation errors. Fully ve
1919
## Testing Guidelines
2020
Unit tests stay close to the affected crate (`#[cfg(test)]` modules), and integration behavior belongs in the relevant SQL suites or meta harness (`tests/metactl`, `tests/meta-kvapi`). Every planner, executor, or storage change should add at least one regression SQL file plus expected output when deterministic. Use cluster variants (`make stateless-cluster-test` and TLS mode) whenever coordination, transactions, or auth are involved. Document new fixtures or configs in `tests/README.md` (or inline comments) so CI remains reproducible.
2121

22+
2223
## Commit & Pull Request Guidelines
24+
2325
History follows a Conventional-style subject such as `fix(storage): avoid stale snapshot (#19174)` or `feat: support self join elimination (#19169)`; keep the first line imperative and under 72 characters. Commits should stay scoped to a logical change set and include formatting/linting updates in the same patch. PRs must outline motivation, implementation notes, and validation commands, plus link issues or RFCs, and the final description should follow `PULL_REQUEST_TEMPLATE.md` (checkboxes, verification, screenshots when needed). Attach screenshots or sample queries when UI, SQL plans, or system tables change, and call out rollout risks (migrations, config toggles, backfills) so reviewers can plan accordingly.
26+
27+
There is the example of pull requests:
28+
29+
````
30+
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
31+
32+
## Summary
33+
34+
- Enable table functions like `generate_series` and `range` to accept scalar subqueries as arguments
35+
- Return NULL for empty scalar subqueries to align with existing scalar subquery semantics
36+
37+
## Changes
38+
39+
This PR enables SQL like:
40+
41+
```sql
42+
SELECT generate_series AS install_date
43+
FROM generate_series(
44+
(SELECT count() FROM numbers(10))::int,
45+
(SELECT count() FROM numbers(39))::int
46+
);
47+
````
48+
49+
Previously, table function arguments could only be constants. Now they can be scalar subqueries that return a single value.
50+
51+
## Implementation
52+
53+
1. Added `contains_subquery()` function to detect subqueries in AST expressions
54+
2. Added `execute_subquery_for_scalar()` to execute and extract scalar values from subqueries
55+
3. Modified `bind_table_args` to try constant folding first, then fall back to subquery execution
56+
4. The subquery executor is passed from the binder to enable runtime evaluation
57+
5. Returns `Scalar::Null` for empty subquery results (aligns with LeftSingleJoin behavior)
58+
59+
## Tests
60+
61+
- [x] Unit Test
62+
- [x] Logic Test
63+
- [ ] Benchmark Test
64+
- [ ] No Test - Pair with the reviewer to explain why
65+
66+
Added tests in `02_0063_function_generate_series.test` for:
67+
68+
- `generate_series` with subquery arguments
69+
- `range` with subquery arguments
70+
71+
## Type of change
72+
73+
- [x] New feature (non-breaking change which adds functionality)
74+
75+
<!-- Reviewable:start -->
76+
77+
---
78+
79+
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/databendlabs/databend/19213)
80+
81+
<!-- Reviewable:end -->
82+
83+
```
84+
85+
86+
Pull request mus be pushed into fork and create pr into origin.
87+
You can use gh tools to do it.
88+
```

CLAUDE.md

Lines changed: 0 additions & 164 deletions
This file was deleted.

0 commit comments

Comments
 (0)