Skip to content

Commit d0ee460

Browse files
committed
trigger ci test
1 parent 54f6792 commit d0ee460

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed

CI_INVESTIGATION.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# CI Investigation: `Run sqllogictests with the sqlite test suite`
2+
3+
## Question
4+
Why did the CI task
5+
`Datafusion extended tests / Run sqllogictests with the sqlite test suite (pull_request)`
6+
increase from about 1 hour to about 2 hours after merge commit `76be0b64c`?
7+
8+
## Scope
9+
Compared:
10+
- pre-merge parent: `8f959bba6`
11+
- merge result: `76be0b64c`
12+
13+
## Findings
14+
15+
### 1) CI workflow/job definition did not change in the merge
16+
I diffed `.github/workflows/extended.yml` between `8f959bba6` and `76be0b64c` and found no changes.
17+
18+
Implication: the slowdown is not explained by a direct change to the job steps, image, or command in this merge.
19+
20+
### 2) The sqllogictest workload increased materially
21+
`datafusion/sqllogictest/test_files` changes in this merge range:
22+
- `34 files changed`
23+
- `+2359 / -296` lines (net `+2063`)
24+
- new files include:
25+
- `join_limit_pushdown.slt`
26+
- `spark/bitmap/bitmap_bit_position.slt`
27+
- `spark/bitmap/bitmap_bucket_number.slt`
28+
- `spark/json/json_tuple.slt`
29+
30+
Largest growth files:
31+
- `sort_pushdown.slt`: `+748`
32+
- `projection_pushdown.slt`: `+348/-191`
33+
- `dynamic_filter_pushdown_config.slt`: `+301`
34+
- `join_limit_pushdown.slt`: `+269`
35+
36+
Aggregate test corpus size in `datafusion/sqllogictest/test_files`:
37+
- files: `459 -> 463`
38+
- lines: `134122 -> 136189`
39+
- runnable records (query/statement/skipif/onlyif markers): `14845 -> 15110` (`+265`)
40+
41+
Implication: the same CI command now executes more sqllogictest content than before.
42+
43+
### 2.1) PRs in this merge range that expanded sqllogictest corpus
44+
The following PRs (from `8f959bba6..76be0b64c`) had positive net line growth under
45+
`datafusion/sqllogictest/test_files`:
46+
47+
- #20329 `fix: validate inter-file ordering in eq_properties()` (`+538`)
48+
- #20192 `Support parent dynamic filters for more join types` (`+282`)
49+
- #20228 `feat: Push limit into hash join` (`+265`)
50+
- #20247 `Fix incorrect SortExec removal before AggregateExec` (`+210`)
51+
-
52+
- #20117 `feat: add ExtractLeafExpressions optimizer rule for get_field pushdown` (`+166`)
53+
- #20412 `feat: support Spark-compatible json_tuple function` (`+154`)
54+
- #20288 `feat: Implement Spark bitmap_bucket_number function` (`+122`)
55+
- #20275 `feat: Implement Spark bitmap_bit_position function` (`+112`)
56+
- #20420 `test: Extend Spark Array functions: array_repeat, shuffle and slice test coverage` (`+55`)
57+
- #20189 `Adds support for ANSI mode in negative function` (`+52`)
58+
- #20224 `fix: Fix scalar broadcast for to_timestamp()` (`+26`)
59+
- #20279 `fix: disable dynamic filter pushdown for non min/max aggregates` (`+19`)
60+
- #20361 `fix: Handle Utf8View and LargeUtf8 separators in concat_ws` (`+19`)
61+
- #20191 `Support pushing down empty projections into joins` (`+19`)
62+
- #20328 `perf: Optimize trim UDFs for single-character trims` (`+9`)
63+
- #20241 `fix: Add integer check for bitwise coercion` (`+8`)
64+
- #20305 `perf: Optimize translate() UDF for scalar inputs` (`+5`)
65+
- #20341 `Reduce ExtractLeafExpressions optimizer overhead with fast pre-scan` (`+2`)
66+
67+
Notes:
68+
- Net growth values above are line-based deltas in `datafusion/sqllogictest/test_files`.
69+
- Some PRs touched sqllogictests with net `0` (balanced add/remove) and are excluded here.
70+
71+
### 3) Sqllogictest crate/dependency changes also landed from `main`
72+
In `datafusion/sqllogictest/Cargo.toml`:
73+
- `sqllogictest 0.29.0 -> 0.29.1`
74+
- `clap 4.5.57 -> 4.5.60`
75+
76+
`Cargo.lock` in the merge range changed significantly (`+350/-185`), including new packages.
77+
78+
Implication: compile/setup time for the job can increase even if workflow YAML is unchanged.
79+
80+
### 4) Datafusion engine/query-planning code changed heavily in the merge range
81+
This merge pulled many optimizer/execution changes from `main` (plus extensive sqllogictest updates). Even with "perf" commits, net runtime of this specific test corpus can still shift.
82+
83+
Implication: execution time of thousands of sqllogictest queries can change due to planner/executor behavior changes, not only due to test-count growth.
84+
85+
## Most likely explanation
86+
The duration increase is most likely from **workload growth + dependency/build churn introduced from `main`**, not from a workflow definition change in commit `76be0b64c` itself.
87+
88+
In other words, `76be0b64c` is the integration point where many upstream changes became active on this branch.
89+
90+
## Confidence
91+
- High confidence: no job YAML change in this merge, and sqllogictest corpus/deps grew.
92+
- Medium confidence on exact split between "build-time increase" vs "test-runtime increase" because I could not fetch GitHub step timing logs in this environment.
93+
94+
## Limitation encountered
95+
`gh auth status` shows the local GitHub token is invalid, so I could not inspect historical GitHub Actions step durations for direct Build-vs-Run timing attribution.
96+
97+
## Recommended next check (to confirm exact driver)
98+
Compare step durations for two runs (before/after `76be0b64c`) for:
99+
1. `Build sqllogictest binary`
100+
2. `Run sqllogictest`
101+
102+
If Build step grew most: dependency/compile churn is primary.
103+
If Run step grew most: test corpus / query execution behavior is primary.

0 commit comments

Comments
 (0)