Skip to content

Commit d77d2d0

Browse files
caseydmclaude
andcommitted
Speed up smoke tests: cache works_df, combine aggregations, multi-node cluster
- Cache works snapshot DataFrame once and reuse across all 9 test suites instead of re-reading the 300M+ row table in every cell - Combine Tests 5 and 5b into a single aggregation pass (12 conditional counts in one scan instead of ~10 separate scans) - Combine Test 7 null checks into a single aggregation - Switch cluster from single-node i3.4xlarge to 4-worker i3.2xlarge for parallel reads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 513581f commit d77d2d0

File tree

2 files changed

+14
-459
lines changed

2 files changed

+14
-459
lines changed

jobs/snapshot_full.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -326,12 +326,11 @@ resources:
326326
availability: SPOT_WITH_FALLBACK
327327
zone_id: auto
328328
spot_bid_price_percent: 100
329-
node_type_id: i3.4xlarge
329+
node_type_id: i3.2xlarge
330+
num_workers: 4
330331
enable_elastic_disk: false
331332
data_security_mode: SINGLE_USER
332333
runtime_engine: STANDARD
333-
kind: CLASSIC_PREVIEW
334-
is_single_node: true
335334
git_source:
336335
git_url: https://github.com/ourresearch/openalex-walden.git
337336
git_provider: gitHub

0 commit comments

Comments
 (0)