Skip to content

Targeted CI test execution for PRs #16438

@zeitlinger

Description

@zeitlinger

Problem

Every PR launches the full CI matrix regardless of what changed: 56 test jobs (4 partitions × 14 JVM configs), all 8 smoke test suites × 2 OS, and 4 muzzle jobs. The remote Gradle build cache mitigates this — unchanged test tasks resolve as FROM-CACHE without re-executing. But all jobs still launch, and each carries ~6–7 min of overhead (JDK setup, Gradle startup, ~685 compile/processResources tasks) even when 0 tests actually run.

Measured wall times for recent PRs (all with warm cache):

PR Change type Wall time Tests actually executed
#16444 docs (add link) 8 min 0 (all FROM-CACHE)
#16429 code (jmx-metrics) 6.5 min 1 module (rest FROM-CACHE)
#16440 code (ktor) 30 min 0 (bottleneck: smoke tests)
#16446 dep update (examples/) 16 min 0 (all FROM-CACHE)

The cache handles test execution well, but can't eliminate the per-job overhead or prevent unnecessary smoke test/muzzle runs.

Proposal

Classify PR changes into three scopes and adjust test execution accordingly:

  • FULL_BUILD — core module changed (instrumentation-api/, javaagent-tooling/, conventions/, settings.gradle.kts, etc.) → run everything (same as today)
  • SKIP_TESTS — docs/CI-only changes (.md, .github/, docs/, smoke-tests/, examples/, etc.) → skip instrumentation test and muzzle matrices entirely
  • TARGETED — changes only under instrumentation/ → build reverse dependency graph from project() references, run only affected test tasks and smoke test suites

Expected impact

The remote Gradle build cache already avoids re-running tests for unchanged modules (tests resolve as FROM-CACHE). This proposal complements the cache by eliminating unnecessary jobs entirely — each job still carries ~6–7 min of overhead even when all its tests are FROM-CACHE.

Docs-only PR

CI task Without cache With cache With targeted CI
Instrumentation tests 528 tasks 0 (all FROM-CACHE) 0 (jobs not launched)
Smoke tests 16 jobs 16 jobs 0
Muzzle 4 jobs 4 jobs 0
Build, spotless, lints full full full
Wall time ~27 min ~8 min (70%) ~3 min (89%)
Compute ~900 min ~530 min (41%) ~20 min (98%)

Single instrumentation, e.g. alibaba-druid

CI task Without cache With cache With targeted CI
Instrumentation tests 528 tasks 1 task (rest FROM-CACHE) 1 task (jobs not launched)
Smoke tests 16 jobs 16 jobs 0
Muzzle 4 jobs 4 jobs 4 jobs
Build, spotless, lints full full full
Wall time ~27 min ~8–16 min (41–70%) ~5–13 min (52–81%)
Compute ~900 min ~530 min (41%) ~56 min (94%)

Instrumentation with reverse deps, e.g. jetty

CI task Without cache With cache With targeted CI
Instrumentation tests 528 tasks ~48 tasks (rest FROM-CACHE) ~48 tasks (jobs not launched)
Smoke tests 16 jobs 16 jobs 2 jobs
Muzzle 4 jobs 4 jobs 4 jobs
Build, spotless, lints full full full
Wall time ~27 min ~15–30 min (0–44%) ~13 min (52%)
Compute ~900 min ~530 min (41%) ~154 min (83%)

Widely-depended-on, e.g. servlet-common

CI task Without cache With cache With targeted CI
Instrumentation tests 528 tasks ~68 tasks (rest FROM-CACHE) ~68 tasks, fewer partitions
Smoke tests 16 jobs 16 jobs 14 jobs
Muzzle 4 jobs 4 jobs 4 jobs
Build, spotless, lints full full full
Wall time ~27 min ~15–30 min (0–44%) ~15 min (44%)
Compute ~900 min ~530 min (41%) ~500 min (44%)

Core module or test all label

Everything runs as today.

Design decisions

  1. Conservative by default — any unclassified file triggers FULL_BUILD
  2. test all PR label — escape hatch to force full build
  3. Smoke test suite selection uses the reverse dep closure — same BFS graph used for test filtering maps modules to smoke test suites via a static prefix mapping
  4. Muzzle: SKIP_TESTS only in PoC — planned: filter muzzle to affected modules too

Planned improvements beyond PoC

  • Targeted muzzle — filter muzzle to affected modules instead of running all 4 partitions
  • Skip non-essential build jobs — compile-only for TARGETED mode (skip spotless, javadoc, SBOM, checkstyle)
  • Dynamic partition reduction — collapse 4→1 partitions when few tasks
  • Smoke test mapping uses full reverse dep closure (current PoC only uses directly changed modules)
  • Generate smoke test mapping from build files (currently a static file)

Implementation

  • Two new shell scripts (~240 + ~90 lines) for change classification and test filtering
  • A static mapping file for smoke test suite selection
  • Modifications to 4 workflow files

Working prototype: #16436

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions