Skip to content

Commit aa8ae7f

Browse files
authored
feat(internal/civisibility): subtest-level test management & flaky retry support. (#4063)
Co-authored-by: tony.redondo <[email protected]>
1 parent 49ddb0c commit aa8ae7f

File tree

13 files changed

+1586
-116
lines changed

13 files changed

+1586
-116
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# `internal/civisibility` Walkthrough
2+
3+
## High-Level Purpose
4+
- Implements Datadog Test Optimization for Go: bootstraps tracing/log streaming, exposes manual test lifecycle APIs, and auto-instruments `testing`.
5+
- Coordinates feature toggles such as Intelligent Test Runner (ITR), early flake detection, flaky retries, impacted tests, test management, subtest-level directives, code coverage, and CI logs.
6+
- Normalizes CI/git metadata, fetches repository deltas, and communicates with backend settings/coverage/logs endpoints through a configurable client layer.
7+
- Houses telemetry hooks to measure git command usage, HTTP behavior, and test instrumentation statistics.
8+
9+
## Top-Level Layout
10+
- `civisibility.go` – atomic state/test-mode switches shared across integrations.
11+
- `constants/` – tag keys, span types, environment variable names, capability flags, span metadata helpers.
12+
- `integrations/` – tracer bootstrapping, feature negotiation, manual test lifecycle API, Go `testing` instrumentation (including subtest orchestration), log streaming.
13+
- `integrations/gotesting/subtests/` – mock backend + scenario harness exercising parent/subtest directive matrices and retry ownership rules.
14+
- `utils/` – CI provider discovery, git utilities, code owners lookup, network clients, telemetry plumbing, name canonicalization, impacted test logic, fixtures.
15+
16+
## Core Components
17+
18+
### Root State Management
19+
`civisibility.go` stores a process-wide `status` (`StateUninitialized` through `StateExited`) and a `isTestMode` flag using `atomic` types. Integrations set these to coordinate tracer startup/teardown, and tests can toggle “test mode” for mock tracer usage.
20+
21+
### Constants Package
22+
- `ci.go`, `env.go`, `git.go`, `os.go`, `runtime.go` declare string keys for CI metadata, git fields, OS/runtime descriptors, and env var names driving feature toggles (`DD_CIVISIBILITY_*`, `CIVisibility*`).
23+
- `tags.go`, `test_tags.go`, `span_types.go` centralize span tag names, capability markers, and status/type enumerations used by integrations and utils.
24+
- `test_tags.go` captures nuanced flags (`test.itr.forced_run`, quarantine/disable toggles, retry reasons, coverage toggles) ensuring consistent tagging between retries, EFD, ITR, and test management flows.
25+
26+
### Integrations Package
27+
- `civisibility.go` handles one-time tracer initialization: sets `DD_CIVISIBILITY_ENABLED`, forces tracing sample rate to 1, preloads CI metadata/code owners, optionally sets service name from repo URL, and registers signal handlers that call `ExitCiVisibility`. Close actions queue via `PushCiVisibilityCloseAction`, running in LIFO order during shutdown.
28+
- `civisibility_features.go` orchestrates backend settings negotiation. It spawns asynchronous git pack uploads, retries settings fetch if backend needs git data, applies env-var kill switches for features (flaky retries, impacted tests, test management, subtest directives), and lazily loads supplementary data (known tests, skippables, impacted tests analyzer). Settings and HTTP client live in package-level vars protected by `sync.Once`.
29+
- `manual_api*.go` files expose strongly-typed interfaces for user-driven test lifecycle: `TestSession`, `TestModule`, `TestSuite`, `Test`, along with option structs for command, framework, working directory, start/finish timestamps, and error reporting via `ErrorOption`. Variants (`manual_api_ddtest*.go`) adapt to `ddtest` helper semantics, and `manual_api_mocktracer_test.go` validates API behavior under mock tracer mode.
30+
- `gotesting/` auto-instruments `testing.M`, `testing.T`, and `testing.B`.
31+
- `testing.go`, `testingT.go`, `testingB.go`, and related files manage session/module/suite creation, attach tags (including module/suite counters), handle chatty output, skip logic, coverage capture, log streaming, and telemetry emission. They integrate with `integrations.GetSettings()`, `net` clients, and `logs`.
32+
- Hierarchical identity plumbing: `testIdentity` (module, suite, base name, full name, path segments) plus `matchTestManagementData` allow subtests to resolve directives like `TestParent/SubChild`. `getTestManagementData` reports whether a directive was an exact match or inherited from an ancestor.
33+
- `instrumentation.go` wires wrappers around test functions, stores execution metadata (retries, new/modified flags, quarantined/disabled states, attempt-to-fix ownership), and coordinates with backend settings for retries/EFD/ITR. It leverages `unsafe` pointers and reflective lookup to map `testing` internals, guarded by `sync` primitives. Subtests consult the parent execution metadata to decide whether they should wrap themselves or defer to the parent-run retry loop.
34+
- Attempt-to-fix ownership rules:
35+
1. Parent-only directives orchestrate retries and tag success/failure; children inherit attempt-to-fix tagging but emit no retry spans.
36+
2. Child-only directives wrap the subtest locally (when feature flag enabled and exact match present) while leaving the parent neutral.
37+
3. Parent and child requesting attempt-to-fix results in the parent winning; subtests receive tags but do not run retries.
38+
4. Parent quarantine + attempt-to-fix keeps the parent as the retry owner while subtests inherit quarantine tags; a quarantined parent without attempt-to-fix leaves children free to execute their own retries if explicitly configured.
39+
- Feature gating: `DD_CIVISIBILITY_SUBTEST_FEATURES_ENABLED` enables subtest directives. Standard debug logs capture identity/ownership traces when enabled.
40+
- `instrumentation_orchestrion.go` and `orchestrion.yml` support bytecode rewriting via Orchestrion for transparent instrumentation in user code. The orchestrion path computes subtest identities, inspects parent metadata, and applies the same ownership logic as the manual wrappers.
41+
- `coverage/` builds code coverage payloads, writes them via `coverage_writer`, and includes an auto-generated `test_coverage_msgp.go` for MsgPack encoding.
42+
- `reflections.go` / `_test.go` ensure compatibility with `go test` internal structures across versions; helper routines detect struct offsets, function pointers, and maintain compatibility with new Go releases.
43+
- `logs/` encapsulates CI log forwarding: gating via `DD_CIVISIBILITY_LOGS_ENABLED` stable config, packaging log entries with consistent tags, buffering/flush policies, payload formatting (`logs_payload.go`) and writer lifecycle (`logs_writer.go`).
44+
45+
### Subtest Scenario Harness (`integrations/gotesting/subtests/`)
46+
- Provides an executable matrix covering parent/subtest directive permutations. `main_test.go` enables the subtest feature flag and spawns subprocesses per scenario using `SUBTEST_MATRIX_SCENARIO`.
47+
- `subtestcontroller_test.go` spins up a mock backend (`startSubtestServer`) that surfaces settings, test-management payloads, and stubbed endpoints for logs/git to keep tests hermetic.
48+
- Scenarios assert span counts, `test.status`, `test.is_quarantined`, `test.is_disabled`, retry metadata (`test.is_retry`, `test.retry_reason`), and ownership of attempt-to-fix success tags. Parallel subtests and custom retry budgets are part of the matrix.
49+
- Utilities (`scenarioContext`, `setParentDirective`, `setSubDirective`) help craft hierarchical directives; helper assertions document expected tagging for collaborators extending the matrix.
50+
51+
### Utils Package
52+
- `ci_providers.go` detects CI metadata across numerous providers (AppVeyor, Azure Pipelines, GitHub Actions, Jenkins, etc.), normalizes refs/URLs, removes secrets, supports user overrides through `DD_GIT_*` env vars, and logs detected provider. Fixtures under `testdata/fixtures/{providers}` supply provider-specific JSON.
53+
- `environmentTags.go` maintains cached CI tags/metrics with thread-safe mutation (`AddCITags*`, `ResetCITags*`), expands `~`, computes relative paths, and augments CPU metrics (logical cores).
54+
- `git.go` performs git command execution with telemetry instrumentation, synchronized access (`gitCommandMutex`), shallow clone detection/unshallow, pack-file generation (`MaxPackFileSizeInMb`, `CreatePackfiles`), base branch discovery, and sensitive info filtering. Interacts with `utils/telemetry` enums to classify commands/errors. Backed by tests covering command paths and error handling.
55+
- `file_environmental_data.go` and `_test.go` collect file-level metadata (size, permissions, hash) referenced by impacted tests. `filebitmap/` stores efficient bitmap representation of file coverage.
56+
- `impactedtests/` implements incremental test selection. `algorithm.md` documents the base branch detection heuristic (with 2025 updates) and ties closely to git utilities; `impacted_tests.go` consumes backend responses to track new/modified tests.
57+
- `codeowners.go` parses CODEOWNERS files with caching and fallback to repo root; fixtures for GitHub/GitLab located under `testdata/fixtures/codeowners`.
58+
- `names.go` normalizes module/suite names via runtime function lookup and heuristics, ensuring consistent tagging even with nested/subtests; tests validate complex name resolution.
59+
- `home.go` and `file_environmental_data.go` handle home directory discovery with consideration for CI sandboxes and Windows drive letters.
60+
- `net/` houses HTTP client logic:
61+
- `client.go` builds agent or agentless clients, selects base URL/subdomain, attaches tags/headers, and exposes methods for settings, known tests, pack files, coverage, logs, skippables, and test management APIs. Incorporates retry/backoff (`math/rand/v2` jitter), compression awareness, telemetry hooks, and optional EVP proxy over Unix sockets.
62+
- `http.go`, `coverage.go`, `logs_api.go`, `settings_api.go`, etc., serialize network payloads, set proper endpoints, compress payloads, and capture request/response telemetry (status codes, compression flags, payload sizes).
63+
- `skippable.go`, `known_tests_api.go`, `test_management_tests_api.go` parse backend responses into typed structs for downstream integrations.
64+
- `telemetry/` defines dimensional labels for events (framework identifiers, CI provider tags, error types, git command categories) used throughout the package to emit consistent metrics.
65+
- `names_test.go`, `git_test.go`, `codeowners_test.go`, `ci_providers_test.go`, `net/*_test.go`, etc., provide extensive coverage, often using fixtures to simulate CI environments and network responses.
66+
67+
## Testing, Fixtures, and Tooling
68+
- Extensive `_test.go` coverage in integrations (`manual_api`, `gotesting`, `logs`) and utils ensures feature toggles, retries, coverage serialization, and network clients behave as expected.
69+
- Subtest matrix harness (`integrations/gotesting/subtests`) runs under `go test` and exercises the parent/subtest permutations needed to guard subtest-specific instrumentation changes. Enable debug logging to surface per-scenario diagnostics.
70+
- `integrations/gotesting/testcontroller_test.go` retains historical scenarios for flaky retries, EFD, ITR, and impacted tests; it coexists with the new subtest harness to avoid regression gaps.
71+
- `utils/testdata/fixtures/providers/*.json` mimics CI payloads; `github-event.json` supports webhook parsing tests.
72+
- Generated assets: `coverage/test_coverage_msgp.go` (MsgPack via `go:generate`), with tests to ensure deterministic encoding.
73+
- `integrations/gotesting/reflections_test.go` safeguards reflection-based hooks against Go runtime changes.
74+
- Mock tracer support via `mocktracer` allows unit tests to assert spans without real agent connectivity.
75+
76+
## Notable Nuances & Design Choices
77+
- Heavy use of `sync.Once`, `atomic`, and mutexes to guard global state, ensuring idempotent initialization even under concurrent instrumentation hooks.
78+
- Feature toggles honor both backend settings and local env overrides, often logging when overrides disable capabilities to aid troubleshooting. Subtest features default off unless an env var enables them, allowing gradual rollout.
79+
- Subtest wrappers strictly require an exact directive match before wrapping to avoid unnecessary allocations when hierarchy lookups fall back to parent configuration.
80+
- Git operations are serialized to avoid repository lock contention, and telemetry logs command timings plus categorized exit codes to monitor flaky git environments.
81+
- Instrumentation leans on `unsafe.Pointer` and reflection to interpose on testing internals, a delicate strategy mitigated by fallback logic and version checks. Helper utilities (`reflections.go`) centralize offsets so new Go releases require updates in a single place.
82+
- Coverage and impacted test features rely on asynchronous git uploads; close actions ensure goroutines finish before process exit.
83+
- Network layer supports agentless uploads with API key validation and on-the-fly compression, while also accommodating Datadog agent EVP proxy over HTTP or Unix sockets.
84+
- `orchestrion.yml` indicates support for compile-time rewriting, hinting at hybrid instrumentation strategies (manual wrappers plus bytecode injection).
85+
- Logging pipeline mirrors test span IDs and includes service/host tags, but is guarded behind stable-config flag to avoid unexpected log emission.
86+
87+
## Getting Involved
88+
- When touching `integrations/gotesting`, run both the legacy controller suite and the subtest matrix (`go test ./internal/civisibility/integrations/gotesting/...`). Many scenarios spawn subprocesses; enable debug logging for verbose traces.
89+
- Any change to retry ownership or metadata propagation should be mirrored in the harness scenarios and in `docs/SUBTEST_FEATURE_IMPLEMENTATION.md` to keep documentation synchronized.
90+
- Utility changes often require updating fixtures or provider expectations; leverage the existing test suites instead of ad-hoc scripts.

internal/civisibility/constants/env.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,4 +55,7 @@ const (
5555

5656
// CIVisibilityInternalParallelEarlyFlakeDetectionEnabled indicates if the internal parallel early flake detection feature is enabled.
5757
CIVisibilityInternalParallelEarlyFlakeDetectionEnabled = "DD_CIVISIBILITY_INTERNAL_PARALLEL_EARLY_FLAKE_DETECTION_ENABLED"
58+
59+
// CIVisibilitySubtestFeaturesEnabled indicates if subtest-specific management and retry features are enabled.
60+
CIVisibilitySubtestFeaturesEnabled = "DD_CIVISIBILITY_SUBTEST_FEATURES_ENABLED"
5861
)

0 commit comments

Comments
 (0)