fix: Addressing test flakes for TestReadTerragruntConfigDependencyInStack#5781
fix: Addressing test flakes for TestReadTerragruntConfigDependencyInStack#5781
TestReadTerragruntConfigDependencyInStack#5781Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughModify Terraform source download flow to return a boolean indicating whether a download occurred, add per-cache-directory mutexes to serialize concurrent downloads/copying, and conditionally skip module copying when no download and source equals the working directory. Tests and docs updated; a new concurrent integration test added. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
7c0e5ea to
1dff515
Compare
1dff515 to
7c5461a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/runner/run/download_source_test.go (1)
952-975:⚠️ Potential issue | 🟠 MajorDon't write to the parent
errfrom parallel subtests.These subtests call
t.Parallel(), but Line 965 assigns into the outererrvariable. That creates a race in the test itself under-race, so this coverage can flake independently of the production code.🛠️ Minimal fix
- _, err = run.DownloadTerraformSourceIfNecessary(t.Context(), l, src, configbridge.NewRunOptions(opts), cfg, r) + _, downloadErr := run.DownloadTerraformSourceIfNecessary(t.Context(), l, src, configbridge.NewRunOptions(opts), cfg, r) if tc.name == "Local file source" { - require.NoError(t, err) + require.NoError(t, downloadErr) expectedFilePath := filepath.Join(tmpDir, "main.tf") assert.FileExists(t, expectedFilePath) } else { - t.Logf("Source %s result: %v", tc.sourceURL, err) + t.Logf("Source %s result: %v", tc.sourceURL, downloadErr) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runner/run/download_source_test.go` around lines 952 - 975, The parallel subtests are writing to the outer err variable and also risk using a loop variable unsafely; inside the for loop capture tc (e.g., tc := tc) before calling t.Run and inside the subtest use a local error shadow (err := run.DownloadTerraformSourceIfNecessary(...)) instead of assigning to the outer err so the parallel goroutines do not race on the shared err variable; update the test body around run.DownloadTerraformSourceIfNecessary and the tc usage accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/runner/run/download_source.go`:
- Around line 61-67: The current mutex taken from sourceChangeLocks for
terraformSource.DownloadDir is unlocked before Run returns, allowing concurrent
goroutines to mutate the cache during later steps (GenerateConfig,
CheckFolderContainsTerraformCode, Terraform execution); change the locking so
the mutex covers the entire cache-dir lifetime by acquiring the mutex before
entering Run (or at the start of the code path that ultimately calls
Run/CopyFolderContents) and only unlocking after Run fully completes, ensuring
manifest.Clean(), CopyFolderContents, GenerateConfig,
CheckFolderContainsTerraformCode and any Terraform execution run while holding
the same mutex for that terraformSource.DownloadDir.
- Around line 74-105: The current logic uses needsModuleCopy := downloaded ||
!tf.IsLocalSource(...) which skips copying when downloaded==false (cache hit)
and that can serve stale working-dir-specific files; update the logic so
working-dir inputs are never skipped: either (A) always perform the module copy
when any of the includeInCopy items (the computed includeInCopy / tfLintConfig /
cfg.Terraform.IncludeInCopy set) or ModuleManifestName could differ from the
cached copy by setting needsModuleCopy = true when those working-dir inputs
exist/changed, or (B) fold those working-dir inputs into the cache/version key
instead of skipping the copy (i.e., incorporate their hash into the version
calculation used to set downloaded). Locate the decision around needsModuleCopy
(the downloaded variable and tf.IsLocalSource call) and the
util.CopyFolderContents call and implement one of these two fixes so unit-local
files (includeInCopy/.tflint.hcl/tfvars) are either always copied on cache hits
or are included in the cache version.
---
Outside diff comments:
In `@internal/runner/run/download_source_test.go`:
- Around line 952-975: The parallel subtests are writing to the outer err
variable and also risk using a loop variable unsafely; inside the for loop
capture tc (e.g., tc := tc) before calling t.Run and inside the subtest use a
local error shadow (err := run.DownloadTerraformSourceIfNecessary(...)) instead
of assigning to the outer err so the parallel goroutines do not race on the
shared err variable; update the test body around
run.DownloadTerraformSourceIfNecessary and the tc usage accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4a7d2b9f-3d95-4780-aba0-240f89d3fa0a
📒 Files selected for processing (5)
internal/runner/run/download_source.gointernal/runner/run/download_source_test.gointernal/runner/run/run.gotest/integration_regressions_test.gotest/integration_windows_test.go
| // Serialize concurrent downloads to the same cache directory. Without this, | ||
| // manifest.Clean() in one goroutine can delete files while another goroutine | ||
| // is checking for them (e.g. during CheckFolderContainsTerraformCode). | ||
| rawLock, _ := sourceChangeLocks.LoadOrStore(terraformSource.DownloadDir, &sync.Mutex{}) | ||
| dirLock := rawLock.(*sync.Mutex) | ||
| dirLock.Lock() | ||
| defer dirLock.Unlock() |
There was a problem hiding this comment.
The new mutex doesn't cover the whole cache-dir lifetime.
Line 67 unlocks before control returns to Run, so another goroutine can grab the same lock, run CopyFolderContents/manifest.Clean(), and mutate the cache while the first goroutine is already in GenerateConfig, CheckFolderContainsTerraformCode, or Terraform execution. This narrows the original flake, but the race is still reachable whenever the second caller takes the copy path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/runner/run/download_source.go` around lines 61 - 67, The current
mutex taken from sourceChangeLocks for terraformSource.DownloadDir is unlocked
before Run returns, allowing concurrent goroutines to mutate the cache during
later steps (GenerateConfig, CheckFolderContainsTerraformCode, Terraform
execution); change the locking so the mutex covers the entire cache-dir lifetime
by acquiring the mutex before entering Run (or at the start of the code path
that ultimately calls Run/CopyFolderContents) and only unlocking after Run fully
completes, ensuring manifest.Clean(), CopyFolderContents, GenerateConfig,
CheckFolderContainsTerraformCode and any Terraform execution run while holding
the same mutex for that terraformSource.DownloadDir.
| // For local sources, when no download was needed (AlreadyHaveLatestCode=true), | ||
| // skip the module copy: the version hash incorporates all file mod times, so | ||
| // no files have changed and the cache already has the correct content from a | ||
| // previous run. Skipping avoids manifest.Clean() deleting files that a | ||
| // concurrent goroutine expects to exist. | ||
| // | ||
| // For remote sources, always do the module copy because local working-dir | ||
| // files may have changed independently of the remote source version. | ||
| needsModuleCopy := downloaded || !tf.IsLocalSource(terraformSource.CanonicalSourceURL) | ||
|
|
||
| if needsModuleCopy { | ||
| l.Debugf( | ||
| "Copying files from %s into %s", | ||
| util.RelPathForLog(opts.WorkingDir, opts.WorkingDir, opts.Writers.LogShowAbsPaths), | ||
| util.RelPathForLog(opts.RootWorkingDir, terraformSource.WorkingDir, opts.Writers.LogShowAbsPaths), | ||
| ) | ||
|
|
||
| // Always include the .tflint.hcl file, if it exists | ||
| includeInCopy := slices.Concat(cfg.Terraform.IncludeInCopy, []string{tfLintConfig}) | ||
|
|
||
| err = util.CopyFolderContents( | ||
| l, | ||
| opts.WorkingDir, | ||
| terraformSource.WorkingDir, | ||
| ModuleManifestName, | ||
| includeInCopy, | ||
| cfg.Terraform.ExcludeFromCopy, | ||
| ) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| } |
There was a problem hiding this comment.
Skipping the copy on local cache hits can serve stale unit files.
AlreadyHaveLatestCode is documented below as hashing the local source dir, but this block is also what copies unit-local files from opts.WorkingDir into the cache (IncludeInCopy, .tflint.hcl, tfvars, etc.). If those files change while the local module source does not, downloaded stays false and Terragrunt keeps running with the previous cached copies. Please either keep copying on cache hits or fold the copied working-dir inputs into the cache/version key first.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/runner/run/download_source.go` around lines 74 - 105, The current
logic uses needsModuleCopy := downloaded || !tf.IsLocalSource(...) which skips
copying when downloaded==false (cache hit) and that can serve stale
working-dir-specific files; update the logic so working-dir inputs are never
skipped: either (A) always perform the module copy when any of the includeInCopy
items (the computed includeInCopy / tfLintConfig / cfg.Terraform.IncludeInCopy
set) or ModuleManifestName could differ from the cached copy by setting
needsModuleCopy = true when those working-dir inputs exist/changed, or (B) fold
those working-dir inputs into the cache/version key instead of skipping the copy
(i.e., incorporate their hash into the version calculation used to set
downloaded). Locate the decision around needsModuleCopy (the downloaded variable
and tf.IsLocalSource call) and the util.CopyFolderContents call and implement
one of these two fixes so unit-local files (includeInCopy/.tflint.hcl/tfvars)
are either always copied on cache hits or are included in the cache version.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/runner/run/download_source.go (1)
61-67: Lock scope addresses the download/copy race; consider defensive type assertion.The mutex serializes concurrent
DownloadTerraformSourceIfNecessaryandCopyFolderContentscalls to the same cache directory, which addresses the stated race condition in the PR objectives.Minor defensive improvement: the type assertion on line 65 could use the comma-ok idiom to avoid a potential panic if the map ever contains an unexpected type.
🛡️ Optional defensive fix
rawLock, _ := sourceChangeLocks.LoadOrStore(terraformSource.DownloadDir, &sync.Mutex{}) -dirLock := rawLock.(*sync.Mutex) +dirLock, ok := rawLock.(*sync.Mutex) +if !ok { + return nil, errors.Errorf("unexpected lock type for %s", terraformSource.DownloadDir) +} dirLock.Lock() defer dirLock.Unlock()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runner/run/download_source.go` around lines 61 - 67, The code serializes concurrent downloads using sourceChangeLocks but does a blind type assertion rawLock.(*sync.Mutex) which can panic if the map contains an unexpected type; update the assertion in the DownloadTerraformSourceIfNecessary/CopyFolderContents lock section to use the comma-ok form (m, ok := rawLock.(*sync.Mutex)) and handle the false case defensively (e.g., allocate a new sync.Mutex, store it back into sourceChangeLocks for terraformSource.DownloadDir and use that mutex) so the code never panics on a bad type and still preserves the serialized lock behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/runner/run/download_source.go`:
- Around line 61-67: The code serializes concurrent downloads using
sourceChangeLocks but does a blind type assertion rawLock.(*sync.Mutex) which
can panic if the map contains an unexpected type; update the assertion in the
DownloadTerraformSourceIfNecessary/CopyFolderContents lock section to use the
comma-ok form (m, ok := rawLock.(*sync.Mutex)) and handle the false case
defensively (e.g., allocate a new sync.Mutex, store it back into
sourceChangeLocks for terraformSource.DownloadDir and use that mutex) so the
code never panics on a bad type and still preserves the serialized lock
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bac3f02a-2bb5-40da-b161-6854f9d3e26d
📒 Files selected for processing (1)
internal/runner/run/download_source.go
Description
A race between cache dir creation and reset causes flakes in
TestReadTerragruntConfigDependencyInStack. This fixes that.TODOs
Read the Gruntwork contribution guidelines.
Release Notes (draft)
Added / Removed / Updated [X].
Migration Guide
Summary by CodeRabbit
Refactor
Tests