Skip to content

Conversation

@AliSoftware
Copy link

@AliSoftware AliSoftware commented Nov 22, 2025

Fixes buildkite/agent#3588

Why / Context

When testing if secrets should be redacted based on their env var names, the current implementation was:

  • Testing the env var names against the known suffixes using Contains instead of HasSuffix.
  • Was not filtering secret values that were too short

This made env vars like DISABLE_PASSWORD_PROMPT for example be redacted despite _PASSWORD being in the middle of that env var name, not as a suffix. Likewise, DISABLE_PASSWORD=1 made any "1" appearing in the logs be redacted as we didn't filter secrets < 6 bytes.

How / The Fix

Red-Green-Refactoring approach:

  • Add unit tests to test the secretsToRedact to highlight the 2 problems (and ensure we have failing tests in case of a regression in the future on this)
  • Fix suffix issue by replacing strings.Contains(key, suffix) with strings.HasSuffix(key, suffix)
  • Fix min length issue by adding a length condition in redactSecret(…)
  • Refactored the tests to introduce a BuildkiteAgent interface+type that wraps all the exec.Command calls to buildkite-agent into a dedicated type (similar to Client and Agent), so we can stub them in tests more cleanly

Notes

🤖 I used Cursor and its AI agent to help me work on this. While I proofread the code it generated, corrected it, and iterated on it until I was happy and confident the logic looked correct and made sense, I still wanted to make note of my use of AI for transparency.

🌻 Also, I have never written code in Go before (hence enrolling a bit of AI to guide me), so it's very possible that I might not have used the best patterns for some implementations (for example I'm not sure about the use of config_test.go file to expose the private field from the package secrets only to test code… is that good Go practice? [EDIT: Solved with a different approach since])

Future directions

💡 Some potential ideas for future PRs:

  • Have this plugin read the list of patterns from os.Getenv('BUILDKITE_REDACTED_VARS') and instead of using hardcoded defaultSecretSuffixes? Which would mean update the condition logic in isSecretVar again, this time to be able to do a fnmatch-style matching instead of using strings.HasSuffix.

So we can mock the `buildkite-agent` interactions more easily in tests
 - `Agent` types inside their own modules are ok as they're already disambiguated by `sshagent.Agent` vs `buildkiteagent.Agent`. So this renames `type BuildkiteAgent struct` to `type Agent struct` in the `buildkiteagent` package.
 - But the `interface` types in `secrets.go` both live in the `secrets` module and benefit from having a differentiating name in that file, i.e. `type SSHAgent interface` vs `type BuildkiteAgent interface`. So this also renames the `Agent interface` interface to `SSHAgent interface`
 - Similarly in `secrets_test.go`, `fakeAgent` variables and `FakeAgent struct` stub have been renamed `fakeSSHAgent` and `FakeSSHAgent struct` respectively
@AliSoftware AliSoftware changed the title Fix the condition for testing which secrets should be redacted Fix condition for secrets eligibility to be redacted + Add related unit tests Nov 23, 2025
@AliSoftware
Copy link
Author

cc @DrJosh9000 not sure you saw that PR (and I don't have permissions to add reviewers to the PR) 🙂

@DrJosh9000
Copy link
Contributor

Thanks @AliSoftware , I did see this and I do mean to get back to this soon!

@DrJosh9000
Copy link
Contributor

Hey @AliSoftware, thanks again for working on this PR.

On reflection, I think the right approach is along the lines of your comment here, where the agent command can optionally apply the name and length-based filtering using keys in the JSON object. s3-secrets-hooks will still need updating both to pass the new flag and to perform the filtering itself when the flag is not available.

There's a lot happening in this PR, especially considering the size and scope of the bugs that are being fixed. I can see why you might want to introduce an interface and test double for Buildkite Agent (for tests), but I think this offers no meaningful improvement in fidelity. These new tests are testing that the methods of an interface are called. For better fidelity, you want to test that buildkite-agent is actually executed. I don't think it's worth it, and we'd be better off with a unit test that checks the contents of secretsToRedact after processing.

@AliSoftware
Copy link
Author

Hi @DrJosh9000 thanks for the review.

I can see why you might want to introduce an interface and test double for Buildkite Agent (for tests), but I think this offers no meaningful improvement in fidelity. These new tests are testing that the methods of an interface are called. For better fidelity, you want to test that buildkite-agent is actually executed. I don't think it's worth it, and we'd be better off with a unit test that checks the contents of secretsToRedact after processing.

That was actually my first implementation (up to this commit) but:

  • I didn't like that to be able to test secretsToRedact (a private field) I had to go through something that felt unnatural to expose it to the test package (but maybe that was OK, I'm new to Go after all)
  • That made me realize that the current tests were a mix of unit and integration tests, as they were relying on the presence or absence of the buildkite-agent binary to run
    • If you run those tests on a machine that doesn't have the buildkite-agent binary installed (or in $PATH)—like on my development machine for example—then the tests won't actually cover part of the codebase being tested, because the implementation will just print a warning saying buildkite-agent not found and not call the redactor add at all, not covering that implementation at all in tests in practice
    • While if you run those tests on a machine which has buildkite-agent binary installed and in $PATH, the tests will cover a different part of the implementation, the one that actually calls redactor add
    • This made the tests non-deterministic / dependent on the machine they are run on, which didn't feel like good practice nor reliable/consistant (as tests could pass on one machine but fail on another)? Which is what made me decide to go the route of implementing the interface instead.

There's a lot happening in this PR, especially considering the size and scope of the bugs that are being fixed

Yeah, I can agree to that. If you want, we could split this in multiple PRs:

  1. One that fixes the bug without introducing any tests ➡️ your PR already does that
  2. One that adds the missing tests on top (while still calling the real buildkite-agent if installed on the current machine) ➡️ basically what I implemented up to this
  3. One that introduces the type BuildkiteAgent interface to allow using a test double and make the Unit Tests deterministic (and agnostic of the machine on which they run)
  4. One that introduce separate integration tests, which assert the buildkite-agent binary is installed on the machine then run the test with the real agent.

That way with the test double in (3) we'd be able to have test coverage on all use cases (binary present or not), and be able to run that test suite even on machines that don't have buildkite-agent installed in the $PATH (like on my development machine). While CI and users that have buildkite-agent binary installed can also run the integration test from (4) on top of it all, to ensure that the real call to the binary would be performed as expected in those cases.

@DrJosh9000
Copy link
Contributor

It's becoming apparent to me that at least a few problems I might have here are actually problems in the existing code. To take the unexported field example, there's no reason unit tests must be in a separate package to the code under test - I typically leave them in the same package. Using a separate package forces all the tests to be agnostic to internal details, but also gets in the way of adding tests that test the internal details. To take nondeterminism based on whether the agent is present or not, I can see why we (at Buildkite) might just assume the agent is present, but maybe a better solution is to structure things so the input to the agent is ready whether or not the agent is present. I may be rambling.

I admit that this repo isn't my usual wheelhouse, and at the moment we've got some large reliability-shaped projects underway (such as end-to-end tests over in the agent), that have much more of my attention. So perhaps we're better off accepting this PR pretty much as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Buildkite logs redactor patterns are evaluated greedily, and short values still redacted

2 participants