Skip to content

[Testing] Added Detector-level cache to store verification results #4314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Jul 14, 2025

Description:

This PR introduces a detector-level cache to store verification results, allowing reuse for duplicate findings. To enhance security, secrets will be hashed using XXHash before being added to the cache.

More details: #2262

Note: This is the first example implementation, if approved we will implement this in the most used detectors.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 self-assigned this Jul 14, 2025
@kashifkhan0771 kashifkhan0771 requested review from a team as code owners July 14, 2025 10:54
@bugbaba
Copy link

bugbaba commented Jul 14, 2025

Hi @kashifkhan0771,

Just gave it a quick test, its calling verify function twice if same key is found in two separate files.

image

@kashifkhan0771
Copy link
Contributor Author

Just gave it a quick test, its calling verify function twice if same key is found in two separate files.

I believe this is related to the issue you mentioned here

@trufflesteeeve
Copy link
Collaborator

This is a cool caching implementation. I did a quick read of the related discussions (thank you very much for the links to additional context) and think that prior to us trying this idea, we should probably investigate improvements to the existing verification cache (e.g. making it worker-aware/thread-safe) before adding a second cache implementation.

Something like this very well might be where we end up, but first trying to improve what we already have could help reduce potential future complexity.

@kashifkhan0771
Copy link
Contributor Author

Just gave it a quick test, its calling verify function twice if same key is found in two separate files.

@bugbaba For me it was using cached result for duplicate findings from different files with previous changes as well. However, as you mentioned, I implemented per-secret locking. This ensures that each secret is locked during verification until it's either verified or cached.

Please give it a try and let me know how it goes.

Also, I'm converting this PR back to a draft based on @trufflesteeeve comment. I agree that we should first focus on fixing the existing caching implementation. This PR will remain here for reference or in case we need it in the future.

@kashifkhan0771 kashifkhan0771 marked this pull request as draft July 15, 2025 06:44
@ahrav
Copy link
Collaborator

ahrav commented Jul 15, 2025

@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response.

This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight.

@kashifkhan0771
Copy link
Contributor Author

@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response.

This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight.

Hands down, you always bring great insights 🙌🏻 I really appreciate it. I hadn’t come across singleflight before, but I did have a similar idea in mind about collapsing multiple requests into one. I didn’t realize Go had a package for this, so thanks a lot for pointing it out.

I’ll read through the documentation and blog post to get a better understanding of how to use it effectively. For now, I’ll keep this PR in draft and start incorporating some changes. Let’s see how things go.

@kashifkhan0771
Copy link
Contributor Author

I have implemented singleflight for the GitHub detector for now. Please review and give it a try when you get a chance. I'm also thinking through how we can generalize this for all detectors, as @ahrav suggested.

@kashifkhan0771 kashifkhan0771 changed the title Added Detector-level cache to store verification results [Testing] Added Detector-level cache to store verification results Jul 15, 2025
@amanfcp
Copy link
Contributor

amanfcp commented Jul 17, 2025

@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response.

This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight.

Great read @ahrav. Thanks for sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants