-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Testing] Added Detector-level cache to store verification results #4314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Testing] Added Detector-level cache to store verification results #4314
Conversation
Hi @kashifkhan0771, Just gave it a quick test, its calling verify function twice if same key is found in two separate files. ![]() |
I believe this is related to the issue you mentioned here |
This is a cool caching implementation. I did a quick read of the related discussions (thank you very much for the links to additional context) and think that prior to us trying this idea, we should probably investigate improvements to the existing verification cache (e.g. making it worker-aware/thread-safe) before adding a second cache implementation. Something like this very well might be where we end up, but first trying to improve what we already have could help reduce potential future complexity. |
@bugbaba For me it was using cached result for duplicate findings from different files with previous changes as well. However, as you mentioned, I implemented per-secret locking. This ensures that each secret is locked during verification until it's either verified or cached. Please give it a try and let me know how it goes. Also, I'm converting this PR back to a draft based on @trufflesteeeve comment. I agree that we should first focus on fixing the existing caching implementation. This PR will remain here for reference or in case we need it in the future. |
@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response. This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight. |
Hands down, you always bring great insights 🙌🏻 I really appreciate it. I hadn’t come across singleflight before, but I did have a similar idea in mind about collapsing multiple requests into one. I didn’t realize Go had a package for this, so thanks a lot for pointing it out. I’ll read through the documentation and blog post to get a better understanding of how to use it effectively. For now, I’ll keep this PR in draft and start incorporating some changes. Let’s see how things go. |
I have implemented singleflight for the GitHub detector for now. Please review and give it a try when you get a chance. I'm also thinking through how we can generalize this for all detectors, as @ahrav suggested. |
09ab2c9
to
0617d80
Compare
Great read @ahrav. Thanks for sharing |
Description:
This PR introduces a detector-level cache to store verification results, allowing reuse for duplicate findings. To enhance security, secrets will be hashed using XXHash before being added to the cache.
More details: #2262
Note: This is the first example implementation, if approved we will implement this in the most used detectors.
Checklist:
make test-community
)?make lint
this requires golangci-lint)?