✨ Filter out rulesets during parsing that do not match rule label selector#1091
✨ Filter out rulesets during parsing that do not match rule label selector#1091shawn-hurley wants to merge 2 commits intokonveyor:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Selector field to RuleParser and threads a CLI-provided label selector into the parser. RuleSet and per-rule loading are pruned by selector matches, error aggregation uses errors.Join with cancellation, tests and testdata updated, and a verbose engine debug log was added. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as "CLI (main)"
participant Parser as "RuleParser"
participant LabelSel as "LabelSelector"
participant RuleSet as "RuleSet"
participant Rule as "Rule"
CLI->>Parser: construct RuleParser (Selector optional)
Parser->>RuleSet: loadRuleSet(path)
Parser->>Rule: loadRule(file, ruleSet)
Parser->>LabelSel: build RuleMeta and call Matches(Meta)
alt Selector configured & Matches == true
Parser->>RuleSet: retain rules / include Rule
else Selector configured & Matches == false
Parser->>RuleSet: prune/skip Rule or RuleSet
end
Parser->>Parser: aggregate errors (errors.Join) / cancel context on failures
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@parser/rule_parser.go`:
- Around line 235-246: The current code returns nils when
r.Selector.Matches(meta) determines the ruleset doesn't match, which discards
previously accumulated results and hides errors; instead, when the selector
fails (ok == false && err == nil) skip only the current ruleset by continuing
the loop (do not return), preserving collected ruleSets, rules, errs, and file
paths; also apply the same selector check/skip logic to the single-file branch
that handles rules when ruleSet == nil so both branches consistently skip
non-matching rules rather than returning nils (look for
r.Selector.Matches(meta), ruleSet, rules and the single-file branch handling to
modify).
🧹 Nitpick comments (2)
engine/engine.go (1)
579-591: Downgrade path-resolution logs to debug verbosity.
These run per incident and can flood logs (and expose local paths) at Info level.Proposed change (reduce verbosity)
- r.logger.Info("filepath-loc-prefix", "file", file, "prefix", locationPrefix) + r.logger.V(5).Info("filepath-loc-prefix", "file", file, "prefix", locationPrefix) @@ - r.logger.Info("sourceLocation", "sourceLocation", sourceLocation) + r.logger.V(5).Info("sourceLocation", "sourceLocation", sourceLocation) @@ - r.logger.Info("sourceLocation", "sourceLocation", sourceLocation, "absPath", absPath) + r.logger.V(5).Info("sourceLocation", "sourceLocation", sourceLocation, "absPath", absPath)parser/rule_parser_test.go (1)
1042-1052: Strengthen rule identity checks in selector tests.
Because the new selector test data uses identical messages/perform blocks, adding RuleID to the match criteria would make regressions easier to catch.Proposed tweak (include RuleID in matching)
- if reflect.DeepEqual(expectedRule.Perform, rule.Perform) && expectedRule.Description == rule.Description { + if expectedRule.RuleID == rule.RuleID && + reflect.DeepEqual(expectedRule.Perform, rule.Perform) && + expectedRule.Description == rule.Description {
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@parser/rule_parser_test.go`:
- Around line 1100-1107: The test setup swallows the error from
labels.NewLabelSelector and uses t.Error instead of failing fast; change the
block where ruleParser.Selector is assigned via
labels.NewLabelSelector[*engine.RuleMeta](tc.Selector, nil) to check err and
call t.Fatalf with a message that includes the actual err (e.g., "unable to get
selector: %v") so failures show the underlying cause and abort the test
immediately.
In `@parser/rule_parser.go`:
- Around line 199-202: The error accumulation is joining the wrong variable —
replace the use of err with the goroutine's error load.err so parseErr includes
the goroutine parse failure: update the errors.Join call where load.err != nil
(near the load variable and ruleParserWG.Done() block) to join parseErr with
load.err instead of err so the goroutine error is propagated.
🧹 Nitpick comments (4)
parser/rule_parser.go (1)
167-180: Pre-existing goroutine leak risk:WaitGroup.Done()is called in the consumer, not the producer.If the consumer exits early (e.g., via
cancelFunc()at line 231 when the selector filters out a ruleset), any goroutines that haven't been consumed yet will never haveDone()called. The closer goroutine (lines 184–187) will block onruleParserWG.Wait()forever.Consider moving
ruleParserWG.Done()into the producer goroutine (viadefer), so the WaitGroup always completes regardless of consumer behavior.Sketch
ruleParserWG.Add(1) go func() { + defer ruleParserWG.Done() rules, m, provConditions, err := r.LoadRule(path.Join(filepath, f.Name())) select { case ruleLoadChan <- ruleParseReturn{ rules: rules, conditionsByCap: provConditions, providerMap: m, err: err, }: case <-loadCtx.Done(): return } }()Then remove
ruleParserWG.Done()from the consumer (lines 201 and 214).Also applies to: 184-187
parser/rule_parser_test.go (3)
947-999: Test name "rule-should-be-filtered" is misleading — both rules are retained.The expected output includes both
file-001andfile-002, so nothing is actually filtered out from this ruleset. Per the PR description, the entire ruleset is kept when at least one rule matches. Consider renaming to something like"ruleset-retained-when-rule-matches-selector"to clarify intent.
1122-1129: Comparison logic doesn't verify ruleset count — filtering tests may pass with extra or missing rulesets.The existing checks ensure we got some rulesets when we expected some, but never assert that the actual count matches the expected count. For the new filtering tests this matters: if the selector fails to filter a ruleset, the test could still pass as long as the expected ones are present. Consider adding a count check:
Proposed addition after line 1129
+ if len(tc.ExpectedRuleSet) != 0 && len(ruleSets) != len(tc.ExpectedRuleSet) { + t.Errorf("expected %d rulesets, got %d", len(tc.ExpectedRuleSet), len(ruleSets)) + return + }Also applies to: 1143-1147
376-390: "rule no provider" test asserts no error but doesn't validate the output.With the behavioral change (missing providers are now skipped), this test only checks that no error occurs. It doesn't verify the shape of the output — e.g. that zero rulesets/clients are returned. Adding explicit expectations (empty
ExpectedRuleSetandExpectedProviderwith a count-matching assertion) would prevent regressions where the parser incorrectly includes partially-resolved rulesets.
c2644a8 to
f67bbe6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
parser/rule_parser.go (1)
99-113:⚠️ Potential issue | 🟠 MajorSelector filtering is not applied in the single-file path.
When
LoadRulesis called with a single regular file, the selector is bypassed and the ruleset is always returned. This is inconsistent with the directory-based branch (lines 221–234) and means a single-file invocation will not benefit from the filtering this PR introduces.♻️ Proposed fix to apply selector filtering in the single-file branch
if info.Mode().IsRegular() { rules, m, provConditions, err := r.loadRule(filepath) if err != nil { r.Log.V(8).Error(err, "unable to load rule set") return nil, nil, nil, err } ruleSet := r.loadRuleSet(path.Dir(filepath)) // if nil, use the default rule set if ruleSet == nil { ruleSet = defaultRuleSet } + + if r.Selector != nil { + meta := &engine.RuleMeta{ + Labels: ruleSet.Labels, + } + for _, rule := range rules { + meta.Labels = append(meta.Labels, rule.Labels...) + } + if ok, err := r.Selector.Matches(meta); err == nil && !ok { + r.Log.V(6).Info("ruleset does not have any rules that match selector, filtering out", "ruleSet", ruleSet.Name) + return nil, m, provConditions, nil + } + } + ruleSet.Rules = rules return []engine.RuleSet{*ruleSet}, m, provConditions, err }
🤖 Fix all issues with AI agents
In `@parser/rule_parser_test.go`:
- Around line 947-999: The test "rule-should-be-filtered" is currently using
testFileName: "multiple-rules.yaml" which hits the single-file branch in
LoadRules and never applies Selector filtering; change the test to exercise the
directory-based branch so Selector is evaluated: update the test input to point
to a directory fixture (e.g., replace testFileName with a directory containing
multiple rule files or add a new field/fixture name for a directory) and ensure
the fixture contains rules with and without the "test=filter" label, then assert
that only the matching rules remain; reference LoadRules, Selector, and the test
case Name "rule-should-be-filtered" when making the change.
In `@parser/rule_parser.go`:
- Around line 221-234: When r.Selector.Matches(meta) returns an error, log it
via r.Log (e.g., include ruleSet.Name and the error) and treat the error as a
non-match (i.e., proceed to filter the ruleset) instead of silently keeping it;
additionally, when you decide to filter out the ruleset (the !ok path or
error-as-non-match), remove any entries added for that ruleset from clientMap
and providerConditions (the entries that came from the rules slice) before
calling cancelFunc/return, or alternatively move population of
clientMap/providerConditions until after the selector check; reference
r.Selector.Matches, meta, rules, ruleSet.Name, clientMap and providerConditions
when making the change.
🧹 Nitpick comments (2)
parser/rule_parser.go (1)
129-129: Usedefer cancelFunc()to prevent context leaks.While the current code does call
cancelFunc()on both return paths (lines 231 and 239), usingdeferis idiomatic Go and safeguards against future edits that add new return paths or against panics.♻️ Proposed fix
loadCtx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancelFunc()Then remove the explicit
cancelFunc()calls at lines 231 and 239.parser/rule_parser_test.go (1)
656-762: Good test coverage for selector-based filtering.These two new test cases effectively validate:
- Baseline: loading rules from a folder-of-rules-ruleset without a selector returns all 3 rulesets.
- Filtered: with selector
"test=filter", only the matching ruleset (file-ruleset-a) is returned.One thing to note: the test validation at lines 1143–1166 doesn't directly assert the count of returned rulesets against the expected count. If filtering fails silently and returns extra rulesets, the mismatch is caught indirectly through rule-count comparison against the zero-value
RuleSet. Consider adding a direct length check for stronger assertions.♻️ Optional: add a direct ruleset count assertion
+ if len(tc.ExpectedRuleSet) != 0 && len(ruleSets) != len(tc.ExpectedRuleSet) { + t.Errorf("expected %d rulesets, got %d", len(tc.ExpectedRuleSet), len(ruleSets)) + return + }
f67bbe6 to
11c55ad
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
parser/rule_parser.go (1)
56-87:⚠️ Potential issue | 🟠 MajorReturning
defaultRuleSet(shared pointer) allows callers to mutate package-level state.
loadRuleSetreturns the samedefaultRuleSetpointer in three error paths (lines 61, 64, 69). Callers subsequently assign to.Ruleson the returned pointer (e.g., line 106, line 239), mutating the shared package-level variable. This also leaves two nil-return paths (lines 79, 83) that cause nil-pointer panics in callers.Both issues are resolved by returning a fresh instance and never returning nil:
♻️ Suggested fix
+func newDefaultRuleSet() *engine.RuleSet { + return &engine.RuleSet{Name: "konveyor-analysis"} +} + func (r *RuleParser) loadRuleSet(dir string) *engine.RuleSet { goldenFile := path.Join(dir, RULE_SET_GOLDEN_FILE_NAME) info, err := os.Stat(goldenFile) if err != nil { r.Log.V(8).Error(err, "unable to load rule set") - return defaultRuleSet + return newDefaultRuleSet() } if !info.Mode().IsRegular() { - return defaultRuleSet + return newDefaultRuleSet() } content, err := os.ReadFile(goldenFile) if err != nil { r.Log.V(8).Error(err, "unable to load rule set") - return defaultRuleSet + return newDefaultRuleSet() } set := engine.RuleSet{} err = yaml.Unmarshal(content, &set) if err != nil { r.Log.V(8).Error(err, "unable to load rule set") - return nil + return newDefaultRuleSet() } if len(set.Rules) != 0 { r.Log.V(8).Error(fmt.Errorf("rules should not be added in the ruleset"), "unable to load rule set") - return nil + return newDefaultRuleSet() } return &set }
🤖 Fix all issues with AI agents
In `@parser/rule_parser.go`:
- Around line 358-365: The code dereferences ruleSet.Labels without checking
ruleSet (causing panics when loadRuleSet returns nil) and also treats selector
errors as matches by using `if ok, err := r.Selector.Matches(meta); !ok && err
== nil { continue }`; update the block around r.Selector.Matches in
rule_parser.go to first guard that ruleSet != nil (use an empty label slice when
nil), construct engine.RuleMeta safely, then call r.Selector.Matches(meta) and
treat any returned error as a non-match (i.e., log or propagate the error and
continue instead of keeping the rule); ensure this change applies both in the
directory/goroutine path and anywhere loadRuleSet could return nil so the
goroutine cannot panic.
🧹 Nitpick comments (3)
parser/rule_parser.go (2)
206-208: Debug log emitted at Info level will be noisy in production.
r.Log.Info("providers", ...)on line 207 fires for every rule-file load. This should use a verbosity gate (e.g.,V(5)or higher) to avoid flooding production logs.♻️ Suggested fix
if load.providerMap != nil { - r.Log.Info("providers", "current", clientMap, "toAdd", load.providerMap) + r.Log.V(5).Info("providers", "current", clientMap, "toAdd", load.providerMap) maps.Copy(clientMap, load.providerMap) }
124-124: Consider deferringcancelFuncfor safety.If a panic occurs between context creation and the explicit
cancelFunc()calls (lines 234, 242), the context leaks. Usingdeferis idiomatic and avoids the need forcancelFunc()at every return path.♻️ Suggested fix
loadCtx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancelFunc()Then remove the explicit
cancelFunc()calls on lines 234 and 242.parser/rule_parser_test.go (1)
1145-1169: Test validates returned rulesets but does not assert that all expected rulesets are present.The loop iterates over actual
ruleSetsand checks each againstExpectedRuleSet, but never verifies the inverse — that every entry inExpectedRuleSetappears in the actual results. If filtering fails to include an expected ruleset, the test would silently pass.♻️ Suggested assertion
+ if len(ruleSets) != len(tc.ExpectedRuleSet) { + t.Errorf("expected %d rulesets, got %d", len(tc.ExpectedRuleSet), len(ruleSets)) + } for _, ruleSet := range ruleSets {
11c55ad to
6a13edd
Compare
| } | ||
|
|
||
| r.addRuleFields(&rule, ruleMap) | ||
| if r.Selector != nil { |
There was a problem hiding this comment.
This fixes the issue I was talking about.
* As we move to having a rather large set of rules that we have to parse each time, the output is getting long, and we assume we need all the providers * This isn't true because only the rules that will be run are needed, and this setup breaks an assumption that only rulesets you cared about would be passed into the analyzer. * We are still parsing all the rules, but the output and what the engine gets is only rulesets that themselves have matched the selector or at least one rule has a label that would match the selector. Signed-off-by: Shawn Hurley <shawn@hurley.page>
Signed-off-by: Shawn Hurley <shawn@hurley.page>
6a13edd to
b3b1007
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@parser/rule_parser.go`:
- Around line 155-161: The early return after failing to load ruleSet can leak
the closer goroutine because other worker goroutines may have been started (via
ruleParserWG.Add(1) + go func()) and the closer goroutine blocks on
ruleParserWG.Wait(); fix by ensuring ruleSet is loaded before any goroutines are
launched (call r.loadRuleSet(...) and handle nil result/return prior to spawning
workers), or if you prefer minimal change, call cancelFunc() via defer
immediately after creating the context and on the error path ensure you drain
the consumer channel and call ruleParserWG.Wait()/Done as appropriate so the
closer goroutine can complete; reference symbols: ruleSet, r.loadRuleSet,
cancelFunc, ruleParserWG, and the closer goroutine that waits on ruleParserWG.
🧹 Nitpick comments (3)
parser/rule_parser.go (3)
99-112: Single-file path skips ruleset-level selector filtering.In the directory path (lines 237–250), a coarse ruleset-level selector check is performed against the aggregate of all rule labels. The single-file path here relies solely on per-rule filtering inside
loadRule. This is a minor behavioral inconsistency — functionally the per-rule filter inloadRuleshould cover most cases, but the returnedruleSethere could end up with zero rules if all were filtered, producing an emptyRuleSetin the output slice rather than being omitted entirely.
219-222: Debug-level logging left atInfolevel.Logging full provider maps on every rule load iteration at
Infolevel will be noisy in production. This looks like a debug artifact — consider usingr.Log.V(5).Info(...)or higher to match the verbosity of surrounding log calls.♻️ Proposed fix
- r.Log.Info("providers", "current", clientMap, "toAdd", load.providerMap) + r.Log.V(5).Info("providers", "current", clientMap, "toAdd", load.providerMap)
238-244: Slice aliasing:meta.Labelsshares backing array withruleSet.Labels.Assigning
ruleSet.Labelsdirectly tometa.Labelsand then appending in a loop can silently mutate the backing array ofruleSet.Labelsif it has spare capacity. While not an active bug in the current call paths (sinceruleSet.Labels' length is unchanged), it's a fragile pattern that can bite during future refactoring. Copy the slice to be safe.♻️ Proposed fix
if r.Selector != nil { meta := &engine.RuleMeta{ - Labels: ruleSet.Labels, + Labels: slices.Clone(ruleSet.Labels), } for _, rule := range rules { meta.Labels = append(meta.Labels, rule.Labels...)
Summary by CodeRabbit
New Features
Improvements
Tests
Test Data