Skip to content

Comments

✨ Filter out rulesets during parsing that do not match rule label selector#1091

Open
shawn-hurley wants to merge 2 commits intokonveyor:mainfrom
shawn-hurley:feature/filter-rulesets-based-on-selector
Open

✨ Filter out rulesets during parsing that do not match rule label selector#1091
shawn-hurley wants to merge 2 commits intokonveyor:mainfrom
shawn-hurley:feature/filter-rulesets-based-on-selector

Conversation

@shawn-hurley
Copy link
Contributor

@shawn-hurley shawn-hurley commented Feb 5, 2026

  • As we move to having a rather large set of rules that we have to parse each time, the output is getting long, and we assume we need all the providers
  • This isn't true because only the rules that will be run are needed, and this setup breaks an assumption that only the rulesets you cared about would be passed into the analyzer.
  • We are still parsing all the rules, but the output and what the engine gets is only rulesets that themselves have matched the selector, or at least one rule has a label that would match the selector.

Summary by CodeRabbit

  • New Features

    • Rule filtering by labels — select and apply rulesets and individual rules using label selectors; non-matching items are skipped early.
  • Improvements

    • More informative debug logging for path/source resolution.
    • Parser robustness: safer, deterministic rule loading with improved error aggregation and timeout/cancellation handling.
  • Tests

    • Expanded coverage validating label-based filtering and provider-aware behavior.
  • Test Data

    • Added example rules and rulesets demonstrating filtering and combinations.

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Selector field to RuleParser and threads a CLI-provided label selector into the parser. RuleSet and per-rule loading are pruned by selector matches, error aggregation uses errors.Join with cancellation, tests and testdata updated, and a verbose engine debug log was added.

Changes

Cohort / File(s) Summary
Parser implementation
parser/rule_parser.go
Adds Selector *labels.LabelSelector[*engine.RuleMeta] to RuleParser; centralizes rule loading via loadRule; introduces context cancellation with timeout, nil-safe map copies, consolidated error aggregation using errors.Join, and selector-based filtering/pruning at ruleset and rule levels.
CLI wiring
cmd/analyzer/main.go
Initializes ruleLabelSelector when a selector string is provided and assigns it to RuleParser.Selector, replacing previous inline selector creation.
Engine logging
engine/engine.go
Adds a verbose V(7) debug log in getRelativePathForViolation showing original and computed relative file URIs (no behavior change).
Tests
parser/rule_parser_test.go
Replaces logrus with logr/testr, adds Selector to test cases and compiles it into RuleParser.Selector, expands scenarios and assertions to validate selector filtering, and updates diagnostics.
Testdata — new and updated rules
parser/testdata/multiple-rules.yaml, parser/testdata/folder-of-rules-ruleset/..., parser/testdata/folder-of-rulesets/..., parser/testdata/.../rule-simple-default.yaml
Adds and updates YAML rulesets and rules with labels to exercise selector behavior (new multiple-rules.yaml, several ruleset directories, and label additions).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as "CLI (main)"
    participant Parser as "RuleParser"
    participant LabelSel as "LabelSelector"
    participant RuleSet as "RuleSet"
    participant Rule as "Rule"

    CLI->>Parser: construct RuleParser (Selector optional)
    Parser->>RuleSet: loadRuleSet(path)
    Parser->>Rule: loadRule(file, ruleSet)
    Parser->>LabelSel: build RuleMeta and call Matches(Meta)
    alt Selector configured & Matches == true
        Parser->>RuleSet: retain rules / include Rule
    else Selector configured & Matches == false
        Parser->>RuleSet: prune/skip Rule or RuleSet
    end
    Parser->>Parser: aggregate errors (errors.Join) / cancel context on failures
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through YAML with labels in tow,
I sniff each rule where soft breezes blow.
A selector’s tug tells some to depart,
The kept ones sing, a tidy heart.
Tests clap — carrots fall: a coder's show.

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: filtering rulesets during parsing based on rule label selectors, which aligns with the core PR objective.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@parser/rule_parser.go`:
- Around line 235-246: The current code returns nils when
r.Selector.Matches(meta) determines the ruleset doesn't match, which discards
previously accumulated results and hides errors; instead, when the selector
fails (ok == false && err == nil) skip only the current ruleset by continuing
the loop (do not return), preserving collected ruleSets, rules, errs, and file
paths; also apply the same selector check/skip logic to the single-file branch
that handles rules when ruleSet == nil so both branches consistently skip
non-matching rules rather than returning nils (look for
r.Selector.Matches(meta), ruleSet, rules and the single-file branch handling to
modify).
🧹 Nitpick comments (2)
engine/engine.go (1)

579-591: Downgrade path-resolution logs to debug verbosity.
These run per incident and can flood logs (and expose local paths) at Info level.

Proposed change (reduce verbosity)
-			r.logger.Info("filepath-loc-prefix", "file", file, "prefix", locationPrefix)
+			r.logger.V(5).Info("filepath-loc-prefix", "file", file, "prefix", locationPrefix)
@@
-		r.logger.Info("sourceLocation", "sourceLocation", sourceLocation)
+		r.logger.V(5).Info("sourceLocation", "sourceLocation", sourceLocation)
@@
-		r.logger.Info("sourceLocation", "sourceLocation", sourceLocation, "absPath", absPath)
+		r.logger.V(5).Info("sourceLocation", "sourceLocation", sourceLocation, "absPath", absPath)
parser/rule_parser_test.go (1)

1042-1052: Strengthen rule identity checks in selector tests.
Because the new selector test data uses identical messages/perform blocks, adding RuleID to the match criteria would make regressions easier to catch.

Proposed tweak (include RuleID in matching)
-						if reflect.DeepEqual(expectedRule.Perform, rule.Perform) && expectedRule.Description == rule.Description {
+						if expectedRule.RuleID == rule.RuleID &&
+							reflect.DeepEqual(expectedRule.Perform, rule.Perform) &&
+							expectedRule.Description == rule.Description {

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@parser/rule_parser_test.go`:
- Around line 1100-1107: The test setup swallows the error from
labels.NewLabelSelector and uses t.Error instead of failing fast; change the
block where ruleParser.Selector is assigned via
labels.NewLabelSelector[*engine.RuleMeta](tc.Selector, nil) to check err and
call t.Fatalf with a message that includes the actual err (e.g., "unable to get
selector: %v") so failures show the underlying cause and abort the test
immediately.

In `@parser/rule_parser.go`:
- Around line 199-202: The error accumulation is joining the wrong variable —
replace the use of err with the goroutine's error load.err so parseErr includes
the goroutine parse failure: update the errors.Join call where load.err != nil
(near the load variable and ruleParserWG.Done() block) to join parseErr with
load.err instead of err so the goroutine error is propagated.
🧹 Nitpick comments (4)
parser/rule_parser.go (1)

167-180: Pre-existing goroutine leak risk: WaitGroup.Done() is called in the consumer, not the producer.

If the consumer exits early (e.g., via cancelFunc() at line 231 when the selector filters out a ruleset), any goroutines that haven't been consumed yet will never have Done() called. The closer goroutine (lines 184–187) will block on ruleParserWG.Wait() forever.

Consider moving ruleParserWG.Done() into the producer goroutine (via defer), so the WaitGroup always completes regardless of consumer behavior.

Sketch
 			ruleParserWG.Add(1)
 			go func() {
+				defer ruleParserWG.Done()
 				rules, m, provConditions, err := r.LoadRule(path.Join(filepath, f.Name()))
 				select {
 				case ruleLoadChan <- ruleParseReturn{
 					rules:           rules,
 					conditionsByCap: provConditions,
 					providerMap:     m,
 					err:             err,
 				}:
 				case <-loadCtx.Done():
 					return
 				}
 			}()

Then remove ruleParserWG.Done() from the consumer (lines 201 and 214).

Also applies to: 184-187

parser/rule_parser_test.go (3)

947-999: Test name "rule-should-be-filtered" is misleading — both rules are retained.

The expected output includes both file-001 and file-002, so nothing is actually filtered out from this ruleset. Per the PR description, the entire ruleset is kept when at least one rule matches. Consider renaming to something like "ruleset-retained-when-rule-matches-selector" to clarify intent.


1122-1129: Comparison logic doesn't verify ruleset count — filtering tests may pass with extra or missing rulesets.

The existing checks ensure we got some rulesets when we expected some, but never assert that the actual count matches the expected count. For the new filtering tests this matters: if the selector fails to filter a ruleset, the test could still pass as long as the expected ones are present. Consider adding a count check:

Proposed addition after line 1129
+			if len(tc.ExpectedRuleSet) != 0 && len(ruleSets) != len(tc.ExpectedRuleSet) {
+				t.Errorf("expected %d rulesets, got %d", len(tc.ExpectedRuleSet), len(ruleSets))
+				return
+			}

Also applies to: 1143-1147


376-390: "rule no provider" test asserts no error but doesn't validate the output.

With the behavioral change (missing providers are now skipped), this test only checks that no error occurs. It doesn't verify the shape of the output — e.g. that zero rulesets/clients are returned. Adding explicit expectations (empty ExpectedRuleSet and ExpectedProvider with a count-matching assertion) would prevent regressions where the parser incorrectly includes partially-resolved rulesets.

@shawn-hurley shawn-hurley force-pushed the feature/filter-rulesets-based-on-selector branch from c2644a8 to f67bbe6 Compare February 6, 2026 16:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
parser/rule_parser.go (1)

99-113: ⚠️ Potential issue | 🟠 Major

Selector filtering is not applied in the single-file path.

When LoadRules is called with a single regular file, the selector is bypassed and the ruleset is always returned. This is inconsistent with the directory-based branch (lines 221–234) and means a single-file invocation will not benefit from the filtering this PR introduces.

♻️ Proposed fix to apply selector filtering in the single-file branch
 	if info.Mode().IsRegular() {
 		rules, m, provConditions, err := r.loadRule(filepath)
 		if err != nil {
 			r.Log.V(8).Error(err, "unable to load rule set")
 			return nil, nil, nil, err
 		}
 
 		ruleSet := r.loadRuleSet(path.Dir(filepath))
 		// if nil, use the default rule set
 		if ruleSet == nil {
 			ruleSet = defaultRuleSet
 		}
+
+		if r.Selector != nil {
+			meta := &engine.RuleMeta{
+				Labels: ruleSet.Labels,
+			}
+			for _, rule := range rules {
+				meta.Labels = append(meta.Labels, rule.Labels...)
+			}
+			if ok, err := r.Selector.Matches(meta); err == nil && !ok {
+				r.Log.V(6).Info("ruleset does not have any rules that match selector, filtering out", "ruleSet", ruleSet.Name)
+				return nil, m, provConditions, nil
+			}
+		}
+
 		ruleSet.Rules = rules
 
 		return []engine.RuleSet{*ruleSet}, m, provConditions, err
 	}
🤖 Fix all issues with AI agents
In `@parser/rule_parser_test.go`:
- Around line 947-999: The test "rule-should-be-filtered" is currently using
testFileName: "multiple-rules.yaml" which hits the single-file branch in
LoadRules and never applies Selector filtering; change the test to exercise the
directory-based branch so Selector is evaluated: update the test input to point
to a directory fixture (e.g., replace testFileName with a directory containing
multiple rule files or add a new field/fixture name for a directory) and ensure
the fixture contains rules with and without the "test=filter" label, then assert
that only the matching rules remain; reference LoadRules, Selector, and the test
case Name "rule-should-be-filtered" when making the change.

In `@parser/rule_parser.go`:
- Around line 221-234: When r.Selector.Matches(meta) returns an error, log it
via r.Log (e.g., include ruleSet.Name and the error) and treat the error as a
non-match (i.e., proceed to filter the ruleset) instead of silently keeping it;
additionally, when you decide to filter out the ruleset (the !ok path or
error-as-non-match), remove any entries added for that ruleset from clientMap
and providerConditions (the entries that came from the rules slice) before
calling cancelFunc/return, or alternatively move population of
clientMap/providerConditions until after the selector check; reference
r.Selector.Matches, meta, rules, ruleSet.Name, clientMap and providerConditions
when making the change.
🧹 Nitpick comments (2)
parser/rule_parser.go (1)

129-129: Use defer cancelFunc() to prevent context leaks.

While the current code does call cancelFunc() on both return paths (lines 231 and 239), using defer is idiomatic Go and safeguards against future edits that add new return paths or against panics.

♻️ Proposed fix
 	loadCtx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Minute)
+	defer cancelFunc()

Then remove the explicit cancelFunc() calls at lines 231 and 239.

parser/rule_parser_test.go (1)

656-762: Good test coverage for selector-based filtering.

These two new test cases effectively validate:

  1. Baseline: loading rules from a folder-of-rules-ruleset without a selector returns all 3 rulesets.
  2. Filtered: with selector "test=filter", only the matching ruleset (file-ruleset-a) is returned.

One thing to note: the test validation at lines 1143–1166 doesn't directly assert the count of returned rulesets against the expected count. If filtering fails silently and returns extra rulesets, the mismatch is caught indirectly through rule-count comparison against the zero-value RuleSet. Consider adding a direct length check for stronger assertions.

♻️ Optional: add a direct ruleset count assertion
+		if len(tc.ExpectedRuleSet) != 0 && len(ruleSets) != len(tc.ExpectedRuleSet) {
+			t.Errorf("expected %d rulesets, got %d", len(tc.ExpectedRuleSet), len(ruleSets))
+			return
+		}

@shawn-hurley shawn-hurley force-pushed the feature/filter-rulesets-based-on-selector branch from f67bbe6 to 11c55ad Compare February 6, 2026 18:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
parser/rule_parser.go (1)

56-87: ⚠️ Potential issue | 🟠 Major

Returning defaultRuleSet (shared pointer) allows callers to mutate package-level state.

loadRuleSet returns the same defaultRuleSet pointer in three error paths (lines 61, 64, 69). Callers subsequently assign to .Rules on the returned pointer (e.g., line 106, line 239), mutating the shared package-level variable. This also leaves two nil-return paths (lines 79, 83) that cause nil-pointer panics in callers.

Both issues are resolved by returning a fresh instance and never returning nil:

♻️ Suggested fix
+func newDefaultRuleSet() *engine.RuleSet {
+	return &engine.RuleSet{Name: "konveyor-analysis"}
+}
+
 func (r *RuleParser) loadRuleSet(dir string) *engine.RuleSet {
 	goldenFile := path.Join(dir, RULE_SET_GOLDEN_FILE_NAME)
 	info, err := os.Stat(goldenFile)
 	if err != nil {
 		r.Log.V(8).Error(err, "unable to load rule set")
-		return defaultRuleSet
+		return newDefaultRuleSet()
 	}
 	if !info.Mode().IsRegular() {
-		return defaultRuleSet
+		return newDefaultRuleSet()
 	}
 	content, err := os.ReadFile(goldenFile)
 	if err != nil {
 		r.Log.V(8).Error(err, "unable to load rule set")
-		return defaultRuleSet
+		return newDefaultRuleSet()
 	}
 
 	set := engine.RuleSet{}
 	err = yaml.Unmarshal(content, &set)
 	if err != nil {
 		r.Log.V(8).Error(err, "unable to load rule set")
-		return nil
+		return newDefaultRuleSet()
 	}
 	if len(set.Rules) != 0 {
 		r.Log.V(8).Error(fmt.Errorf("rules should not be added in the ruleset"), "unable to load rule set")
-		return nil
+		return newDefaultRuleSet()
 	}
 
 	return &set
 }
🤖 Fix all issues with AI agents
In `@parser/rule_parser.go`:
- Around line 358-365: The code dereferences ruleSet.Labels without checking
ruleSet (causing panics when loadRuleSet returns nil) and also treats selector
errors as matches by using `if ok, err := r.Selector.Matches(meta); !ok && err
== nil { continue }`; update the block around r.Selector.Matches in
rule_parser.go to first guard that ruleSet != nil (use an empty label slice when
nil), construct engine.RuleMeta safely, then call r.Selector.Matches(meta) and
treat any returned error as a non-match (i.e., log or propagate the error and
continue instead of keeping the rule); ensure this change applies both in the
directory/goroutine path and anywhere loadRuleSet could return nil so the
goroutine cannot panic.
🧹 Nitpick comments (3)
parser/rule_parser.go (2)

206-208: Debug log emitted at Info level will be noisy in production.

r.Log.Info("providers", ...) on line 207 fires for every rule-file load. This should use a verbosity gate (e.g., V(5) or higher) to avoid flooding production logs.

♻️ Suggested fix
 		if load.providerMap != nil {
-			r.Log.Info("providers", "current", clientMap, "toAdd", load.providerMap)
+			r.Log.V(5).Info("providers", "current", clientMap, "toAdd", load.providerMap)
 			maps.Copy(clientMap, load.providerMap)
 		}

124-124: Consider deferring cancelFunc for safety.

If a panic occurs between context creation and the explicit cancelFunc() calls (lines 234, 242), the context leaks. Using defer is idiomatic and avoids the need for cancelFunc() at every return path.

♻️ Suggested fix
 	loadCtx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Minute)
+	defer cancelFunc()

Then remove the explicit cancelFunc() calls on lines 234 and 242.

parser/rule_parser_test.go (1)

1145-1169: Test validates returned rulesets but does not assert that all expected rulesets are present.

The loop iterates over actual ruleSets and checks each against ExpectedRuleSet, but never verifies the inverse — that every entry in ExpectedRuleSet appears in the actual results. If filtering fails to include an expected ruleset, the test would silently pass.

♻️ Suggested assertion
+			if len(ruleSets) != len(tc.ExpectedRuleSet) {
+				t.Errorf("expected %d rulesets, got %d", len(tc.ExpectedRuleSet), len(ruleSets))
+			}
 			for _, ruleSet := range ruleSets {

@shawn-hurley shawn-hurley force-pushed the feature/filter-rulesets-based-on-selector branch from 11c55ad to 6a13edd Compare February 6, 2026 18:52
}

r.addRuleFields(&rule, ruleMap)
if r.Selector != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the issue I was talking about.

* As we move to having a rather large set of rules that we have to parse
  each time, the output is getting long, and we assume we need all the
providers
* This isn't true because only the rules that will be run are needed,
  and this setup breaks an assumption that only rulesets you cared about
would be passed into the analyzer.
* We are still parsing all the rules, but the output and what the engine
  gets is only rulesets that themselves have matched the selector or at
least one rule has a label that would match the selector.

Signed-off-by: Shawn Hurley <shawn@hurley.page>
Signed-off-by: Shawn Hurley <shawn@hurley.page>
@shawn-hurley shawn-hurley force-pushed the feature/filter-rulesets-based-on-selector branch from 6a13edd to b3b1007 Compare February 13, 2026 17:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@parser/rule_parser.go`:
- Around line 155-161: The early return after failing to load ruleSet can leak
the closer goroutine because other worker goroutines may have been started (via
ruleParserWG.Add(1) + go func()) and the closer goroutine blocks on
ruleParserWG.Wait(); fix by ensuring ruleSet is loaded before any goroutines are
launched (call r.loadRuleSet(...) and handle nil result/return prior to spawning
workers), or if you prefer minimal change, call cancelFunc() via defer
immediately after creating the context and on the error path ensure you drain
the consumer channel and call ruleParserWG.Wait()/Done as appropriate so the
closer goroutine can complete; reference symbols: ruleSet, r.loadRuleSet,
cancelFunc, ruleParserWG, and the closer goroutine that waits on ruleParserWG.
🧹 Nitpick comments (3)
parser/rule_parser.go (3)

99-112: Single-file path skips ruleset-level selector filtering.

In the directory path (lines 237–250), a coarse ruleset-level selector check is performed against the aggregate of all rule labels. The single-file path here relies solely on per-rule filtering inside loadRule. This is a minor behavioral inconsistency — functionally the per-rule filter in loadRule should cover most cases, but the returned ruleSet here could end up with zero rules if all were filtered, producing an empty RuleSet in the output slice rather than being omitted entirely.


219-222: Debug-level logging left at Info level.

Logging full provider maps on every rule load iteration at Info level will be noisy in production. This looks like a debug artifact — consider using r.Log.V(5).Info(...) or higher to match the verbosity of surrounding log calls.

♻️ Proposed fix
-			r.Log.Info("providers", "current", clientMap, "toAdd", load.providerMap)
+			r.Log.V(5).Info("providers", "current", clientMap, "toAdd", load.providerMap)

238-244: Slice aliasing: meta.Labels shares backing array with ruleSet.Labels.

Assigning ruleSet.Labels directly to meta.Labels and then appending in a loop can silently mutate the backing array of ruleSet.Labels if it has spare capacity. While not an active bug in the current call paths (since ruleSet.Labels' length is unchanged), it's a fragile pattern that can bite during future refactoring. Copy the slice to be safe.

♻️ Proposed fix
 		if r.Selector != nil {
 			meta := &engine.RuleMeta{
-				Labels: ruleSet.Labels,
+				Labels: slices.Clone(ruleSet.Labels),
 			}
 			for _, rule := range rules {
 				meta.Labels = append(meta.Labels, rule.Labels...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants