Skip to content

Conversation

@ofek1weiss
Copy link
Contributor

@ofek1weiss ofek1weiss commented Aug 24, 2025

null

Summary by CodeRabbit

  • New Features

    • Improved filtering: more accurate IS/IS NOT and CONTAINS/NOT CONTAINS matching with normalized, multi-value comparisons for better results.
    • Selector now exposes singular fields (tag, owner, model) for quicker filtering.
  • Refactor

    • Streamlined, set-based filter flow for better performance and consistency on large datasets.
    • Negative filters now pass when no values are provided, yielding more intuitive behavior.

@linear
Copy link

linear bot commented Aug 24, 2025

@coderabbitai
Copy link

coderabbitai bot commented Aug 24, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Replaces procedural per-value filtering with set-based, normalized matching in elementary/monitor/data_monitoring/schema.py: removes module-level apply_filter, adds NEGATIVE_OPERATORS, cached normalized values, new FilterSchema APIs (get_matching_values, apply_filter_on_values, apply_filter_on_value), updates FiltersSchema.test_ids typing and selector field mappings.

Changes

Cohort / File(s) Summary of changes
Filtering schema refactor
elementary/monitor/data_monitoring/schema.py
Removed module-level apply_filter(...). Added NEGATIVE_OPERATORS. Reworked FilterSchema by adding normalize_value(s), cached _normalized_values, get_matching_normalized_values, get_matching_values, apply_filter_on_values, apply_filter_on_value; removed _apply_filter_type. Updated FiltersSchema.test_ids to List[FilterSchema[str]] and changed to_selector_filter_schema to populate singular tag, owner, model from plural fields.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant FS as FilterSchema
  participant Cache as cached props

  rect rgba(231,243,255,0.6)
    Caller->>FS: apply_filter_on_values(values)
    FS->>FS: normalize_values(values)
    FS->>Cache: access _normalized_values (cached)
    FS->>FS: get_matching_normalized_values(normalized_input)
    alt operator IS / CONTAINS
      FS-->>Caller: return True if any match
    else operator IS_NOT / NOT_CONTAINS
      opt input empty
        Note over FS: negative operators pass empty input
      end
      FS-->>Caller: return True if no matches
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • MikaKerman

Poem

I hop through sets, not loops today,
Cached crumbs keep mismatches at bay.
IS, CONTAINS, or NOT — I peek and see,
Empty fields let negations be.
Tags, owners, models neat in a row — a tidy burrow, ready to go. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8ea18 and 80a38e6.

📒 Files selected for processing (1)
  • elementary/monitor/data_monitoring/schema.py (3 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ele-4990-use-sets-for-alert-filters

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

👋 @ofek1weiss
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
elementary/monitor/data_monitoring/schema.py (4)

57-60: Bug: normalized_status is always empty due to incorrect membership check

status is a str and list(Status) yields enum members, so the membership test fails and drops all statuses. This causes FiltersSchema.apply to reject everything when a status filter exists.

Apply this diff to fix:

-    def normalized_status(self) -> List[Status]:
-        return [Status(status) for status in self.statuses if status in list(Status)]
+    def normalized_status(self) -> List[Status]:
+        normalized: List[Status] = []
+        for s in self.statuses:
+            try:
+                normalized.append(Status(s))
+            except ValueError:
+                # ignore invalid values
+                continue
+        return normalized

48-56: Mutable default lists in Pydantic model fields

Using [] as a default creates a shared mutable default. We already use Field(default_factory=list) elsewhere; please align these.

Apply this diff:

-class FilterFields(BaseModel):
-    tags: List[str] = []
-    models: List[str] = []
-    owners: List[str] = []
-    statuses: List[str] = []
-    resource_types: List[ResourceType] = []
-    node_names: List[str] = []
-    test_ids: List[str] = []
+class FilterFields(BaseModel):
+    tags: List[str] = Field(default_factory=list)
+    models: List[str] = Field(default_factory=list)
+    owners: List[str] = Field(default_factory=list)
+    statuses: List[str] = Field(default_factory=list)
+    resource_types: List[ResourceType] = Field(default_factory=list)
+    node_names: List[str] = Field(default_factory=list)
+    test_ids: List[str] = Field(default_factory=list)

155-157: Avoid shared default list for statuses

Field(default=_get_default_statuses_filter()) evaluates at import time and shares a list across instances.

Apply this diff:

-    statuses: List[StatusFilterSchema] = Field(default=_get_default_statuses_filter())
+    statuses: List[StatusFilterSchema] = Field(default_factory=_get_default_statuses_filter)

344-359: Mutable default for SelectorFilterSchema.statuses

Same mutable-default issue here; use Field(default_factory=...) to avoid shared state.

Apply this diff:

 class SelectorFilterSchema(BaseModel):
@@
-    statuses: Optional[List[Status]] = [
-        Status.FAIL,
-        Status.ERROR,
-        Status.RUNTIME_ERROR,
-        Status.WARN,
-    ]
+    statuses: Optional[List[Status]] = Field(
+        default_factory=lambda: [
+            Status.FAIL,
+            Status.ERROR,
+            Status.RUNTIME_ERROR,
+            Status.WARN,
+        ]
+    )
🧹 Nitpick comments (8)
elementary/monitor/data_monitoring/schema.py (8)

67-67: Avoid duplicating operator groups

NEGATIVE_OPERATORS duplicates ALL_OPERATORS. Reuse the existing constant to keep semantics in one place.

Apply this diff:

-NEGATIVE_OPERATORS = [FilterType.IS_NOT, FilterType.NOT_CONTAINS]
+NEGATIVE_OPERATORS = ALL_OPERATORS

79-86: cached_property can get stale if FilterSchema.values mutates at runtime

Both caches depend on self.values. If the model is mutated post-init, caches won’t refresh. If these models are meant to be immutable, make that explicit to avoid subtle bugs.

Option A (preferred): freeze the model to protect the caches (v1-style config via our shim):

 class FilterSchema(BaseModel, Generic[ValueT]):
@@
     class Config:
         # Make sure that serializing Enum return values
         use_enum_values = True
+        allow_mutation = False

Option B: if mutation is required, override setattr to invalidate cached attributes when values changes.


88-117: Double-check NOT semantics and case-sensitivity changes

  • For IS_NOT/NOT_CONTAINS you return an empty set if any value violates the filter, effectively requiring all input values to satisfy the negative condition. That’s stricter than “any value is acceptable” semantics. Please confirm this is intentional across all call sites.
  • IS/IS_NOT perform exact equality with the raw objects. For strings, that is now case-sensitive. If previous behavior was case-insensitive, this is a breaking change.

If case-insensitive equality is desired for strings, one approach is to pre-normalize string-only subsets:

# sketch (not a diff): build a mapping of original->lower and compare on the lowered views

119-126: Empty input handling looks correct; broaden type for flexibility

The “negative operator + empty input => True” rule matches the spec in the PR summary. Minor nit: accept Iterable[ValueT] for symmetry with get_matching_values.

Apply this diff:

-    def apply_filter_on_values(self, values: List[ValueT]) -> bool:
+    def apply_filter_on_values(self, values: Iterable[ValueT]) -> bool:

152-158: Specify generics for string-based filters for clarity

test_ids is typed as List[FilterSchema[str]]. Consider aligning tags/owners/models similarly for consistent type hints.

Apply this diff:

-    tags: List[FilterSchema] = Field(default_factory=list)
-    owners: List[FilterSchema] = Field(default_factory=list)
-    models: List[FilterSchema] = Field(default_factory=list)
+    tags: List[FilterSchema[str]] = Field(default_factory=list)
+    owners: List[FilterSchema[str]] = Field(default_factory=list)
+    models: List[FilterSchema[str]] = Field(default_factory=list)

274-279: Trim CLI filter tokens and drop empties

A filter like "tags:a, b" will include " b" with a leading space. Strip tokens and ignore empty entries.

Apply this diff:

     def _match_filter_regex(filter_string: str, regex: Pattern) -> List[str]:
         match = regex.search(filter_string)
         if match:
-            return match.group(1).split(",")
+            return [t.strip() for t in match.group(1).split(",") if t.strip()]
         return []

286-301: Selector singulars should prefer positive filters

When both positive and negative filters exist, picking the first value regardless of type can misrepresent the selection (e.g., tag="foo" when the active filter is is_not foo). Prefer the first IS filter; fall back only if none exist.

Apply this diff:

-        tags = self.tags[0].values[0] if self.tags else None
-        owners = self.owners[0].values[0] if self.owners else None
-        models = self.models[0].values[0] if self.models else None
+        tags = next((f.values[0] for f in self.tags if f.type == FilterType.IS and f.values), None)
+        owners = next((f.values[0] for f in self.owners if f.type == FilterType.IS and f.values), None)
+        models = next((f.values[0] for f in self.models if f.type == FilterType.IS and f.values), None)

305-341: End-to-end behavior verification recommended

Given the set-based rewrite, please add/adjust tests to pin the following:

  • Negative filters with mixed values: e.g., owners is_not ["alice"] against ["bob","alice"] must fail.
  • Empty input with negative filters returns True.
  • Case-sensitivity expectations for IS/IS_NOT on strings.
  • Status normalization once fixed.

I can draft unit tests targeting FiltersSchema.apply and FilterSchema.get_matching_values for these scenarios if helpful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d7a4fdc and 4f8ea18.

📒 Files selected for processing (1)
  • elementary/monitor/data_monitoring/schema.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: code-quality
🔇 Additional comments (2)
elementary/monitor/data_monitoring/schema.py (2)

4-4: Confirm Python version for functools.cached_property

cached_property is available in Python 3.8+. Please confirm our minimum supported Python version; if we support <3.8 anywhere, we’d need a fallback (e.g., backports.cached_property).


5-14: LGTM on typing imports

The expanded typing imports are appropriate for the new generic/set-based approach.

@ofek1weiss ofek1weiss temporarily deployed to elementary_test_env August 24, 2025 13:08 — with GitHub Actions Inactive
@ofek1weiss ofek1weiss merged commit 792cd1b into master Aug 24, 2025
4 of 5 checks passed
@ofek1weiss ofek1weiss deleted the ele-4990-use-sets-for-alert-filters branch August 24, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants