Skip to content

Alerts updates #1398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Alerts updates #1398

wants to merge 8 commits into from

Conversation

parmesant
Copy link
Contributor

@parmesant parmesant commented Aug 4, 2025

  • separated out enums, traits, and structs
  • added ability to pause alert evals (new endpoints to pause and resume)
  • added snooze for notifications (time dependent)
  • removed masking from targets for now

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • New Features

    • Per-alert notification controls (Notify, Mute — time-bound or indefinite, Disabled) with endpoints to update notification state, disable/enable, modify, and force-evaluate alerts.
    • Human-readable alert messages including grouped breach summaries; query execution now returns structured/grouped results.
  • Refactor

    • Alerts API reorganized; alert configs include notification settings and migration initializes them.
    • Alerts listing now filters by tags only.
    • Target responses now return full target data.
    • State names renamed: Resolved → NotTriggered, Silenced → Disabled.

- separated out enums, traits, and structs
- added ability to pause alert evals (new endpoints to pause and resume)
- added snooze for notifications (time dependent)
- removed masking from targets fow now
Copy link
Contributor

coderabbitai bot commented Aug 4, 2025

Walkthrough

This PR splits the alerts subsystem into submodules and adds comprehensive enums, structs, and traits for alerts; changes evaluation to return optional messages and grouped query results; centralizes notification config; updates Target serialization; adds HTTP endpoints for modify/enable/disable/evaluate/update_notification_state; and adjusts storage/manifest behavior and import paths.

Changes

Cohort / File(s) Summary
Alerts enums
src/alerts/alert_enums.rs
New enums and impls: AlertTask, AlertVersion, Severity, LogicalOperator, AlertType, AlertOperator, WhereConfigOperator, AggregateFunction, EvalConfig, AlertState, NotificationState (serde, Display, conversions, mute timestamp logic).
Alerts structs & registry
src/alerts/alert_structs.rs
New data model and creation flow: Alerts, AlertRequest (async into AlertConfig), AlertConfig/AlertConfigResponse, Context, AlertInfo, DeploymentInfo, Conditions/ConditionConfig/FilterConfig, GroupBy, ThresholdConfig, RollingWindow, ForecastConfig, AnomalyConfig, AlertQueryResult/GroupResult, summaries and helpers.
Traits & callable targets
src/alerts/alert_traits.rs
Added MessageCreation; AlertTrait signatures updated (eval_alert -> Result<Option<String>>, added update_notification_state, getter signature changes); AlertManagerTrait gains update_notification_state; new CallableTarget trait.
Threshold alert logic
src/alerts/alert_types.rs
ThresholdAlert adds notification_state/notification_config, eval_alert returns Option<String> with GROUP BY support, state/notification update logic moved in, MessageCreation implemented, conversions updated.
Query/result handling
src/alerts/alerts_utils.rs
execute_alert_query now returns AlertQueryResult; added JSON/Arrow → GroupResult parsers, extract_group_results, and convert_result_to_group_results; evaluate/update_alert_state accept optional message and propagate it.
Notification config & targets
src/alerts/target.rs
TimeoutNotificationConfig (new type), Target no longer stores notification_config, Context carries it; spawn_timeout_task signature updated; masking/serialization and callable payloads adjusted; verifiers updated.
Module reorganization & public API
src/alerts/mod.rs
Alerts split into alert_enums, alert_structs, alert_traits; re-exports added; AlertConfig extended with notification_state/notification_config; AlertError variants updated; new aggregation helper APIs; AlertManagerTrait adds update_notification_state.
HTTP handlers & routes
src/handlers/http/alerts.rs, src/handlers/http/modal/server.rs
List simplified (tags-only); POST validates notification timing and sets retry times; new endpoints: disable, enable, modify, evaluate, update_notification_state; routes updated to require PutAlert; responses return AlertConfigResponse.
Targets HTTP responses
src/handlers/http/targets.rs
Masking removed: handlers now return full Target objects (masked serialization calls commented out).
Storage & manifest path
src/storage/object_storage.rs
Import paths updated; merged stream metadata handling simplified; manifest_path no longer uses hostname, uses node id for ingestor mode; minor control-flow refactors.
Import/path cleanups
src/prism/home/mod.rs, src/query/mod.rs, src/sync.rs
Adjusted imports to new alert submodules; small conditional refactors; no behavior changes beyond paths.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Scheduler
  participant AlertsMgr as Alerts Manager
  participant Alert as AlertTrait impl
  participant Utils as alerts_utils
  participant Storage

  Scheduler->>AlertsMgr: tick(alert_id)
  AlertsMgr->>Alert: eval_alert()
  Alert-->>AlertsMgr: Result<Option<String>> (message?)
  AlertsMgr->>Utils: update_alert_state(alert_id, message?)
  Utils->>Storage: persist state & notification_state
  Storage-->>Utils: ok
  Utils-->>AlertsMgr: updated state
  AlertsMgr-->>Scheduler: done
Loading
sequenceDiagram
  autonumber
  participant Client
  participant HTTP as Server
  participant Handler as alerts.rs
  participant AlertsMgr as Alerts Manager
  participant Storage

  Client->>HTTP: PATCH /alerts/{id}/update_notification_state
  HTTP->>Handler: update_notification_state(id, body)
  Handler->>AlertsMgr: update_notification_state(id, NotificationState)
  AlertsMgr->>Storage: persist notification_state
  Storage-->>AlertsMgr: ok
  AlertsMgr-->>Handler: AlertConfig
  Handler-->>Client: AlertConfigResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable

Poem

A rabbit taps the alerting tree,
Enums and contexts hop with glee.
Mute the chatter, group the counts,
Notifications threaded through the mounts.
Hop on — the burrow ships at three 🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/alerts/target.rs (2)

224-264: Consider adding defensive logging for the Paused state.

The refactoring to use context-based notification config and the new state handling looks good. However, since the comment indicates that evaluations shouldn't run for paused alerts, consider adding a warning log to help diagnose if this case is unexpectedly hit.

-            // do not send out any notifs
-            // (an eval should not have run!)
-            AlertState::Paused => {}
+            // do not send out any notifs
+            // (an eval should not have run!)
+            AlertState::Paused => {
+                warn!("Alert evaluation called for paused alert: {}", context.alert_info.alert_id);
+            }

574-582: Remove commented out Silenced state handling code.

Since the Silenced state has been replaced with Paused, the commented code is no longer needed and should be removed for clarity.

-            AlertState::Paused => alert["labels"]["status"] = "paused".into(), // AlertState::Silenced => {
-                                                                               //     alert["labels"]["status"] = "silenced".into();
-                                                                               //     alert["annotations"]["reason"] =
-                                                                               //         serde_json::Value::String(payload.default_silenced_string());
-                                                                               //     // alert["endsAt"] = Utc::now()
-                                                                               //     //     .to_rfc3339_opts(chrono::SecondsFormat::Millis, true)
-                                                                               //     //     .into();
-                                                                               // }
+            AlertState::Paused => alert["labels"]["status"] = "paused".into(),
src/alerts/alert_structs.rs (2)

85-94: Inconsistent status formatting in message strings.

The default_resolved_string uses backticks around not-triggered while default_paused_string doesn't use them around paused. Consider making the formatting consistent.

     pub(crate) fn default_resolved_string(&self) -> String {
-        format!("{} is now `not-triggered` ", self.alert_info.alert_name)
+        format!("{} is now `not-triggered`", self.alert_info.alert_name)
     }
 
     pub(crate) fn default_paused_string(&self) -> String {
         format!(
-            "{} is now `paused`. No more evals will be run till it is `paused`.",
+            "{} is now `paused`. No more evals will be run while it is paused.",
             self.alert_info.alert_name
         )
     }

Also, there's a typo in line 91: "till it is paused" should probably be "while it is paused" or "until it is resumed".


96-101: Remove commented out code.

The default_silenced_string method is no longer needed since Silenced state has been replaced with Paused.

-    // fn default_silenced_string(&self) -> String {
-    //     format!(
-    //         "Notifications for {} have been silenced ",
-    //         self.alert_info.alert_name
-    //     )
-    // }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 727ab36 and a8e2053.

📒 Files selected for processing (5)
  • src/alerts/alert_enums.rs (1 hunks)
  • src/alerts/alert_structs.rs (1 hunks)
  • src/alerts/alert_types.rs (7 hunks)
  • src/alerts/mod.rs (14 hunks)
  • src/alerts/target.rs (9 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/alerts/alert_enums.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/alert_types.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
📚 Learning: in the parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlle...
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/target.rs
  • src/alerts/mod.rs
  • src/alerts/alert_structs.rs
📚 Learning: datafusion's parquet reader defaults to using view types (utf8view, binaryview) when reading parquet...
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/storage/object_storage.rs:31-41
Timestamp: 2025-06-18T11:15:10.836Z
Learning: DataFusion's parquet reader defaults to using view types (Utf8View, BinaryView) when reading parquet files via the schema_force_view_types configuration (default: true). This means StringViewArray and BinaryViewArray downcasting is required when processing Arrow arrays from DataFusion parquet operations, even though these types are behind nightly feature flags.

Applied to files:

  • src/alerts/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (17)
src/alerts/target.rs (5)

39-43: LGTM! Import reorganization looks good.

The imports are properly structured and the addition of CallableTarget trait aligns with the modularization effort.


47-47: Import change aligns with module restructuring.

The import of AlertState and Context from the parent module is consistent with the new modular structure.


266-266: Method signature update is consistent with struct rename.

The parameter rename from target_timeout to notification_config properly reflects the Timeout to NotificationConfig struct rename.


595-612: Good naming improvement from Timeout to NotificationConfig.

The rename better reflects the struct's purpose in managing notification configurations. The default implementation is appropriate.


441-463: Alert state handling in SlackWebHook looks good.

The implementation correctly handles all three alert states (Triggered, NotTriggered, Paused) with appropriate message generation.

src/alerts/mod.rs (8)

39-56: Excellent modularization of alert-related code.

The separation into alert_enums, alert_structs, and alert_traits modules provides better code organization and maintainability. The public re-exports maintain a clean API surface.


132-136: Migration properly handles new notification fields.

The v1 to v2 migration correctly initializes notification_state to Notify and notification_config to default values, ensuring backward compatibility.


559-567: State mapping correctly updated for new alert states.

The migration properly maps "resolved" and unknown states to NotTriggered, which is a safe default for the new state model.


606-618: Good validation of notification timing configuration.

The validation ensures that the total notification duration doesn't exceed the evaluation frequency, preventing notification spam and timing conflicts.


1056-1075: Well-designed alert pause/resume logic.

The state transition handling correctly manages scheduled tasks - deleting them when pausing and recreating them when resuming. The order of operations ensures consistency.


1077-1111: New notification state update method properly implemented.

The update_notification_state method follows established patterns and enables the snooze feature for notifications without affecting alert evaluation.


1187-1247: Summary generation correctly updated for new alert states.

The changes properly handle the Paused state in alert summaries, including sorting by severity and limiting to top 5 alerts per state.


825-832: Error enum updates are appropriate.

The simplification of InvalidQueryParameter and addition of Unimplemented variant provide better error handling flexibility.

src/alerts/alert_structs.rs (4)

177-219: Well-implemented filter message generation.

The generate_filter_message method provides clear, human-readable descriptions of filter conditions with proper handling of logical operators.


260-286: Robust AlertRequest to AlertConfig conversion.

The async into() method properly validates target IDs and extracts datasets from the query before creating the config. Good error handling and appropriate defaults.


288-309: Complete and well-structured AlertConfig definition.

The struct includes all necessary fields for the enhanced alert system, with proper serialization attributes and the new notification-related fields.


311-330: Well-designed summary structures for alert statistics.

The summary structs provide comprehensive alert statistics with proper support for the new Paused state and severity-based sorting.

- separated `NotificationConfig` from target config
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/alerts/mod.rs (1)

949-953: Critical issue resolved: Early return bug fixed.

This code segment correctly uses continue instead of return Ok(()) when processing paused alerts, which fixes the critical bug identified in previous reviews where loading would stop after the first paused alert.

🧹 Nitpick comments (1)
src/alerts/alert_structs.rs (1)

89-94: Fix typo in paused string message.

Line 91 has a logical error in the message - it says "till it is paused" but should say "till it is resumed" or "unpaused".

-            "{} is now `paused`. No more evals will be run till it is `paused`.",
+            "{} is now `paused`. No more evals will be run till it is `resumed`.",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8e2053 and a331597.

📒 Files selected for processing (5)
  • src/alerts/alert_enums.rs (1 hunks)
  • src/alerts/alert_structs.rs (1 hunks)
  • src/alerts/alert_types.rs (7 hunks)
  • src/alerts/mod.rs (14 hunks)
  • src/alerts/target.rs (9 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/alerts/alert_enums.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/target.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
📚 Learning: in the parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlle...
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/mod.rs
  • src/alerts/alert_types.rs
  • src/alerts/alert_structs.rs
📚 Learning: datafusion's parquet reader defaults to using view types (utf8view, binaryview) when reading parquet...
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/storage/object_storage.rs:31-41
Timestamp: 2025-06-18T11:15:10.836Z
Learning: DataFusion's parquet reader defaults to using view types (Utf8View, BinaryView) when reading parquet files via the schema_force_view_types configuration (default: true). This means StringViewArray and BinaryViewArray downcasting is required when processing Arrow arrays from DataFusion parquet operations, even though these types are behind nightly feature flags.

Applied to files:

  • src/alerts/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
🔇 Additional comments (18)
src/alerts/alert_types.rs (6)

60-61: LGTM: Notification state fields added appropriately.

The addition of notification_state and notification_config fields integrates well with the broader notification management system introduced in this PR.


69-85: LGTM: Improved eval_alert method with message generation.

The refactored method now returns an optional message string instead of a boolean/value tuple, which is more semantically clear and integrates well with the MessageCreation trait. The conditional message generation based on threshold evaluation is correct.


109-120: Validation logic correctly updated for centralized notification config.

The validation now properly uses the alert's own notification_config instead of iterating over target configs, which aligns with the architectural change to centralize notification configuration at the alert level.


279-313: LGTM: MessageCreation trait implementation is well-structured.

The trait implementation correctly provides threshold-specific message creation while appropriately rejecting anomaly and forecast message creation for threshold alerts.


328-329: LGTM: From implementations updated for new notification fields.

The bidirectional conversion between AlertConfig and ThresholdAlert correctly handles the new notification_state and notification_config fields.

Also applies to: 350-351


158-207: Audit ISO8601 format for Snoozed timestamps

The update_state method uses

let till = DateTime::<Utc>::from_str(till_time)
    .map_err(|e| AlertError::CustomError(e.to_string()))?;

which relies on RFC 3339 parsing. Please verify that wherever you construct a NotificationState::Snoozed(till_time) value, you serialize the timestamp using an RFC 3339 formatter (e.g. to_rfc3339()) so that from_str will always succeed. Consider adding a unit test exercising the snooze path in update_state to catch any mismatches in format.

src/alerts/mod.rs (7)

39-56: LGTM: Excellent modularization of alert system components.

The separation into dedicated submodules (alert_enums, alert_structs, alert_traits) greatly improves code organization and maintainability while preserving the public API through re-exports.


132-133: LGTM: Migration correctly adds new notification fields.

The migration logic properly initializes new alerts with NotificationState::Notify and default notification config, ensuring backward compatibility.


563-566: LGTM: State mapping updated for new alert state semantics.

The state extraction correctly maps legacy states to the new enum values, with both "resolved" and unknown states mapping to NotTriggered.


607-618: LGTM: Validation updated for centralized notification config.

The validation logic correctly uses the alert's own notification_config instead of per-target configs, consistent with the architectural changes.


1056-1075: LGTM: Task management properly handles alert state transitions.

The logic correctly manages scheduled tasks when alerts are paused (removes task) or resumed (creates task), ensuring proper resource management.


1077-1111: LGTM: New notification state update method is well-implemented.

The method follows the same pattern as the main state update method, properly handling different alert types and updating both memory and persistent storage.


1187-1242: LGTM: Summary generation updated for new alert states.

The summary correctly uses the new state names (Paused instead of Silenced) and maintains the same severity-based sorting and limiting logic.

src/alerts/alert_structs.rs (5)

40-44: LGTM: Helper struct is appropriately scoped.

BasicAlertFields provides a clean way to group common alert fields during migration, improving code clarity.


46-50: LGTM: Alerts struct properly designed for concurrent access.

The use of RwLock for the alerts HashMap and mpsc::Sender for task management provides appropriate concurrency control.


177-219: LGTM: Filter message generation is comprehensive.

The method properly handles both logical operators (AND/OR) and single conditions, with appropriate null value handling.


260-286: LGTM: AlertRequest conversion is well-structured.

The async conversion method properly validates target IDs, resolves dataset names, and initializes all required fields with appropriate defaults.


305-306: LGTM: New notification fields properly integrated.

The AlertConfig struct correctly includes the new notification_state and notification_config fields, completing the notification management integration.

Signed-off-by: parmesant <[email protected]>
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 5, 2025
add serde default

Signed-off-by: parmesant <[email protected]>
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 5, 2025
Added endpoints-
- PATCH /{alertid}/enable -> enable a disabled alert
- PATCH /{alertid}/disable -> disable an alert
- PATCH /{alertid}/update_notification_state -> update the notification state (notify, mute)
- PUT /{alertid}/evaluate_alert -> run manual eval for an alert
- PUT /{alertid} -> modify an alert
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🔭 Outside diff range comments (2)
src/handlers/livetail.rs (1)

283-283: Do not panic on malformed cookies.

.expect("cookie is parseable") can crash the server if a client sends an invalid Cookie header. Handle parse errors gracefully.

Apply this safer change:

-        .map(|value| value.expect("cookie is parseable"))
+        .filter_map(|res| res.ok())
src/handlers/http/audit.rs (1)

49-51: Avoid panics: replace header/JSON unwraps with fallible parsing

to_str().unwrap() and serde_json::from_str(...).unwrap() on untrusted headers can panic and DoS the service. Handle errors gracefully and log at warn level.

Suggested change (outside the changed hunk, shown for clarity):

if let Some(kinesis_common_attributes) =
    req.request().headers().get(KINESIS_COMMON_ATTRIBUTES_KEY)
{
    if let Ok(attribute_value) = kinesis_common_attributes.to_str() {
        match serde_json::from_str::<Message>(attribute_value) {
            Ok(message) => {
                log_builder = log_builder.with_stream(message.common_attributes.x_p_stream);
            }
            Err(e) => {
                tracing::warn!(
                    "invalid {} header JSON: {e}",
                    KINESIS_COMMON_ATTRIBUTES_KEY
                );
            }
        }
    } else {
        tracing::warn!("non-UTF8 {} header", KINESIS_COMMON_ATTRIBUTES_KEY);
    }
}
🧹 Nitpick comments (7)
src/handlers/livetail.rs (1)

278-289: Avoid intermediate Vec allocation; stream the search.

You can remove the collect and find directly on the iterator to reduce allocations and improve performance.

Apply this refactor:

-    let cookies = header.get_all("cookie");
-    let cookies: Vec<_> = cookies
-        .iter()
-        .filter_map(|value| value.to_str().ok())
-        .flat_map(Cookie::split_parse)
-        .filter_map(|res| res.ok())
-        .collect();
-
-    cookies
-        .into_iter()
-        .find(|cookie| cookie.name() == SESSION_COOKIE_NAME)
+    header
+        .get_all("cookie")
+        .iter()
+        .filter_map(|value| value.to_str().ok())
+        .flat_map(Cookie::split_parse)
+        .filter_map(|res| res.ok())
+        .find(|cookie| cookie.name() == SESSION_COOKIE_NAME)
src/about.rs (1)

155-155: Typo in comment: “Cargol.toml” → “Cargo.toml”

Fix the filename spelling in the comment.

-    // CARGO_PKG_VERSION is set from Cargol.toml file at build time
+    // CARGO_PKG_VERSION is set from Cargo.toml file at build time
src/migration/metadata_migration.rs (1)

79-81: Nit: typo in panic message

"arrray" → "array".

-            panic!("privileges is an arrray")
+            panic!("privileges is an array")
src/catalog/mod.rs (1)

295-296: Nit: variable name typo

"mainfest_file_name" → "manifest_file_name".

-    let mainfest_file_name = manifest_path("").to_string();
-    let path = partition_path(stream_name, lower_bound, upper_bound).join(&mainfest_file_name);
+    let manifest_file_name = manifest_path("").to_string();
+    let path = partition_path(stream_name, lower_bound, upper_bound).join(&manifest_file_name);
src/handlers/http/modal/query/querier_logstream.rs (1)

39-41: Nit: function name typo in import

merge_quried_stats looks misspelled. Consider renaming to merge_queried_stats across the codebase.

I can open a follow-up PR to rename and update call sites.

src/query/stream_schema_provider.rs (1)

175-185: Nit: function name typo

get_hottier_exectuion_plan contains typos (“hottier”, “exectuion”). Consider renaming to get_hot_tier_execution_plan for clarity.

I can generate a patch and a script to update all references safely.

src/alerts/alerts_utils.rs (1)

277-293: Avoid sending empty notification messages

update_alert_state uses Some("".into()) to mark a resolution path.
Because trigger_notif.is_some() is later checked, this still fires a notification – but with an empty body.

Consider using None for “no message” or provide a meaningful “resolved” message instead, e.g.:

-            .update_state(*alert.get_id(), AlertState::NotTriggered, Some("".into()))
+            .update_state(*alert.get_id(), AlertState::NotTriggered, None)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6bfdf8 and 15d8fa8.

📒 Files selected for processing (31)
  • src/about.rs (1 hunks)
  • src/alerts/alert_enums.rs (1 hunks)
  • src/alerts/alert_structs.rs (1 hunks)
  • src/alerts/alert_traits.rs (3 hunks)
  • src/alerts/alert_types.rs (8 hunks)
  • src/alerts/alerts_utils.rs (9 hunks)
  • src/alerts/mod.rs (16 hunks)
  • src/alerts/target.rs (9 hunks)
  • src/catalog/manifest.rs (1 hunks)
  • src/catalog/mod.rs (2 hunks)
  • src/event/format/json.rs (1 hunks)
  • src/handlers/http/alerts.rs (5 hunks)
  • src/handlers/http/audit.rs (1 hunks)
  • src/handlers/http/cluster/mod.rs (2 hunks)
  • src/handlers/http/logstream.rs (1 hunks)
  • src/handlers/http/modal/query/querier_logstream.rs (1 hunks)
  • src/handlers/http/modal/server.rs (2 hunks)
  • src/handlers/http/modal/ssl_acceptor.rs (1 hunks)
  • src/handlers/http/modal/utils/ingest_utils.rs (1 hunks)
  • src/handlers/http/query.rs (1 hunks)
  • src/handlers/livetail.rs (1 hunks)
  • src/hottier.rs (1 hunks)
  • src/livetail.rs (1 hunks)
  • src/metadata.rs (1 hunks)
  • src/migration/metadata_migration.rs (3 hunks)
  • src/otel/otel_utils.rs (1 hunks)
  • src/parseable/streams.rs (2 hunks)
  • src/query/mod.rs (2 hunks)
  • src/query/stream_schema_provider.rs (1 hunks)
  • src/stats.rs (2 hunks)
  • src/storage/object_storage.rs (4 hunks)
✅ Files skipped from review due to trivial changes (8)
  • src/handlers/http/modal/ssl_acceptor.rs
  • src/query/mod.rs
  • src/livetail.rs
  • src/handlers/http/cluster/mod.rs
  • src/metadata.rs
  • src/event/format/json.rs
  • src/hottier.rs
  • src/parseable/streams.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/handlers/http/modal/server.rs
  • src/alerts/target.rs
  • src/storage/object_storage.rs
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.

Applied to files:

  • src/handlers/http/modal/query/querier_logstream.rs
  • src/handlers/http/logstream.rs
  • src/query/stream_schema_provider.rs
  • src/catalog/mod.rs
  • src/stats.rs
  • src/handlers/http/audit.rs
📚 Learning: 2025-06-16T02:04:58.990Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.

Applied to files:

  • src/handlers/http/modal/query/querier_logstream.rs
  • src/handlers/http/logstream.rs
📚 Learning: 2025-06-16T09:50:38.636Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.

Applied to files:

  • src/handlers/http/modal/query/querier_logstream.rs
  • src/catalog/mod.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

Applied to files:

  • src/catalog/mod.rs
  • src/migration/metadata_migration.rs
📚 Learning: 2025-06-16T05:20:18.593Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Applied to files:

  • src/catalog/mod.rs
📚 Learning: 2025-06-18T06:45:37.070Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/storage/object_storage.rs:832-843
Timestamp: 2025-06-18T06:45:37.070Z
Learning: Stats calculation for parquet files in Parseable is done synchronously during the upload process because files are deleted from staging after upload. This prevents race conditions and ensures stats are calculated while files are still available locally.

Applied to files:

  • src/stats.rs
📚 Learning: 2025-05-01T10:27:56.858Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.

Applied to files:

  • src/handlers/http/query.rs
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

  • src/handlers/http/query.rs
  • src/alerts/alert_types.rs
  • src/alerts/alerts_utils.rs
  • src/alerts/alert_structs.rs
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/alert_traits.rs
  • src/alerts/alert_enums.rs
  • src/alerts/alert_types.rs
  • src/alerts/alerts_utils.rs
  • src/handlers/http/alerts.rs
  • src/alerts/mod.rs
  • src/alerts/alert_structs.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.

Applied to files:

  • src/alerts/alerts_utils.rs
  • src/alerts/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (20)
src/handlers/livetail.rs (1)

276-276: Explicit lifetime on Cookie return looks correct.

Returning Option<Cookie<'_>> ties the cookie to the request metadata and avoids unnecessary owning conversions. This aligns with the borrowing model used elsewhere.

src/otel/otel_utils.rs (1)

172-176: Let-chain refactor is correct and keeps NaN/inf filtered out

Logic remains unchanged and is concise. Please ensure the MSRV supports let-chains.

You can reuse the MSRV check script from the previous comment.

src/catalog/manifest.rs (1)

115-121: Refactor is equivalent; confirm desired “vacuous truth” behavior

The chain correctly sets sort_order_id when all row-group sort orders match the last popped one. Note this also holds for a single row group (remainder is empty, .all(...) is true). If that is intended, this is good to go. Otherwise, guard for length > 1.

Would you like a focused unit test covering:

  • 1 row group (sets sort order)
  • N>1 identical sort orders (sets)
  • mixed sort orders (does not set)?
src/handlers/http/audit.rs (1)

54-58: Header parsing let-chain looks good

Concise and equivalent to the previous nested form.

src/stats.rs (2)

163-167: Let-chain for conditional stats persistence is correct

Semantics preserved: only logs when stats exist and put_stats fails.

Ensure MSRV supports let-chains (reuse the script from earlier).


212-214: Let-chain for metric removal keeps prior semantics

Short-circuits on the prefix check and only warns on actual removal errors.

src/migration/metadata_migration.rs (2)

188-192: LGTM: concise nested object match

The chained pattern cleanly ensures both "permission" and "resource" are JSON objects before removing "tag". Behavior unchanged.


88-92: MSRV Already Covers Let-Chains
The repository’s Cargo.toml sets rust-version = "1.88.0" (edition 2024), and let-chains have been stable since Rust 1.65. No further action needed—the current toolchain already supports these constructs.

Likely an incorrect or invalid review comment.

src/catalog/mod.rs (3)

271-287: Logic preserved; consider MSRV for let-chains

Combining the checks reads better and avoids an extra unwrap. Ensure repo MSRV supports let-chains.

(See MSRV verification script posted in metadata_migration.rs comment.)


288-291: LGTM: error handling preserved when setting first_event_at

On failure to update streaminfo, you log and proceed. Matches existing behavior.


369-371: LGTM: simpler Option handling

Replacing is_some/unwrap with if let Some(event) is clearer and safer.

src/handlers/http/modal/utils/ingest_utils.rs (2)

177-181: MSRV note: let-chains in header extraction

The chained condition (starts_with && !IGNORE_HEADERS && let Ok(value) = ...) relies on let-chains. Ensure toolchain compatibility.

(See MSRV verification script posted in metadata_migration.rs comment.)


202-206: LGTM: LOG_SOURCE_KEY propagated to FORMAT_KEY

Cleanly maps the header when present and valid UTF-8.

src/handlers/http/modal/query/querier_logstream.rs (1)

79-83: LGTM: concise hot tier guard

Combining the global manager fetch and existence check is clear and preserves behavior.

Note: uses let-chains; ensure MSRV compatibility (see verification script in metadata_migration.rs).

src/query/stream_schema_provider.rs (1)

566-579: LGTM: hot tier execution plan gated via let-chain

The refactor reads well and behavior is unchanged. Ensure MSRV for let-chains.

(See MSRV verification script posted in metadata_migration.rs comment.)

src/handlers/http/query.rs (1)

349-350: Nice micro-optimisation

Switching to std::slice::from_ref(&body.stream) removes an unnecessary clone() and is clearer. 👍

src/alerts/alert_traits.rs (1)

32-46: Interfaces look solid

The new MessageCreation abstraction cleanly separates message formatting from evaluation logic and keeps the trait object-safe. No concerns here.

src/handlers/http/alerts.rs (1)

95-102: Clarify notification-retry calculation

When eval_freq < notif_freq, integer division yields 0 and you force times = 1. Consider rejecting this configuration instead of silently coercing it – it likely indicates a mis-configured alert.

[ suggest_optional_refactor ]

src/alerts/mod.rs (1)

1018-1023: Early-return bug fixed – looks good

The previous return Ok(()) that aborted loading on the first disabled alert has been replaced with continue, ensuring the rest of the alerts are processed.

src/alerts/alert_structs.rs (1)

178-207: Potential panic in generate_filter_message

When operator is Some, the code indexes condition_config[1] but never checks that at least two conditions exist. An empty or single-element array will cause a panic.

Add a length check:

-let expr1 = &self.condition_config[0];
-let expr2 = &self.condition_config[1];
+if self.condition_config.len() < 2 {
+    return "<invalid filter config>".to_string();
+}
+let expr1 = &self.condition_config[0];
+let expr2 = &self.condition_config[1];

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/alerts/target.rs (1)

381-404: Variable name timeout is misleading

The variable name timeout is confusing since it represents a NotificationConfig. Consider renaming it to better reflect its purpose.

-        let mut timeout = NotificationConfig::default();
+        let mut notification_config = NotificationConfig::default();

         // Default is Infinite in case of alertmanager
         if matches!(value.target, TargetType::AlertManager(_)) {
-            timeout.times = Retry::Infinite
+            notification_config.times = Retry::Infinite
         }

         if let Some(notification_config) = value.notification_config {
             let interval = notification_config.interval.map(|ref interval| *interval);

             if let Some(interval) = interval {
-                timeout.interval = interval
+                notification_config.interval = interval
             }

             if let Some(times) = notification_config.times {
-                timeout.times = Retry::Finite(times)
+                notification_config.times = Retry::Finite(times)
             }
         }
♻️ Duplicate comments (3)
src/alerts/alerts_utils.rs (2)

227-233: Fix compile error: Invalid if let pattern with &&

The pattern if let Some(int_array) = ... && !int_array.is_null(...) is not valid stable Rust syntax. You need to separate the pattern matching from the guard condition.

-} else if let Some(int_array) = column.as_any().downcast_ref::<Int64Array>()
-    && !int_array.is_null(row_index)
-{
-    return int_array.value(row_index) as f64;
+} else if let Some(int_array) = column.as_any().downcast_ref::<Int64Array>() {
+    if !int_array.is_null(row_index) {
+        return int_array.value(row_index) as f64;
+    }
 }

247-252: Fix compile error: Invalid if let pattern with &&

Same issue as above - the if let ... && guard pattern needs to be split.

-} else if let Some(float_array) = column.as_any().downcast_ref::<Float64Array>()
-    && !float_array.is_null(row_index)
-{
-    return float_array.value(row_index).to_string();
+} else if let Some(float_array) = column.as_any().downcast_ref::<Float64Array>() {
+    if !float_array.is_null(row_index) {
+        return float_array.value(row_index).to_string();
+    }
 }
src/alerts/mod.rs (1)

830-836: Fix compile error: Cannot sum iterator of Result values

The code attempts to sum an iterator of Result<usize, AlertError> values, but Result doesn't implement the Sum trait. This will cause a compilation error.

-        _ => {
-            // Use inputs() method to get all input plans
-            plan.inputs()
-                .iter()
-                .map(|input| _get_number_of_agg_exprs(input))
-                .sum()
-        }
+        _ => {
+            // Use inputs() method to get all input plans
+            plan.inputs()
+                .iter()
+                .try_fold(0usize, |acc, input| {
+                    _get_number_of_agg_exprs(input).map(|n| acc + n)
+                })
+        }
🧹 Nitpick comments (3)
src/alerts/target.rs (2)

39-43: Consider consolidating import groups for better organization

The imports could be better organized by grouping related items together. Consider restructuring them for improved readability.

 use crate::{
-    alerts::{AlertError, alert_traits::CallableTarget},
+    alerts::{
+        AlertError, AlertState, Context,
+        alert_traits::CallableTarget,
+    },
     parseable::PARSEABLE,
     storage::object_storage::target_json_path,
 };

574-581: Remove commented-out code

There's commented-out code that should be removed if it's no longer needed. If this functionality might be needed later, consider tracking it in an issue instead.

-            AlertState::Disabled => alert["labels"]["status"] = "disabled".into(), // AlertState::Silenced => {
-                                                                                   //     alert["labels"]["status"] = "silenced".into();
-                                                                                   //     alert["annotations"]["reason"] =
-                                                                                   //         serde_json::Value::String(payload.default_silenced_string());
-                                                                                   //     // alert["endsAt"] = Utc::now()
-                                                                                   //     //     .to_rfc3339_opts(chrono::SecondsFormat::Millis, true)
-                                                                                   //     //     .into();
-                                                                                   // }
+            AlertState::Disabled => alert["labels"]["status"] = "disabled".into(),
src/alerts/alert_structs.rs (1)

466-474: Consider handling edge cases in get_single_value

The method returns 0.0 when groups is empty, which might mask actual query failures. Consider whether this should return an Option or Result<f64, Error> instead.

     pub fn get_single_value(&self) -> f64 {
         if self.is_simple_query && !self.groups.is_empty() {
             self.groups[0].aggregate_value
         } else {
             0.0
         }
     }

This could be:

pub fn get_single_value(&self) -> Option<f64> {
    if self.is_simple_query && !self.groups.is_empty() {
        Some(self.groups[0].aggregate_value)
    } else {
        None
    }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15d8fa8 and 2c523de.

📒 Files selected for processing (6)
  • src/alerts/alert_enums.rs (1 hunks)
  • src/alerts/alert_structs.rs (1 hunks)
  • src/alerts/alerts_utils.rs (8 hunks)
  • src/alerts/mod.rs (16 hunks)
  • src/alerts/target.rs (9 hunks)
  • src/handlers/http/alerts.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/alert_enums.rs
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/alert_structs.rs
  • src/alerts/mod.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.

Applied to files:

  • src/alerts/alerts_utils.rs
🧬 Code Graph Analysis (5)
src/alerts/alert_structs.rs (2)
src/query/mod.rs (1)
  • resolve_stream_names (505-517)
src/alerts/alert_enums.rs (1)
  • from (44-50)
src/alerts/target.rs (2)
src/storage/object_storage.rs (1)
  • target_json_path (1003-1009)
src/alerts/alert_structs.rs (3)
  • default (246-252)
  • default_resolved_string (85-87)
  • default_disabled_string (89-94)
src/alerts/alerts_utils.rs (2)
src/alerts/mod.rs (8)
  • alerts (1254-1258)
  • extract_aggregate_aliases (742-760)
  • serde_json (967-967)
  • value (414-414)
  • value (422-422)
  • value (430-430)
  • value (443-443)
  • value (454-454)
src/alerts/alert_structs.rs (3)
  • new (61-73)
  • new (116-130)
  • new (141-147)
src/handlers/http/alerts.rs (8)
src/alerts/mod.rs (2)
  • alerts (1254-1258)
  • update_notification_state (1174-1207)
src/utils/actix.rs (1)
  • extract_session_key_from_req (51-71)
src/alerts/alert_structs.rs (3)
  • new (61-73)
  • new (116-130)
  • new (141-147)
src/alerts/alert_enums.rs (1)
  • from (44-50)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
src/utils/mod.rs (1)
  • user_auth_for_query (79-88)
src/storage/object_storage.rs (1)
  • alert_json_path (997-999)
src/alerts/mod.rs (4)
src/alerts/alert_enums.rs (9)
  • fmt (64-71)
  • fmt (82-87)
  • fmt (99-105)
  • fmt (126-135)
  • fmt (197-200)
  • fmt (215-224)
  • fmt (243-249)
  • fmt (266-277)
  • from (44-50)
src/alerts/alerts_utils.rs (1)
  • fmt (484-490)
src/alerts/alert_structs.rs (4)
  • default (246-252)
  • new (61-73)
  • new (116-130)
  • new (141-147)
src/alerts/alert_types.rs (2)
  • from (342-360)
  • from (364-382)
🔇 Additional comments (12)
src/alerts/target.rs (1)

224-264: LGTM! Well-structured notification config handling

The refactoring to pass notification_config via context instead of storing it in Target is a good architectural decision. This promotes better separation of concerns and makes the configuration more flexible.

src/alerts/alerts_utils.rs (2)

269-279: Good concurrency improvement!

Excellent work minimizing lock contention by cloning the Arc reference while briefly holding the read lock, then releasing it before performing the state update. This pattern prevents potential deadlocks and improves concurrent access performance.


151-219: LGTM! Well-structured group result conversion

The implementation properly handles both simple and grouped queries, with good error handling and clear logic flow. The use of extract_aggregate_aliases to determine the aggregate column is well thought out.

src/handlers/http/alerts.rs (4)

46-74: LGTM! Simplified list endpoint with proper tag filtering

The simplified implementation without pagination is cleaner and the tag filtering logic is well-structured. The early return for empty tags after splitting prevents potential issues.


84-94: Good validation for notification intervals

The validation checks for non-zero eval frequency and notification interval are essential to prevent division by zero and ensure proper alert functioning.


216-235: Robust notification state parsing

The implementation handles multiple input formats (notify, indefinite, human time, UTC datetime) gracefully with proper error messages. Good defensive programming!


307-323: Good state validation in enable_alert

The check to ensure an alert is disabled before enabling it prevents unnecessary state transitions and maintains state consistency.

src/alerts/mod.rs (3)

1003-1008: LGTM! Proper handling of disabled alerts during load

The code correctly handles disabled alerts by inserting them into the map without creating tasks, properly using continue instead of return.


1036-1095: Excellent concurrency optimization!

Great work on the parallelized authorization checks using futures::future::join_all. This significantly improves performance when listing many alerts. The pattern of collecting alerts first, releasing the lock, then performing expensive auth checks is well-designed.


1146-1168: Well-structured state transition logic

The implementation properly handles the lifecycle of disabled alerts:

  • Deletes the task when disabling
  • Creates the task when re-enabling
  • Validates state transitions to prevent invalid operations
src/alerts/alert_structs.rs (2)

274-321: LGTM! Well-structured alert request validation

The AlertRequest implementation properly validates target IDs and handles different alert types with appropriate error messages. The automatic extraction of datasets from the query is a nice touch.


373-410: Clean response transformation logic

The to_response method provides a clean separation between internal and external representations, properly handling the polymorphic alert_type field.

@parmesant parmesant force-pushed the alerts-updates branch 2 times, most recently from a15baca to 309b973 Compare August 12, 2025 04:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (3)
src/handlers/http/alerts.rs (1)

169-201: Align deletion order with comment and reduce race potential

Comment says “disk, scheduled tasks, then memory” but code does disk → memory → scheduled. Deleting the task before removing from memory can also prevent a transient window where a scheduled task runs while memory entry is already gone.

     // delete from Object Store
     store
         .delete_object(&alert_path)
         .await
         .map_err(AlertError::ObjectStorage)?;
 
-    // delete from memory
-    alerts.delete(alert_id).await?;
-
-    // delete the scheduled task
-    alerts.delete_task(alert_id).await?;
+    // delete the scheduled task
+    alerts.delete_task(alert_id).await?;
+
+    // delete from memory
+    alerts.delete(alert_id).await?;
src/alerts/mod.rs (2)

593-603: Honor the alert’s current notification_state in context

Using NotificationState::Notify here ignores the configured mute/snooze state and can lead to notifications being sent when they shouldn’t. Pass through self.notification_state.

-            AlertInfo::new(
+            AlertInfo::new(
                 self.id,
                 self.title.clone(),
                 self.state,
-                alert_enums::NotificationState::Notify,
+                self.notification_state.clone(),
                 self.severity.clone().to_string(),
             ),

886-911: Return the right HTTP status for authorization and enterprise-only features

  • Unauthorized should map to 403 Forbidden (or 401 if you expect auth challenges), not 400.
  • NotPresentInOSS could be 403 (entitlement) or 501; 403 is usually clearer.
-            Self::Unauthorized => StatusCode::BAD_REQUEST,
+            Self::Unauthorized => StatusCode::FORBIDDEN,
@@
-            Self::NotPresentInOSS(_) => StatusCode::BAD_REQUEST,
+            Self::NotPresentInOSS(_) => StatusCode::FORBIDDEN,
♻️ Duplicate comments (2)
src/handlers/http/alerts.rs (1)

417-445: Authorization added to evaluate_alert — looks good

The earlier auth gap is now addressed by extracting the session and calling user_auth_for_query. Also consider switching to a read lock for ALERTS here.

-    let guard = ALERTS.write().await;
+    let guard = ALERTS.read().await;
src/alerts/mod.rs (1)

829-833: Compile error: sum() over Result<usize, _> doesn’t implement Sum

This repeats a previously raised issue. Use try_fold to accumulate and propagate errors.

-            plan.inputs()
-                .iter()
-                .map(|input| _get_number_of_agg_exprs(input))
-                .sum()
+            plan.inputs()
+                .iter()
+                .try_fold(0usize, |acc, input| {
+                    _get_number_of_agg_exprs(input).map(|n| acc + n)
+                })
🧹 Nitpick comments (7)
src/handlers/http/alerts.rs (4)

120-126: Prefer read lock when only accessing the manager instance

You’re only reading ALERTS to fetch the manager; using a write lock increases contention unnecessarily.

-    let guard = ALERTS.write().await;
+    let guard = ALERTS.read().await;

279-285: Don’t pass empty string as trigger message for state updates

Passing Some("".into()) is ambiguous. If no notification needs to be triggered, pass None.

-    alerts
-        .update_state(alert_id, AlertState::Disabled, Some("".into()))
+    alerts
+        .update_state(alert_id, AlertState::Disabled, None)
         .await?;

317-323: Same here: use None for optional trigger message

-    alerts
-        .update_state(alert_id, AlertState::NotTriggered, Some("".into()))
+    alerts
+        .update_state(alert_id, AlertState::NotTriggered, None)
         .await?;

336-342: Use a read lock for fetching the manager

This handler only needs to read the ALERTS option and then call methods on the manager; a read lock is sufficient.

-    let guard = ALERTS.write().await;
+    let guard = ALERTS.read().await;
src/alerts/mod.rs (3)

742-760: Avoid string matching for aggregate detection where possible

extract_aggregate_aliases falls back to string contains for “count/sum/avg/min/max”. Prefer exhaustive pattern matching over Expr variants to avoid false positives across DataFusion versions.

If supporting all variants is complex right now, consider adding comments and a unit test to guard this behavior across DataFusion upgrades.

Also applies to: 763-808


1000-1005: Comment drift: “paused” vs “disabled”

The comment still mentions “paused” while the state is Disabled. Update the comment to avoid confusion.

-    // Create alert task iff alert's state is not paused
+    // Create alert task iff alert is not disabled

1033-1092: Nice concurrency: release lock before auth checks; minor clone optimization

Releasing the map lock before auth checks is correct. In the tags-empty branch you clone session for each future. You can move one clone outside the iterator to avoid repeated clones.

-            let futures: Vec<_> = all_alerts
+            let session = session.clone();
+            let futures: Vec<_> = all_alerts
                 .into_iter()
                 .map(|alert| async {
-                    if user_auth_for_query(&session.clone(), &alert.query)
+                    if user_auth_for_query(&session, &alert.query)
                         .await
                         .is_ok()
                     {
                         Some(alert)
                     } else {
                         None
                     }
                 })
                 .collect();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a15baca and 309b973.

📒 Files selected for processing (3)
  • src/alerts/alert_structs.rs (1 hunks)
  • src/alerts/mod.rs (16 hunks)
  • src/handlers/http/alerts.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/alert_structs.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/mod.rs
🧬 Code Graph Analysis (2)
src/handlers/http/alerts.rs (6)
src/alerts/mod.rs (2)
  • alerts (1251-1255)
  • update_notification_state (1171-1204)
src/alerts/alert_structs.rs (3)
  • new (61-73)
  • new (116-130)
  • new (141-147)
src/alerts/alert_enums.rs (1)
  • from (44-50)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
src/utils/mod.rs (1)
  • user_auth_for_query (79-88)
src/alerts/mod.rs (3)
src/alerts/alert_enums.rs (9)
  • fmt (64-71)
  • fmt (82-87)
  • fmt (99-105)
  • fmt (126-135)
  • fmt (197-200)
  • fmt (215-224)
  • fmt (243-249)
  • fmt (266-277)
  • from (44-50)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (4)
src/handlers/http/alerts.rs (1)

47-60: Tag filtering flow is solid

Parsing tags from query params with trimming and rejecting empty lists is correct and avoids accidental broad queries. LGTM.

src/alerts/mod.rs (3)

133-137: Good: initialize v2 with sane notification defaults

Setting notification_state: Notify and notification_config: default() during migration is sensible.


638-642: Adding notificationState to summary is a nice touch

This will help clients reflect mute/snooze status in lists.


1279-1339: Updated summary logic looks consistent with new states

Counts and top-5 lists by severity per state look correct. LGTM.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
src/migration/metadata_migration.rs (2)

114-118: Eliminate double-unwrap on server_mode and accept legacy capitalization.

Current code can panic if server_mode exists but is not a string. Also, handle both "All" and "all" to be robust to historical casing.

Apply this diff:

-    let sm = metadata.get("server_mode");
- 
-    if sm.is_none() || sm.unwrap().as_str().unwrap() == "All" {
+    // Guard against non-string values and handle legacy capitalization
+    if matches!(
+        metadata.get("server_mode").and_then(|v| v.as_str()),
+        None | Some("All") | Some("all")
+    ) {
         metadata.insert("server_mode".to_string(), json!(PARSEABLE.options.mode));
     }

62-108: Add unit tests for metadata migration functions

We currently lack tests for src/migration/metadata_migration.rs—all coverage today is in stream migrations. Please add targeted, table-driven unit tests for:

  • v2_v3
    • Version bumps to “v3”
    • “ingester” → “ingestor” normalization in each user’s role array
    • Resulting roles field populated correctly

  • v3_v4
    • Version bumps to “v4”
    server_mode inserted from PARSEABLE.options.mode
    • Handle cases where the field is missing or set to "All"/"all"

  • v4_v5
    • Removal of version field
    • Role/value behavior for "Query"/"Querier"/"all"/"All" once variants are finalized

  • v5_v6
    • Removal of resource.tag
    • Backfill of user_groups

Suggested test locations:

  • src/migration/metadata_migration.rs, around functions v2_v3…v5_v6

If it’s helpful, I can scaffold table-driven Rust tests for each path.

♻️ Duplicate comments (5)
src/handlers/http/alerts.rs (3)

361-365: Apply consistent ceiling division logic as in POST handler

Maintain consistency with the POST handler by using the same ceiling division logic for calculating notification retries.

Apply this diff to match the POST handler logic:

-    let times = if (eval_freq / notif_freq) == 0 {
-        1
-    } else {
-        (eval_freq / notif_freq) as usize
-    };
+    let times_u64 = std::cmp::max(1, (eval_freq + notif_freq - 1) / notif_freq);
+    let times = usize::try_from(times_u64).unwrap_or(usize::MAX);

Consider extracting this into a helper function to avoid duplication:

fn calculate_notification_retries(eval_freq: u64, notif_freq: u64) -> usize {
    let times_u64 = std::cmp::max(1, (eval_freq + notif_freq - 1) / notif_freq);
    usize::try_from(times_u64).unwrap_or(usize::MAX)
}

220-223: Fix: Cannot add std::time::Duration directly to DateTime<Utc>

The code attempts to add a std::time::Duration (from humantime::parse_duration) directly to DateTime<Utc>, which doesn't compile. You need to convert it to chrono::Duration first.

Apply this diff to fix the compilation error:

-            let till_time = if let Ok(duration) =
-                humantime::parse_duration(&new_notification_state.state)
-            {
-                (Utc::now() + duration).to_rfc3339()
+            let till_time = if let Ok(duration) =
+                humantime::parse_duration(&new_notification_state.state)
+            {
+                let chrono_dur = chrono::Duration::from_std(duration)
+                    .map_err(|_| AlertError::InvalidStateChange("Invalid duration overflow".into()))?;
+                (Utc::now() + chrono_dur).to_rfc3339()

98-102: Use ceiling division for consistent notification retry calculations

The current floor division logic can result in times = 0 when notif_freq > eval_freq, which would then be set to 1 but creates semantic inconsistency. Using ceiling division better captures the "number of notifications within one eval window" intent.

Apply this diff to use ceiling division:

-    let times = if (eval_freq / notif_freq) == 0 {
-        1
-    } else {
-        (eval_freq / notif_freq) as usize
-    };
+    let times_u64 = std::cmp::max(1, (eval_freq + notif_freq - 1) / notif_freq);
+    let times = usize::try_from(times_u64).unwrap_or(usize::MAX);
src/alerts/mod.rs (2)

561-566: Handle v1 "silenced" state migration properly

The v1 state "silenced" should map to v2 AlertState::Disabled for backward compatibility, but this case is missing. Without it, silenced alerts will incorrectly be migrated as NotTriggered.

Apply this diff to handle the "silenced" state:

     fn extract_state(alert_json: &JsonValue) -> AlertState {
         let state_str = alert_json["state"].as_str().unwrap_or("resolved");
         match state_str.to_lowercase().as_str() {
             "triggered" => AlertState::Triggered,
             "resolved" => AlertState::NotTriggered,
+            "silenced" => AlertState::Disabled,
             _ => AlertState::NotTriggered,
         }
     }

829-832: Fix: Iterator::sum() over Result values does not compile

The code attempts to call sum() on an iterator of Result<usize, _> values, but Result doesn't implement the Sum trait. This will fail to compile.

Apply this diff using try_fold to properly handle the Result types:

-            plan.inputs()
-                .iter()
-                .map(|input| _get_number_of_agg_exprs(input))
-                .sum()
+            plan.inputs()
+                .iter()
+                .try_fold(0usize, |acc, input| {
+                    _get_number_of_agg_exprs(input).map(|n| acc + n)
+                })
🧹 Nitpick comments (4)
src/migration/metadata_migration.rs (2)

80-80: Typo in panic message.

"arrray" → "array".

-            panic!("privileges is an arrray")
+            panic!("privileges is an array")

88-92: DRY the privilege normalization into a helper.

The "ingester" → "ingestor" normalization appears in two places. Extracting it avoids duplication and future drift.

Example helper and usage (apply outside the selected lines as appropriate):

fn normalize_privilege_value(value: &mut JsonValue) {
    if matches!(value.as_str(), Some("ingester")) {
        *value = JsonValue::String("ingestor".to_string());
    }
}

Then replace the inlined checks with:

if let Some(value) = privilege.get_mut("privilege") {
    normalize_privilege_value(value);
}

Also applies to: 127-131

src/handlers/http/alerts.rs (1)

395-413: Persist to storage before updating in-memory state

The current order updates in-memory state before persisting to storage. If persistence fails, the in-memory state would be inconsistent with stored state. Consider persisting first, then updating memory.

Apply this diff to ensure consistency:

-    // save the new alert in ObjectStore
-    alerts.update(new_alert).await;
     let path = alert_json_path(*new_alert.get_id());
     let store = PARSEABLE.storage.get_object_store();
     let alert_bytes = serde_json::to_vec(&new_alert.to_alert_config())?;
     store.put_object(&path, Bytes::from(alert_bytes)).await?;

+    // update memory only after successful persistence
+    alerts.update(new_alert).await;
+
     // remove the task
     alerts.delete_task(alert_id).await?;

     // remove alert from memory
     alerts.delete(alert_id).await?;

     let config = new_alert.to_alert_config().to_response();

     // start the task
     alerts.start_task(new_alert.clone_box()).await?;
src/alerts/target.rs (1)

452-454: Fix typo in disabled state message

There's a typo in the disabled state message returned by default_disabled_string(). The message says "till the sate is disabled" but should say "state".

The typo is in the alert_structs.rs file as shown in the code snippets. Consider fixing it:

-            "{} is now `disabled`. No more evals will be run till the sate is `disabled`.",
+            "{} is now `disabled`. No more evals will be run till the state is changed.",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 309b973 and 838191a.

📒 Files selected for processing (5)
  • src/alerts/alert_structs.rs (1 hunks)
  • src/alerts/mod.rs (16 hunks)
  • src/alerts/target.rs (9 hunks)
  • src/handlers/http/alerts.rs (7 hunks)
  • src/migration/metadata_migration.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/alerts/alert_structs.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/mod.rs
🧬 Code Graph Analysis (4)
src/migration/metadata_migration.rs (1)
src/handlers/http/modal/mod.rs (1)
  • as_str (237-245)
src/handlers/http/alerts.rs (6)
src/utils/actix.rs (1)
  • extract_session_key_from_req (51-71)
src/alerts/alert_structs.rs (3)
  • new (61-73)
  • new (109-123)
  • new (134-140)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
src/storage/object_storage.rs (1)
  • alert_json_path (997-999)
src/alerts/alerts_utils.rs (1)
  • evaluate_alert (58-64)
src/alerts/target.rs (1)
src/alerts/alert_structs.rs (3)
  • default (239-245)
  • default_resolved_string (85-87)
  • default_disabled_string (89-94)
src/alerts/mod.rs (3)
src/alerts/alert_enums.rs (9)
  • fmt (64-71)
  • fmt (82-87)
  • fmt (99-105)
  • fmt (126-135)
  • fmt (197-200)
  • fmt (215-224)
  • fmt (243-249)
  • fmt (266-277)
  • from (44-50)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (6)
src/migration/metadata_migration.rs (3)

88-92: Good fix: removed unwrap risk and normalized "ingester" → "ingestor".

Using matches!(value.as_str(), Some("ingester")) avoids a panic if the JSON value isn’t a string and correctly rewrites to the canonical "ingestor" (consistent with NodeType::Ingestor -> "ingestor" in src/handlers/http/modal/mod.rs).


127-131: Good fix duplicated for v3_v4.

Same safe guard applied here; avoids panic and normalizes the privilege value consistently.


146-157: Ignore server_mode normalization suggestion

The server_mode field in your metadata migrations is always populated via json!(PARSEABLE.options.mode), which serializes the Mode enum as its variant name (“Query”, “Ingest”, “Index”, “Prism”, “All”). There are no legacy or lowercase “all”/“query” values in storage—those lowercase strings come from the separate NodeType enum used for cluster‐node metadata, not the Mode enum used here. All existing migrations and code/tests expect the uppercase Mode names, so no normalization for lowercase or “querier” variants is needed.

Likely an incorrect or invalid review comment.

src/handlers/http/alerts.rs (1)

416-443: Add authorization check to evaluate_alert endpoint

The evaluate_alert endpoint performs authorization on line 432, which is good. The implementation correctly validates user access before forcing alert evaluation.

src/alerts/mod.rs (2)

1000-1003: LGTM: Correctly handling disabled alerts during load

The code properly checks if an alert is disabled and continues to the next alert instead of creating a task for it. This fixes the early return bug that was present in previous versions.


133-134: LGTM: Proper initialization of notification fields during migration

The v1 to v2 migration correctly initializes the new notification fields with sensible defaults (NotificationState::Notify and NotificationConfig::default()).

coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 12, 2025
deadlock issue with alert updates
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
src/handlers/http/alerts.rs (5)

406-433: Consider adding authorization checks to evaluate_alert

The evaluate_alert endpoint allows forcing immediate alert evaluation. While it does check user access to the alert's query, consider whether additional authorization is needed to prevent abuse of this functionality.

The current implementation validates query access, but you may want to add rate limiting or additional permission checks to prevent users from triggering excessive evaluations.


95-104: Use ceiling division for notification retry calculation

The current floor division can result in times = 0 when notif_freq > eval_freq, which gets coerced to 1. This doesn't accurately represent the retry behavior. Use ceiling division for more accurate retry count calculation.

Apply this fix to use ceiling division:

 // calculate the `times` for notification config
 let eval_freq = alert.get_eval_frequency();
 let notif_freq = alert.notification_config.interval;
-let times = if (eval_freq / notif_freq) == 0 {
-    1
-} else {
-    (eval_freq / notif_freq) as usize
-};
+// Use ceiling division: (a + b - 1) / b for b > 0
+let times_u64 = std::cmp::max(1, (eval_freq + notif_freq - 1) / notif_freq);
+let times = usize::try_from(times_u64).unwrap_or(usize::MAX);

220-224: Fix compilation error: Cannot add std::time::Duration to DateTime

humantime::parse_duration returns std::time::Duration which cannot be directly added to DateTime<Utc>. You need to convert it to chrono::Duration first.

Apply this fix to convert the duration properly:

 let till_time = if let Ok(duration) =
     humantime::parse_duration(&new_notification_state.state)
 {
-    (Utc::now() + duration).to_rfc3339()
+    let chrono_dur = chrono::Duration::from_std(duration)
+        .map_err(|_| AlertError::InvalidStateChange("Invalid duration overflow".into()))?;
+    (Utc::now() + chrono_dur).to_rfc3339()

358-365: Apply consistent ceiling division for notification retries in update path

The notification retry calculation here uses floor division, which differs from the recommended ceiling division approach in the POST handler. This inconsistency could lead to different behavior between creation and update.

Apply the same ceiling division logic:

 // Calculate notification config
 let eval_freq = new_config.get_eval_frequency();
 let notif_freq = new_config.notification_config.interval;
-let times = if (eval_freq / notif_freq) == 0 {
-    1
-} else {
-    (eval_freq / notif_freq) as usize
-};
+// Use ceiling division for consistency
+let times_u64 = std::cmp::max(1, (eval_freq + notif_freq - 1) / notif_freq);
+let times = usize::try_from(times_u64).unwrap_or(usize::MAX);

Consider extracting this calculation into a helper function to avoid duplication between POST and PUT handlers:

fn calculate_notification_retries(eval_freq: u64, notif_freq: u64) -> usize {
    let times_u64 = std::cmp::max(1, (eval_freq + notif_freq - 1) / notif_freq);
    usize::try_from(times_u64).unwrap_or(usize::MAX)
}

389-402: Persist to storage before updating in-memory state

The code updates the in-memory state before persisting to storage. If persistence fails, the system will have inconsistent state between memory and disk. This could lead to data loss on restart.

Reorder operations to persist first:

-// Perform I/O operations
 let path = alert_json_path(*new_alert.get_id());
 let store = PARSEABLE.storage.get_object_store();
 let alert_bytes = serde_json::to_vec(&new_alert.to_alert_config())?;
 store.put_object(&path, Bytes::from(alert_bytes)).await?;

-// Now perform the atomic operations
+// Only update in-memory state after successful persistence
 alerts.delete_task(alert_id).await?;
 alerts.delete(alert_id).await?;
 alerts.update(&*new_alert).await;
 alerts.start_task(new_alert.clone_box()).await?;
src/alerts/mod.rs (2)

564-566: Add support for v1 "silenced" state during migration

The v1 "silenced" state is commented out in the migration logic. This will cause silenced v1 alerts to be incorrectly migrated as NotTriggered instead of Disabled.

Uncomment the "silenced" case to properly handle v1 alerts:

 match state_str.to_lowercase().as_str() {
     "triggered" => AlertState::Triggered,
     "resolved" => AlertState::NotTriggered,
+    "silenced" => AlertState::Disabled,
     _ => AlertState::NotTriggered,
 }

826-832: Iterator::sum() over Result values does not compile

The code attempts to use sum() on an iterator of Result<usize, _> values, but Result doesn't implement the Sum trait. This will cause a compilation error.

Replace with try_fold to properly handle Results:

 // Use inputs() method to get all input plans
-plan.inputs()
-    .iter()
-    .map(|input| _get_number_of_agg_exprs(input))
-    .sum()
+plan.inputs()
+    .iter()
+    .try_fold(0usize, |acc, input| {
+        _get_number_of_agg_exprs(input).map(|n| acc + n)
+    })
🧹 Nitpick comments (1)
src/alerts/target.rs (1)

271-299: Potential deadlock risk when accessing ALERTS within spawned task

The spawned timeout task acquires a read lock on ALERTS to fetch alert state. If this runs while other operations hold write locks on ALERTS, it could contribute to lock contention. Consider passing necessary state data to avoid repeated global lock access.

Consider refactoring to minimize lock access within the spawned task by passing the necessary data upfront or using a more granular locking strategy.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 838191a and 4c5784b.

📒 Files selected for processing (3)
  • src/alerts/mod.rs (18 hunks)
  • src/alerts/target.rs (10 hunks)
  • src/handlers/http/alerts.rs (7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

  • src/alerts/mod.rs
🧬 Code Graph Analysis (2)
src/alerts/mod.rs (3)
src/alerts/alert_enums.rs (9)
  • fmt (64-71)
  • fmt (82-87)
  • fmt (99-105)
  • fmt (126-135)
  • fmt (197-200)
  • fmt (215-224)
  • fmt (243-249)
  • fmt (266-277)
  • from (44-50)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
src/handlers/http/alerts.rs (5)
src/alerts/alert_types.rs (3)
  • from (342-360)
  • from (364-382)
  • update_notification_state (168-179)
src/alerts/alert_enums.rs (1)
  • from (44-50)
src/alerts/alert_traits.rs (2)
  • update_notification_state (52-55)
  • update_notification_state (95-99)
src/utils/mod.rs (1)
  • user_auth_for_query (79-88)
src/storage/object_storage.rs (1)
  • alert_json_path (997-999)
🔇 Additional comments (3)
src/alerts/mod.rs (3)

1000-1003: Incorrect early return when loading disabled alerts

The code uses continue correctly for disabled alerts. The past review comment about using return instead of continue appears to be outdated.


1032-1091: Optimize alert listing to avoid holding locks during expensive operations

Good optimization here - the code correctly releases the read lock before performing expensive authorization checks and uses parallel futures for better performance.


1120-1177: Well-structured state management with proper lock handling

The implementation correctly manages lock scopes to avoid deadlocks and properly handles task creation/deletion based on state transitions. The pattern of collecting necessary data with a read lock, releasing it, performing operations, and then briefly acquiring a write lock for updates is a good practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant