Skip to content

feat: add api for preparing demo data #1370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 15, 2025

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Jul 10, 2025

API GET /api/v1/demodata?action=ingest
to ingest 10k records from the script

GET /api/v1/demodata?action=filters
to create saved sqls and saved filters from the script

GET /api/v1/demodata?action=alerts
to create a webhook target and 3 alerts

GET /api/v1/demodata?action=dashboards
to create a dashboard with 4 tiles - of timeseries, line, area and bar charts

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Introduced a new HTTP endpoint for demo data operations, enabling users to trigger demo data ingestion, filter creation, alert setup, and dashboard management via API.
    • Added asynchronous execution and forwarding of demo data actions with detailed status responses.
    • Provided a comprehensive, configurable utility script for demo data ingestion, filter creation, alert setup, and dashboard management.
    • Enhanced server routing to support demo data API endpoints in multiple server modes.
    • Improved error handling with specific responses for invalid or missing query parameters.

Copy link
Contributor

coderabbitai bot commented Jul 10, 2025

Walkthrough

This change introduces a new demo data ingestion utility, including a Bash script for ingesting, filtering, alerting, and dashboard setup via an HTTP API. The backend adds new HTTP endpoints and handlers to support these operations, with robust error handling and support for multiple server modes. Supporting modules and routes are updated accordingly.

Changes

File(s) Change Summary
resources/ingest_demo_data.sh Added a comprehensive Bash script for demo data ingestion, filter creation, alert setup, and dashboard management with robust retry logic.
src/handlers/http/cluster/mod.rs Added async function to forward demo data requests to a live ingestor node; improved result handling in metrics fetch.
src/handlers/http/demo_data.rs Implemented HTTP handler for demo data actions, executing or forwarding to script based on server mode and action.
src/handlers/http/ingest.rs Added InvalidQueryParameter and MissingQueryParameter variants to PostError with error messages and status codes.
src/handlers/http/mod.rs Declared new public module demo_data.
src/handlers/http/modal/ingest_server.rs
src/handlers/http/modal/query_server.rs
Registered new demo data service route in ingest and query server API scopes.
src/handlers/http/modal/server.rs Added demo data web scope and route; exposed as a public method in Server.
Cargo.toml Added tempfile crate dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Server
    participant BashScript
    participant IngestorNode

    Client->>HTTP_Server: GET /demodata?action=ingest|filters|alerts|dashboards
    HTTP_Server->>HTTP_Server: Parse query params, determine action
    alt action == "ingest" and mode == Ingest/All
        HTTP_Server->>BashScript: Execute ingest_demo_data.sh (ingest)
        BashScript->>HTTP_Server: Ingest demo data via API
    else action == "ingest" and mode == Query/Prism
        HTTP_Server->>IngestorNode: Forward request to ingestor
        IngestorNode->>BashScript: Execute ingest_demo_data.sh (ingest)
    else action == "filters" or "alerts" or "dashboards"
        HTTP_Server->>BashScript: Execute ingest_demo_data.sh (filters/alerts/dashboards)
    end
    HTTP_Server-->>Client: 202 Accepted / Error
Loading

Poem

In the warren, scripts now hop,
Demo data, filters—nonstop!
Alerts ring out with bashful cheer,
Dashboards bloom, the path is clear.
With endpoints new, the code’s delight,
The rabbits code through day and night!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0b5b2c and 898ce0d.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • resources/ingest_demo_data.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
resources/filters_demo_data.sh (1)

38-144: Address static analysis warnings and improve curl function.

The curl_with_retry function is well-implemented but has a couple of issues flagged by static analysis:

curl_with_retry() {
    local url="$1"
    local method="$2"
    local data="$3"
    local content_type="${4:-application/json}"
    local max_retries="${5:-3}"
-   local base_timeout="${6:-15}"
    local retry_count=0
    
    # ... rest of function ...
    
    # Extract status code and response body separately to avoid masking return values
-   local status_code
    if [[ -n "$response" ]]; then
-       status_code=$(echo "$response" | tail -n1)
+       local status_code
+       status_code=$(echo "$response" | tail -n1)
        local response_body=$(echo "$response" | sed '$d')

The base_timeout parameter is unused and can be removed since you're using a fixed timeout calculation pattern.

resources/ingest_demo_data.sh (1)

43-149: Same static analysis issues as filters script - fix unused variable and declaration pattern.

The curl_with_retry function has identical issues to the filters script:

curl_with_retry() {
    local url="$1"
    local method="$2"
    local data="$3"
    local content_type="${4:-application/json}"
    local max_retries="${5:-3}"
-   local base_timeout="${6:-15}"
    local retry_count=0
    
    # ... rest of function ...
    
-   local status_code
    if [[ -n "$response" ]]; then
-       status_code=$(echo "$response" | tail -n1)
+       local status_code
+       status_code=$(echo "$response" | tail -n1)
        local response_body=$(echo "$response" | sed '$d')

Consider extracting this function to a shared utility script to avoid duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4533be and ecbf80d.

📒 Files selected for processing (9)
  • resources/filters_demo_data.sh (1 hunks)
  • resources/ingest_demo_data.sh (1 hunks)
  • src/handlers/http/cluster/mod.rs (2 hunks)
  • src/handlers/http/demo_data.rs (1 hunks)
  • src/handlers/http/ingest.rs (2 hunks)
  • src/handlers/http/mod.rs (1 hunks)
  • src/handlers/http/modal/ingest_server.rs (1 hunks)
  • src/handlers/http/modal/query_server.rs (1 hunks)
  • src/handlers/http/modal/server.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
src/handlers/http/demo_data.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
src/handlers/http/cluster/mod.rs (2)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.
🧬 Code Graph Analysis (2)
src/handlers/http/modal/ingest_server.rs (1)
src/handlers/http/modal/server.rs (2)
  • get_readiness_factory (692-696)
  • get_demo_data_webscope (206-208)
src/handlers/http/modal/query_server.rs (1)
src/handlers/http/modal/server.rs (1)
  • get_demo_data_webscope (206-208)
🪛 Shellcheck (0.10.0)
resources/filters_demo_data.sh

[warning] 44-44: base_timeout appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 100-100: Declare and assign separately to avoid masking return values.

(SC2155)

resources/ingest_demo_data.sh

[warning] 49-49: base_timeout appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 105-105: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
🔇 Additional comments (15)
src/handlers/http/mod.rs (1)

37-37: LGTM! Module declaration follows existing patterns.

The addition of the demo_data module declaration is consistent with the existing module organization and properly integrates the new functionality.

src/handlers/http/modal/query_server.rs (1)

78-79: Consistent service registration for query server.

The demo data web scope service is properly added to the query server configuration, maintaining the existing service chain pattern.

src/handlers/http/modal/ingest_server.rs (1)

81-82: Excellent consistency across server configurations.

The demo data service registration in the ingest server matches the pattern used in the query server, ensuring consistent API availability across different server modes.

src/handlers/http/modal/server.rs (3)

26-26: Import statement is correctly placed.

The import of get_demo_data from the demo_data module is properly placed with other handler imports.


101-102: Service registration maintains consistency.

The demo data web scope service is appropriately added to the main server configuration, consistent with the pattern used in query and ingest servers.


206-208: Web scope definition aligns with API requirements.

The get_demo_data_webscope() function correctly creates a scope at /demodata with a GET route, matching the API specification in the PR objectives.

src/handlers/http/ingest.rs (2)

496-499: Well-designed error variants for parameter validation.

The new InvalidQueryParameter and MissingQueryParameter error variants provide clear, descriptive error handling for query parameter validation scenarios. The error messages are user-friendly and appropriate.


529-530: Correct HTTP status code mapping.

Both new error variants appropriately map to BAD_REQUEST (400) status code, which is the correct HTTP response for client-side parameter validation errors.

src/handlers/http/cluster/mod.rs (2)

159-213: LGTM! Well-implemented ingestor forwarding function.

The function correctly:

  • Validates ingestor availability through liveness checks
  • Uses the first available ingestor for load distribution
  • Handles authentication and proper URL construction
  • Provides comprehensive error handling

The design choice to break after finding the first live ingestor is appropriate for this use case.


978-978: Good generalization of the pattern match.

Changing from Ok(None) to Ok(_) appropriately handles all successful cases that don't contain metrics without being overly specific.

resources/filters_demo_data.sh (2)

167-216: Well-structured filter creation with good error handling.

The SQL filter creation logic is solid:

  • Good variety of realistic filter queries
  • Proper JSON escaping for quotes
  • Retry logic with appropriate delays
  • Clear progress reporting

The filter definitions cover useful monitoring scenarios and will provide good demo data for users.


218-271: Comprehensive saved filter implementation.

The saved filter creation includes proper:

  • Visible column configuration
  • Group-by field specification
  • JSON array construction for columns
  • Consistent error handling pattern

The filters complement the SQL filters well and provide a complete demo experience.

resources/ingest_demo_data.sh (3)

161-189: Excellent performance optimization with static data pre-computation.

The approach of pre-computing all random data once and cycling through arrays is a smart optimization that will significantly improve ingestion performance:

  • Eliminates runtime random number generation overhead
  • Provides consistent, predictable data patterns
  • Reduces CPU usage during batch generation
  • Maintains data variety through array cycling

This is a well-engineered solution for high-throughput demo data generation.


192-263: Robust batch generation with good validation.

The generate_batch function includes:

  • Input validation for batch size and array initialization
  • Efficient array cycling with modulo operations
  • Proper JSON escaping for user agent strings
  • Severity mapping based on HTTP status codes
  • Size validation of generated output

The single-line JSON generation approach is efficient while maintaining readability.


312-377: Well-implemented main execution loop with comprehensive monitoring.

The execution flow provides:

  • Progress tracking with rate calculations
  • Proper error handling and early exit on failures
  • Batch size adjustment for remaining records
  • Performance statistics and optimization details
  • Appropriate delays to prevent server overload

The final performance tips output is a nice touch for users understanding the optimizations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
src/handlers/http/demo_data.rs (2)

33-90: LGTM with remaining security considerations.

The handler implementation correctly validates query parameters, uses appropriate async execution patterns, and returns HTTP 202 for fire-and-forget operations. However, there are still security concerns that need to be addressed.

The implementation addresses most requirements but still has the credential exposure issue mentioned in previous reviews. Consider implementing credential passing via secure temporary files instead of environment variables.


92-151: Improve security by avoiding credential exposure via environment variables.

The script execution function has good error handling and temporary file management, but still passes sensitive credentials through environment variables (lines 125-127), which can be visible in process lists.

Consider this more secure approach for passing credentials:

async fn execute_demo_script(
    action: &str,
    url: &str,
    username: &str,
    password: &str,
) -> Result<(), anyhow::Error> {
+   // Create secure temporary file for credentials
+   let cred_file = tempfile::NamedTempFile::new()
+       .map_err(|e| anyhow::anyhow!("Failed to create credentials file: {}", e))?;
+   
+   let cred_content = format!("P_URL={}\nP_USERNAME={}\nP_PASSWORD={}", url, username, password);
+   fs::write(cred_file.path(), cred_content)
+       .map_err(|e| anyhow::anyhow!("Failed to write credentials: {}", e))?;
+   
+   #[cfg(unix)]
+   {
+       let mut permissions = fs::metadata(cred_file.path())?.permissions();
+       permissions.set_mode(0o600); // Read/write for owner only
+       fs::set_permissions(cred_file.path(), permissions)?;
+   }

    // ... existing temp file creation code ...

    let output = Command::new("bash")
        .arg(temp_path)
-       .env("P_URL", url)
-       .env("P_USERNAME", username)
-       .env("P_PASSWORD", password)
+       .env("CRED_FILE", cred_file.path())
        .env("DEMO_ACTION", action)
        .output()
        // ... rest unchanged

Then update the scripts to source credentials from the secure file instead of environment variables.

resources/ingest_demo_data.sh (1)

15-95: Duplicate code detected - consider shared utilities.

This curl_with_retry function is identical to the one in resources/filters_demo_data.sh. Consider extracting it to a shared utility script to follow DRY principles.

Refer to the refactoring suggestion in the filters script review for creating a shared common_utils.sh file.

🧹 Nitpick comments (2)
resources/filters_demo_data.sh (1)

13-93: Consider extracting shared utility functions.

The curl_with_retry function is duplicated between this script and resources/ingest_demo_data.sh. Consider creating a shared utility script to reduce code duplication.

Create a shared utility file resources/common_utils.sh:

#!/usr/bin/env bash
# Common utility functions for demo scripts

curl_with_retry() {
    # ... move the current implementation here
}

# Other shared functions

Then source it in both scripts:

#!/usr/bin/env bash
+source "$(dirname "$0")/common_utils.sh"

# Configuration
P_URL=${P_URL:-"http://localhost:8000"}
# ... rest of script
resources/ingest_demo_data.sh (1)

208-210: Consider utilizing computed metrics for monitoring.

The static analysis correctly identifies that progress, rate, and FINAL_RATE variables are computed but not used. While they're not currently utilized, they provide valuable monitoring data.

Consider adding optional verbose output to utilize these metrics:

# Progress tracking
progress=$((RECORDS_SENT * 100 / TARGET_RECORDS))
elapsed=$(($(date +%s) - START_TIME))
rate=$((RECORDS_SENT / (elapsed == 0 ? 1 : elapsed)))

+# Optional progress output (uncomment for verbose mode)
+# echo "Progress: ${progress}% (${RECORDS_SENT}/${TARGET_RECORDS}) - Rate: ${rate} records/sec"

And at the end:

# Final statistics
TOTAL_TIME=$(($(date +%s) - START_TIME))
FINAL_RATE=$((TARGET_RECORDS / (TOTAL_TIME == 0 ? 1 : TOTAL_TIME)))

+# echo "Ingestion completed: ${TARGET_RECORDS} records in ${TOTAL_TIME}s (${FINAL_RATE} records/sec)"

Also applies to: 230-230

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecbf80d and a7387c1.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • Cargo.toml (1 hunks)
  • resources/filters_demo_data.sh (1 hunks)
  • resources/ingest_demo_data.sh (1 hunks)
  • src/handlers/http/cluster/mod.rs (2 hunks)
  • src/handlers/http/demo_data.rs (1 hunks)
  • src/handlers/http/ingest.rs (2 hunks)
  • src/handlers/http/mod.rs (1 hunks)
  • src/handlers/http/modal/ingest_server.rs (1 hunks)
  • src/handlers/http/modal/query_server.rs (1 hunks)
  • src/handlers/http/modal/server.rs (3 hunks)
✅ Files skipped from review due to trivial changes (2)
  • Cargo.toml
  • src/handlers/http/modal/query_server.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/handlers/http/modal/ingest_server.rs
  • src/handlers/http/mod.rs
  • src/handlers/http/ingest.rs
  • src/handlers/http/modal/server.rs
  • src/handlers/http/cluster/mod.rs
🧰 Additional context used
🧠 Learnings (1)
src/handlers/http/demo_data.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
🧬 Code Graph Analysis (1)
resources/ingest_demo_data.sh (1)
resources/filters_demo_data.sh (1)
  • curl_with_retry (13-93)
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh

[warning] 208-208: progress appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 210-210: rate appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 230-230: FINAL_RATE appears unused. Verify use (or export if used externally).

(SC2034)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
🔇 Additional comments (4)
resources/filters_demo_data.sh (2)

13-93: Excellent retry logic implementation.

The curl_with_retry function has robust error handling, timeout management, and proper cleanup of temporary files. The implementation correctly handles different HTTP status codes and provides incremental timeout increases on retries.


96-176: Well-structured filter creation with proper JSON handling.

Both create_sql_filters and create_saved_filters functions demonstrate good practices:

  • Proper JSON escaping for user input
  • Efficient array-based configuration
  • Rate limiting between requests
  • Error counting for monitoring
resources/ingest_demo_data.sh (2)

104-111: Efficient synthetic data generation approach.

The pre-computation of static arrays for cycling is an excellent optimization that avoids repeated random generation during batch creation. The use of realistic data patterns (IP addresses, timestamps, trace IDs) makes the demo data valuable for testing.


122-186: Robust batch generation with proper JSON construction.

The generate_batch function demonstrates good practices:

  • Proper modulo cycling through pre-computed arrays
  • JSON escaping for user agents
  • Realistic log entry structure with all required fields
  • Severity mapping based on HTTP status codes

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 11, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)

72-74: Address static analysis warning about variable assignment.

The static analysis tool correctly identified a potential issue where the assignment and command substitution could mask return values.

Apply this diff to address the warning:

-            if [[ -n "$response" ]]; then
-                status_code=$(echo "$response" | tail -n1)
-                local response_body=$(echo "$response" | sed '$d')
+            if [[ -n "$response" ]]; then
+                local status_code
+                status_code=$(echo "$response" | tail -n1)
+                local response_body
+                response_body=$(echo "$response" | sed '$d')
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7387c1 and 1ccad5b.

📒 Files selected for processing (2)
  • resources/ingest_demo_data.sh (1 hunks)
  • src/handlers/http/demo_data.rs (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh

[warning] 74-74: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (10)
src/handlers/http/demo_data.rs (3)

33-45: Query parameter validation looks good.

The implementation correctly validates query parameters, handles empty query maps, and extracts the required action parameter with proper error handling.


52-82: Action routing and mode handling implementation is well-structured.

The action-based routing correctly handles different modes:

  • Ingest/All modes execute scripts locally
  • Query/Prism modes forward to ingestor services
  • Proper error handling for unsupported modes

The asynchronous execution pattern with fire-and-forget for script execution is appropriate for this use case.


85-139: Script execution implementation addresses previous security concerns.

The implementation now uses temporary files instead of constructing script paths, which addresses the directory traversal concerns from previous reviews. The error handling is comprehensive, covering file creation, permission setting, script execution, and exit status validation.

The approach of writing the embedded script to a temporary file and making it executable is secure and appropriate.

resources/ingest_demo_data.sh (7)

1-14: Configuration setup is well-structured.

The script provides sensible defaults for all configuration variables and uses proper environment variable handling.


15-105: Robust HTTP retry logic with comprehensive error handling.

The curl_with_retry function implements:

  • Proper timeout escalation
  • Temporary file management for large payloads
  • HTTP status code validation
  • Clean resource cleanup

The implementation handles both string data and file-based data appropriately.


109-200: Efficient data generation with pre-computed arrays.

The ingestion functionality uses smart optimizations:

  • Pre-computed arrays for cycling through data
  • Batch generation with realistic log patterns
  • Proper JSON escaping and formatting

The approach of generating 100 pre-computed values and cycling through them is efficient for demo data generation.


214-255: Well-implemented ingestion loop with progress tracking.

The main ingestion function provides:

  • Progress reporting with rate calculations
  • Proper batch size handling for remaining records
  • Error handling with early exit on failures
  • Final statistics reporting

The implementation correctly handles edge cases like remaining records less than batch size.


259-356: Comprehensive filter creation with proper JSON handling.

The filters functionality creates both SQL and saved filters with:

  • Proper JSON escaping for queries and descriptions
  • Comprehensive set of useful demo filters
  • Success tracking and error reporting
  • Appropriate delays between requests

The filter definitions are well-thought-out and provide good examples for demo purposes.


360-456: Alert creation with proper target management.

The alerts functionality:

  • Creates webhook targets and extracts IDs correctly
  • Defines realistic alert conditions
  • Implements proper error handling and response validation
  • Uses the extracted target ID for alert association

The alert definitions are practical and demonstrate the alerting capabilities well.


484-520: Clean main execution logic with proper action handling.

The main function provides:

  • Clear action-based routing
  • Sequential execution for "all" action with appropriate delays
  • Help functionality
  • Proper error handling for unknown actions

The script structure is well-organized and user-friendly.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 12, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)

79-82: SC2155: avoid masking return values

Declaring and assigning in the same statement can mask errors. Split the declaration:

-                status_code=$(echo "$response" | tail -n1)
-                local response_body=$(echo "$response" | sed '$d')
+                local status_code
+                status_code=$(echo "$response" | tail -n1)
+                local response_body
+                response_body=$(echo "$response" | sed '$d')

Minor, but eliminates the ShellCheck warning and improves readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ccad5b and 1ed1c97.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh

[warning] 81-81: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
resources/ingest_demo_data.sh (1)

210-218: send_batch passes a superfluous argument

curl_with_retry (current definition) ignores the 6th positional parameter, so the hard-coded 15 here is dead code and confuses future maintainers:

- curl_with_retry "$P_URL/api/v1/ingest" "POST" "$data" "application/json" 3 15
+ curl_with_retry "$P_URL/api/v1/ingest" "POST" "$data" "application/json" 3

After harmonising the function signature (see earlier comment), adjust all similar calls.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)

16-24: curl_with_retry interface is still under-specified

The 6-th positional parameter (data_file) is now consumed (local data_file="$6"), but neither the header comments nor the call-sites give any hint that such a parameter exists. Every call passes five arguments at most, which is fine, yet the silent extra slot makes misuse easy.

-# Common curl function with retry logic
-curl_with_retry() {
+# Common curl wrapper
+#   $1 url (required)
+#   $2 method (POST|PUT) (required)
+#   $3 data string (optional)
+#   $4 content-type (default: application/json)
+#   $5 max retries (default: 3)
+#   $6 data file path (optional)
+curl_with_retry() {

Consider:

  1. Documenting the full signature as above.
  2. Guarding against callers that supply both $3 and $6.
  3. Using named parameters (e.g. curl_with_retry --url … --data-file …) to make misuse impossible.

77-96: HTTP-code extraction drops the body when it matters

You rely on appending a line break and %{http_code} then stripping it back with tail -n1 / sed '$d'.
Problems:

  • Bodies that naturally end with a newline will shift the index (status code becomes empty).
  • Binary responses (even small PNGs from dashboards) may embed newlines, breaking the split.

Prefer the built-in separation:

tmp_body=$(mktemp)
status_code=$(curl "${curl_args[@]}" -o "$tmp_body" -w '%{http_code}')
curl_exit_code=$?

# … use "$status_code" and "$(cat "$tmp_body")"
rm -f "$tmp_body"

This is more robust and removes the need for sed and tail.


831-840: show_usage implies positional argument, but the script uses $ACTION env var

The help text suggests script.sh ingest, yet main() never looks at $1; it only inspects the ACTION environment variable. Update the usage string (or parse $1) to avoid user confusion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ed1c97 and a575504.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh

[warning] 81-81: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: coverage

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)

129-134: Use a stronger entropy source for IDs.

$((RANDOM * RANDOM)) yields at most a 30-bit value, so the 32-hex-digit
trace-/span-IDs are heavily zero-padded and far from unique at 10 000
records. Prefer /dev/urandom or uuidgen for realistic IDs.

-        TRACE_IDS+=("$(printf '%032x' $((RANDOM * RANDOM)))")
-        SPAN_IDS+=("$(printf '%016x' $((RANDOM * RANDOM)))")
+        TRACE_IDS+=("$(uuidgen | tr -d '-')")                 # 32-hex chars
+        SPAN_IDS+=("$(xxd -l 8 -p /dev/urandom)")             # 16-hex chars
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdf3f8d and 1ad8d0e.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh

[warning] 84-84: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)

128-135: UNIX_NANOS lacks entropy – many records share identical seconds component

date +%s is evaluated once per loop so every 100 generated records have the same epoch-seconds; only the nanosecond suffix varies. For large ingests this causes artificial “spiky” time-series clusters.

-        UNIX_NANOS+=("$(date +%s)$(printf '%09d' $((RANDOM % 1000000000)))")
+        now_s=$(date +%s)
+        UNIX_NANOS+=("$now_s$(printf '%09d' $((RANDOM % 1000000000)))")

(Or offset now_s by the same random seconds you already apply to the RFC-3339 timestamp to keep the two fields consistent.)


195-199: Manual JSON escaping is brittle – use jq -R @json instead

The current two-stage sed chain misses back-slashes, control chars, etc. A safer, one-liner escape:

-        escaped_user_agent=$(echo "$user_agent" | sed 's/"/\\"/g' | sed "s/'/\\'/g")
+        escaped_user_agent=$(printf '%s' "$user_agent" | jq -Rsa @json | sed -e 's/^"//' -e 's/"$//')

This guarantees valid UTF-8 / JSON for any header value.


41-48: Nit: $AUTH_HEADER is constant – switch to -H@"<(printf …)" to avoid arg-length limits

With long Basic-Auth tokens the fixed -H "$AUTH_HEADER" may hit argv size limits on macOS/BSD. Using a process-substitution file avoids quoting issues and is more portable:

curl_args+=( -H@"<(printf 'Authorization: Basic %s' "$BASIC")" )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ad8d0e and 82d23cf.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh

[warning] 85-85: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)

41-44: Duplicate -w option – first instance is silently ignored

curl_args adds -w '\n%{http_code}' (line 43) and later adds another
-w "%{http_code}" (line 82). Only the last -w is honored, so the earlier
one is dead code and misleading.

-            -w '\n%{http_code}'

Also applies to: 82-83

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82d23cf and 3953f3c.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)

391-404: Inconsistent use of retry / timeout logic for target creation

create_target falls back to a bare curl -s … call.
Unlike the rest of the script it:

  • lacks retry/back-off on transient failures,
  • ignores connect/read time-outs,
  • omits 2xx status-code validation.

Replacing it with curl_with_retry keeps error-handling behaviour uniform.

-    response=$(curl -s -H "Content-Type: application/json" -H "$AUTH_HEADER" -X POST "$P_URL/api/v1/targets" -d @- << EOF
+    response=$(curl_with_retry "$P_URL/api/v1/targets" "POST" @- "application/json" 3 << EOF

(You may need to extend curl_with_retry to support stdin payloads, or write the JSON to a temp file and pass its path.)


211-215: Manual JSON string escaping is fragile

Using sed to escape quotes in user_agent (and later for SQL/filter JSON) risks subtle breakage for other special characters (backslashes, control bytes, UTF-8).
A safer pattern is to delegate JSON encoding to a tool such as jq:

json_record=$(jq -n --arg body "$body" \
                    --arg time "$unix_nano" \
                    --arg ua "$user_agent" \
'{
  body: $body,
  time_unix_nano: $time,
  user_agent_original: $ua
  # … other fields …
}')

This prevents malformed payloads if input ever contains unexpected characters.


145-151: date invoked 200× inside the outer loop – precompute once

TIMESTAMPS and UNIX_NANOS are generated with separate date invocations for each of the 100 iterations. While acceptable for 10 k records, the cost grows quadratically if you raise TARGET_RECORDS later.

Consider computing base_now=$(date +%s) once and deriving offsets with arithmetic (printf -v ts … $((base_now-rand))) to remove the per-iteration process spawn.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3953f3c and 39077f1.

📒 Files selected for processing (1)
  • resources/ingest_demo_data.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.509Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)

1-2: Enable strict Bash modes for more predictable failure handling

Adding set -euo pipefail (or at least set -e and set -o pipefail) right after the she-bang ensures the script aborts on the first failed command, on undefined variables, and on pipeline errors – preventing silent mis-ingestion when e.g. curl fails inside a subshell.

 #!/usr/bin/env bash
+set -euo pipefail

20-33: curl_with_retry – document/validate the 6-argument contract

The function now expects an optional sixth data_file argument, but call-sites (e.g. send_batch, create_alerts) still pass five. That’s fine, yet without a comment or an if (( $# < 5 || $# > 6 )) guard, future maintainers may unintentionally pass seven+ params and silently shift $data_file.

Consider adding a short doc-string or an argument-count check to make the interface self-evident.


397-405: Gracefully handle missing jq dependency

The script now relies on jq to extract IDs, but there’s no pre-flight check. On machines without jq, create_target (and dashboard creation) will fail with hard-to-read errors.

command -v jq >/dev/null 2>&1 || {
  log_error "jq is required but not installed – aborting"; return 1;
}

Insert this check once near the top (e.g. after the config section) to fail fast and inform users what they’re missing.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39077f1 and b0b5b2c.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • Cargo.toml (1 hunks)
  • resources/ingest_demo_data.sh (1 hunks)
  • src/handlers/http/cluster/mod.rs (2 hunks)
  • src/handlers/http/demo_data.rs (1 hunks)
  • src/handlers/http/ingest.rs (2 hunks)
  • src/handlers/http/mod.rs (1 hunks)
  • src/handlers/http/modal/ingest_server.rs (1 hunks)
  • src/handlers/http/modal/query_server.rs (1 hunks)
  • src/handlers/http/modal/server.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • Cargo.toml
  • src/handlers/http/modal/ingest_server.rs
  • src/handlers/http/modal/query_server.rs
  • src/handlers/http/mod.rs
  • src/handlers/http/ingest.rs
  • src/handlers/http/modal/server.rs
  • src/handlers/http/demo_data.rs
  • src/handlers/http/cluster/mod.rs
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (5)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.509Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka aarch64-apple-darwin

@nitisht nitisht merged commit 359995e into parseablehq:main Jul 15, 2025
13 of 14 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants