feat: add api for preparing demo data #1370

nikhilsinhaparseable · 2025-07-10T18:44:51Z

API GET /api/v1/demodata?action=ingest
to ingest 10k records from the script

GET /api/v1/demodata?action=filters
to create saved sqls and saved filters from the script

GET /api/v1/demodata?action=alerts
to create a webhook target and 3 alerts

GET /api/v1/demodata?action=dashboards
to create a dashboard with 4 tiles - of timeseries, line, area and bar charts

Summary by CodeRabbit

New Features
- Introduced a new HTTP endpoint for demo data operations, enabling users to trigger demo data ingestion, filter creation, alert setup, and dashboard management via API.
- Added asynchronous execution and forwarding of demo data actions with detailed status responses.
- Provided a comprehensive, configurable utility script for demo data ingestion, filter creation, alert setup, and dashboard management.
- Enhanced server routing to support demo data API endpoints in multiple server modes.
- Improved error handling with specific responses for invalid or missing query parameters.

coderabbitai · 2025-07-10T18:45:02Z

Walkthrough

This change introduces a new demo data ingestion utility, including a Bash script for ingesting, filtering, alerting, and dashboard setup via an HTTP API. The backend adds new HTTP endpoints and handlers to support these operations, with robust error handling and support for multiple server modes. Supporting modules and routes are updated accordingly.

Changes

File(s)	Change Summary
resources/ingest_demo_data.sh	Added a comprehensive Bash script for demo data ingestion, filter creation, alert setup, and dashboard management with robust retry logic.
src/handlers/http/cluster/mod.rs	Added async function to forward demo data requests to a live ingestor node; improved result handling in metrics fetch.
src/handlers/http/demo_data.rs	Implemented HTTP handler for demo data actions, executing or forwarding to script based on server mode and action.
src/handlers/http/ingest.rs	Added `InvalidQueryParameter` and `MissingQueryParameter` variants to `PostError` with error messages and status codes.
src/handlers/http/mod.rs	Declared new public module `demo_data`.
src/handlers/http/modal/ingest_server.rs src/handlers/http/modal/query_server.rs	Registered new demo data service route in ingest and query server API scopes.
src/handlers/http/modal/server.rs	Added demo data web scope and route; exposed as a public method in `Server`.
Cargo.toml	Added `tempfile` crate dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Server
    participant BashScript
    participant IngestorNode

    Client->>HTTP_Server: GET /demodata?action=ingest|filters|alerts|dashboards
    HTTP_Server->>HTTP_Server: Parse query params, determine action
    alt action == "ingest" and mode == Ingest/All
        HTTP_Server->>BashScript: Execute ingest_demo_data.sh (ingest)
        BashScript->>HTTP_Server: Ingest demo data via API
    else action == "ingest" and mode == Query/Prism
        HTTP_Server->>IngestorNode: Forward request to ingestor
        IngestorNode->>BashScript: Execute ingest_demo_data.sh (ingest)
    else action == "filters" or "alerts" or "dashboards"
        HTTP_Server->>BashScript: Execute ingest_demo_data.sh (filters/alerts/dashboards)
    end
    HTTP_Server-->>Client: 202 Accepted / Error

Poem

In the warren, scripts now hop,
Demo data, filters—nonstop!
Alerts ring out with bashful cheer,
Dashboards bloom, the path is clear.
With endpoints new, the code’s delight,
The rabbits code through day and night!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0b5b2c and 898ce0d.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

resources/ingest_demo_data.sh

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: coverage

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

resources/filters_demo_data.sh (1)
38-144: Address static analysis warnings and improve curl function.

The curl_with_retry function is well-implemented but has a couple of issues flagged by static analysis:
curl_with_retry() {
    local url="$1"
    local method="$2"
    local data="$3"
    local content_type="${4:-application/json}"
    local max_retries="${5:-3}"
-   local base_timeout="${6:-15}"
    local retry_count=0
    
    # ... rest of function ...
    
    # Extract status code and response body separately to avoid masking return values
-   local status_code
    if [[ -n "$response" ]]; then
-       status_code=$(echo "$response" | tail -n1)
+       local status_code
+       status_code=$(echo "$response" | tail -n1)
        local response_body=$(echo "$response" | sed '$d')
The base_timeout parameter is unused and can be removed since you're using a fixed timeout calculation pattern.
resources/ingest_demo_data.sh (1)
43-149: Same static analysis issues as filters script - fix unused variable and declaration pattern.

The curl_with_retry function has identical issues to the filters script:
curl_with_retry() {
    local url="$1"
    local method="$2"
    local data="$3"
    local content_type="${4:-application/json}"
    local max_retries="${5:-3}"
-   local base_timeout="${6:-15}"
    local retry_count=0
    
    # ... rest of function ...
    
-   local status_code
    if [[ -n "$response" ]]; then
-       status_code=$(echo "$response" | tail -n1)
+       local status_code
+       status_code=$(echo "$response" | tail -n1)
        local response_body=$(echo "$response" | sed '$d')
Consider extracting this function to a shared utility script to avoid duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4533be and ecbf80d.

📒 Files selected for processing (9)

resources/filters_demo_data.sh (1 hunks)
resources/ingest_demo_data.sh (1 hunks)
src/handlers/http/cluster/mod.rs (2 hunks)
src/handlers/http/demo_data.rs (1 hunks)
src/handlers/http/ingest.rs (2 hunks)
src/handlers/http/mod.rs (1 hunks)
src/handlers/http/modal/ingest_server.rs (1 hunks)
src/handlers/http/modal/query_server.rs (1 hunks)
src/handlers/http/modal/server.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

src/handlers/http/demo_data.rs (1)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.

src/handlers/http/cluster/mod.rs (2)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.

🧬 Code Graph Analysis (2)

src/handlers/http/modal/ingest_server.rs (1)

src/handlers/http/modal/server.rs (2)

get_readiness_factory (692-696)

get_demo_data_webscope (206-208)

src/handlers/http/modal/query_server.rs (1)

src/handlers/http/modal/server.rs (1)

get_demo_data_webscope (206-208)

🪛 Shellcheck (0.10.0)

resources/filters_demo_data.sh

[warning] 44-44: base_timeout appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 100-100: Declare and assign separately to avoid masking return values.

(SC2155)

resources/ingest_demo_data.sh

[warning] 49-49: base_timeout appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 105-105: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: coverage
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc

🔇 Additional comments (15)

src/handlers/http/mod.rs (1)

37-37: LGTM! Module declaration follows existing patterns.

The addition of the demo_data module declaration is consistent with the existing module organization and properly integrates the new functionality.

src/handlers/http/modal/query_server.rs (1)

78-79: Consistent service registration for query server.

The demo data web scope service is properly added to the query server configuration, maintaining the existing service chain pattern.

src/handlers/http/modal/ingest_server.rs (1)

81-82: Excellent consistency across server configurations.

The demo data service registration in the ingest server matches the pattern used in the query server, ensuring consistent API availability across different server modes.

src/handlers/http/modal/server.rs (3)

26-26: Import statement is correctly placed.

The import of get_demo_data from the demo_data module is properly placed with other handler imports.

101-102: Service registration maintains consistency.

The demo data web scope service is appropriately added to the main server configuration, consistent with the pattern used in query and ingest servers.

206-208: Web scope definition aligns with API requirements.

The get_demo_data_webscope() function correctly creates a scope at /demodata with a GET route, matching the API specification in the PR objectives.

src/handlers/http/ingest.rs (2)

496-499: Well-designed error variants for parameter validation.

The new InvalidQueryParameter and MissingQueryParameter error variants provide clear, descriptive error handling for query parameter validation scenarios. The error messages are user-friendly and appropriate.

529-530: Correct HTTP status code mapping.

Both new error variants appropriately map to BAD_REQUEST (400) status code, which is the correct HTTP response for client-side parameter validation errors.

src/handlers/http/cluster/mod.rs (2)

159-213: LGTM! Well-implemented ingestor forwarding function.

The function correctly:

Validates ingestor availability through liveness checks

Uses the first available ingestor for load distribution

Handles authentication and proper URL construction

Provides comprehensive error handling

The design choice to break after finding the first live ingestor is appropriate for this use case.

978-978: Good generalization of the pattern match.

Changing from Ok(None) to Ok(_) appropriately handles all successful cases that don't contain metrics without being overly specific.

resources/filters_demo_data.sh (2)

167-216: Well-structured filter creation with good error handling.

The SQL filter creation logic is solid:

Good variety of realistic filter queries

Proper JSON escaping for quotes

Retry logic with appropriate delays

Clear progress reporting

The filter definitions cover useful monitoring scenarios and will provide good demo data for users.

218-271: Comprehensive saved filter implementation.

The saved filter creation includes proper:

Visible column configuration

Group-by field specification

JSON array construction for columns

Consistent error handling pattern

The filters complement the SQL filters well and provide a complete demo experience.

resources/ingest_demo_data.sh (3)

161-189: Excellent performance optimization with static data pre-computation.

The approach of pre-computing all random data once and cycling through arrays is a smart optimization that will significantly improve ingestion performance:

Eliminates runtime random number generation overhead

Provides consistent, predictable data patterns

Reduces CPU usage during batch generation

Maintains data variety through array cycling

This is a well-engineered solution for high-throughput demo data generation.

192-263: Robust batch generation with good validation.

The generate_batch function includes:

Input validation for batch size and array initialization

Efficient array cycling with modulo operations

Proper JSON escaping for user agent strings

Severity mapping based on HTTP status codes

Size validation of generated output

The single-line JSON generation approach is efficient while maintaining readability.

312-377: Well-implemented main execution loop with comprehensive monitoring.

The execution flow provides:

Progress tracking with rate calculations

Proper error handling and early exit on failures

Batch size adjustment for remaining records

Performance statistics and optimization details

Appropriate delays to prevent server overload

The final performance tips output is a nice touch for users understanding the optimizations.

src/handlers/http/demo_data.rs

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

src/handlers/http/demo_data.rs (2)
33-90: LGTM with remaining security considerations.

The handler implementation correctly validates query parameters, uses appropriate async execution patterns, and returns HTTP 202 for fire-and-forget operations. However, there are still security concerns that need to be addressed.

The implementation addresses most requirements but still has the credential exposure issue mentioned in previous reviews. Consider implementing credential passing via secure temporary files instead of environment variables.

92-151: Improve security by avoiding credential exposure via environment variables.

The script execution function has good error handling and temporary file management, but still passes sensitive credentials through environment variables (lines 125-127), which can be visible in process lists.

Consider this more secure approach for passing credentials:
async fn execute_demo_script(
    action: &str,
    url: &str,
    username: &str,
    password: &str,
) -> Result<(), anyhow::Error> {
+   // Create secure temporary file for credentials
+   let cred_file = tempfile::NamedTempFile::new()
+       .map_err(|e| anyhow::anyhow!("Failed to create credentials file: {}", e))?;
+   
+   let cred_content = format!("P_URL={}\nP_USERNAME={}\nP_PASSWORD={}", url, username, password);
+   fs::write(cred_file.path(), cred_content)
+       .map_err(|e| anyhow::anyhow!("Failed to write credentials: {}", e))?;
+   
+   #[cfg(unix)]
+   {
+       let mut permissions = fs::metadata(cred_file.path())?.permissions();
+       permissions.set_mode(0o600); // Read/write for owner only
+       fs::set_permissions(cred_file.path(), permissions)?;
+   }

    // ... existing temp file creation code ...

    let output = Command::new("bash")
        .arg(temp_path)
-       .env("P_URL", url)
-       .env("P_USERNAME", username)
-       .env("P_PASSWORD", password)
+       .env("CRED_FILE", cred_file.path())
        .env("DEMO_ACTION", action)
        .output()
        // ... rest unchanged
Then update the scripts to source credentials from the secure file instead of environment variables.
resources/ingest_demo_data.sh (1)

15-95: Duplicate code detected - consider shared utilities.

This curl_with_retry function is identical to the one in resources/filters_demo_data.sh. Consider extracting it to a shared utility script to follow DRY principles.

Refer to the refactoring suggestion in the filters script review for creating a shared common_utils.sh file.

🧹 Nitpick comments (2)

resources/filters_demo_data.sh (1)
13-93: Consider extracting shared utility functions.

The curl_with_retry function is duplicated between this script and resources/ingest_demo_data.sh. Consider creating a shared utility script to reduce code duplication.

Create a shared utility file resources/common_utils.sh:
#!/usr/bin/env bash
# Common utility functions for demo scripts

curl_with_retry() {
    # ... move the current implementation here
}

# Other shared functions
Then source it in both scripts:
#!/usr/bin/env bash
+source "$(dirname "$0")/common_utils.sh"

# Configuration
P_URL=${P_URL:-"http://localhost:8000"}
# ... rest of script
resources/ingest_demo_data.sh (1)
208-210: Consider utilizing computed metrics for monitoring.

The static analysis correctly identifies that progress, rate, and FINAL_RATE variables are computed but not used. While they're not currently utilized, they provide valuable monitoring data.

Consider adding optional verbose output to utilize these metrics:
# Progress tracking
progress=$((RECORDS_SENT * 100 / TARGET_RECORDS))
elapsed=$(($(date +%s) - START_TIME))
rate=$((RECORDS_SENT / (elapsed == 0 ? 1 : elapsed)))

+# Optional progress output (uncomment for verbose mode)
+# echo "Progress: ${progress}% (${RECORDS_SENT}/${TARGET_RECORDS}) - Rate: ${rate} records/sec"
And at the end:
# Final statistics
TOTAL_TIME=$(($(date +%s) - START_TIME))
FINAL_RATE=$((TARGET_RECORDS / (TOTAL_TIME == 0 ? 1 : TOTAL_TIME)))

+# echo "Ingestion completed: ${TARGET_RECORDS} records in ${TOTAL_TIME}s (${FINAL_RATE} records/sec)"
Also applies to: 230-230

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecbf80d and a7387c1.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

Cargo.toml (1 hunks)
resources/filters_demo_data.sh (1 hunks)
resources/ingest_demo_data.sh (1 hunks)
src/handlers/http/cluster/mod.rs (2 hunks)
src/handlers/http/demo_data.rs (1 hunks)
src/handlers/http/ingest.rs (2 hunks)
src/handlers/http/mod.rs (1 hunks)
src/handlers/http/modal/ingest_server.rs (1 hunks)
src/handlers/http/modal/query_server.rs (1 hunks)
src/handlers/http/modal/server.rs (3 hunks)

✅ Files skipped from review due to trivial changes (2)

Cargo.toml
src/handlers/http/modal/query_server.rs

🚧 Files skipped from review as they are similar to previous changes (5)

src/handlers/http/modal/ingest_server.rs
src/handlers/http/mod.rs
src/handlers/http/ingest.rs
src/handlers/http/modal/server.rs
src/handlers/http/cluster/mod.rs

🧰 Additional context used

🧠 Learnings (1)

src/handlers/http/demo_data.rs (1)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.

🧬 Code Graph Analysis (1)

resources/ingest_demo_data.sh (1)

resources/filters_demo_data.sh (1)

curl_with_retry (13-93)

🪛 Shellcheck (0.10.0)

resources/ingest_demo_data.sh

[warning] 208-208: progress appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 210-210: rate appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 230-230: FINAL_RATE appears unused. Verify use (or export if used externally).

(SC2034)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: coverage

🔇 Additional comments (4)

resources/filters_demo_data.sh (2)

13-93: Excellent retry logic implementation.

The curl_with_retry function has robust error handling, timeout management, and proper cleanup of temporary files. The implementation correctly handles different HTTP status codes and provides incremental timeout increases on retries.

96-176: Well-structured filter creation with proper JSON handling.

Both create_sql_filters and create_saved_filters functions demonstrate good practices:

Proper JSON escaping for user input

Efficient array-based configuration

Rate limiting between requests

Error counting for monitoring

resources/ingest_demo_data.sh (2)

104-111: Efficient synthetic data generation approach.

The pre-computation of static arrays for cycling is an excellent optimization that avoids repeated random generation during batch creation. The use of realistic data patterns (IP addresses, timestamps, trace IDs) makes the demo data valuable for testing.

122-186: Robust batch generation with proper JSON construction.

The generate_batch function demonstrates good practices:

Proper modulo cycling through pre-computed arrays

JSON escaping for user agents

Realistic log entry structure with all required fields

Severity mapping based on HTTP status codes

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

resources/ingest_demo_data.sh (1)

72-74: Address static analysis warning about variable assignment.

The static analysis tool correctly identified a potential issue where the assignment and command substitution could mask return values.

Apply this diff to address the warning:

-            if [[ -n "$response" ]]; then
-                status_code=$(echo "$response" | tail -n1)
-                local response_body=$(echo "$response" | sed '$d')
+            if [[ -n "$response" ]]; then
+                local status_code
+                status_code=$(echo "$response" | tail -n1)
+                local response_body
+                response_body=$(echo "$response" | sed '$d')

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7387c1 and 1ccad5b.

📒 Files selected for processing (2)

resources/ingest_demo_data.sh (1 hunks)
src/handlers/http/demo_data.rs (1 hunks)

🧰 Additional context used

🪛 Shellcheck (0.10.0)

resources/ingest_demo_data.sh

[warning] 74-74: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: coverage
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu

🔇 Additional comments (10)

src/handlers/http/demo_data.rs (3)

33-45: Query parameter validation looks good.

The implementation correctly validates query parameters, handles empty query maps, and extracts the required action parameter with proper error handling.

52-82: Action routing and mode handling implementation is well-structured.

The action-based routing correctly handles different modes:

Ingest/All modes execute scripts locally

Query/Prism modes forward to ingestor services

Proper error handling for unsupported modes

The asynchronous execution pattern with fire-and-forget for script execution is appropriate for this use case.

85-139: Script execution implementation addresses previous security concerns.

The implementation now uses temporary files instead of constructing script paths, which addresses the directory traversal concerns from previous reviews. The error handling is comprehensive, covering file creation, permission setting, script execution, and exit status validation.

The approach of writing the embedded script to a temporary file and making it executable is secure and appropriate.

resources/ingest_demo_data.sh (7)

1-14: Configuration setup is well-structured.

The script provides sensible defaults for all configuration variables and uses proper environment variable handling.

15-105: Robust HTTP retry logic with comprehensive error handling.

The curl_with_retry function implements:

Proper timeout escalation

Temporary file management for large payloads

HTTP status code validation

Clean resource cleanup

The implementation handles both string data and file-based data appropriately.

109-200: Efficient data generation with pre-computed arrays.

The ingestion functionality uses smart optimizations:

Pre-computed arrays for cycling through data

Batch generation with realistic log patterns

Proper JSON escaping and formatting

The approach of generating 100 pre-computed values and cycling through them is efficient for demo data generation.

214-255: Well-implemented ingestion loop with progress tracking.

The main ingestion function provides:

Progress reporting with rate calculations

Proper batch size handling for remaining records

Error handling with early exit on failures

Final statistics reporting

The implementation correctly handles edge cases like remaining records less than batch size.

259-356: Comprehensive filter creation with proper JSON handling.

The filters functionality creates both SQL and saved filters with:

Proper JSON escaping for queries and descriptions

Comprehensive set of useful demo filters

Success tracking and error reporting

Appropriate delays between requests

The filter definitions are well-thought-out and provide good examples for demo purposes.

360-456: Alert creation with proper target management.

The alerts functionality:

Creates webhook targets and extracts IDs correctly

Defines realistic alert conditions

Implements proper error handling and response validation

Uses the extracted target ID for alert association

The alert definitions are practical and demonstrate the alerting capabilities well.

484-520: Clean main execution logic with proper action handling.

The main function provides:

Clear action-based routing

Sequential execution for "all" action with appropriate delays

Help functionality

Proper error handling for unknown actions

The script structure is well-organized and user-friendly.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

resources/ingest_demo_data.sh (1)

79-82: SC2155: avoid masking return values

Declaring and assigning in the same statement can mask errors. Split the declaration:

-                status_code=$(echo "$response" | tail -n1)
-                local response_body=$(echo "$response" | sed '$d')
+                local status_code
+                status_code=$(echo "$response" | tail -n1)
+                local response_body
+                response_body=$(echo "$response" | sed '$d')

Minor, but eliminates the ShellCheck warning and improves readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ccad5b and 1ed1c97.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (1)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

🪛 Shellcheck (0.10.0)

resources/ingest_demo_data.sh

[warning] 81-81: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: coverage
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments

🔇 Additional comments (1)

resources/ingest_demo_data.sh (1)

210-218: send_batch passes a superfluous argument

curl_with_retry (current definition) ignores the 6th positional parameter, so the hard-coded 15 here is dead code and confuses future maintainers:
- curl_with_retry "$P_URL/api/v1/ingest" "POST" "$data" "application/json" 3 15
+ curl_with_retry "$P_URL/api/v1/ingest" "POST" "$data" "application/json" 3
After harmonising the function signature (see earlier comment), adjust all similar calls.

resources/ingest_demo_data.sh

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

resources/ingest_demo_data.sh (3)
16-24: curl_with_retry interface is still under-specified

The 6-th positional parameter (data_file) is now consumed (local data_file="$6"), but neither the header comments nor the call-sites give any hint that such a parameter exists. Every call passes five arguments at most, which is fine, yet the silent extra slot makes misuse easy.
-# Common curl function with retry logic
-curl_with_retry() {
+# Common curl wrapper
+#   $1 url (required)
+#   $2 method (POST|PUT) (required)
+#   $3 data string (optional)
+#   $4 content-type (default: application/json)
+#   $5 max retries (default: 3)
+#   $6 data file path (optional)
+curl_with_retry() {
Consider:

Documenting the full signature as above.

Guarding against callers that supply both $3 and $6.

Using named parameters (e.g. curl_with_retry --url … --data-file …) to make misuse impossible.

77-96: HTTP-code extraction drops the body when it matters

You rely on appending a line break and %{http_code} then stripping it back with tail -n1 / sed '$d'.
Problems:

Bodies that naturally end with a newline will shift the index (status code becomes empty).

Binary responses (even small PNGs from dashboards) may embed newlines, breaking the split.

Prefer the built-in separation:
tmp_body=$(mktemp)
status_code=$(curl "${curl_args[@]}" -o "$tmp_body" -w '%{http_code}')
curl_exit_code=$?

# … use "$status_code" and "$(cat "$tmp_body")"
rm -f "$tmp_body"
This is more robust and removes the need for sed and tail.

831-840: show_usage implies positional argument, but the script uses $ACTION env var

The help text suggests script.sh ingest, yet main() never looks at $1; it only inspects the ACTION environment variable. Update the usage string (or parse $1) to avoid user confusion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ed1c97 and a575504.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (4)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

🪛 Shellcheck (0.10.0)

resources/ingest_demo_data.sh

[warning] 81-81: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: coverage

resources/ingest_demo_data.sh

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

resources/ingest_demo_data.sh (1)
129-134: Use a stronger entropy source for IDs.

$((RANDOM * RANDOM)) yields at most a 30-bit value, so the 32-hex-digit
trace-/span-IDs are heavily zero-padded and far from unique at 10 000
records. Prefer /dev/urandom or uuidgen for realistic IDs.
-        TRACE_IDS+=("$(printf '%032x' $((RANDOM * RANDOM)))")
-        SPAN_IDS+=("$(printf '%016x' $((RANDOM * RANDOM)))")
+        TRACE_IDS+=("$(uuidgen | tr -d '-')")                 # 32-hex chars
+        SPAN_IDS+=("$(xxd -l 8 -p /dev/urandom)")             # 16-hex chars

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdf3f8d and 1ad8d0e.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (4)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

🪛 Shellcheck (0.10.0)

resources/ingest_demo_data.sh

[warning] 84-84: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: coverage
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-apple-darwin

resources/ingest_demo_data.sh

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

resources/ingest_demo_data.sh (3)
128-135: UNIX_NANOS lacks entropy – many records share identical seconds component

date +%s is evaluated once per loop so every 100 generated records have the same epoch-seconds; only the nanosecond suffix varies. For large ingests this causes artificial “spiky” time-series clusters.
-        UNIX_NANOS+=("$(date +%s)$(printf '%09d' $((RANDOM % 1000000000)))")
+        now_s=$(date +%s)
+        UNIX_NANOS+=("$now_s$(printf '%09d' $((RANDOM % 1000000000)))")
(Or offset now_s by the same random seconds you already apply to the RFC-3339 timestamp to keep the two fields consistent.)

195-199: Manual JSON escaping is brittle – use jq -R @json instead

The current two-stage sed chain misses back-slashes, control chars, etc. A safer, one-liner escape:
-        escaped_user_agent=$(echo "$user_agent" | sed 's/"/\\"/g' | sed "s/'/\\'/g")
+        escaped_user_agent=$(printf '%s' "$user_agent" | jq -Rsa @json | sed -e 's/^"//' -e 's/"$//')
This guarantees valid UTF-8 / JSON for any header value.

41-48: Nit: $AUTH_HEADER is constant – switch to -H@"<(printf …)" to avoid arg-length limits

With long Basic-Auth tokens the fixed -H "$AUTH_HEADER" may hit argv size limits on macOS/BSD. Using a process-substitution file avoids quoting issues and is more portable:
curl_args+=( -H@"<(printf 'Authorization: Basic %s' "$BASIC")" )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ad8d0e and 82d23cf.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (4)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

🪛 Shellcheck (0.10.0)

resources/ingest_demo_data.sh

[warning] 85-85: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: coverage
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments

resources/ingest_demo_data.sh

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

resources/ingest_demo_data.sh (1)
41-44: Duplicate -w option – first instance is silently ignored

curl_args adds -w '\n%{http_code}' (line 43) and later adds another
-w "%{http_code}" (line 82). Only the last -w is honored, so the earlier
one is dead code and misleading.
-            -w '\n%{http_code}'
Also applies to: 82-83

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82d23cf and 3953f3c.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (4)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: coverage
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments

resources/ingest_demo_data.sh

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

resources/ingest_demo_data.sh (3)
391-404: Inconsistent use of retry / timeout logic for target creation

create_target falls back to a bare curl -s … call.
Unlike the rest of the script it:

lacks retry/back-off on transient failures,

ignores connect/read time-outs,

omits 2xx status-code validation.

Replacing it with curl_with_retry keeps error-handling behaviour uniform.
-    response=$(curl -s -H "Content-Type: application/json" -H "$AUTH_HEADER" -X POST "$P_URL/api/v1/targets" -d @- << EOF
+    response=$(curl_with_retry "$P_URL/api/v1/targets" "POST" @- "application/json" 3 << EOF
(You may need to extend curl_with_retry to support stdin payloads, or write the JSON to a temp file and pass its path.)

211-215: Manual JSON string escaping is fragile

Using sed to escape quotes in user_agent (and later for SQL/filter JSON) risks subtle breakage for other special characters (backslashes, control bytes, UTF-8).
A safer pattern is to delegate JSON encoding to a tool such as jq:
json_record=$(jq -n --arg body "$body" \
                    --arg time "$unix_nano" \
                    --arg ua "$user_agent" \
'{
  body: $body,
  time_unix_nano: $time,
  user_agent_original: $ua
  # … other fields …
}')
This prevents malformed payloads if input ever contains unexpected characters.

145-151: date invoked 200× inside the outer loop – precompute once

TIMESTAMPS and UNIX_NANOS are generated with separate date invocations for each of the 100 iterations. While acceptable for 10 k records, the cost grows quadratically if you raise TARGET_RECORDS later.

Consider computing base_now=$(date +%s) once and deriving offsets with arithmetic (printf -v ts … $((base_now-rand))) to remove the per-iteration process spawn.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3953f3c and 39077f1.

📒 Files selected for processing (1)

resources/ingest_demo_data.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (4)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.509Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: coverage

resources/ingest_demo_data.sh

API GET /api/v1/demodata?action=ingest to ingest 10k records from the script GET /api/v1/demodata?action=filters to create saved sqls and saved filters from the script

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

resources/ingest_demo_data.sh (3)
1-2: Enable strict Bash modes for more predictable failure handling

Adding set -euo pipefail (or at least set -e and set -o pipefail) right after the she-bang ensures the script aborts on the first failed command, on undefined variables, and on pipeline errors – preventing silent mis-ingestion when e.g. curl fails inside a subshell.
 #!/usr/bin/env bash
+set -euo pipefail
20-33: curl_with_retry – document/validate the 6-argument contract

The function now expects an optional sixth data_file argument, but call-sites (e.g. send_batch, create_alerts) still pass five. That’s fine, yet without a comment or an if (( $# < 5 || $# > 6 )) guard, future maintainers may unintentionally pass seven+ params and silently shift $data_file.

Consider adding a short doc-string or an argument-count check to make the interface self-evident.

397-405: Gracefully handle missing jq dependency

The script now relies on jq to extract IDs, but there’s no pre-flight check. On machines without jq, create_target (and dashboard creation) will fail with hard-to-read errors.
command -v jq >/dev/null 2>&1 || {
  log_error "jq is required but not installed – aborting"; return 1;
}
Insert this check once near the top (e.g. after the config section) to fail fast and inform users what they’re missing.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39077f1 and b0b5b2c.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

Cargo.toml (1 hunks)
resources/ingest_demo_data.sh (1 hunks)
src/handlers/http/cluster/mod.rs (2 hunks)
src/handlers/http/demo_data.rs (1 hunks)
src/handlers/http/ingest.rs (2 hunks)
src/handlers/http/mod.rs (1 hunks)
src/handlers/http/modal/ingest_server.rs (1 hunks)
src/handlers/http/modal/query_server.rs (1 hunks)
src/handlers/http/modal/server.rs (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (8)

Cargo.toml
src/handlers/http/modal/ingest_server.rs
src/handlers/http/modal/query_server.rs
src/handlers/http/mod.rs
src/handlers/http/ingest.rs
src/handlers/http/modal/server.rs
src/handlers/http/demo_data.rs
src/handlers/http/cluster/mod.rs

🧰 Additional context used

🧠 Learnings (1)

resources/ingest_demo_data.sh (5)

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.509Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: coverage
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Kafka aarch64-apple-darwin

resources/ingest_demo_data.sh

coderabbitai bot requested changes Jul 10, 2025

View reviewed changes

src/handlers/http/demo_data.rs Outdated Show resolved Hide resolved

src/handlers/http/demo_data.rs Outdated Show resolved Hide resolved

nikhilsinhaparseable force-pushed the demo-data branch from ecbf80d to a7387c1 Compare July 11, 2025 06:34

coderabbitai bot reviewed Jul 11, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jul 11, 2025

View reviewed changes

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 1ccad5b July 12, 2025 07:58

coderabbitai bot reviewed Jul 12, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jul 12, 2025

View reviewed changes

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 1ed1c97 July 13, 2025 18:16

coderabbitai bot requested changes Jul 13, 2025

View reviewed changes

resources/ingest_demo_data.sh Show resolved Hide resolved

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

coderabbitai bot requested changes Jul 14, 2025

View reviewed changes

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

coderabbitai bot requested changes Jul 14, 2025

View reviewed changes

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

resources/ingest_demo_data.sh Show resolved Hide resolved

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

nikhilsinhaparseable added the for next release label Jul 14, 2025

coderabbitai bot requested changes Jul 14, 2025

View reviewed changes

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

coderabbitai bot approved these changes Jul 14, 2025

View reviewed changes

coderabbitai bot requested changes Jul 14, 2025

View reviewed changes

resources/ingest_demo_data.sh Show resolved Hide resolved

resources/ingest_demo_data.sh Show resolved Hide resolved

coderabbitai bot previously approved these changes Jul 14, 2025

View reviewed changes

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 39077f1 July 14, 2025 11:34

nikhilsinhaparseable force-pushed the demo-data branch from 3953f3c to 39077f1 Compare July 14, 2025 11:34

coderabbitai bot requested changes Jul 14, 2025

View reviewed changes

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

nikhilsinhaparseable added 10 commits July 14, 2025 07:53

feat: add api for preparing demo data

9adde9a

API GET /api/v1/demodata?action=ingest to ingest 10k records from the script GET /api/v1/demodata?action=filters to create saved sqls and saved filters from the script

update script and execution design

ca13841

add alerts to demo

2982086

add alerts, merge script

af19000

add dashboards to the script

f8db961

refactor script

5ab2c5e

add dashboards action

ce5f193

curl command as array of arguments

db9dd5b

refactor demo script

2a78107

capture response body and status separately

01ccb3f

nikhilsinhaparseable added 2 commits July 14, 2025 07:53

add function for logging error

ccf4913

coderabbit suggestions

b0b5b2c

nikhilsinhaparseable force-pushed the demo-data branch from 39077f1 to b0b5b2c Compare July 14, 2025 18:10

coderabbitai bot requested changes Jul 14, 2025

View reviewed changes

resources/ingest_demo_data.sh Outdated Show resolved Hide resolved

strip newline to status

898ce0d

coderabbitai bot approved these changes Jul 14, 2025

View reviewed changes

nitisht merged commit 359995e into parseablehq:main Jul 15, 2025
13 of 14 checks passed

coderabbitai bot mentioned this pull request Jul 29, 2025

Protobuf ingestion #1391

Merged

Uh oh!

feat: add api for preparing demo data #1370

feat: add api for preparing demo data #1370

Uh oh!

Conversation

nikhilsinhaparseable commented Jul 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikhilsinhaparseable commented Jul 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)