Skip to content

Conversation

@lindesvard
Copy link
Contributor

@lindesvard lindesvard commented Oct 23, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added "Sessions" column to real-time analytics dashboards displaying unique session counts.
    • Implemented persistent column visibility settings for event tables.
  • Performance

    • Added multi-layer caching (in-memory and Redis) for faster data retrieval.
    • Optimized event queue processing with sharding for improved throughput.
  • Bug Fixes

    • Enhanced bot detection accuracy with improved pattern matching.
    • Fixed timestamp handling consistency across event tracking.
  • Dependencies

    • Updated core dependencies including Fastify and queue management libraries for stability and security.

@vercel
Copy link

vercel bot commented Oct 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
openpanel-public Error Error Nov 3, 2025 2:29pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

Walkthrough

This PR introduces shard-based event queue architecture replacing a single global queue with multiple sharded queues, refactors event buffering to use simplified Redis-list logic with Lua scripts, adds two-tier LRU caching to reduce Redis pressure, removes cached profile accessor exports in favor of direct database access with implicit Redis caching, enhances ClickHouse client configuration with extended timeouts and insert settings, adds UI column-visibility persistence, optimizes bot detection via precompiled patterns, upgrades dependencies, and provides migration tooling for legacy queue structures.

Changes

Cohort / File(s) Summary
Queue Architecture & Sharding
packages/queue/src/queues.ts, packages/queue/package.json
Introduces shard-based eventsGroupQueues (replacing single eventsGroupQueue) with pickShard(projectId) routing, SHA-1 hashing, per-shard namespaces, orderingDelayMs, and autoBatch configuration; exposes getEventsGroupQueueShard(groupId) helper; updates payload types to include uaInfo discriminated union, numeric/string timestamp, headers, and device IDs; upgrades bullmq and groupmq dependencies.
Worker Boot & Lifecycle
apps/worker/src/boot-workers.ts, apps/worker/src/boot-cron.ts, apps/worker/src/index.ts, apps/worker/src/utils/graceful-shutdown.ts, apps/worker/package.json
Migrates from central eventsGroupWorker to per-shard workers driven by ENABLED_QUEUES environment variable with per-queue concurrency; adds getEnabledQueues and getConcurrencyFor utilities; replaces fixedDelay cronQueue.add with lock-guarded upsertJobScheduler flow; updates Bull Board to use multiple BullBoardGroupMQAdapter instances; closes all eventsGroupQueues in graceful shutdown; upgrades bullmq and groupmq versions.
Event Buffering Refactor
packages/db/src/buffers/event-buffer.ts, packages/db/src/buffers/event-buffer.test.ts
Replaces Lua-script-heavy multi-key buffering with simplified single-queue (event_buffer:queue) model; introduces cached Lua script loading (scriptShas) via evalScript helper; adds addScreenView/addSessionEnd scripted handlers; implements processBuffer() with single queue fetch, ClickHouse chunk writes, and cleanup; exposes public getLastScreenView(sessionId
Session & Bot Buffering
packages/db/src/buffers/session-buffer.ts, packages/db/src/buffers/bot-buffer.ts
Introduces SESSION_BUFFER_CHUNK_SIZE environment variable for configurable chunking; downgrades processBuffer logging from info to debug; no functional changes to buffering logic.
Base Buffer & Processing
packages/db/src/buffers/base-buffer.ts
Adds enableParallelProcessing boolean option to toggle lock-free parallel flush mode; when enabled, skips lock acquisition and logs at debug level; when disabled, retains lock-based processing with debug-level logging and explicit counter reset on error.
LRU Caching Infrastructure
packages/redis/cachable.ts, packages/redis/package.json, packages/redis/cachable.test.ts, packages/common/package.json, packages/geo/package.json
Adds global LRU cache (globalLruCache) with getCache useLruCache flag for L1 hits before Redis (L2); extends cacheable overloads with per-function lruCache?: boolean option; introduces clearGlobalLruCache(key?) and getGlobalLruCacheStats() helpers; adds lru-cache ^11.2.2 dependency to redis and common packages.
User-Agent & Geo Caching
packages/common/server/parser-user-agent.ts, packages/geo/src/geo.ts
Adds LRU-based caching for UAParser results; pre-compiles device/OS detection regex constants (IPAD_OS_VERSION_REGEX, SAMSUNG_MOBILE_REGEX, etc.); exports new UserAgentResult type; geo service adds in-memory LRU cache (max 1000, 5-min TTL) with early returns on hits.
Profile & Client Service Caching
packages/db/src/services/profile.service.ts, packages/db/src/services/clients.service.ts
Removes exported getProfileByIdCached entirely; getProfileById now reads fetchFromCache() before querying DB; removes in-function cache write during upsert; updates getClientByIdCached wrapper to pass true flag to cacheable.
Database Service Changes
packages/db/src/services/salt.service.ts, packages/db/src/services/event.service.ts, packages/db/src/services/notification.service.ts
salt.service: wraps getSalts with cacheable('op:salt', ..., 60*10, true); event.service: replaces organization-driven date interval with configurable dateIntervalInDays (default 0.5), switches to getProfileById (non-cached), removes console.log, removes getOrganizationByProjectIdCached import; notification.service: replaces getProfileByIdCached with getProfileById.
ClickHouse Client Configuration
packages/db/src/clickhouse/client.ts, packages/db/package.json
Adds events_imports to TABLE_NAMES; increases request_timeout (60s → 300s) and idle_socket_ttl (8s → 60s) in CLICKHOUSE_OPTIONS; wraps insert method with retry logic and injects comprehensive clickhouse_settings (async insert, larger blocks, progress headers, parallel parsing, etc.); upgrades @clickhouse/client from ^1.2.0 to ^1.12.1.
API Track & Event Controllers
apps/api/src/controllers/track.controller.ts, apps/api/src/controllers/event.controller.ts, apps/api/src/hooks/duplicate.hook.ts
track: extracts identity/__timestamp/__ip from payload.properties; uses numeric timestamps; generates jobId from request fields; enqueues via getEventsGroupQueueShard(groupId); removes duplicate checks; adds uaInfo to payload; event: replaces eventsGroupQueue with getEventsGroupQueueShard(groupId); removes checkDuplicatedEvent; adds jobId and uaInfo to payload. duplicate.hook: new Fastify hook detecting duplicates via isDuplicatedEvent, returns 200 "Duplicate event" on match; registered as preValidation hook.
API Profile & Live Controllers
apps/api/src/controllers/profile.controller.ts, apps/api/src/controllers/live.controller.ts
profile: removes deduplication checks and related imports; uses payload variable consistently; spreads payload into upsertProfile with id from payload.profileId; removes conditional geo merging; live: replaces getProfileByIdCached import with getProfileById call.
API Authentication & Hooks
apps/api/src/utils/auth.ts, apps/api/src/hooks/ip.hook.ts, apps/api/src/utils/deduplicate.ts, apps/api/src/index.ts, apps/api/src/routes/event.router.ts
auth.ts: caches validateSdkRequest client secret verification with getCache (5-min TTL); ip.hook: removes multi-line type imports, retains FastifyRequest; deduplicate.ts: adds ip/origin parameters to isDuplicatedEvent hash; removes checkDuplicatedEvent export; index.ts: removes fixHook import/registration; event.router: registers duplicateHook as preValidation hook; removes fixHook.
TRPC Routers
packages/trpc/src/routers/event.ts, packages/trpc/src/routers/profile.ts, packages/trpc/src/routers/chart.ts, packages/trpc/src/routers/realtime.ts
event.ts: adds columnVisibility input field (z.record); extends select with dynamic fields based columnVisibility; disables projectId; profile.ts: replaces getProfileByIdCached with getProfileById; chart.ts: removes duration from properties list; realtime.ts: adds unique_sessions COUNT(DISTINCT session_id) to paths/referrals/geo queries.
UI Column Visibility & Data Table
apps/start/src/components/ui/data-table/data-table-hooks.tsx, apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx, apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx, apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx, apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx
Introduces useReadColumnVisibility('events') hook returning persisted column visibility from localStorage; integrates columnVisibility into events/conversions queries; gates queries with enabled: columnVisibility !== null.
Realtime UI Components
apps/start/src/components/realtime/realtime-geo.tsx, apps/start/src/components/realtime/realtime-paths.tsx, apps/start/src/components/realtime/realtime-referrals.tsx
Reduces Events column width (84px → 60px); adds Sessions column displaying item.unique_sessions in short form (right-aligned, ~82px width).
Bot Detection & Scripts
apps/api/src/bots/index.ts, apps/api/scripts/get-bots.ts
Bot detection: pre-compiles regex patterns into compiledRegex; derives regexBots and includesBots; adds fast-path Mozilla/5.0 check for early exit; implements two-phase matching (includes-based fast path, regex-based fallback); get-bots script: adds transformBots() to preserve regex on special characters or convert to includes; outputs JSON array.
Incoming Event & Session Job Processing
apps/worker/src/jobs/events.incoming-event.ts, apps/worker/src/jobs/events.create-session-end.ts
incoming-event: increments queue:counter via getRedisCache(); destructures uaInfo from payload, falls back to parseUserAgent; create-session-end: replaces getTime with convertClickhouseDateToJs; awaits handleSessionEndNotifications; downgrades logging to debug.
Worker Utils & Metrics
apps/worker/src/utils/session-handler.ts, apps/worker/src/metrics.ts
session-handler: downgrades job-state log from info to debug; metrics.ts: spreads eventsGroupQueues into queues array; adds events_group_job_duration_ms histogram; updates per-queue gauges to use brace-stripped queue names.
Cron & Job Processing
apps/worker/src/jobs/cron.delete-projects.ts
Changes ClickHouse lightweight_deletes_sync from numeric 0 to string '0' in payload.
Migration & Cleanup Scripts
apps/worker/scripts/cleanup-old-event-buffer-keys.ts, apps/worker/scripts/migrate-delayed-jobs.ts
cleanup-script: new module migrating legacy event_buffer Redis keys (sessions_sorted, ready_sessions, session lists, regular_queue) to new queue architecture; migrate-script: new module migrating delayed jobs from old to new BullMQ queue names with dry-run support.
Logging & Configuration
packages/logger/index.ts, apps/api/src/controllers/misc.controller.ts
logger: derives silent from LOG_SILENT env; combines LOG_PREFIX + name + NODE_ENV for service string; misc.controller: downgrades ICO/SVG/OG image path logging from info to debug.
Dependency Management
apps/start/package.json, apps/api/package.json
Removes @clickhouse/client from apps/start; upgrades @fastify/* packages (^8.0.1 → ^8.1.0, etc.) and fastify (^5.2.1 → ^5.6.1) in apps/api.
Docker & Environment
docker-compose.yml
Changes version from '3' to "3"; adds new op-df (Dragonfly) service with cluster_mode=emulated; updates op-kv command quoting style.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Controller<br/>(track/event)
    participant QueueRouter as Queue Router<br/>(getEventsGroupQueueShard)
    participant ShardQueue as Shard Queue<br/>(GroupQueue)
    participant Worker as Worker<br/>(per-shard)
    participant Redis as Redis<br/>(event_buffer:queue)
    participant ClickHouse as ClickHouse

    rect rgb(200, 220, 255)
        note right of Client: Event Ingestion & Sharding
        Client->>API: POST /track (payload + projectId)
        API->>API: Extract identity, timestamp, IP<br/>from payload
        API->>API: Generate jobId
        API->>QueueRouter: getEventsGroupQueueShard(groupId)
        QueueRouter->>QueueRouter: pickShard(projectId)<br/>SHA-1 hash mod
        QueueRouter-->>API: return ShardQueue instance
        API->>ShardQueue: add(jobId, payload)
        ShardQueue->>Redis: enqueue to<br/>shard-specific queue
        ShardQueue-->>API: return jobId
    end

    rect rgb(220, 255, 220)
        note right of Worker: Event Processing & Buffering
        Worker->>ShardQueue: fetch job from shard queue
        ShardQueue->>Redis: dequeue job
        Worker->>Redis: incomingEventPure handler
        Worker->>Redis: add event to event_buffer:queue<br/>(via Lua or direct)
        Worker->>Redis: increment queue:counter
        Worker-->>ShardQueue: job completed
    end

    rect rgb(255, 240, 200)
        note right of Worker: Buffer Flush & Persistence
        Worker->>Redis: processBuffer()
        Redis->>Redis: fetch from event_buffer:queue
        Worker->>Worker: parse & sort events
        Worker->>ClickHouse: insert events in chunks<br/>(with enhanced settings)
        ClickHouse-->>Worker: insert success
        Worker->>Redis: publish "saved" event
        Worker->>Redis: cleanup queue portion
    end
Loading
sequenceDiagram
    participant Component as React Component
    participant Hook as useReadColumnVisibility
    participant LocalStorage as localStorage
    participant TRPC as TRPC Query

    rect rgb(200, 220, 255)
        note right of Component: Column Visibility Flow
        Component->>+Hook: useReadColumnVisibility('events')
        Hook->>+LocalStorage: read @op:events-column-visibility
        LocalStorage-->>-Hook: visibility object or null
        Hook-->>-Component: return visibility state
    end

    rect rgb(220, 255, 220)
        note right of Component: Query with Visibility
        Component->>+TRPC: trpc.event.events.infiniteQuery<br/>({ columnVisibility, ... })
        TRPC->>TRPC: dynamic select fields<br/>based on columnVisibility
        TRPC->>TRPC: exclude projectId (false)
        TRPC-->>-Component: return filtered results
    end
Loading
sequenceDiagram
    participant Code as App Code
    participant GetCache as getCache()
    participant L1 as LRU L1 Cache<br/>(globalLruCache)
    participant L2 as Redis L2<br/>(setex)
    participant Fn as Async Function

    rect rgb(200, 220, 255)
        note right of Code: Cache Hit (L1)
        Code->>+GetCache: getCache(key, ttl, fn, useLruCache=true)
        GetCache->>+L1: check key
        L1-->>-GetCache: found cached value
        GetCache-->>-Code: return cached value
    end

    rect rgb(220, 255, 220)
        note right of Code: Cache Hit (L2)
        Code->>+GetCache: getCache(key, ttl, fn, useLruCache=true)
        GetCache->>+L1: check key
        L1-->>-GetCache: miss
        GetCache->>+L2: get key from Redis
        L2-->>-GetCache: found value
        GetCache->>+L1: store in L1
        L1-->>-GetCache: stored
        GetCache-->>-Code: return value
    end

    rect rgb(255, 240, 200)
        note right of Code: Cache Miss (Both)
        Code->>+GetCache: getCache(key, ttl, fn, useLruCache=true)
        GetCache->>L1: check key (miss)
        GetCache->>L2: get key (miss)
        GetCache->>+Fn: execute async function
        Fn-->>-GetCache: return result
        GetCache->>+L1: store result
        L1-->>-GetCache: stored
        GetCache->>+L2: async setex (fire & forget)
        L2-->>-GetCache: queued
        GetCache-->>-Code: return result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas requiring extra attention:

  • Shard-based queue architecture (packages/queue/src/queues.ts, apps/worker/src/boot-workers.ts): New sharding logic with SHA-1 hashing, per-shard configuration, and concurrent worker startup across multiple queues requires verification of routing consistency and worker lifecycle management.
  • Event buffer Lua script refactoring (packages/db/src/buffers/event-buffer.ts): Transition from complex multi-key logic to simplified single-queue model with cached script loading; verify correctness of script execution paths and session-end semantics.
  • LRU caching tier addition (packages/redis/cachable.ts, packages/common/server/parser-user-agent.ts): New two-tier cache (L1 in-memory + L2 Redis) introduces memory overhead and requires testing of cache coherency, TTL behavior, and eviction logic.
  • Profile caching strategy inversion (packages/db/src/services/profile.service.ts, packages/db/src/services/event.service.ts, apps/api/src/controllers/live.controller.ts): Removal of explicitly cached accessor in favor of implicit cache-on-read within getProfileById; verify that all call sites receive correct cached/uncached behavior and that deduplication logic removal is safe.
  • Duplicate detection refactoring (apps/api/src/hooks/duplicate.hook.ts, apps/api/src/utils/deduplicate.ts): Movement from controller-level checks to preValidation hook with IP/origin-based hashing; ensure hook execution order and early-return semantics don't bypass security.
  • TRPC column visibility integration (packages/trpc/src/routers/event.ts, apps/start/src/routes/...): Dynamic SQL field selection based on UI column visibility state; verify that columnVisibility fallbacks (→ {}) work correctly and that field projection logic handles edge cases.

Possibly related PRs

  • feat: use groupmq instead of bullmq for incoming events #206: Implements shard-based group queues with groupmq integration, replacing single eventsGroupQueue with per-shard queues — directly extends the queue architecture refactor in this PR.
  • fix: read-after-write issues #215: Modifies profile/session access patterns and caching strategies, adding session-context exports and replacing cached accessor exports — aligns with the removal of getProfileByIdCached and shift to implicit Redis caching in this PR.

Poem

🐰 Queues now shard and split in two,
Redis caches bloom like morning dew,
Buffers simplified, scripts cached with care,
Column visibility floats through the air,
Bot patterns swift, profiles lean and true! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The pull request title 'Feature/performance' is vague and generic, using a branch naming convention rather than describing the actual changes made. It does not convey meaningful information about what the changeset accomplishes. The PR contains substantial changes across multiple systems including event buffering refactoring, caching mechanisms, queue sharding, bot detection optimization, and more—none of which are captured by this generic label. Revise the title to be more specific and descriptive of the main changes. Consider a title like 'Refactor event buffering with simplified Redis queue and add LRU caching layer' or 'Optimize performance with queue sharding, LRU caching, and bot detection improvements' that clearly summarizes the primary improvements made in this changeset.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/performance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/db/src/clickhouse/client.ts (1)

171-186: Fix int parsing: zero-value bug and 64‑bit precision loss.

  • item[key] && … skips 0; zeroes remain strings.
  • Parsing Int64/UInt64 into Number loses precision.

Parse only safe ints/floats and keep 64‑bit ints as strings.

Apply:

-    data: json.data.map((item) => {
-      return keys.reduce((acc, key) => {
-        const meta = json.meta?.find((m) => m.name === key);
-        return {
-          ...acc,
-          [key]:
-            item[key] && meta?.type.includes('Int')
-              ? Number.parseFloat(item[key] as string)
-              : item[key],
-        };
-      }, {} as T);
-    }),
+    data: json.data.map((item) => {
+      return keys.reduce((acc, key) => {
+        const meta = json.meta?.find((m) => m.name === key);
+        const v = (item as any)[key];
+        const t = meta?.type ?? '';
+        const isDefined = v !== null && v !== undefined;
+        const isFloat = t.includes('Float');
+        const isSafeInt = t.includes('Int') && !t.includes('64');
+        return {
+          ...acc,
+          [key]: isDefined && (isFloat || isSafeInt)
+            ? (isFloat ? Number.parseFloat(String(v)) : Number.parseInt(String(v), 10))
+            : v,
+        };
+      }, {} as T);
+    }),
packages/db/src/buffers/profile-buffer.ts (2)

184-195: Filter JSON parse failures before ClickHouse insert.

getSafeJson can return null; passing nulls to JSONEachRow will fail.

-      const parsedProfiles = profiles.map((p) =>
-        getSafeJson<IClickhouseProfile>(p),
-      );
+      const parsedProfiles = profiles
+        .map((p) => getSafeJson<IClickhouseProfile>(p))
+        .filter((p): p is IClickhouseProfile => p !== null);

153-167: Escape identifiers in SELECT to ClickHouse to avoid injection.

Profile IDs are interpolated unescaped. Use sqlstring.escape (consistent with toDate usage elsewhere).

+    // Avoid string interpolation; escape values
+    const pid = sqlstring.escape(profile.project_id);
+    const id = sqlstring.escape(profile.id);
     const result = await chQuery<IClickhouseProfile>(
-      `SELECT *
-       FROM ${TABLE_NAMES.profiles}
-       WHERE project_id = '${profile.project_id}'
-         AND id = '${profile.id}'
-       ORDER BY created_at DESC
-       LIMIT 1`,
+      `SELECT *
+       FROM ${TABLE_NAMES.profiles}
+       WHERE project_id = ${pid}
+         AND id = ${id}
+       ORDER BY created_at DESC
+       LIMIT 1`,
     );

And import:

+import sqlstring from 'sqlstring';
🧹 Nitpick comments (12)
.github/workflows/docker-build.yml (3)

148-164: Rolling tags strategy: consider implications for deployments.

The workflow implements a "rolling tag" strategy where each component tag (api, worker, dashboard) is force-pushed to always point to the latest commit on main. While this is valid for CI/CD, ensure downstream deployment systems are aware that these tags are mutable and will be updated on every main branch build.

If any deployment or release process depends on tag immutability, consider using version-based tags (e.g., api-v1.2.3) in addition to or instead of these rolling tags.

Also applies to: 209-225, 270-285


148-164: Consider extracting duplicated tag logic into a reusable workflow.

The tag creation logic is identical across all three jobs, differing only in the tag name. To reduce maintenance burden and improve clarity, consider creating a reusable GitHub Actions workflow or composite action for tag creation.

Example:

# .github/workflows/create-tag.yml (reusable workflow)
on:
  workflow_call:
    inputs:
      tag_name:
        required: true
        type: string
      
jobs:
  create-tag:
    # ... tag creation logic here

Then call it from each job:

- uses: ./.github/workflows/create-tag.yml
  with:
    tag_name: api

Also applies to: 209-225, 270-285


152-152: Simplify tag existence check.

The grep pattern is redundant since git tag -l already filters by tag name. The condition will always succeed if the tag exists.

Consider simplifying to:

if git rev-parse "api" >/dev/null 2>&1; then
  git tag -d "api"
fi

This more directly checks if the tag exists without redundant piping.

Also applies to: 213-213, 274-274

packages/db/src/services/profile.service.ts (1)

16-16: Remove unused import

getDurationSql is imported but never used in this file.

packages/db/src/clickhouse/client.ts (2)

91-121: Broaden retry conditions and add jittered backoff.

Handle common network error codes and add small jitter to avoid thundering herd.

-      if (
-        error.message.includes('Connect') ||
-        error.message.includes('socket hang up') ||
-        error.message.includes('Timeout error')
-      ) {
-        const delay = baseDelay * 2 ** attempt;
+      const code = (error && (error.code || error.errno)) || '';
+      if (
+        error.message.includes('Connect') ||
+        error.message.includes('socket hang up') ||
+        error.message.includes('Timeout') ||
+        ['ECONNRESET','ETIMEDOUT','EPIPE','ENOTFOUND','ECONNREFUSED'].includes(code)
+      ) {
+        const jitter = Math.floor(Math.random() * 100);
+        const delay = baseDelay * 2 ** attempt + jitter;

5-5: Move type import to root export.

The import path @clickhouse/client/dist/config contradicts the pattern established elsewhere in this file, where other types (ClickHouseSettings, ResponseJSON, ClickHouseLogLevel, Logger) are imported from the root package export. Importing from internal paths introduces risk of breakage during upgrades, as /dist/ paths are not part of the public API contract.

-import type { NodeClickHouseClientConfigOptions } from '@clickhouse/client/dist/config';
+import type { NodeClickHouseClientConfigOptions } from '@clickhouse/client';
packages/db/src/buffers/profile-buffer.ts (2)

18-21: Guard TTL env parsing.

Ensure sane default when env is set but not a number.

-  private ttlInSeconds = process.env.PROFILE_BUFFER_TTL_IN_SECONDS
-    ? Number.parseInt(process.env.PROFILE_BUFFER_TTL_IN_SECONDS, 10)
-    : 60 * 60;
+  private ttlInSeconds = (() => {
+    const v = Number.parseInt(process.env.PROFILE_BUFFER_TTL_IN_SECONDS ?? '', 10);
+    return Number.isFinite(v) ? v : 60 * 60;
+  })();

47-53: Use the class Redis instance for consistency.

Prefer this.redis.exists(cacheKey) over a fresh getRedisCache() call.

-    return (await getRedisCache().exists(cacheKey)) === 1;
+    return (await this.redis.exists(cacheKey)) === 1;
packages/db/src/buffers/event-buffer.test.ts (2)

1-4: Close publisher Redis connection in tests.

processBuffer uses getRedisPub(). Close it to avoid leaked handles.

-import { getRedisCache } from '@openpanel/redis';
+import { getRedisCache, getRedisPub } from '@openpanel/redis';
@@
 afterAll(async () => {
   try {
     await redis.quit();
   } catch {}
+  try {
+    await getRedisPub().quit();
+  } catch {}
 });

Also applies to: 50-54


216-245: Optional: add a CSV mode test.

Set USE_CSV=1, assert format 'CSV' and input_format_csv_skip_first_lines passed. Improves coverage for the alternative path.

packages/db/src/buffers/event-buffer.ts (2)

9-15: Use TABLE_NAMES constant instead of hard‑coding 'events'.

Avoid drift if table names change.

-import { ch } from '../clickhouse/client';
+import { ch, TABLE_NAMES } from '../clickhouse/client';
@@
-          await ch.insert({
-            table: 'events',
+          await ch.insert({
+            table: TABLE_NAMES.events,
             values: chunk,
             format: 'JSONEachRow',
           });

(Apply similarly in the CSV branch.)

Also applies to: 270-297


300-305: Publish with multi: drop awaits inside the loop.

publishEvent(..., pubMulti) queues commands; awaiting each call is unnecessary.

-      for (const event of eventsToClickhouse) {
-        await publishEvent('events', 'saved', transformEvent(event), pubMulti);
-      }
+      for (const event of eventsToClickhouse) {
+        publishEvent('events', 'saved', transformEvent(event), pubMulti);
+      }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcfb4f2 and 78dc299.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (13)
  • .github/workflows/docker-build.yml (4 hunks)
  • apps/start/package.json (0 hunks)
  • apps/worker/src/jobs/cron.delete-projects.ts (1 hunks)
  • packages/db/package.json (1 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (3 hunks)
  • packages/db/src/buffers/event-buffer.ts (5 hunks)
  • packages/db/src/buffers/profile-buffer.ts (2 hunks)
  • packages/db/src/clickhouse/client.ts (2 hunks)
  • packages/db/src/services/event.service.ts (3 hunks)
  • packages/db/src/services/overview.service.ts (7 hunks)
  • packages/db/src/services/profile.service.ts (3 hunks)
  • packages/trpc/src/routers/chart.ts (0 hunks)
  • packages/trpc/src/routers/realtime.ts (4 hunks)
💤 Files with no reviewable changes (2)
  • packages/trpc/src/routers/chart.ts
  • apps/start/package.json
🧰 Additional context used
🧬 Code graph analysis (5)
packages/trpc/src/routers/realtime.ts (1)
packages/db/src/services/event.service.ts (1)
  • getDurationSql (987-988)
packages/db/src/services/profile.service.ts (1)
packages/db/src/clickhouse/client.ts (1)
  • TABLE_NAMES (48-60)
packages/db/src/buffers/event-buffer.test.ts (3)
packages/db/src/buffers/index.ts (1)
  • eventBuffer (6-6)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/db/src/buffers/event-buffer.ts (1)
  • EventBuffer (17-356)
packages/db/src/buffers/event-buffer.ts (5)
packages/redis/redis.ts (3)
  • Redis (9-9)
  • getRedisCache (66-72)
  • getRedisPub (84-90)
packages/db/src/services/event.service.ts (3)
  • IClickhouseEvent (68-102)
  • IServiceEvent (145-182)
  • transformEvent (104-138)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/redis/publisher.ts (1)
  • publishEvent (28-36)
packages/db/src/services/overview.service.ts (2)
packages/db/src/services/chart.service.ts (1)
  • getEventFiltersWhereClause (219-470)
packages/db/src/clickhouse/client.ts (3)
  • TABLE_NAMES (48-60)
  • formatClickhouseDate (208-219)
  • chQuery (201-206)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (9)
.github/workflows/docker-build.yml (2)

107-107: Correct permissions for tag operations.

The contents: write permission is appropriately granted to enable git tag creation and pushing in the post-build steps.

Also applies to: 168-168, 229-229


148-164: Missing git user configuration may cause tag operations to fail.

The workflow uses git tag and git push commands without explicitly setting git user configuration (user.name and user.email). While GitHub Actions typically provides defaults, some environments require these to be explicitly configured before git operations succeed.

Add git configuration before tag operations by adding these lines to each tag step:

 - name: Create/Update API tag
   if: github.ref == 'refs/heads/main'
   run: |
+    git config --local user.name "github-actions[bot]"
+    git config --local user.email "github-actions[bot]@users.noreply.github.com"
     # Delete existing tag if it exists

Please verify that tag operations succeed without this configuration in your GitHub Actions environment. If they fail, apply the above change to all three tag creation steps.

Also applies to: 209-225, 270-285

apps/worker/src/jobs/cron.delete-projects.ts (1)

54-59: The string type for lightweight_deletes_sync is correct and required by ClickHouse client v1.12.1. The client's ClickHouseSettings interface defines lightweight_deletes_sync as a string type, expecting values like "0", "1", "2", or "3". The change from numeric 0 to string '0' is necessary for the upgrade and is properly implemented.

packages/trpc/src/routers/realtime.ts (1)

71-71: Consistent duration calculation via getDurationSql — nice

Uniform duration logic across paths, referrals, and geo. Looks good.

Also applies to: 98-98, 128-128

packages/db/src/services/event.service.ts (2)

480-481: Use computed duration expression — aligned with realtime

Selecting getDurationSql() as duration keeps duration semantics consistent. LGTM.


987-988: Helper export LGTM

Exporting getDurationSql for reuse is clean and keeps SQL consistent.

packages/db/package.json (1)

16-16: I need to check how the client is being instantiated to see what configuration options are used, then verify if breaking changes exist between versions 1.2 and 1.12.

I need to check the actual client configuration in your codebase to validate the compatibility claims.

Based on my verification, the original review comment contains an inaccuracy that needs correction:

Workspace analysis confirms single source of dependency; version jump claim lacks basis

The latest @clickhouse/client version is 1.12.1. Your codebase shows only one version pinned (^1.12.1 in packages/db/package.json), with all usage confined to packages/db/src/ — no version conflicts exist across the workspace. The official ClickHouse JS documentation confirms retry/timeout and compression settings continue to work with patterns like request_timeout: 400_000.

However, the review comment claims a "1.2 → 1.12" jump, but the semver caret (^1.12.1) does not indicate the prior version — it only specifies the lower bound for future installs. Without access to the actual previous version in your PR, this jump cannot be confirmed. If the concern is valid, verify the actual prior version before this update and check the official ClickHouse JS GitHub releases for any breaking changes in that specific range.

packages/db/src/clickhouse/client.ts (1)

64-79: Timeout defaults bump LGTM.

Higher request_timeout and idle_socket_ttl align with larger batch inserts. No issues spotted.

packages/db/src/buffers/event-buffer.ts (1)

240-319: Concurrency: consider a Redis lock to ensure single worker flush.

Multiple workers could read the same head of the list and double-insert. Recommend a short‑TTL SETNX lock around processBuffer.

I can propose a minimal lock using SETNX event-buffer:lock ${pid} EX 30 guarding lrange/insert/ltrim.

Comment on lines 45 to 95
private eventToCsvRow(event: IClickhouseEvent): string {
const escapeCsvValue = (value: any): string => {
if (value === null || value === undefined) return '';
const str = String(value);
// Replace double quotes with single quotes, then escape single quotes by doubling them
const withSingleQuotes = str.replace(/"/g, "'");
return `'${withSingleQuotes.replace(/'/g, "''")}'`;
};

// Order matches the ClickHouse table schema exactly
const columns = [
event.id, // id UUID
event.name, // name
event.sdk_name, // sdk_name
event.sdk_version, // sdk_version
event.device_id, // device_id
event.profile_id, // profile_id
event.project_id, // project_id
event.session_id, // session_id
event.path, // path
event.origin, // origin
event.referrer, // referrer
event.referrer_name, // referrer_name
event.referrer_type, // referrer_type
event.duration, // duration
escapeCsvValue(JSON.stringify(event.properties)), // properties
event.created_at, // created_at
event.country, // country
event.city, // city
event.region, // region
event.longitude, // longitude
event.latitude, // latitude
event.os, // os
event.os_version, // os_version
event.browser, // browser
event.browser_version, // browser_version
event.device, // device
event.brand, // brand
event.model, // model
event.imported_at, // imported_at
];

return columns.join(',');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CSV builder breaks JSON and NULL semantics; fix quoting/escaping.

  • Replacing all " with ' corrupts JSON in properties.
  • NULLs encoded as '' instead of \N (ClickHouse’s default CSV null), causing parse/default issues.

Apply:

-  private eventToCsvRow(event: IClickhouseEvent): string {
-    const escapeCsvValue = (value: any): string => {
-      if (value === null || value === undefined) return '';
-      const str = String(value);
-      // Replace double quotes with single quotes, then escape single quotes by doubling them
-      const withSingleQuotes = str.replace(/"/g, "'");
-      return `'${withSingleQuotes.replace(/'/g, "''")}'`;
-    };
-
-    // Order matches the ClickHouse table schema exactly
-    const columns = [
+  private eventToCsvRow(event: IClickhouseEvent): string {
+    // RFC4180-style escaping with ClickHouse defaults:
+    // - Nulls as \N (unquoted)
+    // - Double quotes escaped by doubling them
+    const escapeCsvField = (value: any): string => {
+      if (value === null || value === undefined) return '\\N';
+      const s = String(value);
+      const needsQuote = /[",\n\r]/.test(s);
+      const escaped = s.replace(/"/g, '""');
+      return needsQuote ? `"${escaped}"` : escaped;
+    };
+
+    // Order must match the ClickHouse table schema
+    const columns = [
       event.id, // id UUID
       event.name, // name
       event.sdk_name, // sdk_name
       event.sdk_version, // sdk_version
       event.device_id, // device_id
       event.profile_id, // profile_id
       event.project_id, // project_id
       event.session_id, // session_id
       event.path, // path
       event.origin, // origin
       event.referrer, // referrer
       event.referrer_name, // referrer_name
       event.referrer_type, // referrer_type
       event.duration, // duration
-      escapeCsvValue(JSON.stringify(event.properties)), // properties
+      JSON.stringify(event.properties ?? {}), // properties (keep valid JSON)
       event.created_at, // created_at
       event.country, // country
       event.city, // city
       event.region, // region
       event.longitude, // longitude
       event.latitude, // latitude
       event.os, // os
       event.os_version, // os_version
       event.browser, // browser
       event.browser_version, // browser_version
       event.device, // device
       event.brand, // brand
       event.model, // model
       event.imported_at, // imported_at
     ];
-
-    return columns.join(',');
+    return columns.map(escapeCsvField).join(',');
   }

If you rely on non-default null handling, set input_format_csv_null_representation explicitly to \N.

Comment on lines 280 to 297
await ch.insert({
table: 'events',
values: csvStream,
format: 'CSV',
clickhouse_settings: {
input_format_csv_skip_first_lines: '1',
format_csv_allow_single_quotes: 1,
format_csv_allow_double_quotes: 1,
},
});
} else {
await ch.insert({
table: 'events',
values: chunk,
format: 'JSONEachRow',
});
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add idempotency via insert_deduplication_token to prevent duplicates on retries.

A failure mid-chunk reprocesses earlier rows, causing duplicates. Use a stable token per chunk.

+import { createHash } from 'node:crypto';
@@
-      for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
+      for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
+        const dedupToken = createHash('sha1')
+          .update(chunk.map((e) => e.id).join(','))
+          .digest('hex');
         if (process.env.USE_CSV === 'true' || process.env.USE_CSV === '1') {
@@
           await ch.insert({
-            table: 'events',
+            table: TABLE_NAMES.events,
             values: csvStream,
             format: 'CSV',
             clickhouse_settings: {
               input_format_csv_skip_first_lines: '1',
-              format_csv_allow_single_quotes: 1,
-              format_csv_allow_double_quotes: 1,
+              format_csv_allow_double_quotes: 1,
+              insert_deduplication_token: dedupToken,
             },
           });
         } else {
           await ch.insert({
-            table: 'events',
+            table: TABLE_NAMES.events,
             values: chunk,
             format: 'JSONEachRow',
+            clickhouse_settings: {
+              insert_deduplication_token: dedupToken,
+            },
           });
         }
       }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/db/src/buffers/event-buffer.ts around lines 280 to 297, the inserts
currently can create duplicate rows when a retry reprocesses earlier rows;
compute a stable per-chunk deduplication token (for example a SHA256 or other
stable hash of the chunk payload plus a batch identifier or chunk index) and
include it as insert_deduplication_token in clickhouse_settings on both the CSV
and JSONEachRow insert calls so retries with the same chunk token become
idempotent; ensure the token is derived deterministically from the chunk
contents (and batch id if available) and add it to the clickhouse_settings
object for both branches.

Comment on lines 99 to 125
private toStartOfInterval(
field: string,
interval: string,
timezone: string,
): string {
const tzPart = timezone ? `, '${timezone}'` : '';
switch (interval) {
case 'hour':
return `toStartOfHour(${field}${tzPart})`;
case 'day':
return `toStartOfDay(${field}${tzPart})`;
case 'week':
// toStartOfWeek(date, mode) - mode is UInt8, NOT timezone
// mode 0 = Sunday, mode 1 = Monday
// For timezone support, we need to convert to timezone first
if (timezone) {
return `toStartOfWeek(toTimeZone(${field}, '${timezone}'), 1)`;
}
return `toStartOfWeek(${field}, 1)`;
case 'month':
return `toStartOfMonth(${field}${tzPart})`;
case 'year':
return `toStartOfYear(${field}${tzPart})`;
default:
return `toStartOfDay(${field}${tzPart})`;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Escape timezone to avoid SQL injection

timezone is directly interpolated into SQL. Escape or whitelist it.

Apply:

-    const tzPart = timezone ? `, '${timezone}'` : '';
+    const tzPart = timezone ? `, ${sqlstring.escape(timezone)}` : '';
...
-        if (timezone) {
-          return `toStartOfWeek(toTimeZone(${field}, '${timezone}'), 1)`;
-        }
+        if (timezone) {
+          return `toStartOfWeek(toTimeZone(${field}, ${sqlstring.escape(timezone)}), 1)`;
+        }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 131 to 155
private buildWhereClause(
type: 'events' | 'sessions',
filters: IChartEventFilter[],
): string {
const mappedFilters = filters.map((item) => {
if (type === 'sessions') {
if (item.name === 'path') {
return { ...item, name: 'entry_path' };
}
if (item.name === 'origin') {
return { ...item, name: 'entry_origin' };
}
if (item.name.startsWith('properties.__query.utm_')) {
return {
...item,
name: item.name.replace('properties.__query.utm_', 'utm_'),
};
}
}
return item;
});

this.pendingQueries.set(key, queryPromise);
return queryPromise;
const where = getEventFiltersWhereClause(mappedFilters);
return Object.values(where).filter(Boolean).join(' AND ');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sessions where-clause may reference non-existent columns

buildWhereClause('sessions', ...) passes filters through getEventFiltersWhereClause. Filters like properties.* or profile.properties.* produce expressions that don't exist on sessions, causing runtime SQL errors.

  • Either map all supported fields to sessions schema (country, city, os, device, utm_, entry_), or drop/guard unsupported property-based filters when type='sessions'.
  • Add unit tests for mixed filters to prevent regressions.
🤖 Prompt for AI Agents
In packages/db/src/services/overview.service.ts around lines 131 to 155,
buildWhereClause forwards event-style filters to getEventFiltersWhereClause even
for type='sessions', causing SQL to reference non-existent session columns;
update the function to either (1) map all incoming filter names to session
column names (e.g., properties.country -> country, properties.city -> city,
properties.os -> os, properties.device -> device, properties.__query.utm_* ->
utm_*, path -> entry_path, origin -> entry_origin, profile.properties.* ->
profile_* or other session equivalents) OR (2) filter out/guard any
property/profile-based filters that have no corresponding session column before
calling getEventFiltersWhereClause; implement the chosen approach consistently
in the mappedFilters step and add unit tests that pass mixed event+session
filters to ensure unsupported filters are dropped or properly translated and
that no invalid SQL is produced.

Comment on lines 246 to 261
SELECT
${startOfInterval} AS date,
round(
countIf(is_bounce = 1 AND sign = 1) * 100.0
/ nullIf(countIf(sign = 1), 0),
2
) AS bounce_rate
FROM ${TABLE_NAMES.sessions} s
WHERE s.project_id = ${sqlstring.escape(projectId)}
AND s.created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})
AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})
AND sign = 1
${sessionsWhere ? `AND ${sessionsWhere}` : ''}
GROUP BY date
WITH ROLLUP
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Wrong alias used for bounce_rate; missing JOIN to bounce_stats

In the page-filter path, SELECT uses ss.bounce_rate but ss is session_stats (no such column). bounce_rate is in bounce_stats; there's no JOIN to it.

Apply:

-        SELECT
-          ${startOfInterval} AS date,
-          any(ss.bounce_rate) AS bounce_rate,
+        SELECT
+          ${startOfInterval} AS date,
+          any(bs.bounce_rate) AS bounce_rate,
...
-        FROM ${TABLE_NAMES.events} e
-        LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date
+        FROM ${TABLE_NAMES.events} e
+        LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date
+        LEFT JOIN bounce_stats bs ON ${startOfInterval} = bs.date

Also applies to: 270-276

🤖 Prompt for AI Agents
In packages/db/src/services/overview.service.ts around lines 246-261 (and
similarly 270-276), the query selects ss.bounce_rate but ss refers to
session_stats and that column is actually in bounce_stats and the query never
joins bounce_stats; update the SQL to JOIN the bounce_stats table (e.g., LEFT
JOIN ${TABLE_NAMES.bounce_stats} bs ON bs.project_id =
${sqlstring.escape(projectId)} AND bs.date = ${startOfInterval} OR otherwise
matching the interval key used for grouping) and replace ss.bounce_rate with
bs.bounce_rate in the SELECT; ensure the JOIN uses the same date/project
matching logic as the rest of the query and that the new alias is used
consistently in both locations mentioned.

Comment on lines 303 to 309
// WITH ROLLUP: First row is the aggregated totals, rest are time series
const rollupRow = res[0];
const series = res.slice(1).map((r) => ({
...r,
date: new Date(r.date).toISOString(),
}));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

WITH ROLLUP ordering assumption is unsafe

Assuming the first row is totals is brittle; rollup rows are not guaranteed first, especially with ORDER BY ... WITH FILL. Identify totals via date IS NULL (or a sentinel) and filter series explicitly.

Example:

-const rollupRow = res[0];
-const series = res.slice(1).map((r) => ({ ...r, date: new Date(r.date).toISOString() }));
+const totals = res.find((r: any) => r.date == null);
+const series = res.filter((r: any) => r.date != null).map((r: any) => ({
+  ...r,
+  date: new Date(r.date).toISOString(),
+}));

Consider changing the result type to date: string | null for accuracy.

Also applies to: 384-390

🤖 Prompt for AI Agents
In packages/db/src/services/overview.service.ts around lines 303-309 (and
similarly at 384-390), the code assumes the rollup/totals row is the first
element of the result which is brittle; instead, detect the totals row by
checking for date === null (or the sentinel used by the DB), remove that row
from the series, and map only the real time-series rows. Change the result
typing to allow date: string | null, and when converting dates do so
conditionally (e.g. if r.date !== null convert to ISO string, otherwise leave
null), ensuring totals are identified via the null-date marker rather than array
position.

Comment on lines 480 to 481
console.log('sql', sql);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove SQL debug logging

console.log('sql', ...) can leak PII and increases log volume. Remove or guard behind a debug flag.

🤖 Prompt for AI Agents
In packages/db/src/services/overview.service.ts around lines 480-481, remove the
raw console.log('sql', sql) debug statement (it can leak PII and inflates logs);
either delete the line or replace it with a guarded debug log using the project
logger at debug level (e.g. logger.debug(...)) or wrap it behind a controlled
debug flag / environment check (NODE_ENV or a DEBUG config) so SQL is only
emitted when debugging is explicitly enabled.

Comment on lines 517 to 542
const sql = `
WITH distinct_sessions AS (
SELECT DISTINCT session_id
FROM ${TABLE_NAMES.events}
WHERE project_id = ${sqlstring.escape(projectId)}
AND created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})
AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})
${eventsWhere ? `AND ${eventsWhere}` : ''}
)
SELECT
${mode}_origin AS origin,
${mode}_path AS path,
round(avg(duration * sign) / 1000, 2) AS avg_duration,
round(sum(sign * is_bounce) * 100.0 / sum(sign), 2) AS bounce_rate,
sum(sign) AS sessions
FROM ${TABLE_NAMES.sessions}
WHERE project_id = ${sqlstring.escape(projectId)}
AND created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})
AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})
AND id IN (SELECT session_id FROM distinct_sessions)
${sessionsWhere ? `AND ${sessionsWhere}` : ''}
GROUP BY ${mode}_origin, ${mode}_path
HAVING sum(sign) > 0
ORDER BY sessions DESC
LIMIT ${limit} OFFSET ${offset}
`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Average duration uses signed values; use avgIf with sign and > 0

round(avg(duration * sign)/1000, 2) skews results on Collapsing/Versioned rows. Use avgIf(duration, duration > 0 AND sign > 0).

Apply in both branches:

-  round(avg(duration * sign) / 1000, 2) AS avg_duration,
+  round(avgIf(duration, duration > 0 AND sign > 0) / 1000, 2) AS avg_duration,

Also applies to: 553-570

🤖 Prompt for AI Agents
In packages/db/src/services/overview.service.ts around lines 517 to 542, the AVG
calculation uses signed values (round(avg(duration * sign)/1000, 2)) which skews
results for Collapsing/Versioned rows; replace that expression with
round(avgIf(duration, duration > 0 AND sign > 0) / 1000, 2) so only positive
durations from positively signed rows are averaged, and make the same change in
the other branch at lines ~553-570 where avg(duration * sign) is used.

Comment on lines 43 to 45
sessionsCount AS (
SELECT count(*) as sessions FROM ${TABLE_NAMES.events} WHERE name = 'session_start' AND profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align session metrics with sessions table and handle Collapsing sign

  • sessionsCount currently counts session_start events; prefer sessions table with sign to avoid drift.
  • sessionDuration should exclude negative/collapsed rows and zero/NULL durations.

Apply:

-    sessionsCount AS (
-      SELECT count(*) as sessions FROM ${TABLE_NAMES.events} WHERE name = 'session_start' AND profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}
-    ),
+    sessionsCount AS (
+      SELECT sumIf(sign, sign = 1) as sessions
+      FROM ${TABLE_NAMES.sessions}
+      WHERE profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}
+    ),
     sessionDuration AS (
       SELECT 
-        avg(duration) / 1000 as durationAvg, 
-        quantilesExactInclusive(0.9)(duration)[1] / 1000 as durationP90 
-      FROM ${TABLE_NAMES.sessions} 
-      WHERE profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}
+        round(avgIf(duration, duration > 0 AND sign > 0) / 1000, 2) as durationAvg,
+        round(quantilesExactInclusive(0.9)(if(sign > 0 AND duration > 0, duration, NULL))[1] / 1000, 2) as durationP90
+      FROM ${TABLE_NAMES.sessions}
+      WHERE profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)} AND sign = 1
     ),

Also applies to: 46-52, 79-81

🤖 Prompt for AI Agents
In packages/db/src/services/profile.service.ts around lines 43-45 (and also
apply same changes to 46-52 and 79-81): replace the events-based sessionsCount
and the sessionDuration aggregations so they read from the sessions table
instead of counting 'session_start' events, and add filters to exclude
collapsed/negative sessions and zero/NULL durations — e.g. use the sessions
table with a WHERE clause that enforces sign = 1 (or the column that marks valid
sessions), and ensure sessionDuration aggregation ignores rows where duration IS
NULL OR duration <= 0 (and any collapsed flag is true); keep using
sqlstring.escape for profileId/projectId and mirror the same filtering logic in
the other noted line ranges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/redis/cachable.ts (1)

27-45: Harden JSON parsing in getCache against corrupted entries.

A malformed/poisoned Redis value will throw and fail requests. Wrap JSON.parse in try/catch and fall back to recomputing.

Apply this diff:

-  const hit = await getRedisCache().get(key);
-  if (hit) {
-    const parsed = JSON.parse(hit, (_, value) => {
+  const hit = await getRedisCache().get(key);
+  if (hit) {
+    let parsed: T | undefined;
+    try {
+      parsed = JSON.parse(hit, (_, value) => {
         if (
           typeof value === 'string' &&
           /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.*Z$/.test(value)
         ) {
           return new Date(value);
         }
         return value;
-    });
+      });
+    } catch (e) {
+      // Corrupted cache entry; ignore and recompute
+      // Optionally: getRedisCache().del(key).catch(() => {});
+    }
 
-    // Store in LRU cache for next time
-    if (useLruCache) {
-      globalLruCache.set(key, parsed, {
-        ttl: expireInSec * 1000, // Use the same TTL as Redis
-      });
-    }
-
-    return parsed;
+    if (parsed !== undefined) {
+      // Store in LRU cache for next time
+      if (useLruCache) {
+        globalLruCache.set(key, parsed as unknown as T, {
+          ttl: expireInSec * 1000, // Use the same TTL as Redis
+        });
+      }
+      return parsed as T;
+    }
   }
🧹 Nitpick comments (2)
apps/api/src/utils/auth.ts (1)

173-179: Optional: align secret verification caching across export/import paths.

You could apply the same secure getCache pattern here to reduce scrypt cost on bursts, using the full-secret fingerprinted key.

Also applies to: 212-214

packages/redis/cachable.ts (1)

197-200: Key generation may collide with custom stringify — prefer stable JSON.

stringify() omits quotes/escaping and can collide (e.g., values containing ':' or nested structures). Use a stable JSON stringifier.

Apply this diff:

-  const getKey = (...args: Parameters<T>) =>
-    `${cachePrefix}:${stringify(args)}`;
+  const getKey = (...args: Parameters<T>) =>
+    `${cachePrefix}:${stableStringify(args)}`;

Add a stable stringifier (either dependency or local):

+// At top
+import stableStringify from 'safe-stable-stringify';

If avoiding deps, implement a stable JSON with key sorting and proper quoting.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78dc299 and da55934.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (7)
  • apps/api/src/hooks/ip.hook.ts (1 hunks)
  • apps/api/src/utils/auth.ts (2 hunks)
  • packages/db/src/services/clients.service.ts (1 hunks)
  • packages/db/src/services/salt.service.ts (2 hunks)
  • packages/redis/cachable.test.ts (0 hunks)
  • packages/redis/cachable.ts (7 hunks)
  • packages/redis/package.json (1 hunks)
💤 Files with no reviewable changes (1)
  • packages/redis/cachable.test.ts
🧰 Additional context used
🧬 Code graph analysis (4)
apps/api/src/utils/auth.ts (2)
packages/redis/cachable.ts (1)
  • getCache (10-60)
packages/common/server/crypto.ts (1)
  • verifyPassword (38-60)
packages/db/src/services/clients.service.ts (1)
packages/redis/cachable.ts (1)
  • cacheable (155-283)
packages/db/src/services/salt.service.ts (2)
packages/redis/cachable.ts (1)
  • cacheable (155-283)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/redis/cachable.ts (1)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-dashboard
🔇 Additional comments (2)
apps/api/src/hooks/ip.hook.ts (1)

2-2: LGTM! Good cleanup.

Removing unused imports and using the import type syntax aligns with TypeScript best practices.

packages/redis/package.json (1)

11-12: No action required—Node.js 20 is compatible with [email protected].

lru-cache 11.2.2 requires Node.js 20 || >=22, and your CI is configured to use Node 20, which falls within the supported range. The dependency addition is safe.

Comment on lines +135 to +145
const isVerified = await getCache(
`client:auth:${clientId}:${clientSecret.slice(0, 5)}`,
60 * 5,
async () => await verifyPassword(clientSecret, client.secret!),
true,
);
if (isVerified) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Security: auth bypass risk — cache key uses only secret prefix.

Keying by clientId + clientSecret.slice(0, 5) can return a cached true for a different secret with the same 5-char prefix. Use a non-reversible fingerprint of the full secret instead (e.g., HMAC or SHA‑256).

Apply this diff:

@@
-import { getCache } from '@openpanel/redis';
+import { getCache } from '@openpanel/redis';
+import { createHash } from 'node:crypto';
@@
-  const isVerified = await getCache(
-    `client:auth:${clientId}:${clientSecret.slice(0, 5)}`,
-    60 * 5,
-    async () => await verifyPassword(clientSecret, client.secret!),
-    true,
-  );
+  const secretFp = createHash('sha256')
+    .update(clientSecret)
+    .digest('hex')
+    .slice(0, 32); // short, non-reversible fingerprint
+  const isVerified = await getCache(
+    `client:auth:${clientId}:${secretFp}`,
+    60 * 5,
+    async () => verifyPassword(clientSecret, client.secret!),
+    true,
+  );

Optionally prefer HMAC with a server-side key for stronger preimage resistance:

  • secretFp = createHmac('sha256', process.env.CACHE_KEY_SALT!).update(clientSecret).digest('hex').slice(0, 32)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const isVerified = await getCache(
`client:auth:${clientId}:${clientSecret.slice(0, 5)}`,
60 * 5,
async () => await verifyPassword(clientSecret, client.secret!),
true,
);
if (isVerified) {
const secretFp = createHash('sha256')
.update(clientSecret)
.digest('hex')
.slice(0, 32); // short, non-reversible fingerprint
const isVerified = await getCache(
`client:auth:${clientId}:${secretFp}`,
60 * 5,
async () => verifyPassword(clientSecret, client.secret!),
true,
);
if (isVerified) {
🤖 Prompt for AI Agents
In apps/api/src/utils/auth.ts around lines 135-141, the cache key currently uses
clientSecret.slice(0, 5) which allows collisions and an auth bypass risk;
replace that prefix with a non-reversible fingerprint of the full secret
(preferably an HMAC using a server-side key or at minimum a SHA-256 hash), e.g.
compute secretFp = HMAC-SHA256(process.env.CACHE_KEY_SALT, clientSecret) and use
a truncated hex digest (e.g., first 32 chars) in the cache key instead of the
5-char prefix, ensure process.env.CACHE_KEY_SALT is validated and throw or
fallback securely if missing, and keep the rest of the getCache call unchanged
so cached verification is keyed to the full-secret fingerprint.

}

export const getClientByIdCached = cacheable(getClientById, 60 * 60 * 24);
export const getClientByIdCached = cacheable(getClientById, 60 * 60 * 24, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Cache invalidation strategy for client updates.

24h TTL + L1 LRU is aggressive for entities whose policies (CORS, filters, type, secret) can change. Recommend either:

  • Lower TTL (e.g., 5–60 min), or
  • Call getClientByIdCached.clear(id) on client/project mutations to avoid stale auth policy reads.

🏁 Script executed:

#!/bin/bash
# Find any existing invalidations to avoid duplicating work
rg -nP "getClientByIdCached\.clear\(" -C2

Length of output: 355



🏁 Script executed:

#!/bin/bash
# Find client mutation operations (create, update, delete)
rg -nP "procedures?\.(create|update|delete)" packages/trpc/src/routers/ -C3 | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for any client-related mutation in trpc routers
fd -e ts packages/trpc/src/routers/ | xargs rg -l "client" | head -20

Length of output: 1033


🏁 Script executed:

#!/bin/bash
# Check if getClientByIdCached.clear is called in client.ts or similar client-related files
rg -n "getClientByIdCached" packages/trpc/src/routers/ -C3

Length of output: 870



🏁 Script executed:

#!/bin/bash
# Read the client router file to see what mutations exist
cat packages/trpc/src/routers/client.ts

Length of output: 2396


🏁 Script executed:

#!/bin/bash
# Search more broadly for client creation/update/deletion in db services
rg -n "createClient|updateClient|deleteClient" packages/db/src/ -C2

Length of output: 1703


Add cache invalidation to direct client mutations.

Verification found a critical gap: the update and remove procedures in packages/trpc/src/routers/client.ts do not call getClientByIdCached.clear(), meaning direct client mutations bypass cache invalidation entirely. Only project updates invalidate via project.ts:99. With 24h TTL, stale auth policy reads (CORS, filters, type, secret changes) are possible for up to a day.

Required fixes:

  • client.ts line ~48 (end of update mutation): Add getClientByIdCached.clear(input.id);
  • client.ts line ~97 (end of remove mutation): Add getClientByIdCached.clear(input.id);
🤖 Prompt for AI Agents
In packages/db/src/services/clients.service.ts around line 37 the cached getter
is created but cache invalidation is missing in the client mutation handlers; to
fix, update packages/trpc/src/routers/client.ts at the end of the update
mutation (around line ~48) to call getClientByIdCached.clear(input.id) and
likewise add getClientByIdCached.clear(input.id) at the end of the remove
mutation (around line ~97) so direct client updates/deletes immediately
invalidate the cached entry.

Comment on lines +20 to +47
export const getSalts = cacheable(
'op:salt',
async () => {
const [curr, prev] = await db.salt.findMany({
orderBy: {
createdAt: 'desc',
},
take: 2,
});

if (!curr) {
throw new Error('No salt found');
}

if (!prev) {
throw new Error('No salt found');
}
if (!curr) {
throw new Error('No salt found');
}

const salts = {
current: curr.salt,
previous: prev.salt,
};
if (!prev) {
throw new Error('No salt found');
}

await getRedisCache().set('op:salt', JSON.stringify(salts), 'EX', 60 * 10);
const salts = {
current: curr.salt,
previous: prev.salt,
};

return salts;
}
return salts;
},
60 * 10,
true,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Solid use of cacheable for salts.

Logic and TTL look good. If/when salts rotate, remember to invalidate via getSalts.clear() to avoid 10‑min drift.


🏁 Script executed:

#!/bin/bash
# Check if any rotation code calls getSalts.clear()
rg -nP "getSalts\.clear\(" -C2

Length of output: 0


</shell_scripts>


🏁 Script executed:

#!/bin/bash
# Search for salt mutations: create, update, delete, rotate operations
rg -nP "(createSalt|updateSalt|deleteSalt|rotateSalt|salt\.(create|update|delete|upsert))" -C3 --type=ts

echo "---"

# Search for cache invalidation patterns (not just getSalts.clear)
rg -nP "\.clear\(|invalidate|cache" packages/db/src/services/salt.service.ts -C2

Length of output: 3497



🏁 Script executed:

#!/bin/bash
# Check if cron.salt.ts calls getSalts.clear() or any cache invalidation
rg -nP "getSalts|\.clear\(|cache" apps/worker/src/jobs/cron.salt.ts -C2

echo "---"

# Also check if there's any import of getSalts in cron.salt.ts
rg -nP "import.*getSalts" apps/worker/src/jobs/cron.salt.ts

Length of output: 3


Cache invalidation missing in salt rotation flow.

Salt rotation occurs in apps/worker/src/jobs/cron.salt.ts (lines 7, 14) but never invalidates the cache. After db.salt.create() and db.salt.deleteMany(), call getSalts.clear() to prevent serving stale salts for up to 10 minutes.

🤖 Prompt for AI Agents
In packages/db/src/services/salt.service.ts around lines 20 to 47 the cached
getSalts result is not being invalidated by the salt rotation job; update
apps/worker/src/jobs/cron.salt.ts (around lines 7 and 14 where db.salt.create()
and db.salt.deleteMany() are called) to invoke getSalts.clear() immediately
after both the create and the deleteMany operations so the in-memory cache is
cleared and subsequent calls fetch the new salts instead of returning stale
values.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (9)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx (1)

36-41: Avoid queryKey churn while disabled.

When columnVisibility is null, consider omitting the param (undefined) instead of {} to prevent an extra key change once storage loads.

-        columnVisibility: columnVisibility ?? {},
+        columnVisibility: columnVisibility ?? undefined,
apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (1)

35-41: Reduce unnecessary refetch due to default {}.

Prefer not sending {} until visibility is available to keep the initial queryKey minimal.

-        columnVisibility: columnVisibility ?? {},
+        columnVisibility: columnVisibility ?? undefined,
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (1)

22-35: Gate the query until visibility loads (align with other tabs).

Add enabled: columnVisibility !== null so SSR/client don’t double-run and the final key includes the loaded visibility.

   const query = useInfiniteQuery(
     trpc.event.conversions.infiniteQueryOptions(
       {
         projectId,
         startDate: startDate || undefined,
         endDate: endDate || undefined,
-        columnVisibility: columnVisibility ?? {},
+        columnVisibility: columnVisibility ?? undefined,
       },
       {
+        enabled: columnVisibility !== null,
         getNextPageParam: (lastPage) => lastPage.meta.next,
       },
     ),
   );
apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (1)

47-48: Remove unused constant.

LIMIT is declared but not used.

-  const LIMIT = 50;
apps/start/src/components/ui/data-table/data-table-hooks.tsx (2)

59-62: Fallback to accessorKey for columnOrder too.

You already fall back to accessorKey for visibility; do the same here to avoid undefined ids.

-  const [columnOrder, setColumnOrder] = useLocalStorage<string[]>(
-    `@op:${persistentKey}-column-order`,
-    columns.map((column) => column.id!),
-  );
+  const [columnOrder, setColumnOrder] = useLocalStorage<string[]>(
+    `@op:${persistentKey}-column-order`,
+    columns.map((column) => column.id ?? (column as any).accessorKey).filter(Boolean),
+  );

52-57: Now let me examine the actual code file to understand the context:

Confirmed: usehooks-ts useLocalStorage dispatches a custom "local-storage" event after setting items, and useReadLocalStorage listens to both the native "storage" event and the "local-storage" custom event, which means same-tab event propagation works as expected.

Since the web search confirms that same-tab coordination via custom events is functional in usehooks-ts, the reviewer's primary concern about event propagation is mitigated. The underlying issue—writing back the same value on mount—may still warrant review, but requires examining the actual code context to determine if this write-back serves a legitimate purpose or is truly redundant.

Based on the confirmed web search result, the review comment's characterization of the risk as a reliability issue (no same-tab event propagation) is technically inaccurate. The pattern will work as intended if both hooks are from usehooks-ts. However, without access to the actual codebase to verify whether columnVisibility and setColumnVisibility are indeed paired from the same library or if there's another legitimate reason for the write-back, I cannot definitively resolve this.

Initialize only when storage has no value—confirmed as valid pattern.

The suggested refactor (initialize only if nothing is persisted) is a sound alternative that avoids unnecessary write-backs entirely.

Replace the redundant write-back pattern with conditional initialization to avoid unnecessary state updates and improve code clarity. The custom event mechanism in usehooks-ts already handles same-tab coordination reliably, so the write-back is not necessary to trigger updates. Initialize setColumnVisibility only when localStorage has no prior value.

packages/trpc/src/routers/event.ts (1)

186-211: Apply the same fix for conversions select.

Mirror the corrected per-field mapping (and optional whitelist) here for consistency.

-        select: {
-          ...columnVisibility,
-          city: columnVisibility?.country ?? true,
-          path: columnVisibility?.name ?? true,
-          projectId: false,
-          duration: false,
-        },
+        select: {
+          ...(columnVisibility ?? {}),
+          city: columnVisibility?.city ?? true,
+          path: columnVisibility?.path ?? true,
+          projectId: false,
+          duration: false,
+        },
packages/db/src/services/event.service.ts (2)

413-415: Mandatory selections: clarify and dedupe.

Forcing createdAt/projectId is good (transformers depend on them), but you also conditionally add them below, which is redundant. Mark them as required in the select shape (or remove the conditional block) to avoid confusion.


975-977: Window helper is OK; consider lead() for simpler semantics.

leadInFrame + custom frame works, but lead() defaults to an unbounded frame and is simpler to read: lead(toNullable(${field}), 1, NULL) OVER (PARTITION BY session_id ORDER BY ${field} ASC). Either way is fine. (clickhouse.com)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da55934 and 459bbf6.

📒 Files selected for processing (8)
  • apps/start/src/components/events/table/columns.tsx (0 hunks)
  • apps/start/src/components/ui/data-table/data-table-hooks.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (2 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (3 hunks)
  • packages/db/src/services/event.service.ts (6 hunks)
  • packages/trpc/src/routers/event.ts (3 hunks)
💤 Files with no reviewable changes (1)
  • apps/start/src/components/events/table/columns.tsx
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/components/ui/data-table/data-table-hooks.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx
🧬 Code graph analysis (5)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
packages/trpc/src/routers/event.ts (1)
packages/db/src/services/event.service.ts (1)
  • getEventList (364-548)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
🔇 Additional comments (4)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx (1)

25-41: LGTM on conditional enable with persisted visibility.

Using useReadColumnVisibility and enabled: columnVisibility !== null avoids SSR/localStorage races and keeps the query stable until ready.

apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (1)

25-41: Consistent gating looks good.

Same pattern as events tab; defers fetch until visibility is known. Nice.

apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (1)

60-75: Good: defer events fetch until visibility is ready.

The enabled guard mirrors the events tab and avoids SSR/localStorage timing issues.

apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)

25-29: New read hook is clear and focused.

Nice small utility to consume visibility without coupling to table state.

Comment on lines +361 to 392
dateIntervalInDays?: number;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

New option needs bounds/validation (and docs).

Expose allowed range and units for dateIntervalInDays (default 0.5). Add runtime validation (e.g., clamp to [0, 14]) at input boundaries (TRPC/Zod) to avoid unsafe values propagating into SQL strings.

Run this to locate validators and callsites:


🏁 Script executed:

#!/bin/bash
fd --hidden --no-ignore-vcs -t f | rg -n "dateIntervalInDays|getEventList\("

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if file exists and read target lines
echo "=== Checking file existence ==="
test -f "packages/db/src/services/event.service.ts" && echo "File exists" || echo "File NOT found"

echo ""
echo "=== File size and line count ==="
wc -l "packages/db/src/services/event.service.ts" 2>/dev/null || echo "Cannot read file"

echo ""
echo "=== Lines 355-370 (around 361-362) ==="
sed -n '355,370p' "packages/db/src/services/event.service.ts" 2>/dev/null || echo "Cannot read lines"

echo ""
echo "=== Search for dateIntervalInDays in file ==="
grep -n "dateIntervalInDays" "packages/db/src/services/event.service.ts" 2>/dev/null || echo "No matches found"

echo ""
echo "=== Search for GetEventListOptions interface ==="
grep -n "interface GetEventListOptions" "packages/db/src/services/event.service.ts" 2>/dev/null || echo "Not found"

Length of output: 1096



---

🏁 Script executed:

```shell
#!/bin/bash
echo "=== Lines 370-395 (function start and defaults) ==="
sed -n '370,395p' "packages/db/src/services/event.service.ts"

echo ""
echo "=== Lines 535-555 (recursion context) ==="
sed -n '535,555p' "packages/db/src/services/event.service.ts"

echo ""
echo "=== Search for getEventList validator/schema ==="
fd -t f \( -name "*.ts" -o -name "*.tsx" \) -type f -exec grep -l "getEventList\|dateIntervalInDays" {} \; | head -20

echo ""
echo "=== Search for Zod schema definitions in packages/db ==="
find packages/db -name "*.ts" -type f -exec grep -l "z\.object.*dateInterval\|z\.object.*getEventList" {} \; 2>/dev/null

Length of output: 1782



---

🏁 Script executed:

```shell
#!/bin/bash
echo "=== Search for getEventList calls (API boundaries) ==="
rg -n "getEventList\(" --type ts | head -30

echo ""
echo "=== Check for recursion depth limits or guards near line 540 ==="
sed -n '520,560p' "packages/db/src/services/event.service.ts"

echo ""
echo "=== Search for TRPC routers that might expose getEventList ==="
find . -name "*.ts" -type f -exec rg -l "\.query.*getEventList|\.mutation.*getEventList" {} \; 2>/dev/null | head -10

echo ""
echo "=== Check if dateIntervalInDays is defined in any Zod schemas ==="
rg -n "dateInterval" --type ts

Length of output: 2711



---

🌐 Web query:

```web
ClickHouse INTERVAL fractional days syntax 0.5 DAY valid

💡 Result:

Short answer: No — the INTERVAL 0.5 DAY token (numeric literal + unit) is not the documented form. Use a string literal or the toInterval* functions instead.

How to write it:

  • INTERVAL '0.5 days' (string literal form). [1]
  • toIntervalDay(0.5) or toInterval(0.5, 'day'). [2]

Sources:

  • ClickHouse INTERVAL / operators documentation (string-literal examples). [1]
  • ClickHouse type-conversion functions (toIntervalDay / toInterval accept Float). [2]

References:
[1] ClickHouse Operators — INTERVAL examples.
[2] ClickHouse Type conversion functions — toIntervalDay / toInterval.


Fix invalid ClickHouse INTERVAL syntax; add recursion depth safeguard.

Lines 384 and 389 use INTERVAL ${dateIntervalInDays} DAY, which is invalid for fractional values like 0.5. ClickHouse requires either a string literal (INTERVAL '0.5 days') or the toIntervalDay() function. Use toIntervalDay(${dateIntervalInDays}) to accept any numeric value safely.

Additionally, the recursion at line 543 doubles dateIntervalInDays indefinitely with no upper bound. Add a safeguard to prevent excessive window expansion (e.g., cap at 14 days):

// Line 543 — add max recursion check
if (data.length === 0 && sb.where.cursorWindow && dateIntervalInDays < 14) {
  return getEventList({
    ...options,
    dateIntervalInDays: dateIntervalInDays * 2,
  });
}

Apply the same INTERVAL fix to packages/db/src/services/session.service.ts lines 205 and 209.

🤖 Prompt for AI Agents
In packages/db/src/services/event.service.ts (inspect around lines 384, 389 and
the recursive call near 543) and packages/db/src/services/session.service.ts
(around lines 205 and 209), replace any ClickHouse usage of INTERVAL
${dateIntervalInDays} DAY with toIntervalDay(${dateIntervalInDays}) so
fractional day values are valid, and add a recursion depth safeguard where the
function retries with a larger window: at the recursive branch near line 543
check that dateIntervalInDays < 14 before doubling (e.g., if (data.length === 0
&& sb.where.cursorWindow && dateIntervalInDays < 14) { return
getEventList({...options, dateIntervalInDays: dateIntervalInDays * 2}); }),
ensuring you default or validate dateIntervalInDays prior to use; apply the same
to the analogous interval usages in packages/db/src/services/session.service.ts
lines ~205 and ~209.

Comment on lines +376 to 408
select: incomingSelect,
dateIntervalInDays = 0.5,
} = options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: fractional INTERVAL is invalid in ClickHouse. Use integer seconds.

INTERVAL ${dateIntervalInDays} DAY with default 0.5 produces invalid SQL. ClickHouse INTERVAL requires an unsigned integer (no fractions). Convert to seconds (or minutes) and use an integer unit. (clickhouse.com)

Apply:

@@
-    select: incomingSelect,
-    dateIntervalInDays = 0.5,
+    select: incomingSelect,
+    dateIntervalInDays = 0.5,
@@
-  if (typeof cursor === 'number') {
+  const intervalSeconds = Math.max(1, Math.ceil((dateIntervalInDays || 0) * 24 * 60 * 60));
+  if (typeof cursor === 'number') {
@@
-    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(cursor))}, 3) - INTERVAL ${dateIntervalInDays} DAY`;
+    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(cursor))}, 3) - INTERVAL ${intervalSeconds} SECOND`;
@@
-  if (!cursor) {
-    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(new Date()))}, 3) - INTERVAL ${dateIntervalInDays} DAY`;
+  if (!cursor) {
+    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(new Date()))}, 3) - INTERVAL ${intervalSeconds} SECOND`;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/db/src/services/event.service.ts around lines 376-378, the SQL uses
INTERVAL ${dateIntervalInDays} DAY with a default of 0.5 which produces invalid
ClickHouse SQL (INTERVAL requires an unsigned integer); convert the fractional
days to an integer number of seconds and use INTERVAL <seconds> SECOND instead —
e.g. compute an integerSeconds = Math.floor(dateIntervalInDays * 86400) (or
Math.round if you prefer) and substitute INTERVAL ${integerSeconds} SECOND
(ensure integer/unsigned) so the generated SQL always uses an integer interval
unit.

}
if (select.duration) {
sb.select.duration = 'duration';
sb.select.duration = `${getDurationSql()} as duration`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duration can be NULL; coalesce for type safety.

leadInFrame(..., NULL) yields NULL on the last row; dateDiff(..., NULL) returns NULL, conflicting with IServiceEvent.duration: number. Wrap with ifNull(..., 0) here (keep getDurationSql raw for aggregations). (clickhouse.com)

-    sb.select.duration = `${getDurationSql()} as duration`;
+    sb.select.duration = `ifNull(${getDurationSql()}, 0) as duration`;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sb.select.duration = `${getDurationSql()} as duration`;
sb.select.duration = `ifNull(${getDurationSql()}, 0) as duration`;
🤖 Prompt for AI Agents
In packages/db/src/services/event.service.ts around line 474, sb.select.duration
is set to `${getDurationSql()} as duration` but getDurationSql can return NULL
(e.g., leadInFrame(..., NULL)), which conflicts with IServiceEvent.duration:
number; wrap the expression with ifNull(..., 0) when assigning
sb.select.duration so the SELECT yields 0 instead of NULL while keeping
getDurationSql unchanged for aggregation use; update the assignment to coalesce
the duration to 0 to ensure type safety.

Comment on lines +543 to 574
dateIntervalInDays: dateIntervalInDays * 2,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unbounded recursion risk when no events exist. Add a cap.

If the project has no events, doubling dateIntervalInDays recursively never terminates. Cap the interval and stop recursing after the cap is reached.

-  if (data.length === 0 && sb.where.cursorWindow) {
-    return getEventList({
-      ...options,
-      dateIntervalInDays: dateIntervalInDays * 2,
-    });
-  }
+  const MAX_INTERVAL_DAYS = 14;
+  if (data.length === 0 && sb.where.cursorWindow && dateIntervalInDays < MAX_INTERVAL_DAYS) {
+    return getEventList({
+      ...options,
+      dateIntervalInDays: Math.min(dateIntervalInDays * 2, MAX_INTERVAL_DAYS),
+    });
+  }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/db/src/services/event.service.ts around lines 543-544, the recursive
call that doubles dateIntervalInDays can loop forever when no events exist; add
a hard cap (e.g. const MAX_DATE_INTERVAL_DAYS = 365 or configurable) and check
it before recursing: if dateIntervalInDays >= MAX_DATE_INTERVAL_DAYS then stop
recursing and return an appropriate fallback (empty result or the current
result) instead of calling the function again; update any callers or types if
needed so the function returns deterministically when the cap is reached.

Comment on lines 126 to 145
columnVisibility: z.record(z.string(), z.boolean()).optional(),
}),
)
.query(async ({ input }) => {
.query(async ({ input: { columnVisibility, ...input } }) => {
const items = await getEventList({
...input,
take: 50,
cursor: input.cursor ? new Date(input.cursor) : undefined,
select: {
profile: true,
properties: true,
sessionId: true,
deviceId: true,
profileId: true,
referrerName: true,
referrerType: true,
referrer: true,
origin: true,
...columnVisibility,
city: columnVisibility?.country ?? true,
path: columnVisibility?.name ?? true,
projectId: false,
duration: false,
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Incorrect mapping: city depends on country, path depends on name.

This overrides user toggles for city/path and likely hides them unintentionally. Map each field to its own flag and (optionally) whitelist allowed keys.

-        select: {
-          ...columnVisibility,
-          city: columnVisibility?.country ?? true,
-          path: columnVisibility?.name ?? true,
-          projectId: false,
-          duration: false,
-        },
+        // Optional: restrict client-provided keys to known fields
+        // const allowed = new Set(['id','name','deviceId','profileId','sessionId','createdAt','path','city','country','os','browser','region','latitude','longitude','osVersion','browserVersion','device','brand','model','properties','origin','referrer','referrerName','referrerType','importedAt','sdkName','sdkVersion','profile','meta','duration','projectId']);
+        // const filtered = Object.fromEntries(Object.entries(columnVisibility ?? {}).filter(([k]) => allowed.has(k)));
+        select: {
+          ...(columnVisibility ?? {}),
+          city: columnVisibility?.city ?? true,
+          path: columnVisibility?.path ?? true,
+          projectId: false,
+          duration: false,
+        },

Does EventsTable rely on event.id for list keys/links? If yes, consider forcing id: true regardless of visibility to prevent UI breakage.


🏁 Script executed:

#!/bin/bash
# Verify the actual code in event.ts
cat packages/trpc/src/routers/event.ts | sed -n '115,145p'

Length of output: 1112


🏁 Script executed:

#!/bin/bash
# Search for EventsTable component definition and usage
rg -n "EventsTable" --type=ts --type=tsx -C 5

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Find event model/schema definition to understand available fields
rg -n "select.*columnVisibility" --type=ts --type=tsx -C 10

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Check if event.id is used in EventsTable or for list rendering
rg -n "event\.id|eventId" --type=ts --type=tsx -C 3 -g "*EventsTable*"

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Search for any documentation or comments about columnVisibility intent
rg -n "columnVisibility" --type=ts --type=tsx -B 3 -A 3

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Search for EventsTable without tsx type issue
rg -n "EventsTable" -C 5 -g "*.ts" -g "*.tsx" -g "*.jsx"

Length of output: 11098


🏁 Script executed:

#!/bin/bash
# Find event.id usage in table/UI components
rg -n "event\.id" -C 3 -g "*.ts" -g "*.tsx" -g "*.jsx" | head -50

Length of output: 3640


🏁 Script executed:

#!/bin/bash
# Find getEventList function definition to understand schema
rg -n "getEventList\s*=" -A 20 -g "*.ts" -g "*.tsx"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for tests or documentation about columnVisibility
rg -n "columnVisibility" -B 2 -A 2 -g "*.ts" -g "*.tsx"

Length of output: 15094


🏁 Script executed:

#!/bin/bash
# Check if event model has these fields (id, city, path, country, name)
find . -type f \( -name "*.ts" -o -name "*.tsx" \) -exec grep -l "interface.*Event\|type.*Event" {} \; | head -10

Length of output: 2168


Fix incorrect field mappings in select object—appears in two query handlers.

The city and path fields are mapped to wrong visibility toggles (country and name respectively) at lines 136–137 and 207–208. This overrides user preferences and hides these fields unintentionally. Map each field to its own flag:

        select: {
          ...columnVisibility,
-         city: columnVisibility?.country ?? true,
-         path: columnVisibility?.name ?? true,
+         city: columnVisibility?.city ?? true,
+         path: columnVisibility?.path ?? true,
          projectId: false,
          duration: false,
        },

Additionally, since ...columnVisibility spreads client-provided keys into the select object, ensure id is always returned (prevents table rendering breakage if user hides the id column):

        select: {
          ...columnVisibility,
+         id: true,
          city: columnVisibility?.city ?? true,

Apply both fixes at packages/trpc/src/routers/event.ts lines 136–137 (events query) and lines 207–208 (conversions query).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
columnVisibility: z.record(z.string(), z.boolean()).optional(),
}),
)
.query(async ({ input }) => {
.query(async ({ input: { columnVisibility, ...input } }) => {
const items = await getEventList({
...input,
take: 50,
cursor: input.cursor ? new Date(input.cursor) : undefined,
select: {
profile: true,
properties: true,
sessionId: true,
deviceId: true,
profileId: true,
referrerName: true,
referrerType: true,
referrer: true,
origin: true,
...columnVisibility,
city: columnVisibility?.country ?? true,
path: columnVisibility?.name ?? true,
projectId: false,
duration: false,
},
});
select: {
...columnVisibility,
city: columnVisibility?.city ?? true,
path: columnVisibility?.path ?? true,
projectId: false,
duration: false,
},
🤖 Prompt for AI Agents
In packages/trpc/src/routers/event.ts around lines 126–141 and 204–210, the
select object incorrectly maps city to columnVisibility.country and path to
columnVisibility.name; change them so city uses columnVisibility.city and path
uses columnVisibility.path in both the events query (lines ~136–137) and
conversions query (lines ~207–208). Also ensure id is always included by setting
id: true in the select object after spreading ...columnVisibility so a client
hiding id cannot break table rendering.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/start/src/modals/event-details.tsx (1)

70-80: Add error state handling for the query.

The switch from useSuspenseQuery to useQuery changes the loading behavior, but the current implementation doesn't handle error states. When query.isError is true, users will see the loading skeleton indefinitely with no error message or retry option.

Apply this diff to add error handling:

  const query = useQuery(
    trpc.event.details.queryOptions({
      id,
      projectId,
      createdAt,
    }),
  );

- if (!query.data) {
+ if (query.isLoading) {
    return <EventDetailsSkeleton />;
  }
+
+ if (query.isError) {
+   return (
+     <ModalContent className="!p-0">
+       <Widget className="bg-transparent border-0 min-w-0">
+         <WidgetHead>
+           <div className="row items-center justify-between">
+             <div className="title">Error loading event</div>
+             <Button size="icon" variant={'ghost'} onClick={() => popModal()}>
+               <XIcon className="size-4" />
+             </Button>
+           </div>
+         </WidgetHead>
+         <WidgetBody className="col gap-4 bg-def-100">
+           <div className="card p-4 text-center text-muted-foreground">
+             Failed to load event details. Please try again.
+           </div>
+         </WidgetBody>
+       </Widget>
+     </ModalContent>
+   );
+ }
♻️ Duplicate comments (2)
packages/trpc/src/routers/event.ts (2)

139-144: Fix incorrect field mappings.

Lines 140-141 map city to columnVisibility?.country and path to columnVisibility?.name, which overrides user visibility preferences for these fields.

This issue was already flagged in a previous review. Apply the suggested fix:

        select: {
          ...columnVisibility,
-         city: columnVisibility?.country ?? true,
-         path: columnVisibility?.name ?? true,
+         city: columnVisibility?.city ?? true,
+         path: columnVisibility?.path ?? true,
          projectId: false,
          duration: false,
        },

Additionally, consider forcing id: true after the spread to prevent UI breakage if the client hides the ID column (as suggested in the previous review).


210-215: Fix incorrect field mappings (same issue as events endpoint).

Lines 211-212 have the same incorrect mappings as the events endpoint: citycountry and pathname.

Apply the same fix here:

        select: {
          ...columnVisibility,
-         city: columnVisibility?.country ?? true,
-         path: columnVisibility?.name ?? true,
+         city: columnVisibility?.city ?? true,
+         path: columnVisibility?.path ?? true,
          projectId: false,
          duration: false,
        },

Also consider forcing id: true to prevent UI issues.

🧹 Nitpick comments (1)
apps/start/src/modals/event-details.tsx (1)

181-204: Remove commented-out navigation code.

Since the arrow icon imports have been removed and this code is commented out, it should be deleted entirely. If navigation is needed in the future, it can be restored from git history.

Apply this diff to remove the dead code:

            <div className="row items-center gap-2 pr-2">
-             {/* <Button
-               size="icon"
-               variant={'ghost'}
-               onClick={() => {
-                 const event = new KeyboardEvent('keydown', {
-                   key: 'ArrowLeft',
-                 });
-                 dispatchEvent(event);
-               }}
-             >
-               <ArrowLeftIcon className="size-4" />
-             </Button>
-             <Button
-               size="icon"
-               variant={'ghost'}
-               onClick={() => {
-                 const event = new KeyboardEvent('keydown', {
-                   key: 'ArrowRight',
-                 });
-                 dispatchEvent(event);
-               }}
-             >
-               <ArrowRightIcon className="size-4" />
-             </Button> */}
              <Button size="icon" variant={'ghost'} onClick={() => popModal()}>
                <XIcon className="size-4" />
              </Button>
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 459bbf6 and ce363c7.

📒 Files selected for processing (2)
  • apps/start/src/modals/event-details.tsx (1 hunks)
  • packages/trpc/src/routers/event.ts (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/modals/event-details.tsx
🧬 Code graph analysis (1)
packages/trpc/src/routers/event.ts (2)
packages/db/src/services/session.service.ts (1)
  • IServiceSession (59-97)
packages/db/src/services/event.service.ts (1)
  • getEventList (364-548)
🔇 Additional comments (3)
packages/trpc/src/routers/event.ts (2)

7-7: LGTM!

The import of IServiceSession is necessary for typing the session variable in the details endpoint and aligns with the service layer interface.


108-111: LGTM!

The conditional session retrieval correctly avoids unnecessary database calls when sessionId is absent, improving performance.

apps/start/src/modals/event-details.tsx (1)

21-22: Import changes look correct.

The switch from useSuspenseQuery to useQuery and the reduced icon imports are consistent with the actual usage in the file. However, ensure error handling is addressed (see comment on lines 70-80).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (1)

23-35: Missing enabled guard for columnVisibility.

File 2 (apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx) includes an enabled: columnVisibility !== null guard to prevent the query from executing before column visibility is loaded from localStorage. This file is missing the same guard, which could cause the query to execute with incomplete configuration.

Apply this diff to add the enabled guard:

   const query = useInfiniteQuery(
     trpc.event.conversions.infiniteQueryOptions(
       {
         projectId,
         startDate: startDate || undefined,
         endDate: endDate || undefined,
         columnVisibility: columnVisibility ?? {},
       },
       {
+        enabled: columnVisibility !== null,
         getNextPageParam: (lastPage) => lastPage.meta.next,
       },
     ),
   );
♻️ Duplicate comments (8)
packages/db/src/services/event.service.ts (3)

414-416: Fix invalid ClickHouse INTERVAL with fractional days; precompute integer seconds.

dateIntervalInDays defaults to 0.5, but INTERVAL ${dateIntervalInDays} DAY rejects fractions. Use integer seconds.

   const {
     ...
-    select: incomingSelect,
-    dateIntervalInDays = 0.5,
+    select: incomingSelect,
+    dateIntervalInDays = 0.5,
   } = options;
   const { sb, getSql, join } = createSqlBuilder();
+  const intervalSeconds = Math.max(
+    1,
+    Math.ceil((dateIntervalInDays || 0) * 24 * 60 * 60),
+  );
@@
-    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(cursor))}, 3) - INTERVAL ${dateIntervalInDays} DAY`;
+    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(cursor))}, 3) - INTERVAL ${intervalSeconds} SECOND`;
@@
-  if (!cursor) {
-    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(new Date()))}, 3) - INTERVAL ${dateIntervalInDays} DAY`;
-  }
+  if (!cursor) {
+    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(new Date()))}, 3) - INTERVAL ${intervalSeconds} SECOND`;
+  }
#!/bin/bash
# Find other fractional INTERVAL DAY usages
rg -nP "INTERVAL\s+\$\{?[\w\.]+\}?\s+DAY" --type ts

Also applies to: 418-420, 406-408


569-575: Add a recursion cap when doubling dateIntervalInDays.

Unbounded doubling can recurse forever on empty datasets. Cap and clamp.

-  if (data.length === 0 && sb.where.cursorWindow) {
-    return getEventList({
-      ...options,
-      dateIntervalInDays: dateIntervalInDays * 2,
-    });
-  }
+  const MAX_INTERVAL_DAYS = 14;
+  if (data.length === 0 && sb.where.cursorWindow && dateIntervalInDays < MAX_INTERVAL_DAYS) {
+    return getEventList({
+      ...options,
+      dateIntervalInDays: Math.min(dateIntervalInDays * 2, MAX_INTERVAL_DAYS),
+    });
+  }

504-505: Coalesce nullable duration to 0.

leadInFrame returns NULL on last row; wrap with ifNull to meet number typing and avoid undefined UI.

-    sb.select.duration = `${getDurationSql()} as duration`;
+    sb.select.duration = `ifNull(${getDurationSql()}, 0) as duration`;
apps/api/src/utils/auth.ts (1)

135-141: Security: cache key must not use a 5‑char secret prefix. Use a hash/HMAC of the full secret.

Prefix collisions can wrongly reuse a cached “true” for different secrets.

-import { getCache } from '@openpanel/redis';
+import { getCache } from '@openpanel/redis';
+import { createHash } from 'node:crypto';
@@
-  const isVerified = await getCache(
-    `client:auth:${clientId}:${clientSecret.slice(0, 5)}`,
+  const secretFp = createHash('sha256').update(clientSecret).digest('hex').slice(0, 32);
+  const isVerified = await getCache(
+    `client:auth:${clientId}:${secretFp}`,
     60 * 5,
-    async () => await verifyPassword(clientSecret, client.secret!),
+    async () => verifyPassword(clientSecret, client.secret!),
     true,
   );
packages/db/src/services/profile.service.ts (1)

43-45: Session metrics still need alignment with sessions table and Collapsing sign.

The past review comment requesting sessions table usage with sign filtering and duration validation has not been addressed:

  • Line 44: sessionsCount still counts session_start events instead of reading from the sessions table with sign = 1 filtering
  • Lines 48-51: sessionDuration reads from sessions table but doesn't filter out collapsed/negative rows (sign != 1) or zero/NULL durations

Also applies to: 46-52, 63-63, 71-72, 79-81

packages/db/src/buffers/event-buffer.ts (3)

52-95: CSV builder breaks JSON and NULL semantics; fix quoting/escaping.

This is the same critical issue flagged in the previous review. The current implementation:

  1. Replaces " with ' (line 57), which corrupts JSON in the properties field
  2. Returns empty string '' for NULL/undefined (line 54) instead of \N (ClickHouse's CSV NULL representation)
  3. Applies CSV escaping to already-stringified JSON (line 77), causing double-escaping

These issues will corrupt event data when ClickHouse parses the CSV.

Apply the fix from the previous review:

-  private eventToCsvRow(event: IClickhouseEvent): string {
-    const escapeCsvValue = (value: any): string => {
-      if (value === null || value === undefined) return '';
-      const str = String(value);
-      // Replace double quotes with single quotes, then escape single quotes by doubling them
-      const withSingleQuotes = str.replace(/"/g, "'");
-      return `'${withSingleQuotes.replace(/'/g, "''")}'`;
-    };
-
-    // Order matches the ClickHouse table schema exactly
-    const columns = [
+  private eventToCsvRow(event: IClickhouseEvent): string {
+    // RFC4180-style escaping with ClickHouse defaults:
+    // - Nulls as \N (unquoted)
+    // - Double quotes escaped by doubling them
+    const escapeCsvField = (value: any): string => {
+      if (value === null || value === undefined) return '\\N';
+      const s = String(value);
+      const needsQuote = /[",\n\r]/.test(s);
+      const escaped = s.replace(/"/g, '""');
+      return needsQuote ? `"${escaped}"` : escaped;
+    };
+
+    // Order must match the ClickHouse table schema
+    const columns = [
       event.id, // id UUID
       event.name, // name
       event.sdk_name, // sdk_name
       event.sdk_version, // sdk_version
       event.device_id, // device_id
       event.profile_id, // profile_id
       event.project_id, // project_id
       event.session_id, // session_id
       event.path, // path
       event.origin, // origin
       event.referrer, // referrer
       event.referrer_name, // referrer_name
       event.referrer_type, // referrer_type
       event.duration, // duration
-      escapeCsvValue(JSON.stringify(event.properties)), // properties
+      JSON.stringify(event.properties ?? {}), // properties (keep valid JSON)
       event.created_at, // created_at
       event.country, // country
       event.city, // city
       event.region, // region
       event.longitude, // longitude
       event.latitude, // latitude
       event.os, // os
       event.os_version, // os_version
       event.browser, // browser
       event.browser_version, // browser_version
       event.device, // device
       event.brand, // brand
       event.model, // model
       event.imported_at, // imported_at
     ];
-
-    return columns.join(',');
+    return columns.map(escapeCsvField).join(',');
   }

428-472: Add idempotency via insert_deduplication_token to prevent duplicates on retries.

This is the same major issue flagged in the previous review. When processBuffer fails mid-chunk and retries via pushToRetry, the same events can be inserted again, causing duplicates.

Apply the fix from the previous review to add deduplication tokens:

+import { createHash } from 'node:crypto';
@@
     // Insert events into ClickHouse in chunks using CSV format with headers
     for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
+      const dedupToken = createHash('sha1')
+        .update(chunk.map((e) => e.id).join(','))
+        .digest('hex');
       if (process.env.USE_CSV === 'true' || process.env.USE_CSV === '1') {
         // Convert events to CSV format
         const csvRows = chunk.map((event) => this.eventToCsvRow(event));
         const csv = [this.getCsvHeaders(), ...csvRows].join('\n');

         // Create a readable stream in binary mode for CSV
         const csvStream = Readable.from(csv, { objectMode: false });

         await ch.insert({
           table: 'events',
           values: csvStream,
           format: 'CSV',
           clickhouse_settings: {
             input_format_csv_skip_first_lines: '1',
             format_csv_allow_single_quotes: 1,
             format_csv_allow_double_quotes: 1,
+            insert_deduplication_token: dedupToken,
           },
         });
       } else {
         await ch.insert({
           table: 'events',
           values: chunk,
           format: 'JSONEachRow',
+          clickhouse_settings: {
+            insert_deduplication_token: dedupToken,
+          },
         });
       }
     }

459-463: Update CSV settings after fixing CSV escaping.

Once the CSV escaping is fixed to use RFC4180-style double-quote escaping (per the previous review comment), update the ClickHouse settings accordingly:

           clickhouse_settings: {
             input_format_csv_skip_first_lines: '1',
-            format_csv_allow_single_quotes: 1,
             format_csv_allow_double_quotes: 1,
             insert_deduplication_token: dedupToken,
           },

The format_csv_allow_single_quotes setting is inconsistent with proper RFC4180 CSV format and should be removed.

🧹 Nitpick comments (7)
packages/db/src/buffers/base-buffer.ts (1)

106-112: Consider error recovery strategy for parallel mode.

While the comment correctly notes that the counter can't be safely reset with concurrent workers, persistent errors in parallel mode could leave the system degraded until the hourly resync runs. Consider adding:

  • A circuit breaker or backoff mechanism for repeated failures
  • Metrics/alerts for error rates in parallel mode
  • A manual trigger for counter resync in emergency situations
apps/start/src/modals/event-details.tsx (1)

21-22: Import cleanup looks good; confirm loading/error UX after switching to useQuery.

useQuery returns data/error states; you render a skeleton on !data but don’t render an error. Consider a small error state.

packages/db/src/services/event.service.ts (1)

443-445: Remove redundant unconditional select assignments.

createdAt/projectId are also conditionally assigned below; keep one path to avoid confusion.

-  sb.select.createdAt = 'created_at';
-  sb.select.projectId = 'project_id';
packages/redis/cachable.ts (1)

127-153: cacheable overloads look good; minor hardening optional.

Consider mirroring the getCache parse/error guard and adding .catch to setex here too for consistency.

-      getRedisCache().setex(key, expireInSec, JSON.stringify(result));
+      getRedisCache().setex(key, expireInSec, JSON.stringify(result)).catch(() => {});
@@
-    return getRedisCache().del(key);
+    return getRedisCache().del(key);
@@
-      return getRedisCache().setex(key, expireInSec, JSON.stringify(payload));
+      return getRedisCache()
+        .setex(key, expireInSec, JSON.stringify(payload))
+        .catch(() => 'OK' as any);

Also applies to: 154-188, 201-208, 214-258, 263-283

packages/db/src/services/profile.service.ts (1)

16-16: Unused import.

The getDurationSql import doesn't appear to be used in this file. Consider removing it to keep imports clean.

packages/db/src/buffers/event-buffer.test.ts (1)

56-322: Consider adding test coverage for retry and DLQ behavior.

The test suite covers core buffer operations well (add, process, sort, chunk, bulk add), but given that retry and DLQ metrics are now exposed in apps/worker/src/metrics.ts, consider adding tests for:

  • Retry buffer processing (processRetryBuffer, getRetryBufferSize)
  • DLQ behavior when max retries exceeded (getDLQSize, inspectDLQ, clearDLQ)
  • Error handling during processEventsChunk (should push events to retry buffer)
  • CSV format insertion path (when USE_CSV env var is set)

Would you like me to generate test cases for these scenarios?

packages/db/src/buffers/event-buffer.ts (1)

474-480: Remove unnecessary await inside multi transaction loop.

Lines 476-478 use await inside the loop even though events are being added to a multi transaction. The await is unnecessary since the actual execution happens at line 479 with multi.exec().

     // Publish "saved" events
     const pubMulti = getRedisPub().multi();
     for (const event of eventsToClickhouse) {
-      await publishEvent('events', 'saved', transformEvent(event), pubMulti);
+      publishEvent('events', 'saved', transformEvent(event), pubMulti);
     }
     await pubMulti.exec();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce363c7 and 01837a9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (32)
  • .github/workflows/docker-build.yml (3 hunks)
  • apps/api/src/controllers/live.controller.ts (2 hunks)
  • apps/api/src/controllers/profile.controller.ts (2 hunks)
  • apps/api/src/controllers/track.controller.ts (1 hunks)
  • apps/api/src/hooks/ip.hook.ts (1 hunks)
  • apps/api/src/utils/auth.ts (2 hunks)
  • apps/start/package.json (0 hunks)
  • apps/start/src/components/events/table/columns.tsx (0 hunks)
  • apps/start/src/components/ui/data-table/data-table-hooks.tsx (3 hunks)
  • apps/start/src/modals/event-details.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (2 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (3 hunks)
  • apps/worker/src/jobs/cron.delete-projects.ts (1 hunks)
  • apps/worker/src/metrics.ts (1 hunks)
  • packages/db/package.json (1 hunks)
  • packages/db/src/buffers/base-buffer.ts (2 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (3 hunks)
  • packages/db/src/buffers/event-buffer.ts (3 hunks)
  • packages/db/src/buffers/profile-buffer.ts (3 hunks)
  • packages/db/src/services/clients.service.ts (1 hunks)
  • packages/db/src/services/event.service.ts (7 hunks)
  • packages/db/src/services/notification.service.ts (2 hunks)
  • packages/db/src/services/profile.service.ts (4 hunks)
  • packages/db/src/services/salt.service.ts (2 hunks)
  • packages/logger/index.ts (2 hunks)
  • packages/redis/cachable.test.ts (0 hunks)
  • packages/redis/cachable.ts (7 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/src/routers/event.ts (5 hunks)
  • packages/trpc/src/routers/profile.ts (2 hunks)
💤 Files with no reviewable changes (3)
  • packages/redis/cachable.test.ts
  • apps/start/src/components/events/table/columns.tsx
  • apps/start/package.json
🚧 Files skipped from review as they are similar to previous changes (7)
  • .github/workflows/docker-build.yml
  • packages/redis/package.json
  • apps/start/src/routes/app.$organizationId.$projectId.events._tabs.events.tsx
  • apps/start/src/components/ui/data-table/data-table-hooks.tsx
  • packages/trpc/src/routers/event.ts
  • apps/worker/src/jobs/cron.delete-projects.ts
  • packages/db/src/buffers/profile-buffer.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx
  • apps/start/src/modals/event-details.tsx
🧬 Code graph analysis (17)
apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
packages/trpc/src/routers/profile.ts (1)
packages/db/src/services/profile.service.ts (1)
  • getProfileById (91-120)
apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
apps/api/src/utils/auth.ts (2)
packages/redis/cachable.ts (1)
  • getCache (10-60)
packages/common/server/crypto.ts (1)
  • verifyPassword (38-60)
packages/db/src/services/salt.service.ts (1)
packages/redis/cachable.ts (1)
  • cacheable (155-283)
apps/api/src/controllers/profile.controller.ts (2)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (71-92)
packages/db/src/buffers/base-buffer.ts (1)
packages/logger/index.ts (1)
  • createLogger (11-98)
packages/db/src/services/profile.service.ts (2)
packages/db/src/clickhouse/client.ts (1)
  • TABLE_NAMES (48-60)
packages/db/src/buffers/index.ts (1)
  • profileBuffer (7-7)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
packages/db/src/services/notification.service.ts (1)
packages/db/src/services/profile.service.ts (1)
  • getProfileById (91-120)
apps/api/src/controllers/live.controller.ts (1)
packages/db/src/services/profile.service.ts (1)
  • getProfileById (91-120)
apps/worker/src/metrics.ts (1)
packages/db/src/buffers/index.ts (1)
  • eventBuffer (6-6)
packages/db/src/services/clients.service.ts (1)
packages/redis/cachable.ts (1)
  • cacheable (155-283)
packages/db/src/services/event.service.ts (1)
packages/db/src/services/profile.service.ts (1)
  • getProfileById (91-120)
packages/db/src/buffers/event-buffer.test.ts (3)
packages/db/src/buffers/index.ts (1)
  • eventBuffer (6-6)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/db/src/buffers/event-buffer.ts (1)
  • EventBuffer (17-625)
packages/db/src/buffers/event-buffer.ts (5)
packages/redis/redis.ts (3)
  • Redis (9-9)
  • getRedisCache (66-72)
  • getRedisPub (84-90)
packages/db/src/services/event.service.ts (3)
  • IClickhouseEvent (65-99)
  • IServiceEvent (142-179)
  • transformEvent (101-135)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/redis/publisher.ts (1)
  • publishEvent (28-36)
packages/redis/cachable.ts (1)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
🔇 Additional comments (34)
apps/api/src/hooks/ip.hook.ts (1)

2-2: LGTM! Clean removal of unused type imports.

Removing FastifyReply and HookHandlerDoneFunction is appropriate since only FastifyRequest is used in the hook signature. This cleanup improves code maintainability.

apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (1)

2-2: LGTM!

The columnVisibility feature is properly integrated with the appropriate enabled guard to ensure the query executes only after column visibility preferences are loaded from localStorage.

Also applies to: 25-41

packages/logger/index.ts (1)

9-9: AI summary inconsistency: existing behavior is not preserved.

The AI summary claims the change preserves "existing behavior when LOG_SILENT is not set," but this is incorrect. When LOG_SILENT is not set (or is any value other than 'true'), silent will be false. Previously, logs were silenced when NODE_ENV === 'test' regardless of LOG_SILENT, so the existing behavior is not preserved.

packages/db/package.json (1)

16-16: No breaking changes detected between @clickhouse/client v1.2.0 and v1.12.1.

The official changelog confirms no breaking changes across these versions—only improvements, bug fixes, and type updates. Code inspection confirms the codebase uses stable, standard API patterns (query/insert methods, conventional ClickHouse settings) compatible with v1.12.1. The version bump is safe and aligns properly with related code changes.

apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (3)

6-6: LGTM!

Import is correctly added and used within the component.


79-122: LGTM!

The component structure and render logic are clean and correct. The integration of column visibility into the existing component flow is well done.


60-77: The original review comment misunderstands the columnVisibility architecture.

The columnVisibility parameter passed to the query (line 70) is separate from the table's display visibility. The EventsTable component uses its own useDataTableColumnVisibility hook with independent localStorage management (@op:events-column-visibility), which initializes all columns with a default value of true (visible). The empty object {} fallback is harmless—it only affects server-side filtering, not column display. Column visibility is managed independently by the table component, which always defaults to showing all columns unless explicitly hidden via metadata.

The query gating with enabled: columnVisibility !== null is intentional design to ensure data consistency during localStorage initialization, not a UX blocker.

Likely an incorrect or invalid review comment.

packages/db/src/buffers/base-buffer.ts (3)

11-11: LGTM!

The new enableParallelProcessing property, constructor parameter, and initialization logic are well-implemented with sensible defaults for backward compatibility.

Also applies to: 18-18, 25-25


116-116: LGTM!

The comment clearly explains the sequential mode behavior and provides good symmetry with the parallel mode comment.


98-114: ****

The review comment mischaracterizes the thread-safety implementation. Verification shows the concerns are unfounded:

  1. Counter management: Each buffer subclass (EventBuffer, ProfileBuffer, SessionBuffer, BotBuffer) immediately decrements the counter in processBuffer() after successful processing, not after an hourly resync. The hourly resync (implemented in getBufferSizeWithCounter()) handles drift correction, not primary counter updates.

  2. Parallel processing safety: Only EventBuffer sets enableParallelProcessing: true (line 43, event-buffer.ts). The other three buffers default to sequential mode. EventBuffer uses atomic Redis operations (LPOP with count, DECRBY in transactions), making concurrent access safe. The lrange + ltrim pattern in other buffers is only used in sequential mode where a lock prevents concurrent execution.

  3. Resync mechanism: The "hourly resync" mentioned in the code comment (lines 110-111) is real and correctly implemented via runEvery() in getBufferSizeWithCounter(). It guards against counter drift in edge cases without preventing normal counter decrements.

The code design is sound: parallel mode is opt-in (only EventBuffer), uses atomic operations where enabled, and includes drift correction. The original concerns don't apply to the actual implementation.

Likely an incorrect or invalid review comment.

apps/api/src/controllers/profile.controller.ts (2)

20-21: LGTM on payload normalization and UA parsing.

Using payload consistently and parseUserAgent with overrides is fine.

Also applies to: 27-27


64-65: 202 with id response OK.

Returning 202 and profileId is acceptable; ensure callers expect string body.

packages/db/src/services/event.service.ts (2)

898-901: Switch to getProfileById (non-cached) is fine.

Good for fresh reads after event fetch; buffer/cache handles hot paths elsewhere.


1006-1007: getDurationSql helper looks correct.

Window function with partition by session_id is appropriate.

apps/worker/src/metrics.ts (1)

17-37: LGTM!

The new retry and DLQ buffer metrics follow the same pattern as the existing queue metrics and provide valuable observability into the buffer's retry/DLQ behavior.

packages/trpc/src/routers/profile.ts (1)

9-9: LGTM!

The migration from getProfileByIdCached to getProfileById aligns with the broader refactoring effort. The new implementation checks the profile buffer cache internally, maintaining caching behavior while simplifying the API surface.

Also applies to: 22-22

packages/db/src/services/notification.service.ts (1)

16-16: LGTM!

Consistent migration from getProfileByIdCached to getProfileById, matching the pattern applied across other files in this PR.

Also applies to: 267-267

apps/api/src/controllers/live.controller.ts (1)

7-7: LGTM!

Consistent with the profile retrieval refactor applied throughout the PR.

Also applies to: 95-95

packages/db/src/services/profile.service.ts (1)

96-99: LGTM!

The cache-checking logic correctly attempts to fetch from the profile buffer cache before querying ClickHouse, improving performance while maintaining correctness.

packages/db/src/buffers/event-buffer.ts (15)

1-46: LGTM: Clean initialization with parallel processing support.

The constructor properly initializes the Redis client and enables parallel processing via environment flag. Configuration defaults are reasonable.


97-132: LGTM: CSV headers match row generation.

The header order correctly matches the column order in eventToCsvRow.


134-140: LGTM: Efficient bulk operation using Redis multi.

The method correctly batches multiple events into a single Redis transaction.


192-235: LGTM: Clean overload pattern with safe JSON parsing.

The method provides flexible lookups with proper fallback handling. The comment about sessionId lookup not being supported in the simplified version is clear.


237-245: LGTM: Simple key generation helper.


247-274: LGTM: Atomic batch claiming with proper error handling.

The use of lpop with count (Redis 6.2+) ensures atomic batch claiming across multiple workers. The counter decrement at line 266 is separate from the pop, but this is acceptable since the counter is an optimization with fallback to llen.


276-328: LGTM: Robust retry/DLQ routing with atomic counter updates.

The method correctly routes events to retry buffer or DLQ based on retry count, and updates all counters atomically in a single transaction.


330-357: LGTM: Clean buffer processing with proper error recovery.

The method atomically claims batches, processes them, and handles failures by pushing to the retry buffer. This design supports parallel workers safely.


359-423: LGTM: Robust retry processing with proper counter management.

The method correctly handles retry items with atomic operations and safe parsing. Failed retries are appropriately re-queued or moved to DLQ.


482-535: LGTM: Efficient counter-based size tracking with proper fallbacks.

The methods use Redis counters for O(1) size lookups with proper fallback to llen when counters are missing or invalid.


537-553: LGTM: Comprehensive stats with parallel fetching.

The method efficiently fetches all buffer sizes in parallel and returns a clean stats object.


555-575: LGTM: Safe DLQ inspection for debugging.

The method safely parses and returns DLQ items with a reasonable default limit.


577-592: LGTM: Safe DLQ clearing with atomic counter reset.

The method atomically clears both the DLQ list and counter with appropriate warning logging.


594-606: LGTM: Standard active visitor tracking pattern.

The method correctly uses a sorted set with timestamps and heartbeat keys for visitor activity tracking.


608-624: LGTM: Correct active visitor counting with cleanup.

The method properly removes stale visitors before counting, ensuring accurate results.

Comment on lines +43 to 48
...payload,
id: payload.profileId,
isExternal: true,
projectId,
properties: {
...(properties ?? {}),
...(ip ? geo : {}),
...uaInfo,
...(payload.properties ?? {}),
country: geo.country,
city: geo.city,
region: geo.region,
longitude: geo.longitude,
latitude: geo.latitude,
os: uaInfo.os,
os_version: uaInfo.osVersion,
browser: uaInfo.browser,
browser_version: uaInfo.browserVersion,
device: uaInfo.device,
brand: uaInfo.brand,
model: uaInfo.model,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitize profile.properties: drop “__*” keys before upsert.

Same concern as track.controller: don’t persist internal/transient keys from payload.properties.

   properties: {
-    ...(payload.properties ?? {}),
+    ...Object.fromEntries(
+      Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
+    ),
     country: geo.country,
     city: geo.city,
     region: geo.region,
     longitude: geo.longitude,
     latitude: geo.latitude,
     os: uaInfo.os,
     os_version: uaInfo.osVersion,
     browser: uaInfo.browser,
     browser_version: uaInfo.browserVersion,
     device: uaInfo.device,
     brand: uaInfo.brand,
     model: uaInfo.model,
   },
🤖 Prompt for AI Agents
In apps/api/src/controllers/profile.controller.ts around lines 43 to 61, the
controller currently spreads payload.properties directly into the upsert payload
which can persist internal/transient keys; sanitize payload.properties by
removing any keys that start with "__" before building the final properties
object. Concretely: create a sanitizedProperties object from payload.properties
(or empty object) that copies only entries whose keys do not startWith("__"),
then use that sanitizedProperties when merging in geo and ua fields (country,
city, region, longitude, latitude, os, os_version, browser, browser_version,
device, brand, model) for the upsert; ensure the rest of the payload (id,
isExternal, projectId, etc.) remains unchanged and keep types/null-coalescing
behavior.

Comment on lines +326 to 315
country: geo.country,
city: geo.city,
region: geo.region,
longitude: geo.longitude,
latitude: geo.latitude,
os: uaInfo.os,
os_version: uaInfo.osVersion,
browser: uaInfo.browser,
browser_version: uaInfo.browserVersion,
device: uaInfo.device,
brand: uaInfo.brand,
model: uaInfo.model,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid persisting internal (“__”) properties into profile.properties.

You spread payload.properties before UA/geo. This may store transient/private keys like __ip, __timestamp, os, etc. Strip keys starting with "" before persisting.

Apply:

   properties: {
-      ...(payload.properties ?? {}),
+      ...Object.fromEntries(
+        Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
+      ),
       country: geo.country,
       city: geo.city,
       region: geo.region,
       longitude: geo.longitude,
       latitude: geo.latitude,
       os: uaInfo.os,
       os_version: uaInfo.osVersion,
       browser: uaInfo.browser,
       browser_version: uaInfo.browserVersion,
       device: uaInfo.device,
       brand: uaInfo.brand,
       model: uaInfo.model,
   },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
country: geo.country,
city: geo.city,
region: geo.region,
longitude: geo.longitude,
latitude: geo.latitude,
os: uaInfo.os,
os_version: uaInfo.osVersion,
browser: uaInfo.browser,
browser_version: uaInfo.browserVersion,
device: uaInfo.device,
brand: uaInfo.brand,
model: uaInfo.model,
},
properties: {
...Object.fromEntries(
Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
),
country: geo.country,
city: geo.city,
region: geo.region,
longitude: geo.longitude,
latitude: geo.latitude,
os: uaInfo.os,
os_version: uaInfo.osVersion,
browser: uaInfo.browser,
browser_version: uaInfo.browserVersion,
device: uaInfo.device,
brand: uaInfo.brand,
model: uaInfo.model,
},
🤖 Prompt for AI Agents
In apps/api/src/controllers/track.controller.ts around lines 326 to 338, the
code spreads payload.properties before adding UA/geo and may persist internal
keys prefixed with "__"; remove any keys starting with "__" from
payload.properties first (e.g., build a cleanedProperties object by filtering
Object.entries(payload.properties) to exclude keys that begin with "__"), then
merge cleanedProperties with the UA/geo/device fields so only sanitized keys get
persisted; ensure this filtering runs before the spread/merge and handles
undefined/null payload.properties safely.

@@ -8,18 +8,21 @@ export class BaseBuffer {
lockKey: string;
lockTimeout = 60;
onFlush: () => void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type mismatch for onFlush property.

The property is declared as () => void but the constructor expects () => Promise<void> (line 17), and it's awaited in lines 102 and 130.

Apply this diff to fix the type:

-  onFlush: () => void;
+  onFlush: () => Promise<void>;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
onFlush: () => void;
onFlush: () => Promise<void>;
🤖 Prompt for AI Agents
In packages/db/src/buffers/base-buffer.ts around line 10, the onFlush property
is declared as () => void but the constructor and usages expect a () =>
Promise<void>; change the type declaration to () => Promise<void> so it matches
the constructor parameter and awaited calls (also update any related
interface/type annotations or overloads in this file to use Promise<void> to
keep types consistent).

Comment on lines 15 to 19
constructor(options: {
name: string;
onFlush: () => Promise<void>;
enableParallelProcessing?: boolean;
}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add documentation for parallel processing mode.

The new enableParallelProcessing option significantly changes the behavior and guarantees of the buffer. Consider adding JSDoc to the constructor explaining:

  • When to use parallel vs sequential mode
  • Requirements for parallel mode (thread-safe buffer implementation)
  • Trade-offs (counter drift, no coordination, different error handling)

Example:

/**
 * @param options.enableParallelProcessing - Enable parallel processing mode.
 *   When true, multiple workers can process the buffer concurrently without locks.
 *   Requirements:
 *     - The buffer implementation must be thread-safe
 *     - Duplicate processing of items must be acceptable or handled by the implementation
 *   Trade-offs:
 *     - Counter will drift until the hourly resync job runs
 *     - Errors won't trigger counter resets
 *     - Higher throughput but less consistency guarantees
 */
🤖 Prompt for AI Agents
In packages/db/src/buffers/base-buffer.ts around lines 15 to 19, add JSDoc for
the constructor's new enableParallelProcessing option explaining when to use
parallel vs sequential mode, requirements for parallel mode (e.g., buffer
implementation must be thread-safe, callers must accept possible duplicate
processing), and trade-offs (counter drift until hourly resync, errors won't
trigger counter resets, higher throughput but weaker consistency/coordination).
Keep the doc concise, include the @param tag for
options.enableParallelProcessing and list the bullets shown in the review (usage
guidance, requirements, and trade-offs).

export type ILogger = winston.Logger;

const logLevel = process.env.LOG_LEVEL ?? 'info';
const silent = process.env.LOG_SILENT === 'true';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Breaking change: test environments no longer auto-silence logs.

The previous logic silenced logs when NODE_ENV === 'test', but the new implementation only silences when LOG_SILENT === 'true'. This means existing test suites that relied on automatic log silencing will now output logs unless LOG_SILENT=true is explicitly set in the test environment.

Consider maintaining backwards compatibility:

-const silent = process.env.LOG_SILENT === 'true';
+const silent = process.env.LOG_SILENT === 'true' || process.env.NODE_ENV === 'test';

Alternatively, if the breaking change is intentional, ensure all test configurations are updated to set LOG_SILENT=true.

Also applies to: 89-89

🤖 Prompt for AI Agents
In packages/logger/index.ts around line 9, the new single check const silent =
process.env.LOG_SILENT === 'true' removed the previous auto-silencing for test
runs; restore backwards compatibility by making silent true when either
process.env.LOG_SILENT === 'true' OR process.env.NODE_ENV === 'test', or if the
breaking change is intended, update docs/configs and tests to set
LOG_SILENT=true in test environments—apply the chosen approach consistently
(also update the similar occurrence at line 89).

Comment on lines 10 to 19
export async function getCache<T>(
key: string,
expireInSec: number,
fn: () => Promise<T>,
useLruCache?: boolean,
): Promise<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don’t cache undefined/null; guard JSON.parse; handle fire‑and‑forget errors.

  • getCache currently caches undefined/null; JSON.stringify(undefined) → undefined, which can break setex.
  • Wrap parse in try/catch.
  • Add .catch to fire‑and‑forget writes to avoid unhandled rejections.
 export async function getCache<T>(
@@
 ): Promise<T> {
@@
-  const hit = await getRedisCache().get(key);
-  if (hit) {
-    const parsed = JSON.parse(hit, (_, value) => {
+  const hit = await getRedisCache().get(key);
+  if (hit) {
+    let parsed: T;
+    try {
+      parsed = JSON.parse(hit, (_, value) => {
         if (
           typeof value === 'string' &&
           /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.*Z$/.test(value)
         ) {
           return new Date(value);
         }
         return value;
-    });
+      });
+    } catch {
+      // Corrupt cache entry; ignore and recompute
+      parsed = undefined as unknown as T;
+    }
 
-    // Store in LRU cache for next time
-    if (useLruCache) {
+    // Store in LRU cache for next time
+    if (useLruCache && parsed !== undefined && parsed !== null) {
       globalLruCache.set(key, parsed, {
         ttl: expireInSec * 1000, // Use the same TTL as Redis
       });
     }
 
-    return parsed;
+    if (parsed !== undefined && parsed !== null) {
+      return parsed;
+    }
   }
 
-  // Cache miss: Execute function
+  // Cache miss: Execute function
   const data = await fn();
 
-  // Store in both caches
-  if (useLruCache) {
+  // Store in both caches (only if data is cacheable)
+  if (useLruCache && data !== undefined && data !== null) {
     globalLruCache.set(key, data, {
       ttl: expireInSec * 1000,
     });
   }
-  // Fire and forget Redis write for better performance
-  getRedisCache().setex(key, expireInSec, JSON.stringify(data));
+  // Fire and forget Redis write for better performance
+  if (data !== undefined && data !== null) {
+    getRedisCache()
+      .setex(key, expireInSec, JSON.stringify(data))
+      .catch(() => {});
+  }
 
   return data;
 }

Also applies to: 24-45, 47-59

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/geo/src/geo.ts (1)

74-85: Critical: Cache is never populated.

The code checks the cache (lines 65-68) but never adds entries to it. After successfully retrieving geolocation data from the database, the result must be stored in the cache. Without this, the cache remains empty and provides no performance benefit.

Apply this diff to populate the cache after a successful database lookup:

   try {
     const response = await reader?.city(ip);
-    return {
+    const result = {
       city: response?.city?.names.en,
       country: response?.country?.isoCode,
       region: response?.subdivisions?.[0]?.names.en,
       longitude: response?.location?.longitude,
       latitude: response?.location?.latitude,
     };
+    cache.set(ip, result);
+    return result;
   } catch (error) {
     return DEFAULT_GEO;
   }
♻️ Duplicate comments (1)
apps/api/src/controllers/track.controller.ts (1)

328-342: Avoid persisting internal ("__") properties into profile.properties.

You spread payload.properties which may include transient/internal keys like __ip, __timestamp, __os, etc. These internal properties should be stripped before persisting to the profile.

Apply this diff:

   properties: {
-    ...(payload.properties ?? {}),
+    ...Object.fromEntries(
+      Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__'))
+    ),
     country: geo.country,
     city: geo.city,
🧹 Nitpick comments (9)
packages/geo/src/geo.ts (1)

54-58: LGTM! Consider making cache parameters configurable.

The cache configuration is reasonable with 1000 max entries and a 5-minute TTL. However, you might want to make these values configurable via environment variables for production tuning based on actual traffic patterns.

packages/common/server/parser-user-agent.ts (2)

34-37: Consider removing ttlAutopurge for better performance.

The ttlAutopurge option triggers automatic interval-based cleanup which can add overhead in high-throughput scenarios. Since you have a max: 1000 cap, LRU eviction will naturally manage the cache size. The TTL will still be respected on access without the active purging overhead.

Apply this diff:

 const parseCache = new LRUCache<string, UAParser.IResult>({
   ttl: 1000 * 60 * 5,
-  ttlAutopurge: true,
   max: 1000,
 });

87-93: Validate regex match before non-null assertion.

Line 93 uses a non-null assertion (osVersion[1]!) after checking if (osVersion). However, regex matches can return an array where capture groups are undefined. While unlikely here, it's safer to validate.

Apply this diff:

   const osVersion = ua.match(IPAD_OS_VERSION_REGEX);
-  if (osVersion) {
+  if (osVersion?.[1]) {
     const result = {
       ...res,
       os: {
         ...res.os,
-        version: osVersion[1]!.replace('_', '.'),
+        version: osVersion[1].replace('_', '.'),
       },
     };
apps/api/src/bots/index.ts (2)

31-49: Early-exit heuristic can suppress real bot matches; tighten or reorder

Early exit runs before includes/regex checks. With common spoofed tokens (browser + OS), bots will return null without evaluating bot lists.

Safer options (pick one):

  • Preferred: run bot includes/regex checks first; apply early exit only if no bot matched.
  • Or: keep early exit but add a cheap bot‑keyword guard.

Also, precompute lowercased UA once for case‑insensitive includes matching.

 export function isBot(ua: string) {
-  // Ultra-fast early exit: check if this looks like a legitimate browser
+  const uaLC = ua.toLowerCase();
+  // Ultra-fast early exit: check if this looks like a legitimate browser
   // Real browsers typically have Mozilla/5.0 + browser name + OS
   if (ua.includes('Mozilla/5.0')) {
     // Check for browser signature
     const hasBrowser = legitimateBrowserPatterns.some((pattern) =>
       ua.includes(pattern),
     );

     // Check for OS signature (mobile or desktop)
     const hasOS =
       mobilePatterns.some((pattern) => ua.includes(pattern)) ||
       desktopOSPatterns.some((pattern) => ua.includes(pattern));

-    // If it has Mozilla/5.0, a known browser, and an OS, it's very likely legitimate
-    if (hasBrowser && hasOS) {
+    // Optional: skip early exit if UA contains common bot keywords
+    const likelyBot = /bot|crawl|spider|httpclient|curl|wget|axios|okhttp|python|java|go-http|libwww|apache-http/i.test(ua);
+    // If it looks like a browser and not likely a bot, treat as legitimate
+    if (hasBrowser && hasOS && !likelyBot) {
       return null;
     }
   }

If you choose to reorder instead, move this whole block below the regex pass.


51-59: Make substring checks case‑insensitive

Device‑detector patterns are effectively case‑insensitive. Using strict includes will miss many bots.

-  for (const bot of includesBots) {
-    if (ua.includes(bot.includes)) {
+  for (const bot of includesBots) {
+    if (uaLC.includes(bot.includes.toLowerCase())) {
       return {
         name: bot.name,
         type: 'category' in bot ? bot.category : 'Unknown',
       };
     }
   }
apps/api/scripts/get-bots.ts (4)

13-25: Case handling for “includes” conversions

Converting to includes without preserving case‑insensitive semantics will cause misses downstream unless the consumer lowercases. Either emit lowercase patterns here or add metadata.

Two options:

  • Lowercase here (simple).
  • Or add ci: true and let the consumer honor it.
-    // Convert to includes
-    return { includes: regex, ...rest };
+    // Convert to includes; preserve case-insensitive semantics
+    return { includes: String(regex).toLowerCase(), ...rest };

If you adopt this, ensure index.ts uses ua.toLowerCase() (as suggested).


30-33: Pin the upstream source for reproducible builds

Fetching from master HEAD is non‑deterministic. Pin to a commit SHA or tagged release to avoid surprise diffs and CI flakiness.

-    const data = await fetch(
-      'https://raw.githubusercontent.com/matomo-org/device-detector/master/regexes/bots.yml',
-    ).then((res) => res.text());
+    const COMMIT = process.env.DD_COMMIT ?? '<<pin-commit-sha-here>>';
+    const url = `https://raw.githubusercontent.com/matomo-org/device-detector/${COMMIT}/regexes/bots.yml`;
+    const data = await fetch(url).then((res) => res.text());

34-36: Validate input shape and fail fast

yaml.load(data) as any[] assumes a flat array. Device‑Detector YAML can change shape. Add a guard and helpful error.

-    const parsedData = yaml.load(data) as any[];
-    const transformedBots = transformBots(parsedData);
+    const parsedData = yaml.load(data);
+    if (!Array.isArray(parsedData)) {
+      throw new Error('Unexpected bots.yml shape; expected an array of entries');
+    }
+    const transformedBots = transformBots(parsedData as any[]);

10-12: Regex classification heuristic is coarse; consider a safety pass

The special‑char heuristic is fine for splitting includes vs regex, but it doesn’t assess regex safety. Add a build‑time safety check to flag catastrophic patterns before they reach runtime.

Option (build-time):

// import safe from 'safe-regex2'
// if (hasRegexChars && !safe(regex)) { /* log/skip/mark */ }

I can wire this into transformBots if you want.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01837a9 and 8c39440.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (10)
  • apps/api/scripts/get-bots.ts (1 hunks)
  • apps/api/src/bots/index.ts (1 hunks)
  • apps/api/src/controllers/event.controller.ts (1 hunks)
  • apps/api/src/controllers/track.controller.ts (7 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (2 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/server/parser-user-agent.ts (7 hunks)
  • packages/geo/package.json (1 hunks)
  • packages/geo/src/geo.ts (2 hunks)
  • packages/queue/src/queues.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/geo/package.json
🧰 Additional context used
🧬 Code graph analysis (3)
apps/worker/src/jobs/events.incoming-event.ts (1)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (107-129)
apps/api/src/controllers/track.controller.ts (2)
packages/sdks/sdk/src/index.ts (1)
  • IdentifyPayload (36-43)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/common/server/parser-user-agent.ts (1)
packages/sdks/web/src/index.ts (1)
  • isServer (59-61)
🪛 ast-grep (0.39.6)
apps/api/src/bots/index.ts

[warning] 7-7: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(bot.regex)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (15)
packages/geo/src/geo.ts (2)

10-10: LGTM!

The import statement is correct and the library version (11.2.2) is current.


65-68: LGTM!

The cache lookup logic correctly implements the cache-aside pattern with an early return on cache hit.

packages/common/package.json (2)

13-13: ✅ Dependency addition is sound.

lru-cache v11.2.2 is the latest version and has no dependencies. The caret constraint is appropriate for semantic versioning, and alphabetical placement is correct. This dependency aligns with the PR's performance improvements goal of adding LRU caching capabilities.


13-13: No issues found. lru-cache v11.2.2 is the latest published version, and no direct security advisories are reported for this version in major vulnerability databases. The dependency is valid and secure to use.

packages/common/server/parser-user-agent.ts (2)

131-145: LGTM!

The server detection logic correctly identifies both simple bot user agents (via regex) and cases where all identifying fields are missing.


147-187: LGTM!

The device detection logic is well-structured with early returns for specific manufacturers and comprehensive fallback patterns. The precompiled regexes improve performance over inline regex construction.

apps/api/src/controllers/event.controller.ts (1)

63-79: LGTM!

Computing uaInfo upfront and including it in the queue payload is a good optimization. This moves the parsing work to the API layer and reduces redundant parsing in workers.

apps/worker/src/jobs/events.incoming-event.ts (2)

64-64: LGTM!

Extracting uaInfo from the payload as _uaInfo is appropriate given the fallback logic below.


95-96: API controllers always compute and provide uaInfo to the queue.

Both event.controller.ts:63 and track.controller.ts:286 parse user-agent data via parseUserAgent() and pass the result directly to eventsGroupQueue.add(). Since the queue payload always includes pre-computed uaInfo, the fallback in the worker (line 96) would only trigger if an event somehow bypassed the API layer—which is not possible in the current architecture.

The defensive fallback can be safely removed once the transition is complete and no legacy code paths remain.

packages/queue/src/queues.ts (2)

20-20: LGTM!

Widening the timestamp type to string | number maintains backwards compatibility while supporting the new numeric timestamp format used in track.controller.ts.


23-43: LGTM!

The discriminated union for uaInfo is well-designed with readonly properties and correct type narrowing via the isServer flag. This aligns perfectly with the parsedServerUa constant and the return type of parseUserAgent.

apps/api/src/controllers/track.controller.ts (4)

40-54: LGTM!

The safe property access pattern using 'properties' in body.payload prevents runtime errors when properties is not present.


56-84: LGTM!

The refactored timestamp logic correctly handles numeric timestamps and validates against future timestamps with proper NaN checking.


93-97: LGTM!

Allowing __ip override in properties is useful for server-side event tracking while maintaining fallback to standard IP extraction.


267-309: LGTM!

The numeric timestamp handling and inclusion of uaInfo in the queue payload are consistent with the type changes in packages/queue/src/queues.ts.

Comment on lines +5 to +11
if ('regex' in bot) {
return {
...bot,
compiledRegex: new RegExp(bot.regex),
};
}
return bot;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Case sensitivity + ReDoS risk in dynamic regex compilation

  • Compiling third‑party patterns without flags will miss many matches; Matomo patterns are typically treated case‑insensitively.
  • Variable-driven regex can enable catastrophic backtracking.

Minimal fix: compile with the i flag.

Optional hardening: validate patterns at build time (safe-regex2/recheck) or use RE2.

Apply:

-      compiledRegex: new RegExp(bot.regex),
+      compiledRegex: new RegExp(bot.regex, 'i'),

Optionally (separate change):

// import RE2 from 're2'
// compiledRegex: new RE2(bot.regex, 'i')

And/or validate during generation.

🧰 Tools
🪛 ast-grep (0.39.6)

[warning] 7-7: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(bot.regex)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
In apps/api/src/bots/index.ts around lines 5 to 11, the code compiles
third‑party regex patterns without flags which misses case‑insensitive matches
and leaves you vulnerable to ReDoS; update the compilation to use the
case‑insensitive flag (e.g. add the 'i' flag when constructing the RegExp) and,
for stronger hardening, validate or sanitize incoming patterns at
build/generation time (use a safe‑regex checker like safe-regex2/recheck) or
switch to a linear‑time engine such as RE2 for untrusted patterns.

Comment on lines +20 to +25
'Mozilla/5.0', // Nearly all modern browsers
'Chrome/', // Chrome/Chromium browsers
'Safari/', // Safari and Chrome-based browsers
'Firefox/', // Firefox
'Edg/', // Edge
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid false negatives: treat real browser “signature” correctly

Including 'Mozilla/5.0' in the browser signatures makes hasBrowser trivially true whenever the outer ua.includes('Mozilla/5.0') matches, letting many bot UAs that mimic browsers slip past the early exit.

Remove 'Mozilla/5.0' from this list so the check requires a real product token like Chrome/Safari/Firefox/Edge.

-const legitimateBrowserPatterns = [
-  'Mozilla/5.0', // Nearly all modern browsers
-  'Chrome/', // Chrome/Chromium browsers
-  'Safari/', // Safari and Chrome-based browsers
-  'Firefox/', // Firefox
-  'Edg/', // Edge
-];
+const legitimateBrowserPatterns = [
+  'Chrome/',   // Chrome/Chromium
+  'Safari/',   // Safari and Chrome-based
+  'Firefox/',  // Firefox
+  'Edg/',      // Edge
+];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'Mozilla/5.0', // Nearly all modern browsers
'Chrome/', // Chrome/Chromium browsers
'Safari/', // Safari and Chrome-based browsers
'Firefox/', // Firefox
'Edg/', // Edge
];
const legitimateBrowserPatterns = [
'Chrome/', // Chrome/Chromium
'Safari/', // Safari and Chrome-based
'Firefox/', // Firefox
'Edg/', // Edge
];
🤖 Prompt for AI Agents
In apps/api/src/bots/index.ts around lines 20 to 25, the browser signatures
array includes 'Mozilla/5.0' which causes hasBrowser to be true for any UA
containing that token and yields false negatives for bots; remove 'Mozilla/5.0'
from the list so the detection relies only on real product tokens (e.g.,
'Chrome/', 'Safari/', 'Firefox/', 'Edg/'), and ensure the hasBrowser check
continues to use ua.includes(...) against the updated array.

Comment on lines +61 to 69
// Check regex patterns (slower)
for (const bot of regexBots) {
if (bot.compiledRegex.test(ua)) {
return {
name: bot.name,
type: 'category' in bot ? bot.category : 'Unknown',
};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Ensure regex checks are case‑insensitive (align with compile-time fix)

Assuming the earlier change to add the i flag, this block is fine. If not adopting that change, many patterns will fail to match.

Run to confirm no case‑sensitive fallthroughs remain:


🏁 Script executed:

#!/bin/bash
rg -n "new RegExp\\([^,]+\\)$" apps/api/src/bots/index.ts -C1

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Examine the target file to understand structure and regex compilation
cat apps/api/src/bots/index.ts

Length of output: 2064


Add case‑insensitive flag to regex compilation

The i flag for case-insensitive matching is missing. At line 5, change new RegExp(bot.regex) to new RegExp(bot.regex, 'i') to ensure all regex patterns match regardless of case. Without this, many bot detection patterns will fail to match variations in casing.

🤖 Prompt for AI Agents
In apps/api/src/bots/index.ts around lines 61 to 69, the regex compilation for
bot detection is missing the case-insensitive flag; update the RegExp
construction so patterns match regardless of case by creating the regex with the
'i' flag (i.e., use new RegExp(bot.regex, 'i') when compiling bot.compiledRegex)
so all bot regexes will match UA strings case-insensitively.

Comment on lines +119 to +126
os: (overrides?.__os || res.os.name) as string,
osVersion: (overrides?.__osVersion || res.os.version) as string,
browser: (overrides?.__browser || res.browser.name) as string,
browserVersion: (overrides?.__browserVersion ||
res.browser.version) as string,
device: (overrides?.__device || res.device.type || getDevice(ua)) as string,
brand: (overrides?.__brand || res.device.vendor) as string,
model: (overrides?.__model || res.device.model) as string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type assertions hide potential undefined values.

Lines 119-126 cast potentially undefined values (e.g., res.os.name, res.os.version) to string using as string. UAParser can return undefined for these fields, and the type assertions mask this reality. Downstream code may not handle empty strings or undefined correctly.

Consider either:

  1. Using ?? '' to explicitly convert undefined to empty string
  2. Updating the return type to reflect that these fields can be undefined
  3. Documenting that these fields default to empty string when undefined

Example:

   return {
-    os: (overrides?.__os || res.os.name) as string,
-    osVersion: (overrides?.__osVersion || res.os.version) as string,
+    os: (overrides?.__os || res.os.name || '') as string,
+    osVersion: (overrides?.__osVersion || res.os.version || '') as string,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
os: (overrides?.__os || res.os.name) as string,
osVersion: (overrides?.__osVersion || res.os.version) as string,
browser: (overrides?.__browser || res.browser.name) as string,
browserVersion: (overrides?.__browserVersion ||
res.browser.version) as string,
device: (overrides?.__device || res.device.type || getDevice(ua)) as string,
brand: (overrides?.__brand || res.device.vendor) as string,
model: (overrides?.__model || res.device.model) as string,
os: (overrides?.__os || res.os.name || '') as string,
osVersion: (overrides?.__osVersion || res.os.version || '') as string,
browser: (overrides?.__browser || res.browser.name) as string,
browserVersion: (overrides?.__browserVersion ||
res.browser.version) as string,
device: (overrides?.__device || res.device.type || getDevice(ua)) as string,
brand: (overrides?.__brand || res.device.vendor) as string,
model: (overrides?.__model || res.device.model) as string,
🤖 Prompt for AI Agents
In packages/common/server/parser-user-agent.ts around lines 119 to 126, the code
uses `as string` to force potentially undefined UAParser fields to string;
replace these unsafe type assertions by normalizing undefined to an explicit
empty string (e.g., use `overrides?.__os ?? res.os.name ?? ''` pattern for each
field: os, osVersion, browser, browserVersion, device, brand, model) so values
are never undefined, or alternatively update the function's return type to allow
undefined — prefer the nullish-coalescing to '' approach for minimal breaking
changes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/api/src/controllers/track.controller.ts (2)

41-54: Harden identity extraction: validate shape, ensure profileId, avoid blind cast.

Current code trusts properties.__identify and casts to IdentifyPayload even if malformed. It can trigger identify() without a profileId. Guard the shape, fallback to payload profileId, and only return an identity when an id exists.

Apply:

-function getIdentity(body: TrackHandlerPayload): IdentifyPayload | undefined {
-  const identity =
-    'properties' in body.payload
-      ? (body.payload?.properties?.__identify as IdentifyPayload | undefined)
-      : undefined;
-
-  return (
-    identity ||
-    (body?.payload?.profileId
-      ? {
-          profileId: body.payload.profileId,
-        }
-      : undefined)
-  );
-}
+function getIdentity(body: TrackHandlerPayload): IdentifyPayload | undefined {
+  // Pull optional __identify object from properties
+  const raw =
+    'properties' in body.payload
+      ? (body.payload?.properties?.__identify as unknown)
+      : undefined;
+
+  // Safe coerce: require object-like, then ensure we have a profileId
+  const obj = raw && typeof raw === 'object' ? (raw as Record<string, unknown>) : undefined;
+  const profileId =
+    (obj?.profileId as string | undefined) ?? (body.payload as any)?.profileId;
+  if (!profileId) return undefined;
+
+  // Build sanitized identity with guaranteed profileId string
+  return {
+    ...(obj ?? {}),
+    profileId: String(profileId),
+  } as IdentifyPayload;
+}

167-178: Gate identify calls on a guaranteed profileId.

Currently identity && Object.keys(identity).length > 1 can pass even when profileId is missing (since getIdentity casts). Ensure we never call identify without an id.

Apply:

-      if (identity && Object.keys(identity).length > 1) {
+      const hasMoreThanId = identity
+        ? Object.keys(identity).some((k) => k !== 'profileId')
+        : false;
+      if (identity?.profileId && hasMoreThanId) {
         promises.push(
           identify({
             payload: identity,
             projectId,
             geo,
             ua,
           }),
         );
       }
♻️ Duplicate comments (4)
apps/api/src/bots/index.ts (2)

3-12: Add case-insensitive flag to regex compilation (previously flagged)

The regex compilation at line 8 is still missing the i flag for case-insensitive matching. This was flagged in a previous review but remains unaddressed. Without this flag, bot detection patterns will fail to match User-Agent strings with different casing variations.

Apply this diff to fix the issue:

-      compiledRegex: new RegExp(bot.regex),
+      compiledRegex: new RegExp(bot.regex, 'i'),

17-30: Remove 'Mozilla/5.0' from legitimateBrowserPatterns (previously flagged)

Including 'Mozilla/5.0' in the browser signatures array was flagged in a previous review but remains unaddressed. This causes hasBrowser to be trivially true whenever the outer condition at line 34 matches, allowing bots that mimic 'Mozilla/5.0' to bypass detection.

The early-exit optimization is currently ineffective because when ua.includes('Mozilla/5.0') is true (line 34), the check at lines 36-38 will always find 'Mozilla/5.0' in the array, making hasBrowser always true.

Apply this diff to fix the false negative issue:

 const legitimateBrowserPatterns = [
-  'Mozilla/5.0', // Nearly all modern browsers
   'Chrome/', // Chrome/Chromium browsers
   'Safari/', // Safari and Chrome-based browsers
   'Firefox/', // Firefox
   'Edg/', // Edge
 ];
packages/common/server/parser-user-agent.ts (1)

119-126: Type assertions hide potential undefined values.

UAParser fields (res.os.name, res.os.version, etc.) can return undefined, but the type assertions mask this. Consider using nullish coalescing (?? '') to explicitly handle undefined values as suggested in the previous review.

apps/api/src/controllers/track.controller.ts (1)

329-342: Avoid persisting internal (“__”) properties into profile.properties.

You still spread payload.properties directly; this may store private keys (__ip, __timestamp, __identify, etc.). Strip keys starting with "__" first.

Apply:

-      ...(payload.properties ?? {}),
+      ...Object.fromEntries(
+        Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
+      ),
       country: geo.country,
       city: geo.city,
       region: geo.region,
       longitude: geo.longitude,
       latitude: geo.latitude,
       os: uaInfo.os,
       os_version: uaInfo.osVersion,
       browser: uaInfo.browser,
       browser_version: uaInfo.browserVersion,
       device: uaInfo.device,
       brand: uaInfo.brand,
       model: uaInfo.model,
🧹 Nitpick comments (3)
packages/common/server/parser-user-agent.ts (1)

52-104: Cache implementation is correct.

The caching logic properly checks cache on entry and stores results on all code paths. However, note that cached objects are returned by reference. If the parse function were ever exported or if callers mutated the returned result, the cache could be corrupted. Currently safe since parse is internal and parseUserAgent only reads from results.

Consider defensive deep-cloning if parse might be exported in the future:

const cached = parseCache.get(ua);
if (cached) {
  return structuredClone(cached);
}
apps/api/src/controllers/track.controller.ts (2)

283-303: Numeric timestamp path looks good; consider stripping internal keys before enqueue.

The change to timestamp: number and orderMs: timestamp is correct. As a follow-up, consider removing internal control keys from payload.properties before enqueueing (e.g., __timestamp, __ip, __identify) to reduce downstream ambiguity; you already lift timestamp out.

I can draft a small helper like sanitizeInternalProps(props) and apply it here if useful.


287-292: Server events without profileId use a random groupId, breaking ordering/coalescing.

uaInfo.isServer && !payload.profileId yields a new generateId() per event, defeating grouping semantics. Prefer a stable fallback (e.g., hash of projectId+headers.request-id, or projectId+ip when available).

If you want, I can wire groupId = ${projectId}:server:${ip ?? 'unknown'}`` or a deterministic hash helper.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c39440 and d67356c.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (10)
  • apps/api/scripts/get-bots.ts (1 hunks)
  • apps/api/src/bots/index.ts (1 hunks)
  • apps/api/src/controllers/event.controller.ts (1 hunks)
  • apps/api/src/controllers/track.controller.ts (7 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (2 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/server/parser-user-agent.ts (7 hunks)
  • packages/geo/package.json (1 hunks)
  • packages/geo/src/geo.ts (2 hunks)
  • packages/queue/src/queues.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/common/package.json
  • packages/geo/package.json
  • apps/api/scripts/get-bots.ts
  • packages/geo/src/geo.ts
  • apps/api/src/controllers/event.controller.ts
  • apps/worker/src/jobs/events.incoming-event.ts
  • packages/queue/src/queues.ts
🧰 Additional context used
🧬 Code graph analysis (2)
packages/common/server/parser-user-agent.ts (1)
packages/sdks/web/src/index.ts (1)
  • isServer (59-61)
apps/api/src/controllers/track.controller.ts (2)
packages/sdks/sdk/src/index.ts (1)
  • IdentifyPayload (36-43)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
🪛 ast-grep (0.39.6)
apps/api/src/bots/index.ts

[warning] 7-7: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(bot.regex)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-and-test
🔇 Additional comments (4)
packages/common/server/parser-user-agent.ts (4)

33-38: Good caching strategy.

The LRU cache configuration with 1000 max entries, 5-minute TTL, and autopurge is well-suited for UA parsing. These settings should handle typical traffic patterns effectively while preventing unbounded memory growth.


15-31: Excellent performance improvement with pre-compiled regexes.

Moving regex patterns to constants eliminates repeated compilation overhead and improves readability. The patterns follow standard mobile/tablet detection conventions.


131-145: LGTM!

The refactor to use SINGLE_NAME_VERSION_REGEX improves performance while maintaining the same server detection logic.


147-187: Excellent device detection improvements.

The enhanced logic now correctly handles Samsung mobile/tablet and LG mobile patterns, and the tablet detection at lines 178-180 properly excludes Android phones by checking for the "mobile" keyword and Samsung/LG mobile patterns. This reduces misclassifications.

Comment on lines +94 to 101
'properties' in request.body.payload &&
request.body.payload.properties?.__ip
? (request.body.payload.properties.__ip as string)
: getClientIp(request)!;
const ua = request.headers['user-agent']!;
const projectId = request.client?.projectId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t trust client-supplied __ip; validate and add safe fallbacks. Also remove non-null assertions.

Allowing arbitrary properties.__ip enables IP spoofing (affects geo, deviceId, dedup). Validate format, prefer server-derived IPs, and avoid ! on getClientIp.

Apply:

+import { isIP } from 'node:net';
@@
-  const ip =
-    'properties' in request.body.payload &&
-    request.body.payload.properties?.__ip
-      ? (request.body.payload.properties.__ip as string)
-      : getClientIp(request)!;
-  const ua = request.headers['user-agent']!;
+  const ipCandidate =
+    'properties' in request.body.payload
+      ? (request.body.payload.properties as any)?.__ip
+      : undefined;
+  // Only accept a syntactically valid IP from client; otherwise use server-derived IPs
+  const ip =
+    (typeof ipCandidate === 'string' && isIP(ipCandidate) ? ipCandidate : undefined) ??
+    getClientIp(request) ??
+    request.ip;
+  const ua =
+    typeof request.headers['user-agent'] === 'string'
+      ? request.headers['user-agent']
+      : undefined;

Optional hardening (if you can detect trusted server tokens): only honor __ip for trusted server-side credentials. I can propose a guarded variant if you expose a flag on request.client.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'properties' in request.body.payload &&
request.body.payload.properties?.__ip
? (request.body.payload.properties.__ip as string)
: getClientIp(request)!;
const ua = request.headers['user-agent']!;
const projectId = request.client?.projectId;
import { isIP } from 'node:net';
// ... (existing code)
const ipCandidate =
'properties' in request.body.payload
? (request.body.payload.properties as any)?.__ip
: undefined;
// Only accept a syntactically valid IP from client; otherwise use server-derived IPs
const ip =
(typeof ipCandidate === 'string' && isIP(ipCandidate) ? ipCandidate : undefined) ??
getClientIp(request) ??
request.ip;
const ua =
typeof request.headers['user-agent'] === 'string'
? request.headers['user-agent']
: undefined;
const projectId = request.client?.projectId;
🤖 Prompt for AI Agents
In apps/api/src/controllers/track.controller.ts around lines 94–99, do not trust
client-supplied properties.__ip or use non-null assertions; instead obtain the
server-derived IP via getClientIp(request) as the default, validate any
properties.__ip (use a reliable IP validator such as net.isIP) and only accept
it when it is well-formed and the caller is a trusted server-side client (e.g.,
check a flag on request.client like request.client?.isServer or a trusted
token), otherwise ignore it; remove all trailing "!" non-null assertions on
getClientIp and user-agent (handle undefined safely), and ensure projectId is
read defensively from request.client?.projectId.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
apps/api/src/controllers/track.controller.ts (3)

55-83: Timestamp coercion issue remains unaddressed.

The previous review comment about making timestamps strictly numeric and robustly coerced is still applicable. The timestamp parameter may be a string, but line 59 doesn't coerce it to a number before use, which can cause type inconsistencies in comparisons and return values.


92-97: IP validation and non-null assertion issues remain unaddressed.

The previous review comments about validating client-supplied __ip and removing non-null assertions are still applicable:

  • Client-supplied __ip is accepted without validation, enabling IP spoofing
  • Non-null assertions on getClientIp(request)! and request.headers['user-agent']! can cause runtime errors

These should be addressed to improve security and runtime safety.


276-290: Internal property filtering issue remains unaddressed.

The previous review comment about filtering internal properties prefixed with "__" from payload.properties before persisting is still applicable. Properties like __ip, __timestamp, and __identify should not be stored in the profile.

🧹 Nitpick comments (1)
apps/api/src/hooks/duplicate.hook.ts (1)

18-20: Consider using a more semantically appropriate status code.

Returning HTTP 200 for duplicate events is misleading, as it indicates success. Consider using:

  • 409 Conflict - standard for duplicate resource conflicts
  • 208 Already Reported - WebDAV standard for duplicate submissions
  • 202 Accepted - if you want to acknowledge receipt without reprocessing

Apply this diff to use 208 status code:

   if (isDuplicate) {
-    return reply.status(200).send('Duplicate event');
+    return reply.status(208).send('Duplicate event');
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d67356c and 59ba06d.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (11)
  • apps/api/package.json (2 hunks)
  • apps/api/src/controllers/event.controller.ts (2 hunks)
  • apps/api/src/controllers/profile.controller.ts (1 hunks)
  • apps/api/src/controllers/track.controller.ts (11 hunks)
  • apps/api/src/hooks/duplicate.hook.ts (1 hunks)
  • apps/api/src/hooks/fix.hook.ts (0 hunks)
  • apps/api/src/index.ts (0 hunks)
  • apps/api/src/routes/event.router.ts (1 hunks)
  • apps/api/src/routes/track.router.ts (1 hunks)
  • apps/api/src/utils/deduplicate.ts (1 hunks)
  • apps/worker/src/boot-workers.ts (2 hunks)
💤 Files with no reviewable changes (2)
  • apps/api/src/hooks/fix.hook.ts
  • apps/api/src/index.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/api/src/controllers/profile.controller.ts
  • apps/api/src/controllers/event.controller.ts
🧰 Additional context used
🧬 Code graph analysis (6)
apps/api/src/routes/event.router.ts (1)
apps/api/src/hooks/duplicate.hook.ts (1)
  • duplicateHook (5-21)
apps/api/src/utils/deduplicate.ts (1)
packages/redis/redis.ts (1)
  • getLock (122-125)
apps/api/src/hooks/duplicate.hook.ts (3)
packages/sdks/sdk/index.ts (1)
  • PostEventPayload (10-15)
packages/sdks/sdk/src/index.ts (1)
  • TrackHandlerPayload (3-23)
apps/api/src/utils/deduplicate.ts (1)
  • isDuplicatedEvent (4-34)
apps/api/src/routes/track.router.ts (1)
apps/api/src/hooks/duplicate.hook.ts (1)
  • duplicateHook (5-21)
apps/api/src/controllers/track.controller.ts (3)
packages/sdks/sdk/src/index.ts (4)
  • IdentifyPayload (36-43)
  • track (131-143)
  • increment (172-177)
  • decrement (179-184)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/queue/src/queues.ts (1)
  • eventsGroupQueue (129-142)
apps/worker/src/boot-workers.ts (1)
packages/redis/redis.ts (1)
  • getLock (122-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (13)
apps/worker/src/boot-workers.ts (1)

38-38: Verify that the 10-second lock timeout is appropriate for event processing duration.

The lock timeout is hardcoded to 10,000ms. If event processing can exceed this duration, the lock will expire and another worker may acquire it, potentially leading to duplicate processing even after the critical issue above is fixed.

Consider making this configurable or verify that typical event processing completes well within 10 seconds. You can add observability to track event processing duration:

const processingStart = performance.now();
await incomingEventPure(job.data);
const duration = performance.now() - processingStart;
logger.info('event processed', {
  jobId: job.id,
  duration,
});
apps/api/src/utils/deduplicate.ts (1)

4-34: LGTM! Enhanced deduplication with IP and origin.

The addition of ip and origin to the deduplication hash correctly scopes duplicate detection per client source, preventing false positives when the same event is legitimately sent from different locations or origins.

apps/api/src/routes/track.router.ts (1)

5-11: LGTM! Proper hook registration ordering.

The duplicateHook is correctly registered in the preValidation stage, ensuring duplicate detection occurs before request processing, while maintaining the existing preHandler hooks for client and bot checks.

apps/api/src/routes/event.router.ts (1)

5-11: LGTM! Consistent hook integration.

The duplicate detection hook is properly integrated with the same pattern as the track router, ensuring consistent behavior across event ingestion endpoints.

apps/api/package.json (1)

16-41: No compatibility issues found; upgrade is safe to proceed.

The Fastify 5.2.1 → 5.6.1 upgrade spans four minor versions with primarily stabilization and improvements. Release notes review identified two notable changes: (1) "fix: do not freeze request.routeOptions" (5.4.0) and (2) "feat: move router options to own key" (5.5.0). Codebase verification confirms neither impacts this application—routeOptions is unused and Fastify router configuration is not present (the codebase uses tRPC routing integration). HTTP/2 session handling fixes in 5.6.1 are also not applicable. The fastify.listen() call safely accepts the new abort signal support introduced in 5.5.0. The remaining changes across these versions are backward-compatible bug fixes, TypeScript type enhancements, and documentation updates.

apps/api/src/controllers/track.controller.ts (8)

3-3: LGTM: Import cleanup.

The removal of unused path import is appropriate.


39-43: LGTM: Safer property access.

The defensive check for 'properties' in body.payload before accessing __identify prevents potential undefined access errors.


100-106: LGTM: Defensive check for projectId.

The early return when projectId is missing prevents downstream errors and provides a clear error response.


227-227: LGTM: Timestamp type changed to number.

The signature change from string to number improves type safety and aligns with the performance improvements in this PR. However, this depends on getTimestamp being fixed to reliably return numeric values.


236-238: LGTM: JobId generation for queue deduplication.

The jobId generation provides a reasonable identifier for queue jobs by combining event name, timestamp, projectId, deviceId, and groupId.

Also applies to: 255-255


249-249: LGTM: Explicit uaInfo field.

Passing uaInfo as a separate field in the queue payload improves clarity and aligns with the event restructuring mentioned in the AI summary.


179-183: LGTM: Explicit returns for error cases.

The explicit return statements for unsupported operations improve control flow clarity and prevent unintended fall-through.

Also applies to: 200-204


137-165: Review comment is incorrect - parseUserAgent already safely handles undefined.

The concern is invalid. The parseUserAgent function accepts ua?: string | null, and line 111 guards against falsy values with if (!ua) return parsedServerUa;. When called at line 230 with headers['user-agent'] (which can be undefined per the headers type), the function safely returns a default value. No runtime failure occurs.

Likely an incorrect or invalid review comment.

Comment on lines +11 to +16
const isDuplicate = await isDuplicatedEvent({
ip: req.clientIp ?? '',
origin: req.headers.origin ?? '',
payload: req.body,
projectId: (req.headers['openpanel-client-id'] as string) || '',
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty string fallbacks may cause false-positive duplicates.

When clientIp or origin are missing, falling back to empty strings means all requests without these values will share the same deduplication key, incorrectly flagging legitimate requests as duplicates.

Consider either:

  1. Skipping deduplication when required fields are missing
  2. Using unique fallback values (e.g., a UUID or sentinel value per request)
  3. Treating missing values as a validation error

Apply this diff to skip deduplication when critical fields are missing:

 export async function duplicateHook(
   req: FastifyRequest<{
     Body: PostEventPayload | TrackHandlerPayload;
   }>,
   reply: FastifyReply,
 ) {
+  // Skip deduplication if required fields are missing
+  if (!req.clientIp || !req.headers.origin || !req.headers['openpanel-client-id']) {
+    return;
+  }
+
   const isDuplicate = await isDuplicatedEvent({
-    ip: req.clientIp ?? '',
-    origin: req.headers.origin ?? '',
+    ip: req.clientIp,
+    origin: req.headers.origin,
     payload: req.body,
-    projectId: (req.headers['openpanel-client-id'] as string) || '',
+    projectId: req.headers['openpanel-client-id'] as string,
   });
 
   if (isDuplicate) {
     return reply.status(200).send('Duplicate event');
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const isDuplicate = await isDuplicatedEvent({
ip: req.clientIp ?? '',
origin: req.headers.origin ?? '',
payload: req.body,
projectId: (req.headers['openpanel-client-id'] as string) || '',
});
export async function duplicateHook(
req: FastifyRequest<{
Body: PostEventPayload | TrackHandlerPayload;
}>,
reply: FastifyReply,
) {
// Skip deduplication if required fields are missing
if (!req.clientIp || !req.headers.origin || !req.headers['openpanel-client-id']) {
return;
}
const isDuplicate = await isDuplicatedEvent({
ip: req.clientIp,
origin: req.headers.origin,
payload: req.body,
projectId: req.headers['openpanel-client-id'] as string,
});
if (isDuplicate) {
return reply.status(200).send('Duplicate event');
}
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (6)
packages/db/src/buffers/event-buffer.ts (3)

52-95: CSV builder breaks JSON and NULL semantics; fix quoting/escaping.

This issue was previously flagged but remains unresolved:

  • Replacing all " with ' on line 57 corrupts JSON objects in the properties field
  • NULLs encoded as '' (line 54) instead of \N (ClickHouse's default CSV null marker) cause parse/default issues

Apply this diff to fix the escaping logic:

-  private eventToCsvRow(event: IClickhouseEvent): string {
-    const escapeCsvValue = (value: any): string => {
-      if (value === null || value === undefined) return '';
-      const str = String(value);
-      // Replace double quotes with single quotes, then escape single quotes by doubling them
-      const withSingleQuotes = str.replace(/"/g, "'");
-      return `'${withSingleQuotes.replace(/'/g, "''")}'`;
-    };
-
-    // Order matches the ClickHouse table schema exactly
-    const columns = [
+  private eventToCsvRow(event: IClickhouseEvent): string {
+    // RFC4180-style escaping with ClickHouse defaults:
+    // - Nulls as \N (unquoted)
+    // - Double quotes escaped by doubling them
+    const escapeCsvField = (value: any): string => {
+      if (value === null || value === undefined) return '\\N';
+      const s = String(value);
+      const needsQuote = /[",\n\r]/.test(s);
+      const escaped = s.replace(/"/g, '""');
+      return needsQuote ? `"${escaped}"` : escaped;
+    };
+
+    // Order must match the ClickHouse table schema
+    const columns = [
       event.id,
       event.name,
       event.sdk_name,
       event.sdk_version,
       event.device_id,
       event.profile_id,
       event.project_id,
       event.session_id,
       event.path,
       event.origin,
       event.referrer,
       event.referrer_name,
       event.referrer_type,
       event.duration,
-      escapeCsvValue(JSON.stringify(event.properties)),
+      JSON.stringify(event.properties ?? {}), // properties (keep valid JSON)
       event.created_at,
       event.country,
       event.city,
       event.region,
       event.longitude,
       event.latitude,
       event.os,
       event.os_version,
       event.browser,
       event.browser_version,
       event.device,
       event.brand,
       event.model,
       event.imported_at,
     ];
-
-    return columns.join(',');
+    return columns.map(escapeCsvField).join(',');
   }

If you rely on non-default null handling, set input_format_csv_null_representation explicitly to \N in clickhouse_settings.


162-168: Race condition: session_end may delete screen_view prematurely.

This issue was previously flagged but remains unresolved: If events arrive out of order (common in distributed systems), a session_end event could delete the last screen_view before later events in the same session are processed, preventing proper enrichment.

Consider one of these approaches:

  1. Remove the deletion logic and rely on TTL expiration (you already have a 1-hour TTL on line 26)
  2. Add sequence numbers to events and only delete if session_end is truly the last event
  3. Use a longer TTL and let Redis naturally expire old entries

Since you already have a 1-hour TTL on screen_view entries (line 158), relying solely on TTL expiration may be sufficient.


449-477: Add idempotency via insert_deduplication_token to prevent duplicates on retries.

This issue was previously flagged but remains unresolved: A failure mid-chunk can reprocess earlier rows, causing duplicates. Use a stable token per chunk for idempotent inserts.

Apply this diff:

+import { createHash } from 'node:crypto';
+
 // ...
 
   private async processEventsChunk(events: string[]) {
     // ...
     
     for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
+      const dedupToken = createHash('sha1')
+        .update(chunk.map((e) => e.id).join(','))
+        .digest('hex');
+      
       if (process.env.USE_CSV === 'true' || process.env.USE_CSV === '1') {
         const csvRows = chunk.map((event) => this.eventToCsvRow(event));
         const csv = [this.getCsvHeaders(), ...csvRows].join('\n');
         const csvStream = Readable.from(csv, { objectMode: false });
         
         await ch.insert({
           table: 'events',
           values: csvStream,
           format: 'CSV',
           clickhouse_settings: {
             input_format_csv_skip_first_lines: '1',
             format_csv_allow_single_quotes: 1,
             format_csv_allow_double_quotes: 1,
+            insert_deduplication_token: dedupToken,
           },
         });
         
         await getRedisCache().incrby('event:buffer:csv:counter', chunk.length);
       } else {
         await getRedisCache().incrby('event:buffer:json:counter', chunk.length);
         await ch.insert({
           table: 'events',
           values: chunk,
           format: 'JSONEachRow',
+          clickhouse_settings: {
+            insert_deduplication_token: dedupToken,
+          },
         });
       }
     }
   }
apps/api/src/controllers/track.controller.ts (3)

58-86: CRITICAL: Timestamp type safety issue remains unresolved.

The previous review comment on lines 62-86 correctly identified that safeTimestamp may be a string (depending on Fastify's typing), yet it's compared numerically at line 77 and returned to callers expecting a number. This can cause incorrect timestamp comparisons and type mismatches.

Apply the previously suggested fix to ensure both server and client timestamps are robustly coerced to numeric milliseconds.


296-316: MAJOR: Internal properties still being persisted.

The previous review comment correctly identified that spreading payload.properties before UA/geo fields may persist transient/private keys like __ip, __timestamp, __os, __identify, etc. These internal properties should be stripped before persisting to the profile.

Apply the previously suggested fix to filter out keys starting with "__".


95-101: MAJOR: IP spoofing vulnerability and unsafe assertions remain.

The previous review comment correctly identified that allowing arbitrary properties.__ip enables IP spoofing (affects geo, deviceId, and deduplication logic). Additionally, the non-null assertions on getClientIp(request)! and request.headers['user-agent']! can cause runtime crashes.

Apply the previously suggested fix to validate IP format, prefer server-derived IPs, and remove non-null assertions.

🧹 Nitpick comments (4)
packages/db/src/buffers/event-buffer.ts (1)

146-148: Clarify purpose of unused counter.

The counter event:buffer:counter is incremented here but never read within this file. Is this for external monitoring/metrics, or should it be cleaned up?

apps/api/src/routes/track.router.ts (1)

29-34: Consider parsing counter values to numbers for consistency.

Redis GET operations return string values (or null). The metrics are returned as strings, which might be unexpected for API consumers who expect numeric counters. Consider parsing them to numbers or documenting this behavior.

If numeric values are preferred:

       return reply.send({
-        track: trackCounter,
-        queue: queueCounter,
-        eventBufferCsv: eventBufferCsvCounter,
-        eventBufferJson: eventBufferJsonCounter,
+        track: trackCounter ? parseInt(trackCounter, 10) : 0,
+        queue: queueCounter ? parseInt(queueCounter, 10) : 0,
+        eventBufferCsv: eventBufferCsvCounter ? parseInt(eventBufferCsvCounter, 10) : 0,
+        eventBufferJson: eventBufferJsonCounter ? parseInt(eventBufferJsonCounter, 10) : 0,
       });
apps/api/src/controllers/track.controller.ts (2)

140-140: Consider restoring explicit type annotation.

The explicit Promise<void>[] type annotation was removed. While TypeScript inference works, explicit typing improves readability and catches type errors earlier.

Apply this diff to restore the type annotation:

-      const promises = [];
+      const promises: Promise<void>[] = [];

245-245: Consider non-blocking counter increment or error handling.

The synchronous await on Redis counter increment adds latency to the critical tracking path. If Redis is slow or unavailable, this will degrade track request performance.

Consider one of these approaches:

  1. Fire-and-forget (no await):
-  await getRedisCache().incr('track:counter');
+  getRedisCache().incr('track:counter').catch(() => {/* ignore */});
  1. Add timeout and error handling:
-  await getRedisCache().incr('track:counter');
+  await getRedisCache().incr('track:counter').catch((err) => {
+    log('Failed to increment counter', { error: err.message });
+  });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c20b1f9 and c3e6ed8.

📒 Files selected for processing (5)
  • apps/api/src/controllers/track.controller.ts (10 hunks)
  • apps/api/src/routes/track.router.ts (1 hunks)
  • apps/worker/src/boot-workers.ts (2 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • packages/db/src/buffers/event-buffer.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/worker/src/boot-workers.ts
  • apps/worker/src/jobs/events.incoming-event.ts
🧰 Additional context used
🧬 Code graph analysis (3)
packages/db/src/buffers/event-buffer.ts (5)
packages/redis/redis.ts (3)
  • Redis (9-9)
  • getRedisCache (66-72)
  • getRedisPub (84-90)
packages/db/src/services/event.service.ts (3)
  • IClickhouseEvent (65-99)
  • IServiceEvent (142-179)
  • transformEvent (101-135)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/redis/publisher.ts (1)
  • publishEvent (28-36)
apps/api/src/routes/track.router.ts (4)
apps/api/src/hooks/duplicate.hook.ts (1)
  • duplicateHook (5-21)
apps/api/src/hooks/client.hook.ts (1)
  • clientHook (5-23)
apps/api/src/hooks/is-bot.hook.ts (1)
  • isBotHook (12-51)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
apps/api/src/controllers/track.controller.ts (6)
packages/sdks/sdk/src/index.ts (2)
  • IdentifyPayload (36-43)
  • TrackPayload (25-29)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/geo/src/geo.ts (1)
  • GeoLocation (36-42)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (107-129)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/queue/src/queues.ts (1)
  • eventsGroupQueue (129-142)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (6)
packages/db/src/buffers/event-buffer.ts (2)

36-46: LGTM: Clean parallel processing setup.

The constructor properly initializes the Redis client and enables parallel processing via environment flag, allowing multiple workers to safely claim batches concurrently.


488-598: LGTM: Robust buffer size tracking with counter optimization.

The buffer size methods use counter optimization with LLEN fallbacks, ensuring accuracy while maintaining performance. The atomic counter updates in pushToRetry and clearDLQ are particularly important for parallel worker safety.

apps/api/src/controllers/track.controller.ts (4)

42-56: LGTM: Defensive property access.

The conditional check for 'properties' in body.payload before accessing __identify prevents runtime errors when the payload doesn't have a properties field.


155-167: LGTM: Improved concurrent execution.

Pushing the track() call to the promises array enables concurrent execution with identify() via Promise.all, improving performance.


264-282: LGTM: Correct numeric orderMs for queue ordering.

The change from string to numeric timestamp for orderMs correctly aligns with the GroupQueue's ordering mechanism.


242-245: The original review comment is based on an incorrect assumption about jobId usage.

The jobId is passed as a data field in the queue payload, not as the queue's job identifier. The queue library auto-generates its own job IDs independently. Since jobId serves only as tracking/logging metadata and is not used for queue processing logic, an empty or missing jobId would not cause queue processing issues. Even if payload.name is empty and filtered out, the remaining components (timestamp, projectId, currentDeviceId, groupId) remain in the jobId string, providing sufficient uniqueness for tracking purposes.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
apps/api/src/routes/track.router.ts (4)

10-10: Scope duplicateHook to POST route only (avoid GET body assumptions).

Running this hook for GET routes can crash due to missing req.body.

-  fastify.addHook('preValidation', duplicateHook);
@@
   fastify.route({
     method: 'POST',
     url: '/',
+    preValidation: duplicateHook,
     handler: handler,

11-12: Limit clientHook and isBotHook to POST ingestion.

These expect SDK payload/auth and are not applicable to metrics endpoints.

-  fastify.addHook('preHandler', clientHook);
-  fastify.addHook('preHandler', isBotHook);
@@
   fastify.route({
     method: 'POST',
     url: '/',
+    preHandler: [clientHook, isBotHook],
     handler: handler,

14-39: Protect /metrics, add error handling, and return numeric counters.

Add admin auth, wrap Redis calls in try/catch, coerce to numbers, and reuse a single Redis instance.

   fastify.route({
     method: 'GET',
     url: '/metrics',
     handler: async (req, reply) => {
-      const [
+      // Simple admin/auth check (adjust to your auth model)
+      const token = req.headers['x-internal-token'] as string | undefined;
+      if (token !== process.env.INTERNAL_METRICS_TOKEN) {
+        return reply.status(401).send('Unauthorized');
+      }
+
+      try {
+        const redis = getRedisCache();
+        const [
         trackCounter,
         queueCounter,
         eventBufferCounter,
         eventBufferCsvCounter,
         eventBufferJsonCounter,
-      ] = await Promise.all([
-        getRedisCache().get('track:counter'),
-        getRedisCache().get('queue:counter'),
-        getRedisCache().get('event:buffer:counter'),
-        getRedisCache().get('event:buffer:csv:counter'),
-        getRedisCache().get('event:buffer:json:counter'),
-      ]);
-      return reply.send({
-        track: trackCounter,
-        queue: queueCounter,
-        eventBuffer: eventBufferCounter,
-        eventBufferCsv: eventBufferCsvCounter,
-        eventBufferJson: eventBufferJsonCounter,
-      });
+      ] = await Promise.all([
+        redis.get('track:counter'),
+        redis.get('queue:counter'),
+        redis.get('event:buffer:counter'),
+        redis.get('event:buffer:csv:counter'),
+        redis.get('event:buffer:json:counter'),
+      ]);
+      const toNum = (v: string | null) => (v ? Number(v) : 0);
+      return reply.send({
+        track: toNum(trackCounter),
+        queue: toNum(queueCounter),
+        eventBuffer: toNum(eventBufferCounter),
+        eventBufferCsv: toNum(eventBufferCsvCounter),
+        eventBufferJson: toNum(eventBufferJsonCounter),
+      });
+      } catch (error) {
+        req.log.error({ err: error }, 'Failed to fetch metrics');
+        return reply.status(500).send('Internal server error');
+      }
     },
   });

41-53: Critical: destructive GET without auth/response/handling. Switch to POST, auth, and reply.

Protect, handle errors, and send a response body.

   fastify.route({
-    method: 'GET',
+    method: 'POST',
     url: '/metrics/reset',
     handler: async (req, reply) => {
-      await Promise.all([
-        getRedisCache().del('track:counter'),
-        getRedisCache().del('queue:counter'),
-        getRedisCache().del('event:buffer:counter'),
-        getRedisCache().del('event:buffer:csv:counter'),
-        getRedisCache().del('event:buffer:json:counter'),
-      ]);
+      const token = req.headers['x-internal-token'] as string | undefined;
+      if (token !== process.env.INTERNAL_METRICS_TOKEN) {
+        return reply.status(401).send('Unauthorized');
+      }
+      try {
+        const redis = getRedisCache();
+        await Promise.all([
+          redis.del('track:counter'),
+          redis.del('queue:counter'),
+          redis.del('event:buffer:counter'),
+          redis.del('event:buffer:csv:counter'),
+          redis.del('event:buffer:json:counter'),
+        ]);
+        return reply.send({ success: true, message: 'Metrics reset successfully' });
+      } catch (error) {
+        req.log.error({ err: error }, 'Failed to reset metrics');
+        return reply.status(500).send('Internal server error');
+      }
     },
   });
apps/worker/src/boot-workers.ts (1)

41-56: Critical: Lock is acquired but not enforced, allowing duplicate event processing.

The lock is checked but the event is processed regardless of whether the lock was successfully acquired. When getLock returns false, it means another worker already holds the lock and is processing this event. The current worker should skip processing in that case.

This issue was previously identified and remains unresolved.

Apply this diff to enforce the lock and prevent duplicate processing:

-      if (await getLock(job.id, '1', 10000)) {
-        logger.info('worker handler', {
-          jobId: job.id,
-          groupId: job.groupId,
-          timestamp: job.data.event.timestamp,
-          data: job.data,
-        });
-      } else {
-        logger.info('event already processed', {
-          jobId: job.id,
-          groupId: job.groupId,
-          timestamp: job.data.event.timestamp,
-          data: job.data,
-        });
-      }
-      await incomingEventPure(job.data);
+      if (await getLock(job.id, '1', 10000)) {
+        logger.info('worker handler', {
+          jobId: job.id,
+          groupId: job.groupId,
+          timestamp: job.data.event.timestamp,
+          data: job.data,
+        });
+        await incomingEventPure(job.data);
+      } else {
+        logger.info('event already processed', {
+          jobId: job.id,
+          groupId: job.groupId,
+          timestamp: job.data.event.timestamp,
+          data: job.data,
+        });
+      }
🧹 Nitpick comments (1)
apps/worker/src/boot-workers.ts (1)

41-41: Consider making lock parameters configurable.

The lock timeout (10000 ms) and value ('1') are hardcoded. Consider:

  • Making the timeout configurable via an environment variable for operational flexibility
  • Using a more meaningful lock value (e.g., worker ID or hostname) for better observability

Example:

-      if (await getLock(job.id, '1', 10000)) {
+      const lockTimeout = Number.parseInt(process.env.EVENT_LOCK_TIMEOUT_MS || '10000', 10);
+      const lockValue = process.env.WORKER_ID || process.pid.toString();
+      if (await getLock(job.id, lockValue, lockTimeout)) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0817e and c192d45.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • apps/api/package.json (2 hunks)
  • apps/api/src/routes/track.router.ts (1 hunks)
  • apps/worker/package.json (1 hunks)
  • apps/worker/src/boot-workers.ts (2 hunks)
  • packages/queue/package.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/queue/package.json
🧰 Additional context used
🧬 Code graph analysis (2)
apps/worker/src/boot-workers.ts (1)
packages/redis/redis.ts (1)
  • getLock (122-125)
apps/api/src/routes/track.router.ts (4)
apps/api/src/hooks/duplicate.hook.ts (1)
  • duplicateHook (5-21)
apps/api/src/hooks/client.hook.ts (1)
  • clientHook (5-23)
apps/api/src/hooks/is-bot.hook.ts (1)
  • isBotHook (12-51)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
🔇 Additional comments (5)
apps/worker/package.json (1)

27-27: Verify groupmq 1.0.1-next.3 compatibility with queue integration.

The update from 1.0.0-next.19 to 1.0.1-next.3 is a pre-release bump to an internal package. Confirm that the updated groupmq version works with the queue-related changes described in the PR objectives (Redis list ingestion).

apps/api/package.json (2)

16-16: Confirm Fastify v5.6.1 and plugin compatibility with current codebase.

These Fastify ecosystem updates are minor/patch bumps within the v5.x series. While semver dictates backward compatibility, verify that:

  • All code aligns with v5 APIs (e.g., logger vs. loggerInstance, route versioning constraints, HEAD route changes per v5 migration guide)
  • Fastify plugin versions are tested together
  • Performance improvements from v5.6.1 are realized in the intended code paths

Also applies to: 18-20, 38-38


41-41: Verify groupmq 1.0.1-next.3 integration with performance feature.

The groupmq pre-release bump aligns with the PR objective of Redis list ingestion. Ensure the updated queue package is compatible with the new event buffering strategy described in the PR.

apps/worker/src/boot-workers.ts (2)

13-13: LGTM!

The import of getLock is correctly added and properly used in the event handler.


37-39: LGTM!

The blockingTimeoutSec configuration is properly implemented with a sensible default value.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c192d45 and 4e901cc.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (4)
  • apps/api/package.json (2 hunks)
  • apps/worker/package.json (1 hunks)
  • packages/queue/package.json (1 hunks)
  • packages/queue/src/queues.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/queue/package.json
  • apps/worker/package.json
🔇 Additional comments (5)
apps/api/package.json (2)

41-41: Clarify groupmq pre-release version pin.

The update from 1.0.0-next.19 to 1.0.1-next.6 appears to downgrade the pre-release identifier (next.19 → next.6) while bumping the patch version. Confirm this is intentional, as it may indicate a regression or a deliberate pin to an older pre-release snapshot.


38-38: No action required—fastify 5.2.1 → 5.6.1 is a safe update.

The jump across multiple patch versions contains no breaking changes. The update includes a security patch (v5.3.2 content-type validation fix), TypeScript improvements, HTTP/2 fixes, and general stability enhancements—all compatible with the codebase and aligned with the PR's performance objectives.

packages/queue/src/queues.ts (3)

23-43: Well-structured discriminated union.

The uaInfo property uses a clean discriminated union pattern with isServer as the discriminant. The readonly modifiers and literal types for the server variant provide strong type safety.


129-129: Verify Redis capacity for 100x increase in job retention.

The keepCompleted value increased from 10 to 1,000, which means Redis will retain 100x more completed jobs. Confirm that:

  1. This increase aligns with your Redis capacity and memory planning
  2. The retention is necessary for debugging/monitoring purposes
  3. There's no risk of memory pressure under high load

20-20: No issues found; timestamp type change is safe.

The type change from string to string | number is safe because new Date() accepts both types and immediately converts the timestamp to a Date object (line 85 in events.incoming-event.ts). All downstream code receives and expects Date values, not raw timestamps. The conversion is consistent throughout the codebase and poses no compatibility issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/queue/package.json (1)

14-14: Pre-release pinning is intentional but document the rationale.

The groupmq library is a Redis-backed per-group FIFO queue, and pinning the exact pre-release version (1.0.1-next.9) across the monorepo is consistent. However, using a pre-release in a performance feature PR should be documented—either in the PR description or in a comment—explaining whether this pre-release is temporary pending a stable 1.0.1 release or if it's required for the performance improvements.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36f57d2 and f617a13.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (3)
  • apps/api/package.json (2 hunks)
  • apps/worker/package.json (1 hunks)
  • packages/queue/package.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/worker/package.json
🔇 Additional comments (2)
apps/api/package.json (2)

16-20: Fastify ecosystem upgrades are aligned and compatible.

Fastify 5.6.1 is the latest version, and your bumps from 5.2.1 to 5.6.1 (alongside ecosystem plugin updates) are semver-compatible within the caret range. Fastify may release breaking changes as minor version releases, but such changes are always noted in release notes. Since your codebase was already migrated to v5, these patch-level updates should be safe.

If integration tests or end-to-end tests are part of your CI pipeline, ensure they pass before merging to confirm compatibility with the Fastify 5.2→5.6 jump.

Also applies to: 38-38


41-41: Monorepo consistency: groupmq pinning matches across packages.

Both packages/queue/package.json and apps/api/package.json pin groupmq to 1.0.1-next.9 exactly, which ensures deterministic builds. As noted in the prior review, document why the pre-release version is necessary for this performance feature.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
packages/queue/src/queues.ts (3)

117-120: Add validation for environment variable parsing.

The past review comment about orderingDelayMs being unused is now resolved—it's correctly used on line 137. However, there's no validation that Number.parseInt doesn't return NaN when the environment variable contains an invalid value. If ORDERING_DELAY_MS=invalid, orderingDelayMs would be NaN, which could cause unexpected behavior in the GroupQueue configuration.

Apply this diff to add validation:

 const orderingDelayMs = Number.parseInt(
   process.env.ORDERING_DELAY_MS || '100',
   10,
 );
+if (Number.isNaN(orderingDelayMs)) {
+  throw new Error('ORDERING_DELAY_MS must be a valid number');
+}

122-126: Add validation for auto-batch environment variables.

Similar to orderingDelayMs, there's no validation that Number.parseInt returns a valid number for autoBatchMaxWaitMs and autoBatchSize. Invalid environment variable values would result in NaN, which could cause silent failures or unexpected behavior.

Apply this diff to add validation:

 const autoBatchMaxWaitMs = Number.parseInt(
   process.env.AUTO_BATCH_MAX_WAIT_MS || '0',
   10,
 );
 const autoBatchSize = Number.parseInt(process.env.AUTO_BATCH_SIZE || '0', 10);
+if (Number.isNaN(autoBatchMaxWaitMs) || Number.isNaN(autoBatchSize)) {
+  throw new Error('AUTO_BATCH_MAX_WAIT_MS and AUTO_BATCH_SIZE must be valid numbers');
+}

138-144: Consider logging when auto-batch configuration is incomplete.

The autoBatch configuration requires both AUTO_BATCH_MAX_WAIT_MS and AUTO_BATCH_SIZE to be set and non-zero. If only one environment variable is configured, autoBatch silently becomes undefined without any warning. While this behavior is likely intentional (both parameters are required for batching), consider adding a log message when one is set but the other is missing or zero, to help with configuration troubleshooting.

Apply this diff to add a warning:

+if ((autoBatchMaxWaitMs > 0) !== (autoBatchSize > 0)) {
+  queueLogger.warn(
+    'Auto-batch partially configured: both AUTO_BATCH_MAX_WAIT_MS and AUTO_BATCH_SIZE must be set for batching to be enabled'
+  );
+}
+
 export const eventsGroupQueue = new GroupQueue<
   EventsQueuePayloadIncomingEvent['payload']
 >({
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83c6c2f and b51692d.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (4)
  • apps/api/package.json (2 hunks)
  • apps/worker/package.json (1 hunks)
  • packages/queue/package.json (1 hunks)
  • packages/queue/src/queues.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/worker/package.json
  • packages/queue/package.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
🔇 Additional comments (5)
packages/queue/src/queues.ts (3)

23-43: Well-structured discriminated union for user agent information.

The uaInfo property uses a clean discriminated union pattern with isServer as the discriminant. The type safety ensures server-side events have fixed metadata values while client-side events carry optional device/browser details.


135-135: Verify Redis memory impact of 100x increase in completed job retention.

The keepCompleted value has been increased from 10 to 1,000—a 100x increase. While this may improve observability and debugging capabilities, it will significantly increase Redis memory usage, especially under high event throughput. Ensure that your Redis instance has sufficient memory capacity and that this retention policy aligns with your operational requirements.


20-20: Verify downstream handling of numeric timestamps—queue consumer not found in codebase.

The timestamp type has been broadened from string to string | number at line 20 of packages/queue/src/queues.ts. While the PostEventPayload SDK interface keeps timestamps as strings only, the queue now accepts both types, creating a type union that downstream processors must handle correctly.

The actual queue processor for eventsGroupQueue (defined at lines 128-134) could not be located in the searchable codebase—it likely resides in the ./apps directory or is abstracted by the GroupQueue library. Before merging, ensure that:

  • Queue message processors handle both string and numeric timestamp formats without type errors or unexpected coercions
  • Any timestamp-to-Date conversions (e.g., new Date(timestamp)) work correctly with both string and number inputs
  • ClickHouse insertions or other downstream systems accept the resulting timestamp format
apps/api/package.json (2)

16-16: No security vulnerabilities detected; updates are safe.

The npm audit confirms no security advisories for any of the updated Fastify packages (@fastify/[email protected], @fastify/[email protected], @fastify/[email protected], @fastify/[email protected], and [email protected]). The ^ semver constraint further guarantees compatibility without breaking changes.


41-41: All dependency updates are verified as safe—no breaking changes detected.

Verification confirms:

  • [email protected] exists and is actively maintained (published 2025-09-23, last modified today)
  • No documented breaking changes in groupmq v1.1.0
  • Fastify updates (v5.2.1 → v5.6.1) contain only minor/patch fixes and security updates; no new breaking changes in this range

The pre-release status is acceptable given the active maintenance and lack of reported stability issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (7)
packages/queue/src/queues.ts (1)

127-130: The orderingDelayMs variable is actually used in the queue configuration.

Despite the past review comment claiming this variable is unused, it is passed to the GroupQueue constructor on line 149 as orderingDelayMs: orderingDelayMs. The past review comment appears to be incorrect.

apps/api/src/controllers/track.controller.ts (4)

224-224: Type the log parameter properly.

The past review correctly identified that the log parameter is typed as any, losing type safety.

Based on past review feedback, use ILogger or a more specific logger type.

Also applies to: 234-234


62-86: Address timestamp type safety issues from previous review.

The past review correctly identified that safeTimestamp and clientTimestampNumber should be strictly numeric, but the code still has type safety gaps. The timestamp parameter could be a string, and clientTimestamp.getTime() could return NaN if the date parsing fails.

Based on past review feedback, apply robust timestamp coercion as suggested.


96-100: Client-supplied __ip enables IP spoofing—a security risk.

The past review correctly identified that trusting properties.__ip from clients allows IP spoofing, which affects geolocation, device fingerprinting, and deduplication. The code should validate the IP format and preferably only accept it from trusted server-side clients.

Based on past review feedback, validate IP addresses and restrict client-supplied IPs to trusted sources.


303-315: Filter out internal __ properties before persisting to profile.

The past review correctly identified that spreading payload.properties before adding geo/UA fields may persist internal properties like __ip, __timestamp, etc. These should be filtered out.

Based on past review feedback, filter properties starting with __ before merging.

apps/worker/src/boot-workers.ts (2)

43-58: CRITICAL: Lock is not enforced—events are processed even when another worker holds the lock.

The code checks whether getLock succeeds but always calls incomingEventPure(job.data) on line 58, regardless of the lock result. This defeats the purpose of the lock and allows multiple workers to process the same event concurrently, leading to duplicate processing, data races, and inconsistent state.

Apply the fix from the previous review:

      handler: async (job) => {
        if (await getLock(job.id, '1', 10000)) {
          logger.info('worker handler', {
            jobId: job.id,
            groupId: job.groupId,
            timestamp: job.data.event.timestamp,
            data: job.data,
          });
+         await incomingEventPure(job.data);
        } else {
          logger.info('event already processed', {
            jobId: job.id,
            groupId: job.groupId,
            timestamp: job.data.event.timestamp,
            data: job.data,
          });
        }
-       await incomingEventPure(job.data);
      },

48-48: Logging job.data exposes PII including profileId and user properties.

The log statements on lines 48 and 55 include the full job.data object, which contains:

  • profileId (user identifier)
  • event.properties (arbitrary user-provided data)
  • geo coordinates
  • device IDs

These qualify as PII under GDPR/CCPA and should not be logged unless necessary and properly anonymized.

Remove data: job.data from the log statements:

        if (await getLock(job.id, '1', 10000)) {
          logger.info('worker handler', {
            jobId: job.id,
            groupId: job.groupId,
            timestamp: job.data.event.timestamp,
-           data: job.data,
          });
          await incomingEventPure(job.data);
        } else {
          logger.info('event already processed', {
            jobId: job.id,
            groupId: job.groupId,
            timestamp: job.data.event.timestamp,
-           data: job.data,
          });
        }

Also applies to: 55-55

🧹 Nitpick comments (4)
docker-compose.yml (1)

15-26: Verify the purpose and scope of the new DragonflyDB service.

A new high-performance in-memory data store (DragonflyDB) has been added alongside the existing Redis service. This aligns with the performance refactor theme, but several questions need clarification:

  1. What is the role split between op-kv (Redis) and op-df (DragonflyDB)? Are they both required, or is DragonflyDB replacing Redis?
  2. How will the application decide which service to use? Is this environment-variable driven, or hardcoded?
  3. Is DragonflyDB intended for local development only, or will it be used in production? If production, ensure appropriate observability and scaling strategies are in place.
  4. The cluster_mode=emulated and lock_on_hashtags settings are appropriate for local testing, but confirm these are not intended for production use.

I can help verify the application configuration to confirm how both services are integrated. Would you like me to search the codebase for environment variable usage or service connection logic?

packages/queue/package.json (1)

14-14: Verify groupmq version compatibility.

The groupmq version appears to go from 1.0.0-next.19 to 1.1.0-next.5, which suggests a minor version bump (1.0→1.1) but with a lower pre-release number (next.19→next.5). Ensure this version is correct and compatible with the new sharding features introduced in this PR.

packages/queue/src/queues.ts (1)

132-157: Validate auto-batch configuration to prevent misconfiguration.

The auto-batch feature is only enabled when both autoBatchMaxWaitMs and autoBatchSize are non-zero. However, there's no validation to ensure these values are sensible (e.g., maxWaitMs should be positive, size should be > 0). Invalid configurations would silently disable batching.

Add validation for the auto-batch configuration:

  const autoBatchMaxWaitMs = Number.parseInt(
    process.env.AUTO_BATCH_MAX_WAIT_MS || '0',
    10,
  );
  const autoBatchSize = Number.parseInt(process.env.AUTO_BATCH_SIZE || '0', 10);
+
+ if ((autoBatchMaxWaitMs > 0) !== (autoBatchSize > 0)) {
+   queueLogger.warn('Auto-batch partially configured; both AUTO_BATCH_MAX_WAIT_MS and AUTO_BATCH_SIZE must be set');
+ }

  export const eventsGroupQueues = Array.from({
    length: EVENTS_GROUP_QUEUES_SHARDS,
  }).map(
    (_, index) =>
      new GroupQueue<EventsQueuePayloadIncomingEvent['payload']>({
        logger: queueLogger,
        namespace: `{group_events_${index}}`,
        // @ts-expect-error - TODO: Fix this in groupmq
        redis: getRedisGroupQueue(),
        keepCompleted: 1_000,
        keepFailed: 10_000,
        orderingDelayMs: orderingDelayMs,
        autoBatch:
-         autoBatchMaxWaitMs && autoBatchSize
+         autoBatchMaxWaitMs > 0 && autoBatchSize > 0
            ? {
                maxWaitMs: autoBatchMaxWaitMs,
                size: autoBatchSize,
              }
            : undefined,
      }),
  );
apps/worker/src/boot-workers.ts (1)

31-65: Verify worker concurrency and blocking timeout configuration.

The workers are configured with EVENT_JOB_CONCURRENCY and EVENT_BLOCKING_TIMEOUT_SEC from environment variables. Ensure these are properly documented and that the blocking timeout is appropriate for your event processing time. A too-short timeout could cause jobs to be redelivered while still processing; a too-long timeout delays failure detection.

Consider adding validation and logging for these configuration values:

const concurrency = Number.parseInt(
  process.env.EVENT_JOB_CONCURRENCY || '1',
  10,
);
const blockingTimeoutSec = Number.parseFloat(
  process.env.EVENT_BLOCKING_TIMEOUT_SEC || '1',
);

if (concurrency < 1) {
  logger.warn('EVENT_JOB_CONCURRENCY must be >= 1, using default 1');
}
if (blockingTimeoutSec < 0.1) {
  logger.warn('EVENT_BLOCKING_TIMEOUT_SEC too low, may cause redelivery issues');
}

logger.info('Starting event workers', {
  shards: eventsGroupQueues.length,
  concurrency,
  blockingTimeoutSec,
});
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46d7978 and dbb4657.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (11)
  • apps/api/package.json (2 hunks)
  • apps/api/src/controllers/event.controller.ts (3 hunks)
  • apps/api/src/controllers/track.controller.ts (10 hunks)
  • apps/api/src/utils/graceful-shutdown.ts (2 hunks)
  • apps/worker/package.json (1 hunks)
  • apps/worker/src/boot-workers.ts (3 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/metrics.ts (5 hunks)
  • docker-compose.yml (2 hunks)
  • packages/queue/package.json (1 hunks)
  • packages/queue/src/queues.ts (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/worker/src/metrics.ts
  • apps/worker/package.json
🧰 Additional context used
🧬 Code graph analysis (6)
apps/worker/src/boot-workers.ts (3)
packages/queue/src/queues.ts (3)
  • eventsGroupQueues (138-158)
  • EventsQueuePayloadIncomingEvent (25-65)
  • queueLogger (23-23)
packages/redis/redis.ts (1)
  • getLock (122-125)
apps/worker/src/jobs/events.incoming-event.ts (1)
  • incomingEventPure (52-212)
apps/api/src/utils/graceful-shutdown.ts (1)
packages/queue/src/queues.ts (1)
  • eventsGroupQueues (138-158)
packages/queue/src/queues.ts (4)
packages/common/server/crypto.ts (1)
  • createHash (62-66)
packages/logger/index.ts (1)
  • createLogger (11-100)
packages/sdks/sdk/src/index.ts (1)
  • TrackPayload (25-29)
packages/redis/redis.ts (2)
  • getRedisGroupQueue (108-120)
  • getRedisQueue (93-105)
apps/worker/src/index.ts (1)
packages/queue/src/queues.ts (1)
  • eventsGroupQueues (138-158)
apps/api/src/controllers/event.controller.ts (1)
packages/queue/src/queues.ts (1)
  • getEventsGroupQueueShard (160-167)
apps/api/src/controllers/track.controller.ts (7)
packages/sdks/sdk/src/index.ts (5)
  • IdentifyPayload (36-43)
  • track (131-143)
  • increment (172-177)
  • decrement (179-184)
  • TrackPayload (25-29)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/geo/src/geo.ts (1)
  • GeoLocation (36-42)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (107-129)
packages/common/src/id.ts (1)
  • generateId (7-9)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/queue/src/queues.ts (1)
  • getEventsGroupQueueShard (160-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (8)
docker-compose.yml (2)

1-1: Clarify version specification.

The version field is now a string "3", which is more explicit and YAML-compliant. However, note that Docker Compose versions "3.x" should be used for production (e.g., "3.8" or "3.9") if targeting specific features. Bare version "3" works but is less precise about feature support.

Verify if this is intentional for local development only or if the version should be more specific.


33-33: Redis command formatting consistency improved.

The command has been updated from shell-style single quotes to JSON-style double quotes, aligning with the DragonflyDB service definition above. This is a minor formatting improvement and both formats are functionally equivalent.

apps/api/package.json (1)

16-20: LGTM! Dependency updates align with ecosystem.

The Fastify ecosystem updates (@fastify/compress, @fastify/cors, @fastify/rate-limit, @fastify/websocket, fastify) and groupmq version bump align with the queue sharding infrastructure changes introduced in this PR.

Also applies to: 38-38, 41-41

apps/api/src/controllers/event.controller.ts (1)

62-79: LGTM! Shard-based queue distribution properly implemented.

The change from a single queue to shard-based queue access via getEventsGroupQueueShard(projectId) properly distributes load across multiple queue shards. The enriched payload with uaInfo and jobId provides better traceability and processing context.

packages/queue/src/queues.ts (2)

14-21: LGTM! Crypto-based sharding properly distributes projects across queues.

The sharding implementation uses SHA1 hashing to deterministically distribute projects across EVENTS_GROUP_QUEUES_SHARDS queues. This provides balanced load distribution while ensuring events from the same project consistently go to the same shard for ordered processing.


169-213: LGTM! Redis cluster hash tags ensure correct key distribution.

The queue names are properly wrapped in braces (e.g., {sessions}, {cron}) to use Redis cluster hash tags, ensuring related keys are stored on the same Redis node for efficient operations.

apps/api/src/utils/graceful-shutdown.ts (1)

74-74: LGTM! Graceful shutdown properly handles all queue shards.

The change from closing a single queue to mapping over eventsGroupQueues ensures all queue shards are properly closed during graceful shutdown, preventing data loss or connection leaks.

apps/worker/src/index.ts (1)

36-38: LGTM! BullBoard integration properly displays all queue shards.

The change maps over eventsGroupQueues to create a separate BullBoard adapter for each shard, providing visibility into all queue shards in the monitoring UI.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
apps/api/src/controllers/track.controller.ts (2)

332-346: Fix increment default; value || 1 breaks when value is 0.

Use nullish coalescing so 0 is respected.

   profile.properties = assocPath(
     property.split('.'),
-    parsed + (value || 1),
+    parsed + (value ?? 1),
     profile.properties,
   );

368-381: Fix decrement default; value || 1 breaks when value is 0.

   profile.properties = assocPath(
     property.split('.'),
-    parsed - (value || 1),
+    parsed - (value ?? 1),
     profile.properties,
   );
apps/api/src/controllers/event.controller.ts (1)

23-25: Remove non-null assertions for IP/UA; use safe fallbacks.

-  const ip = getClientIp(request)!;
-  const ua = request.headers['user-agent']!;
+  const ip = getClientIp(request) ?? request.ip;
+  const ua =
+    typeof request.headers['user-agent'] === 'string'
+      ? request.headers['user-agent']
+      : undefined;
apps/worker/src/boot-workers.ts (1)

227-251: Remove PII from failure/completion logs.

Avoid logging job.data which may contain properties, geo, device IDs, identifiers.

     (worker as Worker).on('failed', (job) => {
       if (job) {
         logger.error('job failed', {
           jobId: job.id,
           worker: worker.name,
-          data: job.data,
           error: job.failedReason,
           options: job.opts,
         });
       }
     });
@@
     (worker as Worker).on('completed', (job) => {
       if (job) {
         logger.info('job completed', {
           jobId: job.id,
           worker: worker.name,
-          data: job.data,
           elapsed:
             job.processedOn && job.finishedOn
               ? job.finishedOn - job.processedOn
               : undefined,
         });
       }
     });
♻️ Duplicate comments (9)
apps/api/src/controllers/track.controller.ts (7)

224-235: Type the log parameter explicitly.

 async function track({
@@
   isTimestampFromThePast,
   log,
 }: {
@@
   timestamp: number;
   isTimestampFromThePast: boolean;
-  log: any;
+  log: ILogger['info'];
 }) {

242-244: Make jobId components explicit strings; don’t drop 0.

Stringify timestamp and avoid .filter(Boolean) which drops 0/false.

-  const jobId = [payload.name, timestamp, projectId, currentDeviceId, groupId]
-    .filter(Boolean)
+  const jobId = [payload.name, String(timestamp), projectId, currentDeviceId, groupId]
+    .filter((x) => x !== undefined && x !== null && x !== '')
     .join('-');

62-86: Normalize timestamps to numeric milliseconds; avoid string comparisons.

Current code may compare numbers to strings and return non-numeric timestamps. Coerce both server and client timestamps robustly.

Apply:

 export function getTimestamp(
   timestamp: FastifyRequest['timestamp'],
   payload: TrackHandlerPayload['payload'],
 ) {
-  const safeTimestamp = timestamp || Date.now();
-  const userDefinedTimestamp =
-    'properties' in payload
-      ? (payload?.properties?.__timestamp as string | undefined)
-      : undefined;
+  const toMs = (v: unknown): number | undefined => {
+    if (typeof v === 'number' && Number.isFinite(v)) return v;
+    if (typeof v === 'string') {
+      const n = Number(v);
+      if (Number.isFinite(n)) return n;
+      const d = Date.parse(v);
+      if (Number.isFinite(d)) return d;
+    }
+    return undefined;
+  };
+
+  const safeTimestamp = toMs(timestamp) ?? Date.now();
+  const userDefinedTimestamp =
+    'properties' in payload ? (payload?.properties as any)?.__timestamp : undefined;
@@
-  const clientTimestamp = new Date(userDefinedTimestamp);
-  const clientTimestampNumber = clientTimestamp.getTime();
+  const clientTimestampNumber = toMs(userDefinedTimestamp);
@@
-    Number.isNaN(clientTimestampNumber) ||
-    clientTimestampNumber > safeTimestamp
+    !Number.isFinite(clientTimestampNumber!) ||
+    (clientTimestampNumber as number) > safeTimestamp
   ) {
     return { timestamp: safeTimestamp, isTimestampFromThePast: false };
   }
 
   return {
-    timestamp: clientTimestampNumber,
+    timestamp: clientTimestampNumber as number,
     isTimestampFromThePast: true,
   };
 }

95-101: Do not trust client-supplied __ip; remove non-null assertions for IP/UA.

Validate __ip, prefer server-derived IPs, and avoid ! on getClientIp and headers.

-  const ip =
-    'properties' in request.body.payload &&
-    request.body.payload.properties?.__ip
-      ? (request.body.payload.properties.__ip as string)
-      : getClientIp(request)!;
-  const ua = request.headers['user-agent']!;
+  const ipCandidate =
+    'properties' in request.body.payload
+      ? (request.body.payload.properties as any)?.__ip
+      : undefined;
+  const ip =
+    (typeof ipCandidate === 'string' && isIP(ipCandidate) ? ipCandidate : undefined) ??
+    getClientIp(request) ??
+    request.ip;
+  const ua =
+    typeof request.headers['user-agent'] === 'string'
+      ? request.headers['user-agent']
+      : undefined;

246-263: Avoid logging PII; remove full data object from logs.

data contains properties, geo, device IDs, profileId. Log only safe metadata.

-  log('track handler', {
-    jobId: jobId,
-    groupId: groupId,
-    timestamp: timestamp,
-    data: {
-      projectId,
-      headers,
-      event: {
-        ...payload,
-        timestamp,
-        isTimestampFromThePast,
-      },
-      uaInfo,
-      geo,
-      currentDeviceId,
-      previousDeviceId,
-    },
-  });
+  log('track handler', {
+    jobId,
+    groupId,
+    projectId,
+    eventName: payload.name,
+    timestamp,
+    isTimestampFromThePast,
+  });

302-315: Strip internal (“__*”) properties before persisting profile.properties.

Prevents saving transient private keys like __ip, __timestamp, overrides, etc.

   properties: {
-      ...(payload.properties ?? {}),
+      ...Object.fromEntries(
+        Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
+      ),
       country: geo.country,
       city: geo.city,
       region: geo.region,
       longitude: geo.longitude,
       latitude: geo.latitude,
       os: uaInfo.os,
       os_version: uaInfo.osVersion,
       browser: uaInfo.browser,
       browser_version: uaInfo.browserVersion,
       device: uaInfo.device,
       brand: uaInfo.brand,
       model: uaInfo.model,
   },

1-13: Add IP validation helper import for safe client IP handling.

 import { getClientIp } from '@/utils/get-client-ip';
 import type { FastifyReply, FastifyRequest } from 'fastify';
 import { assocPath, pathOr, pick } from 'ramda';
 
 import { logger } from '@/utils/logger';
+import { isIP } from 'node:net';
apps/api/src/controllers/event.controller.ts (1)

53-61: Make jobId components explicit strings; don’t drop 0.

-  const jobId = [
-    request.body.name,
-    timestamp,
-    projectId,
-    currentDeviceId,
-    groupId,
-  ]
-    .filter(Boolean)
-    .join('-');
+  const jobId = [request.body.name, String(timestamp), projectId, currentDeviceId, groupId]
+    .filter((x) => x !== undefined && x !== null && x !== '')
+    .join('-');
apps/worker/src/boot-workers.ts (1)

132-148: Critical: process event only when lock is acquired.

incomingEventPure(job.data) runs regardless of lock result, allowing duplicates. Also avoid logging full job.data.

       handler: async (job) => {
-        if (await getLock(job.id, '1', 10000)) {
-          logger.info('worker handler', {
-            jobId: job.id,
-            groupId: job.groupId,
-            timestamp: job.data.event.timestamp,
-            data: job.data,
-          });
-        } else {
-          logger.info('event already processed', {
-            jobId: job.id,
-            groupId: job.groupId,
-            timestamp: job.data.event.timestamp,
-            data: job.data,
-          });
-        }
-        await incomingEventPure(job.data);
+        if (await getLock(job.id, '1', 10000)) {
+          logger.info('worker handler', {
+            jobId: job.id,
+            groupId: job.groupId,
+            timestamp: job.data.event.timestamp,
+          });
+          await incomingEventPure(job.data);
+        } else {
+          logger.info('event already processed', {
+            jobId: job.id,
+            groupId: job.groupId,
+            timestamp: job.data.event.timestamp,
+          });
+        }
       },
🧹 Nitpick comments (1)
apps/api/src/controllers/event.controller.ts (1)

63-63: Use numeric timestamp directly for orderMs (after normalization).

-    orderMs: new Date(timestamp).getTime(),
+    orderMs: timestamp,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbb4657 and 560df69.

📒 Files selected for processing (5)
  • apps/api/src/controllers/event.controller.ts (3 hunks)
  • apps/api/src/controllers/track.controller.ts (10 hunks)
  • apps/worker/src/boot-workers.ts (2 hunks)
  • apps/worker/src/utils/singleton-lock.ts (1 hunks)
  • packages/queue/src/queues.ts (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
apps/worker/src/boot-workers.ts (3)
packages/redis/redis.ts (2)
  • getRedisQueue (93-105)
  • getLock (122-125)
packages/queue/src/queues.ts (8)
  • EVENTS_GROUP_QUEUES_SHARDS (14-17)
  • eventsGroupQueues (141-161)
  • EventsQueuePayloadIncomingEvent (28-68)
  • queueLogger (26-26)
  • sessionsQueue (172-178)
  • cronQueue (184-190)
  • notificationQueue (207-216)
  • miscQueue (192-198)
apps/worker/src/utils/singleton-lock.ts (1)
  • requireSingleton (12-69)
apps/api/src/controllers/event.controller.ts (1)
packages/queue/src/queues.ts (1)
  • getEventsGroupQueueShard (163-170)
apps/worker/src/utils/singleton-lock.ts (1)
packages/redis/redis.ts (2)
  • getLock (122-125)
  • getRedisCache (66-72)
apps/api/src/controllers/track.controller.ts (6)
packages/sdks/sdk/src/index.ts (2)
  • IdentifyPayload (36-43)
  • TrackPayload (25-29)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/geo/src/geo.ts (1)
  • GeoLocation (36-42)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (107-129)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/queue/src/queues.ts (1)
  • getEventsGroupQueueShard (163-170)
packages/queue/src/queues.ts (3)
packages/logger/index.ts (1)
  • createLogger (11-100)
packages/sdks/sdk/src/index.ts (1)
  • TrackPayload (25-29)
packages/redis/redis.ts (2)
  • getRedisGroupQueue (108-120)
  • getRedisQueue (93-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
🔇 Additional comments (1)
packages/queue/src/queues.ts (1)

141-160: No action needed—orderingDelayMs is valid in groupmq 1.1.0-next.5.

The option orderingDelayMs is confirmed as a Queue option that controls how long GroupMQ waits for earlier-ordered jobs before releasing later jobs. The codebase uses groupmq version 1.1.0-next.5, which supports this option. The code at line 152 is correct as-is.

Comment on lines 57 to 69
// Return cleanup function
return () => {
clearInterval(extensionInterval);
getRedisCache()
.del(lockKey)
.then(() => {
logger.info('Released singleton lock', { key });
})
.catch((error: unknown) => {
logger.error('Failed to release singleton lock', { key, error });
});
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Release lock only if owned; avoid deleting another process’s lock.

   return () => {
     clearInterval(extensionInterval);
-    getRedisCache()
-      .del(lockKey)
-      .then(() => {
-        logger.info('Released singleton lock', { key });
-      })
-      .catch((error: unknown) => {
-        logger.error('Failed to release singleton lock', { key, error });
-      });
+    (async () => {
+      try {
+        const res: number = (await getRedisCache().eval(
+          "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end",
+          1,
+          lockKey,
+          lockValue,
+        )) as unknown as number;
+        if (res === 1) {
+          logger.info('Released singleton lock', { key });
+        } else {
+          logger.warn('Did not release singleton lock (not owner)', { key });
+        }
+      } catch (error: unknown) {
+        logger.error('Failed to release singleton lock', { key, error });
+      }
+    })();
   };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Return cleanup function
return () => {
clearInterval(extensionInterval);
getRedisCache()
.del(lockKey)
.then(() => {
logger.info('Released singleton lock', { key });
})
.catch((error: unknown) => {
logger.error('Failed to release singleton lock', { key, error });
});
};
}
// Return cleanup function
return () => {
clearInterval(extensionInterval);
(async () => {
try {
const res: number = (await getRedisCache().eval(
"if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end",
1,
lockKey,
lockValue,
)) as unknown as number;
if (res === 1) {
logger.info('Released singleton lock', { key });
} else {
logger.warn('Did not release singleton lock (not owner)', { key });
}
} catch (error: unknown) {
logger.error('Failed to release singleton lock', { key, error });
}
})();
};
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
apps/worker/src/jobs/events.create-session-end.ts (1)

36-42: Critical: SQL injection vulnerability in query construction.

The SQL query uses unsanitized string interpolation with user-controlled inputs (sessionId, projectId), creating a severe SQL injection vulnerability. An attacker could execute arbitrary SQL commands by crafting malicious session or project IDs.

Replace string interpolation with parameterized queries. ClickHouse supports parameterized queries through query parameters:

-  const sql = `
-    SELECT * FROM ${TABLE_NAMES.events} 
-    WHERE 
-      session_id = '${sessionId}' 
-      AND project_id = '${projectId}'
-      AND created_at BETWEEN '${formatClickhouseDate(startAt)}' AND '${formatClickhouseDate(endAt)}'
-    ORDER BY created_at DESC LIMIT ${MAX_SESSION_EVENTS};
-  `;
+  const sql = `
+    SELECT * FROM ${TABLE_NAMES.events} 
+    WHERE 
+      session_id = {sessionId:String} 
+      AND project_id = {projectId:String}
+      AND created_at BETWEEN {startAt:DateTime} AND {endAt:DateTime}
+    ORDER BY created_at DESC LIMIT {maxEvents:UInt32};
+  `;
+  
+  const params = {
+    sessionId,
+    projectId,
+    startAt,
+    endAt,
+    maxEvents: MAX_SESSION_EVENTS,
+  };

Then pass params to your ClickHouse query execution method. Verify that getEvents supports parameterized queries and update accordingly.

Note: While this code wasn't modified in this PR, the vulnerability is critical enough to warrant immediate attention.

packages/db/src/buffers/profile-buffer.ts (1)

131-134: Remove unused variable.

The cacheKey variable is defined but never used in this method since the refactor moved cache key construction to fetchFromCache.

Apply this diff to remove the dead code:

-    const cacheKey = this.getProfileCacheKey({
-      profileId: profile.id,
-      projectId: profile.project_id,
-    });
-
     const existingProfile = await this.fetchFromCache(
       profile.id,
       profile.project_id,
packages/db/src/buffers/session-buffer.ts (1)

191-192: Fix copy/paste in error log message.

Logs “bot” in SessionBuffer.

-      this.logger.error('Failed to add bot event', { error });
+      this.logger.error('Failed to add session event', { error });
apps/worker/src/boot-workers.ts (1)

208-264: Fix worker.name access and event handler compatibility for GroupWorker instances.

The code casts all workers to (worker as Worker) and accesses worker.name at 7 locations (lines 211, 218, 224, 233, 239, 252, 261). However, BullMQ Worker receives the name as its first constructor parameter, while GroupWorker receives only a queue object with no name property set. This causes worker.name to be undefined for GroupWorker instances, resulting in incorrect metric labels and logging output.

Additionally, GroupWorker supports only the events: ready, completed, failed, error, closed, and graceful-timeout, but the code registers a handler for 'ioredis:close' (line 259) which is not emitted by GroupWorker and will never fire.

Recommended fix:

  • Extract the queue name and assign it to the worker before registering event handlers, or
  • Access queue.name directly in metrics and logging instead of worker.name
  • Remove or conditionally handle the 'ioredis:close' event handler for GroupWorker instances
♻️ Duplicate comments (10)
packages/db/src/buffers/base-buffer.ts (2)

15-26: Add documentation for the enableParallelProcessing option.

The constructor still lacks JSDoc explaining when to use parallel vs sequential mode, requirements, and trade-offs.


10-10: Fix type mismatch for onFlush property.

The property is declared as () => void but should be () => Promise<void> to match the constructor parameter (line 17) and the awaited calls (lines 102, 130).

packages/db/src/buffers/event-buffer.ts (3)

52-95: CSV builder corrupts JSON and NULLs; switch to RFC4180 quoting and \N for null.

Single-quote substitution breaks JSON and ClickHouse CSV semantics.

   /**
    * Convert event to CSV row format with headers
    * Order must match the ClickHouse table schema
    */
   private eventToCsvRow(event: IClickhouseEvent): string {
-    const escapeCsvValue = (value: any): string => {
-      if (value === null || value === undefined) return '';
-      const str = String(value);
-      // Replace double quotes with single quotes, then escape single quotes by doubling them
-      const withSingleQuotes = str.replace(/"/g, "'");
-      return `'${withSingleQuotes.replace(/'/g, "''")}'`;
-    };
+    // RFC4180-style; ClickHouse default null is \N (unquoted)
+    const escapeCsvField = (value: any): string => {
+      if (value === null || value === undefined) return '\\N';
+      const s = String(value);
+      const needsQuote = /[",\n\r]/.test(s);
+      const escaped = s.replace(/"/g, '""');
+      return needsQuote ? `"${escaped}"` : escaped;
+    };
@@
-    // Order matches the ClickHouse table schema exactly
+    // Order matches the ClickHouse table schema exactly
     const columns = [
       event.id, // id UUID
       event.name, // name
       event.sdk_name, // sdk_name
       event.sdk_version, // sdk_version
       event.device_id, // device_id
       event.profile_id, // profile_id
       event.project_id, // project_id
       event.session_id, // session_id
       event.path, // path
       event.origin, // origin
       event.referrer, // referrer
       event.referrer_name, // referrer_name
       event.referrer_type, // referrer_type
       event.duration, // duration
-      escapeCsvValue(JSON.stringify(event.properties)), // properties
+      JSON.stringify(event.properties ?? {}), // properties as valid JSON
       event.created_at, // created_at
       event.country, // country
       event.city, // city
       event.region, // region
       event.longitude, // longitude
       event.latitude, // latitude
       event.os, // os
       event.os_version, // os_version
       event.browser, // browser
       event.browser_version, // browser_version
       event.device, // device
       event.brand, // brand
       event.model, // model
       event.imported_at, // imported_at
     ];
-
-    return columns.join(',');
+    return columns.map(escapeCsvField).join(',');
   }

Also update CSV insert settings below to remove single-quote mode and explicitly set null representation (see next comment).


1-1: Add idempotency token and use TABLE_NAMES for inserts (CSV and JSON).

Prevents duplicates on retries; aligns with table constants.

-import { Readable } from 'node:stream';
+import { Readable } from 'node:stream';
+import { createHash } from 'node:crypto';
@@
-import { ch } from '../clickhouse/client';
+import { TABLE_NAMES, ch } from '../clickhouse/client';
@@
-    // Insert events into ClickHouse in chunks using CSV format with headers
+    // Insert events into ClickHouse in chunks using CSV format with headers
     for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
+      const dedupToken = createHash('sha1')
+        .update(chunk.map((e) => e.id).join(','))
+        .digest('hex');
       if (process.env.USE_CSV === 'true' || process.env.USE_CSV === '1') {
@@
-        await ch.insert({
-          table: 'events',
+        await ch.insert({
+          table: TABLE_NAMES.events,
           values: csvStream,
           format: 'CSV',
           clickhouse_settings: {
             input_format_csv_skip_first_lines: '1',
-            format_csv_allow_single_quotes: 1,
-            format_csv_allow_double_quotes: 1,
+            // RFC4180 double-quoted fields only
+            format_csv_allow_double_quotes: 1,
+            input_format_csv_null_representation: '\\N',
+            insert_deduplication_token: dedupToken,
           },
         });
@@
-      } else {
+      } else {
         await getRedisCache().incrby('event:buffer:json:counter', chunk.length);
-        await ch.insert({
-          table: 'events',
+        await ch.insert({
+          table: TABLE_NAMES.events,
           values: chunk,
           format: 'JSONEachRow',
+          clickhouse_settings: {
+            insert_deduplication_token: dedupToken,
+          },
         });
       }
     }

Also applies to: 9-15, 448-477


161-168: Avoid deleting last screen_view on session_end; races with out-of-order events.

Rely on TTL to expire; immediate DEL can drop needed enrichment.

-      // Clear last screen_view on session_end
-      if (event.name === 'session_end' && event.profile_id) {
-        const lastEventKey = this.getLastEventKey({
-          projectId: event.project_id,
-          profileId: event.profile_id,
-        });
-        multi.del(lastEventKey);
-      }
+      // Do not delete immediately; rely on TTL to avoid out-of-order races
apps/worker/src/utils/singleton-lock.ts (2)

32-55: Extend lock only if owned; avoid overwriting another process’s lock.

SET ... XX can refresh a lock we don’t own.

-  const extensionInterval = setInterval(async () => {
+  const extensionInterval = setInterval(async () => {
     try {
-      // Extend the lock by setting it again with the same value
-      const redis = getRedisCache();
-      const result = await redis.set(lockKey, lockValue, 'PX', ttlMs, 'XX');
-
-      if (result === 'OK') {
+      // Atomically extend only if we still own the lock
+      const redis = getRedisCache();
+      const extended: number = (await redis.eval(
+        "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('pexpire', KEYS[1], ARGV[2]) else return 0 end",
+        1,
+        lockKey,
+        lockValue,
+        String(ttlMs),
+      )) as unknown as number;
+
+      if (extended === 1) {
         logger.debug('Extended singleton lock', { key });
       } else {
         // Lock was lost (someone else acquired it or it expired)
         logger.error('Lost singleton lock - exiting', { key });
         clearInterval(extensionInterval);
         process.exit(1);
       }
     } catch (error: unknown) {
       logger.error('Failed to extend singleton lock - exiting', {
         key,
         error,
       });
       clearInterval(extensionInterval);
       process.exit(1);
     }
   }, ttlMs / 2);

57-69: Release only if owned to avoid deleting someone else’s lock.

Use atomic check-and-del.

   return () => {
     clearInterval(extensionInterval);
-    getRedisCache()
-      .del(lockKey)
-      .then(() => {
-        logger.debug('Released singleton lock', { key });
-      })
-      .catch((error: unknown) => {
-        logger.error('Failed to release singleton lock', { key, error });
-      });
+    (async () => {
+      try {
+        const res: number = (await getRedisCache().eval(
+          "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end",
+          1,
+          lockKey,
+          lockValue,
+        )) as unknown as number;
+        if (res === 1) {
+          logger.debug('Released singleton lock', { key });
+        } else {
+          logger.warn('Did not release singleton lock (not owner)', { key });
+        }
+      } catch (error: unknown) {
+        logger.error('Failed to release singleton lock', { key, error });
+      }
+    })();
   };
apps/worker/src/boot-workers.ts (3)

240-240: Major: Logging job.data on failure exposes PII.

As noted in previous reviews, job.data contains PII including profileId, event.properties, geo coordinates, and device IDs. This logging violates GDPR/CCPA compliance requirements.

Apply this diff:

         logger.error('job failed', {
           jobId: job.id,
           worker: worker.name,
-          data: job.data,
           error: job.failedReason,
           options: job.opts,
         });

133-148: Critical: Lock is still not enforced, allowing duplicate event processing.

This issue was previously reported but remains unresolved. The lock is checked at line 133, but incomingEventPure(job.data) is called at line 148 regardless of whether the lock was acquired. When getLock returns false, another worker is already processing this event, and the current worker should skip processing.

Apply this diff to enforce the lock:

-      handler: async (job) => {
-        if (await getLock(job.id, '1', 10000)) {
-          logger.info('worker handler', {
-            jobId: job.id,
-            groupId: job.groupId,
-            timestamp: job.data.event.timestamp,
-            data: job.data,
-          });
-        } else {
-          logger.info('event already processed', {
-            jobId: job.id,
-            groupId: job.groupId,
-            timestamp: job.data.event.timestamp,
-            data: job.data,
-          });
-        }
-        await incomingEventPure(job.data);
-      },
+      handler: async (job) => {
+        if (await getLock(job.id, '1', 10000)) {
+          logger.info('worker handler', {
+            jobId: job.id,
+            groupId: job.groupId,
+            timestamp: job.data.event.timestamp,
+          });
+          await incomingEventPure(job.data);
+        } else {
+          logger.info('event already processed', {
+            jobId: job.id,
+            groupId: job.groupId,
+            timestamp: job.data.event.timestamp,
+          });
+        }
+      },

138-138: Major: Logging job.data exposes PII.

This issue was previously reported but remains unresolved. According to the past review analysis, job.data for EventsQueuePayloadIncomingEvent contains:

  • profileId (user identifier)
  • event.properties (arbitrary user-provided data)
  • geo location coordinates
  • device IDs

These qualify as PII under GDPR/CCPA. Remove the data: job.data from the log statements.

Apply this diff:

         if (await getLock(job.id, '1', 10000)) {
           logger.info('worker handler', {
             jobId: job.id,
             groupId: job.groupId,
             timestamp: job.data.event.timestamp,
-            data: job.data,
           });
         } else {
           logger.info('event already processed', {
             jobId: job.id,
             groupId: job.groupId,
             timestamp: job.data.event.timestamp,
-            data: job.data,
           });
         }

Also applies to: 145-145

🧹 Nitpick comments (3)
packages/db/src/buffers/base-buffer.ts (1)

11-11: Consider making this property readonly.

Since enableParallelProcessing is set once in the constructor and never modified, marking it readonly prevents accidental mutation.

Apply this diff:

-  enableParallelProcessing: boolean;
+  readonly enableParallelProcessing: boolean;
packages/db/src/buffers/bot-buffer.ts (1)

56-59: Filter JSON parse failures before inserting into ClickHouse.

getSafeJson can return null; guard to avoid bad rows.

Apply:

-      const parsedEvents = events.map((e) =>
-        getSafeJson<IClickhouseBotEvent>(e),
-      );
+      const parsedEvents = events
+        .map((e) => getSafeJson<IClickhouseBotEvent>(e))
+        .filter(
+          (e): e is IClickhouseBotEvent => e !== null,
+        );
apps/worker/src/metrics.ts (1)

50-60: Minor: precompute sanitized queue name to avoid repeated replace().

Small readability/alloc win in hot metrics path.

-queues.forEach((queue) => {
+queues.forEach((queue) => {
+  const qname = queue.name.replace(/[\{\}]/g, '');
   register.registerMetric(
     new client.Gauge({
-      name: `${queue.name.replace(/[\{\}]/g, '')}_active_count`,
+      name: `${qname}_active_count`,
@@
-      name: `${queue.name.replace(/[\{\}]/g, '')}_delayed_count`,
+      name: `${qname}_delayed_count`,
@@
-      name: `${queue.name.replace(/[\{\}]/g, '')}_failed_count`,
+      name: `${qname}_failed_count`,
@@
-      name: `${queue.name.replace(/[\{\}]/g, '')}_completed_count`,
+      name: `${qname}_completed_count`,
@@
-      name: `${queue.name.replace(/[\{\}]/g, '')}_waiting_count`,
+      name: `${qname}_waiting_count`,

Also applies to: 62-71, 73-82, 84-93, 95-104

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 560df69 and 3363147.

📒 Files selected for processing (11)
  • apps/api/src/controllers/misc.controller.ts (3 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/jobs/events.create-session-end.ts (1 hunks)
  • apps/worker/src/metrics.ts (5 hunks)
  • apps/worker/src/utils/session-handler.ts (1 hunks)
  • apps/worker/src/utils/singleton-lock.ts (1 hunks)
  • packages/db/src/buffers/base-buffer.ts (4 hunks)
  • packages/db/src/buffers/bot-buffer.ts (1 hunks)
  • packages/db/src/buffers/event-buffer.ts (3 hunks)
  • packages/db/src/buffers/profile-buffer.ts (6 hunks)
  • packages/db/src/buffers/session-buffer.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
apps/api/src/controllers/misc.controller.ts (2)
apps/api/src/utils/logger.ts (1)
  • logger (3-3)
apps/worker/src/utils/logger.ts (1)
  • logger (3-3)
apps/worker/src/metrics.ts (2)
packages/queue/src/queues.ts (3)
  • sessionsQueue (172-178)
  • cronQueue (184-190)
  • eventsGroupQueues (141-161)
packages/db/src/buffers/index.ts (1)
  • eventBuffer (6-6)
apps/worker/src/utils/singleton-lock.ts (1)
packages/redis/redis.ts (2)
  • getLock (122-125)
  • getRedisCache (66-72)
apps/worker/src/boot-workers.ts (4)
packages/redis/redis.ts (2)
  • getRedisQueue (93-105)
  • getLock (122-125)
packages/queue/src/queues.ts (4)
  • EVENTS_GROUP_QUEUES_SHARDS (14-17)
  • eventsGroupQueues (141-161)
  • EventsQueuePayloadIncomingEvent (28-68)
  • queueLogger (26-26)
apps/worker/src/utils/singleton-lock.ts (1)
  • requireSingleton (12-69)
apps/worker/src/metrics.ts (1)
  • eventsGroupJobDuration (18-24)
packages/db/src/buffers/profile-buffer.ts (3)
packages/db/src/services/profile.service.ts (1)
  • IClickhouseProfile (232-242)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/db/src/buffers/event-buffer.ts (5)
packages/redis/redis.ts (3)
  • Redis (9-9)
  • getRedisCache (66-72)
  • getRedisPub (84-90)
packages/db/src/services/event.service.ts (3)
  • IClickhouseEvent (65-99)
  • IServiceEvent (142-179)
  • transformEvent (101-135)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/redis/publisher.ts (1)
  • publishEvent (28-36)
packages/db/src/buffers/base-buffer.ts (1)
packages/logger/index.ts (1)
  • createLogger (11-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
🔇 Additional comments (14)
apps/worker/src/utils/session-handler.ts (1)

116-116: Logging level change looks appropriate for reducing verbosity.

Downgrading this log from info to debug aligns with the PR's performance objectives and reduces noise from transient states (especially 'active') that can occur during normal retry processing.

Optionally verify that critical state transitions (e.g., unexpected states, excessive retries) are still observable through metrics or other monitoring mechanisms.

apps/worker/src/jobs/events.create-session-end.ts (1)

71-71: LGTM! Log level change is appropriate.

Changing this routine processing message from info to debug reduces log noise while maintaining diagnostic visibility.

packages/db/src/buffers/base-buffer.ts (1)

97-114: LGTM! Parallel mode implementation is sound.

The parallel processing path correctly:

  • Skips lock acquisition for concurrent processing
  • Logs appropriately at debug and error levels
  • Documents why the counter isn't reset on error (relies on periodic resync)
  • Returns early to avoid the sequential path

The design trade-offs are clearly documented inline.

packages/db/src/buffers/profile-buffer.ts (4)

18-20: LGTM!

The TTL configuration is clear and the default of 1 hour is reasonable for profile caching.


100-100: LGTM!

The unified TTL approach simplifies the caching logic.


190-190: LGTM!

The logging level changes from info to debug appropriately reduce verbosity for routine buffer operations.

Also applies to: 202-202, 222-222


148-148: No changes needed—public visibility is intentional.

The fetchFromCache method is correctly exposed as public since it's actively called from packages/db/src/services/profile.service.ts (line 96), confirming this API expansion is intentional and necessary.

apps/api/src/controllers/misc.controller.ts (1)

132-136: No concerns — observability infrastructure already handles visibility.

The codebase uses HyperDX OpenTelemetry with automatic distributed tracing (enabled via HYPERDX_API_KEY). Image processing routes are automatically instrumented and traced, capturing timing, errors, and call context without requiring verbose log statements.

Downgrading these logs to debug aligns with best practices for hot paths: use tracing and metrics for performance signals, reserve logs for errors and contextual diagnosis. The visibility into image processing paths is maintained through automatic tracing—no manual metrics or log-level adjustments are needed.

packages/db/src/buffers/bot-buffer.ts (1)

74-76: Log level reduction to debug is fine.

Noise reduction during steady-state. No functional change.

apps/worker/src/boot-workers.ts (5)

35-64: LGTM! Clean queue selection logic.

The function correctly parses the ENABLED_QUEUES environment variable and provides sensible defaults when not specified. The logging provides good visibility into which queues are being started.


66-82: LGTM! Robust concurrency configuration.

The function properly extracts and validates concurrency settings from environment variables with appropriate fallbacks and input validation.


85-94: LGTM! Proper singleton enforcement setup.

The singleton lock is correctly acquired when requested, and the cleanup function is stored for proper release during shutdown.


157-200: LGTM! Consistent worker initialization pattern.

The worker setup for sessions, cron, notification, and misc queues follows a clean, consistent pattern with proper concurrency configuration and logging.


266-312: LGTM! Proper graceful shutdown implementation.

The exit handler correctly waits for the cron queue to empty when enabled, closes all workers, and releases the singleton lock. The error handling and logging are appropriate.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (1)

23-35: Add enabled condition to gate query execution until columnVisibility is ready.

The conversions route is missing the enabled: columnVisibility !== null condition that exists in the events route (same folder) and all other similar routes using useReadColumnVisibility. Add this condition to the query options object to ensure consistent behavior:

{
  enabled: columnVisibility !== null,
  getNextPageParam: (lastPage) => lastPage.meta.next,
}
packages/geo/src/geo.ts (1)

65-86: Critical: Cache is checked but never populated.

The cache lookup is implemented correctly, but the result is never stored in the cache after a successful geo location retrieval. This means the cache will always miss and serve no purpose.

Apply this diff to populate the cache after successful retrieval:

   try {
     const response = await reader?.city(ip);
-    return {
+    const result = {
       city: response?.city?.names.en,
       country: response?.country?.isoCode,
       region: response?.subdivisions?.[0]?.names.en,
       longitude: response?.location?.longitude,
       latitude: response?.location?.latitude,
     };
+    cache.set(ip, result);
+    return result;
   } catch (error) {
     return DEFAULT_GEO;
   }
♻️ Duplicate comments (27)
packages/trpc/src/routers/event.ts (2)

139-144: Fix incorrect field mappings in select object.

The field mappings are incorrect and override user column visibility preferences:

  • city is mapped to columnVisibility?.country instead of columnVisibility?.city
  • path is mapped to columnVisibility?.name instead of columnVisibility?.path
  • duration is mapped to columnVisibility?.name instead of columnVisibility?.duration

Additionally, since ...columnVisibility spreads before other fields, a user could hide the id column, which may break table rendering if id is used for list keys.

Apply this diff to fix the mappings and ensure id is always included:

        select: {
          ...columnVisibility,
+         id: true,
-         city: columnVisibility?.country ?? true,
-         path: columnVisibility?.name ?? true,
-         duration: columnVisibility?.name ?? true,
+         city: columnVisibility?.city ?? true,
+         path: columnVisibility?.path ?? true,
+         duration: columnVisibility?.duration ?? true,
          projectId: false,
        },

210-214: Fix incorrect field mappings in select object.

The same incorrect field mappings from the events query are repeated here:

  • city mapped to columnVisibility?.country instead of columnVisibility?.city
  • path mapped to columnVisibility?.name instead of columnVisibility?.path
  • duration mapped to columnVisibility?.name instead of columnVisibility?.duration

Additionally, ensure id is always included to prevent table rendering issues.

Apply this diff:

        select: {
          ...columnVisibility,
+         id: true,
-         city: columnVisibility?.country ?? true,
-         path: columnVisibility?.name ?? true,
-         duration: columnVisibility?.name ?? true,
+         city: columnVisibility?.city ?? true,
+         path: columnVisibility?.path ?? true,
+         duration: columnVisibility?.duration ?? true,
          projectId: false,
        },
apps/api/src/bots/index.ts (2)

8-9: Restore case-insensitive bot regexes

The pre-compilation dropped the i flag, so patterns like /Googlebot/i now miss mixed-case UAs and regress detection accuracy. Please reinstate the case-insensitive flag.

-      compiledRegex: new RegExp(bot.regex),
+      compiledRegex: new RegExp(bot.regex, 'i'),

20-25: Do not treat Mozilla/5.0 as a browser signature

Keeping 'Mozilla/5.0' in legitimateBrowserPatterns makes hasBrowser true for virtually every modern UA (including bots that mimic real browsers), so the early exit returns null for bots like Googlebot. Remove this token so we require an actual product identifier.

 const legitimateBrowserPatterns = [
-  'Mozilla/5.0', // Nearly all modern browsers
   'Chrome/', // Chrome/Chromium browsers
   'Safari/', // Safari and Chrome-based browsers
   'Firefox/', // Firefox
   'Edg/', // Edge
 ];
packages/db/src/services/event.service.ts (3)

406-420: Critical SQL syntax error: fractional INTERVAL is invalid.

Lines 414 and 419 use INTERVAL ${dateIntervalInDays} DAY with the default value of 0.5, which produces invalid ClickHouse SQL. ClickHouse does not accept fractional values with the INTERVAL n DAY syntax.

Use one of these solutions:

  • Option 1 (recommended): Convert to integer seconds: INTERVAL ${Math.ceil(dateIntervalInDays * 86400)} SECOND
  • Option 2: Use the toIntervalDay() function: toIntervalDay(${dateIntervalInDays})

Apply this diff to fix with integer seconds:

-    select: incomingSelect,
-    dateIntervalInDays = 0.5,
+    select: incomingSelect,
+    dateIntervalInDays = 0.5,
   } = options;
+  const intervalSeconds = Math.ceil(dateIntervalInDays * 86400);
   const { sb, getSql, join } = createSqlBuilder();
 
   if (typeof cursor === 'number') {
     sb.offset = Math.max(0, (cursor ?? 0) * take);
   } else if (cursor instanceof Date) {
-    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(cursor))}, 3) - INTERVAL ${dateIntervalInDays} DAY`;
+    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(cursor))}, 3) - INTERVAL ${intervalSeconds} SECOND`;
     sb.where.cursor = `created_at <= ${sqlstring.escape(formatClickhouseDate(cursor))}`;
   }
 
   if (!cursor) {
-    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(new Date()))}, 3) - INTERVAL ${dateIntervalInDays} DAY`;
+    sb.where.cursorWindow = `created_at >= toDateTime64(${sqlstring.escape(formatClickhouseDate(new Date()))}, 3) - INTERVAL ${intervalSeconds} SECOND`;
   }

391-391: Critical issue remains unresolved: dateIntervalInDays needs validation and recursion cap.

This option was flagged in previous reviews but the critical issues remain:

  1. No input validation or bounds (should be clamped, e.g., [0, 14])
  2. No recursion depth limit when this value is doubled (line 573)
  3. Default 0.5 causes invalid ClickHouse SQL (see lines 414, 419)

Add validation at the API boundary (TRPC/Zod schema) and document the allowed range.


570-575: Critical: Unbounded recursion when no events exist.

If the project has no events or the cursor window contains no events, this function recursively doubles dateIntervalInDays indefinitely without any upper bound, eventually causing a stack overflow or timeout.

Add a maximum interval cap before recursing:

+  const MAX_INTERVAL_DAYS = 14;
   // If we dont get any events, try without the cursor window
-  if (data.length === 0 && sb.where.cursorWindow) {
+  if (data.length === 0 && sb.where.cursorWindow && dateIntervalInDays < MAX_INTERVAL_DAYS) {
     return getEventList({
       ...options,
-      dateIntervalInDays: dateIntervalInDays * 2,
+      dateIntervalInDays: Math.min(dateIntervalInDays * 2, MAX_INTERVAL_DAYS),
     });
   }
packages/db/src/services/profile.service.ts (1)

48-56: Session metrics still need alignment with sessions table.

The existing past review comment regarding session counting and duration calculations remains unaddressed. The code still counts session_start events instead of reading from the sessions table with proper sign handling, which can lead to data drift.

packages/db/src/buffers/profile-buffer.ts (1)

148-161: Use instance redis field for consistency.

The new fetchFromCache method calls getRedisCache() directly (line 156) instead of using this.redis initialized in the constructor. This inconsistency also exists in alreadyExists at line 52.

Apply this diff for consistency:

 public async fetchFromCache(
   profileId: string,
   projectId: string,
 ): Promise<IClickhouseProfile | null> {
   const cacheKey = this.getProfileCacheKey({
     profileId,
     projectId,
   });
-  const existingProfile = await getRedisCache().get(cacheKey);
+  const existingProfile = await this.redis.get(cacheKey);
   if (!existingProfile) {
     return null;
   }
   return getSafeJson<IClickhouseProfile>(existingProfile);
 }
packages/redis/cachable.ts (1)

14-64: Add guards for undefined/null caching and error handling.

Three issues remain from the past review comment:

  1. No try/catch around JSON.parse (line 31): Corrupt cache entries will throw
  2. No undefined/null guards (lines 42-46, 54-59): Can cache and serve invalid values
  3. No .catch on fire-and-forget (line 61): Unhandled promise rejections possible

Apply safeguards:

   const hit = await getRedisCache().get(key);
   if (hit) {
-    const parsed = JSON.parse(hit, (_, value) => {
+    let parsed: T;
+    try {
+      parsed = JSON.parse(hit, (_, value) => {
         if (
           typeof value === 'string' &&
           /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.*Z$/.test(value)
         ) {
           return new Date(value);
         }
         return value;
-    });
+      });
+    } catch {
+      parsed = undefined as unknown as T;
+    }

     // Store in LRU cache for next time
-    if (useLruCache) {
+    if (useLruCache && parsed !== undefined && parsed !== null) {
       globalLruCache.set(key, parsed, {
         ttl: expireInSec * 1000,
       });
     }

-    return parsed;
+    if (parsed !== undefined && parsed !== null) {
+      return parsed;
+    }
   }

   const data = await fn();

-  if (useLruCache) {
+  if (useLruCache && data !== undefined && data !== null) {
     globalLruCache.set(key, data, {
       ttl: expireInSec * 1000,
     });
   }
-  getRedisCache().setex(key, expireInSec, JSON.stringify(data));
+  if (data !== undefined && data !== null) {
+    getRedisCache()
+      .setex(key, expireInSec, JSON.stringify(data))
+      .catch(() => {});
+  }
apps/api/src/controllers/profile.controller.ts (1)

19-51: Sanitize internal properties before persisting.

The payload.properties are spread directly into the upsert (line 35) without filtering internal/transient keys prefixed with __. This matches the unresolved concern from the past review comment and could persist temporary client-side markers.

Consider filtering before spreading:

   properties: {
-    ...(payload.properties ?? {}),
+    ...Object.fromEntries(
+      Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
+    ),
     country: geo.country,
     city: geo.city,
packages/db/src/services/clients.service.ts (1)

37-37: Cache invalidation still missing in client mutations.

The past review comment remains valid: client update and remove operations in packages/trpc/src/routers/client.ts do not call getClientByIdCached.clear(). With a 24-hour TTL, stale auth policies (CORS, filters, secrets) can be served for up to a day after mutations.

packages/db/src/services/salt.service.ts (1)

20-47: Cache invalidation missing in salt rotation.

The past review comment is still valid: apps/worker/src/jobs/cron.salt.ts performs salt rotation via db.salt.create() and db.salt.deleteMany() but never calls getSalts.clear(). This can serve stale salts for up to 10 minutes post-rotation.

Add getSalts.clear() after both the create and deleteMany operations in the rotation job.

packages/db/src/buffers/event-buffer.ts (2)

366-372: Add an insert_deduplication_token per chunk

If we crash after writing a chunk but before trimming the queue, the retry replays the same payload and ClickHouse happily double-inserts. Please derive a stable token per chunk and pass it in the insert settings. For example:

+import { createHash } from 'node:crypto';
@@
-      for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
-        await ch.insert({
-          table: 'events',
-          values: chunk,
-          format: 'JSONEachRow',
-        });
+      for (const chunk of this.chunks(eventsToClickhouse, this.chunkSize)) {
+        const dedupToken = createHash('sha1')
+          .update(chunk.map((event) => event.id).join(','))
+          .digest('hex');
+
+        await ch.insert({
+          table: 'events',
+          values: chunk,
+          format: 'JSONEachRow',
+          clickhouse_settings: {
+            insert_deduplication_token: dedupToken,
+          },
+        });
       }

132-152: session_end must not drop late-arriving screen_views

This script GETDELs the session key and deletes the profile key. When a session_end races ahead of the final screen_view (common with network reordering), we enqueue the older view plus the session_end, wipe the key, and the real last screen_view arrives later to an empty slot—no subsequent event flushes it, so it sits until TTL and is effectively lost. Please keep enough state for out-of-order arrivals (e.g. avoid deleting the key outright or store a sentinel that addScreenView can reconcile) so we don’t lose data.

packages/common/server/parser-user-agent.ts (1)

118-126: Stop masking undefined UA fields

UAParser routinely leaves these properties undefined. Casting with as string doesn’t coerce the runtime value, so we end up returning undefined while the type system thinks it’s a string—downstream string ops (.toLowerCase(), concatenation, etc.) will throw. Please normalise/default explicitly instead of asserting. For example:

+  const readOverride = (value: unknown) =>
+    typeof value === 'string' ? value : undefined;
   return {
-    os: (overrides?.__os || res.os.name) as string,
-    osVersion: (overrides?.__osVersion || res.os.version) as string,
-    browser: (overrides?.__browser || res.browser.name) as string,
-    browserVersion: (overrides?.__browserVersion ||
-      res.browser.version) as string,
-    device: (overrides?.__device || res.device.type || getDevice(ua)) as string,
-    brand: (overrides?.__brand || res.device.vendor) as string,
-    model: (overrides?.__model || res.device.model) as string,
+    os: readOverride(overrides?.__os) ?? res.os.name ?? '',
+    osVersion: readOverride(overrides?.__osVersion) ?? res.os.version ?? '',
+    browser: readOverride(overrides?.__browser) ?? res.browser.name ?? '',
+    browserVersion:
+      readOverride(overrides?.__browserVersion) ?? res.browser.version ?? '',
+    device:
+      readOverride(overrides?.__device) ?? res.device.type ?? getDevice(ua),
+    brand: readOverride(overrides?.__brand) ?? res.device.vendor ?? '',
+    model: readOverride(overrides?.__model) ?? res.device.model ?? '',
   };
apps/api/src/controllers/track.controller.ts (6)

58-86: Make timestamps strictly numeric and robustly coerced.

The safeTimestamp could be a string (depending on Fastify typing), and clientTimestamp is created via new Date() then converted to number. This can still lead to type inconsistencies.

Apply a robust type coercion as suggested in the past review:

 export function getTimestamp(
   timestamp: FastifyRequest['timestamp'],
   payload: TrackHandlerPayload['payload'],
 ) {
+  const toMs = (v: unknown): number | undefined => {
+    if (typeof v === 'number' && Number.isFinite(v)) return v;
+    if (typeof v === 'string') {
+      const n = Number(v);
+      if (Number.isFinite(n)) return n;
+      const d = Date.parse(v);
+      if (Number.isFinite(d)) return d;
+    }
+    return undefined;
+  };
+
-  const safeTimestamp = timestamp || Date.now();
+  const safeTimestamp = toMs(timestamp) ?? Date.now();
   const userDefinedTimestamp =
     'properties' in payload
       ? (payload?.properties?.__timestamp as string | undefined)
       : undefined;
 
   if (!userDefinedTimestamp) {
     return { timestamp: safeTimestamp, isTimestampFromThePast: false };
   }
 
-  const clientTimestamp = new Date(userDefinedTimestamp);
-  const clientTimestampNumber = clientTimestamp.getTime();
+  const clientTimestampNumber = toMs(userDefinedTimestamp);
 
   if (
-    Number.isNaN(clientTimestampNumber) ||
-    clientTimestampNumber > safeTimestamp
+    !Number.isFinite(clientTimestampNumber!) || (clientTimestampNumber as number) > safeTimestamp
   ) {
     return { timestamp: safeTimestamp, isTimestampFromThePast: false };
   }
 
   return {
-    timestamp: clientTimestampNumber,
+    timestamp: clientTimestampNumber as number,
     isTimestampFromThePast: true,
   };
 }

224-235: Type the log parameter properly.

The log parameter is typed as any, losing type safety. Use the proper logger type.

Apply this diff:

   isTimestampFromThePast: boolean;
-  log: any;
+  log: ILogger['info'];
 }) {

95-100: Don't trust client-supplied __ip; validate and use server-derived IP.

Allowing arbitrary properties.__ip enables IP spoofing, which affects geolocation, device ID generation, and deduplication. Validate IP format and prefer server-derived values.

Apply IP validation:

+import { isIP } from 'node:net';
 @@
-  const ip =
-    'properties' in request.body.payload &&
-    request.body.payload.properties?.__ip
-      ? (request.body.payload.properties.__ip as string)
-      : getClientIp(request)!;
-  const ua = request.headers['user-agent']!;
+  const ipCandidate =
+    'properties' in request.body.payload
+      ? (request.body.payload.properties as any)?.__ip
+      : undefined;
+  const ip =
+    (typeof ipCandidate === 'string' && isIP(ipCandidate) ? ipCandidate : undefined) ??
+    getClientIp(request) ??
+    request.ip;
+  const ua =
+    typeof request.headers['user-agent'] === 'string'
+      ? request.headers['user-agent']
+      : undefined;

242-244: Convert timestamp to string explicitly in jobId construction.

The timestamp may be numeric and should be explicitly converted to a string for consistency in the job ID.

Apply this fix:

   const jobId = [
     payload.name,
-    timestamp,
+    String(timestamp),
     projectId,
     currentDeviceId,
     groupId,
   ]
     .filter(Boolean)
     .join('-');

246-263: Logging full data object may expose PII.

The log includes the entire data object with event.properties, profileId, geo coordinates, and device identifiers, which could violate GDPR/CCPA.

Remove or sanitize sensitive fields:

   log('track handler', {
     jobId: jobId,
     groupId: groupId,
     timestamp: timestamp,
-    data: {
-      projectId,
-      headers,
-      event: {
-        ...payload,
-        timestamp,
-        isTimestampFromThePast,
-      },
-      uaInfo,
-      geo,
-      currentDeviceId,
-      previousDeviceId,
-    },
   });

302-315: Avoid persisting internal ("__") properties into profile.properties.

Spreading payload.properties before UA/geo may store transient/private keys like __ip, __timestamp, __os, etc. Strip keys starting with "__" before persisting.

Apply:

   properties: {
-      ...(payload.properties ?? {}),
+      ...Object.fromEntries(
+        Object.entries(payload.properties ?? {}).filter(([k]) => !k.startsWith('__')),
+      ),
       country: geo.country,
       city: geo.city,
apps/worker/src/utils/singleton-lock.ts (2)

33-55: Lock extension can overwrite another owner; use atomic "extend-if-owned".

Using SET ... XX will replace another process's lock if the key exists. The lock should only be extended if we still own it (i.e., the value matches our lockValue).

Apply this diff to use an atomic Lua script that extends only if owned:

   const extensionInterval = setInterval(async () => {
     try {
-      // Extend the lock by setting it again with the same value
-      const redis = getRedisCache();
-      const result = await redis.set(lockKey, lockValue, 'PX', ttlMs, 'XX');
-
-      if (result === 'OK') {
+      // Atomically extend only if we still own the lock
+      const redis = getRedisCache();
+      const extended: number = (await redis.eval(
+        "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('pexpire', KEYS[1], ARGV[2]) else return 0 end",
+        1,
+        lockKey,
+        lockValue,
+        String(ttlMs),
+      )) as unknown as number;
+
+      if (extended === 1) {
         logger.debug('Extended singleton lock', { key });
       } else {
         // Lock was lost (someone else acquired it or it expired)

57-69: Release lock only if owned; avoid deleting another process's lock.

The current implementation unconditionally deletes the lock, which could remove a lock acquired by another process after ours expired.

Apply this diff to atomically check ownership before deleting:

   return () => {
     clearInterval(extensionInterval);
-    getRedisCache()
-      .del(lockKey)
-      .then(() => {
-        logger.debug('Released singleton lock', { key });
-      })
-      .catch((error: unknown) => {
-        logger.error('Failed to release singleton lock', { key, error });
-      });
+    (async () => {
+      try {
+        const res: number = (await getRedisCache().eval(
+          "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end",
+          1,
+          lockKey,
+          lockValue,
+        )) as unknown as number;
+        if (res === 1) {
+          logger.debug('Released singleton lock', { key });
+        } else {
+          logger.warn('Did not release singleton lock (not owner)', { key });
+        }
+      } catch (error: unknown) {
+        logger.error('Failed to release singleton lock', { key, error });
+      }
+    })();
   };
apps/worker/src/metrics.ts (1)

18-26: Fix double registration of eventsGroupJobDuration.

The metric is registered both via registers: [register] (line 23) and register.registerMetric() (line 26), which causes a "metric already registered" error.

Remove the duplicate registration:

 export const eventsGroupJobDuration = new client.Histogram({
   name: 'events_group_job_duration_ms',
   help: 'Duration of job processing in eventsGroupQueues (in ms)',
   labelNames: ['queue_shard', 'status'],
   buckets: [10, 25, 50, 100, 250, 500, 750, 1000, 2000, 5000, 10000, 30000],
-  registers: [register],
 });
-
-register.registerMetric(eventsGroupJobDuration);
apps/api/src/hooks/duplicate.hook.ts (1)

11-16: Empty string fallbacks cause false-positive duplicates.

When clientIp, origin, or projectId are missing, falling back to empty strings means all requests without these values will share the same deduplication key, incorrectly flagging legitimate requests as duplicates.

apps/worker/src/boot-workers.ts (1)

133-148: Critical: Lock is acquired but not enforced, allowing duplicate event processing.

The lock is checked but incomingEventPure(job.data) is called regardless of whether the lock was successfully acquired (line 148 is outside the if/else block). When getLock returns false, it means another worker already holds the lock and is processing this event. The current worker should skip processing in that case.

🧹 Nitpick comments (2)
apps/start/src/components/realtime/realtime-geo.tsx (1)

78-101: LGTM! Consistent addition of Sessions metric.

The Events column width adjustment and new Sessions column complete the consistent implementation across all three realtime components (geo, paths, referrals).

Optional: Consider extracting shared column definitions.

All three realtime components (geo, paths, referrals) now have nearly identical rendering logic for both the Events and Sessions columns. Consider extracting these into shared column definition utilities to reduce duplication and ensure consistency:

// Example: apps/start/src/components/realtime/realtime-columns.tsx
export const createEventsColumn = (number: ReturnType<typeof useNumber>) => ({
  name: 'Events',
  width: '60px',
  render(item: { count: number }) {
    return (
      <div className="row gap-2 justify-end">
        <span className="font-semibold">
          {number.short(item.count)}
        </span>
      </div>
    );
  },
});

export const createSessionsColumn = (number: ReturnType<typeof useNumber>) => ({
  name: 'Sessions',
  width: '82px',
  render(item: { unique_sessions: number }) {
    return (
      <div className="row gap-2 justify-end">
        <span className="font-semibold">
          {number.short(item.unique_sessions)}
        </span>
      </div>
    );
  },
});

Then use them in each component:

columns={[
  // ... other columns
  createEventsColumn(number),
  createSessionsColumn(number),
]}

This would make future changes to these columns easier to maintain across all three components.

apps/worker/src/boot-cron.ts (1)

56-83: Release the lock after completing cron job setup.

The lock is never released after the cron jobs are configured. Consider releasing it immediately after the setup completes to allow other instances to update if needed.

Add lock release after setup:

       );
     }
+    
+    // Release lock after successful setup
+    await getRedisCache().del('cron:lock');
+    logger.info('Cron lock released after setup');
   }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3363147 and c133a90.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (63)
  • apps/api/package.json (2 hunks)
  • apps/api/scripts/get-bots.ts (1 hunks)
  • apps/api/src/bots/index.ts (1 hunks)
  • apps/api/src/controllers/event.controller.ts (3 hunks)
  • apps/api/src/controllers/live.controller.ts (2 hunks)
  • apps/api/src/controllers/misc.controller.ts (3 hunks)
  • apps/api/src/controllers/profile.controller.ts (1 hunks)
  • apps/api/src/controllers/track.controller.ts (10 hunks)
  • apps/api/src/hooks/duplicate.hook.ts (1 hunks)
  • apps/api/src/hooks/fix.hook.ts (0 hunks)
  • apps/api/src/hooks/ip.hook.ts (1 hunks)
  • apps/api/src/index.ts (0 hunks)
  • apps/api/src/routes/event.router.ts (1 hunks)
  • apps/api/src/utils/auth.ts (2 hunks)
  • apps/api/src/utils/deduplicate.ts (1 hunks)
  • apps/api/src/utils/graceful-shutdown.ts (2 hunks)
  • apps/start/package.json (0 hunks)
  • apps/start/src/components/realtime/realtime-geo.tsx (2 hunks)
  • apps/start/src/components/realtime/realtime-paths.tsx (2 hunks)
  • apps/start/src/components/realtime/realtime-referrals.tsx (2 hunks)
  • apps/start/src/components/ui/data-table/data-table-hooks.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (2 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.events.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.profiles.$profileId._tabs.events.tsx (3 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.sessions_.$sessionId.tsx (3 hunks)
  • apps/worker/package.json (2 hunks)
  • apps/worker/src/boot-cron.ts (2 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/cron.delete-projects.ts (1 hunks)
  • apps/worker/src/jobs/events.create-session-end.ts (4 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • apps/worker/src/metrics.ts (5 hunks)
  • apps/worker/src/utils/session-handler.ts (1 hunks)
  • apps/worker/src/utils/singleton-lock.ts (1 hunks)
  • docker-compose.yml (2 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/server/parser-user-agent.ts (7 hunks)
  • packages/db/package.json (1 hunks)
  • packages/db/src/buffers/base-buffer.ts (4 hunks)
  • packages/db/src/buffers/bot-buffer.ts (1 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (2 hunks)
  • packages/db/src/buffers/event-buffer.ts (4 hunks)
  • packages/db/src/buffers/profile-buffer.ts (6 hunks)
  • packages/db/src/buffers/session-buffer.ts (2 hunks)
  • packages/db/src/clickhouse/client.ts (2 hunks)
  • packages/db/src/services/clients.service.ts (1 hunks)
  • packages/db/src/services/event.service.ts (5 hunks)
  • packages/db/src/services/notification.service.ts (2 hunks)
  • packages/db/src/services/profile.service.ts (1 hunks)
  • packages/db/src/services/salt.service.ts (2 hunks)
  • packages/geo/package.json (1 hunks)
  • packages/geo/src/geo.ts (2 hunks)
  • packages/logger/index.ts (2 hunks)
  • packages/queue/package.json (1 hunks)
  • packages/queue/src/queues.ts (4 hunks)
  • packages/redis/cachable.test.ts (0 hunks)
  • packages/redis/cachable.ts (7 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/src/routers/chart.ts (0 hunks)
  • packages/trpc/src/routers/event.ts (3 hunks)
  • packages/trpc/src/routers/profile.ts (2 hunks)
  • packages/trpc/src/routers/realtime.ts (3 hunks)
💤 Files with no reviewable changes (5)
  • packages/trpc/src/routers/chart.ts
  • apps/start/package.json
  • apps/api/src/index.ts
  • packages/redis/cachable.test.ts
  • apps/api/src/hooks/fix.hook.ts
🚧 Files skipped from review as they are similar to previous changes (24)
  • apps/api/scripts/get-bots.ts
  • apps/start/src/routes/app.$organizationId.$projectId.profiles.$profileId._tabs.events.tsx
  • apps/api/src/routes/event.router.ts
  • packages/trpc/src/routers/realtime.ts
  • apps/start/src/routes/app.$organizationId.$projectId.sessions_.$sessionId.tsx
  • apps/api/src/hooks/ip.hook.ts
  • apps/api/src/controllers/misc.controller.ts
  • packages/redis/package.json
  • packages/db/src/buffers/base-buffer.ts
  • packages/db/src/services/notification.service.ts
  • apps/worker/src/jobs/events.incoming-event.ts
  • apps/api/src/utils/auth.ts
  • apps/api/src/utils/graceful-shutdown.ts
  • apps/worker/package.json
  • packages/db/src/buffers/session-buffer.ts
  • apps/api/src/controllers/live.controller.ts
  • apps/start/src/components/ui/data-table/data-table-hooks.tsx
  • apps/api/src/controllers/event.controller.ts
  • apps/start/src/routes/app.$organizationId.$projectId.events._tabs.events.tsx
  • packages/logger/index.ts
  • apps/worker/src/utils/session-handler.ts
  • packages/common/package.json
  • packages/trpc/src/routers/profile.ts
  • packages/db/src/buffers/bot-buffer.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx
  • apps/start/src/components/realtime/realtime-paths.tsx
  • apps/start/src/components/realtime/realtime-referrals.tsx
  • apps/start/src/components/realtime/realtime-geo.tsx
🧠 Learnings (2)
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, use the helper functions in ./packages/db/src/clickhouse/query-functions.ts rather than crafting raw queries

Applied to files:

  • packages/db/src/clickhouse/client.ts
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, always construct them using the custom query builder in ./packages/db/src/clickhouse/query-builder.ts

Applied to files:

  • packages/db/src/clickhouse/client.ts
🧬 Code graph analysis (21)
apps/api/src/hooks/duplicate.hook.ts (3)
packages/sdks/sdk/index.ts (1)
  • PostEventPayload (10-15)
packages/sdks/sdk/src/index.ts (1)
  • TrackHandlerPayload (3-23)
apps/api/src/utils/deduplicate.ts (1)
  • isDuplicatedEvent (4-34)
apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (1)
apps/start/src/components/ui/data-table/data-table-hooks.tsx (1)
  • useReadColumnVisibility (25-29)
apps/api/src/utils/deduplicate.ts (1)
packages/redis/redis.ts (1)
  • getLock (122-125)
packages/db/src/services/profile.service.ts (1)
packages/db/src/buffers/index.ts (1)
  • profileBuffer (7-7)
packages/db/src/services/clients.service.ts (1)
packages/redis/cachable.ts (1)
  • cacheable (159-287)
apps/api/src/controllers/profile.controller.ts (4)
apps/api/src/utils/get-client-ip.ts (1)
  • getClientIp (6-8)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (107-129)
packages/geo/src/geo.ts (1)
  • getGeoLocation (60-86)
packages/db/src/services/profile.service.ts (1)
  • upsertProfile (287-313)
apps/worker/src/boot-workers.ts (9)
packages/redis/redis.ts (2)
  • getRedisQueue (93-105)
  • getLock (122-125)
packages/queue/src/queues.ts (8)
  • EVENTS_GROUP_QUEUES_SHARDS (14-17)
  • eventsGroupQueues (141-160)
  • EventsQueuePayloadIncomingEvent (28-68)
  • queueLogger (26-26)
  • sessionsQueue (171-176)
  • cronQueue (181-186)
  • notificationQueue (202-210)
  • miscQueue (188-193)
apps/worker/src/utils/singleton-lock.ts (1)
  • requireSingleton (12-69)
apps/worker/src/jobs/events.incoming-event.ts (1)
  • incomingEventPure (52-212)
apps/worker/src/jobs/sessions.ts (1)
  • sessionsJob (14-22)
apps/worker/src/jobs/cron.ts (1)
  • cronJob (10-31)
apps/worker/src/jobs/notification.ts (1)
  • notificationJob (10-71)
apps/worker/src/jobs/misc.ts (1)
  • miscJob (7-13)
apps/worker/src/metrics.ts (1)
  • eventsGroupJobDuration (18-24)
apps/worker/src/boot-cron.ts (2)
packages/redis/redis.ts (1)
  • getLock (122-125)
packages/queue/src/queues.ts (1)
  • cronQueue (181-186)
packages/db/src/services/event.service.ts (1)
packages/db/src/services/profile.service.ts (1)
  • getProfileById (104-133)
packages/db/src/buffers/event-buffer.test.ts (2)
packages/db/src/buffers/event-buffer.ts (1)
  • EventBuffer (30-474)
packages/db/src/clickhouse/client.ts (1)
  • ch (131-161)
packages/db/src/services/salt.service.ts (1)
packages/redis/cachable.ts (1)
  • cacheable (159-287)
apps/worker/src/metrics.ts (1)
packages/queue/src/queues.ts (3)
  • sessionsQueue (171-176)
  • cronQueue (181-186)
  • eventsGroupQueues (141-160)
apps/worker/src/jobs/events.create-session-end.ts (1)
packages/db/src/clickhouse/client.ts (1)
  • convertClickhouseDateToJs (238-240)
packages/redis/cachable.ts (1)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/db/src/buffers/event-buffer.ts (5)
packages/db/src/buffers/base-buffer.ts (1)
  • BaseBuffer (5-150)
packages/redis/redis.ts (3)
  • getRedisCache (66-72)
  • Redis (9-9)
  • getRedisPub (84-90)
packages/db/src/services/event.service.ts (3)
  • IClickhouseEvent (65-99)
  • transformEvent (101-135)
  • IServiceEvent (142-179)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/redis/publisher.ts (1)
  • publishEvent (28-36)
apps/worker/src/utils/singleton-lock.ts (1)
packages/redis/redis.ts (2)
  • getLock (122-125)
  • getRedisCache (66-72)
apps/api/src/controllers/track.controller.ts (5)
packages/sdks/sdk/src/index.ts (2)
  • IdentifyPayload (36-43)
  • TrackPayload (25-29)
packages/geo/src/geo.ts (1)
  • GeoLocation (36-42)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (107-129)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/queue/src/queues.ts (1)
  • getEventsGroupQueueShard (162-169)
apps/worker/src/index.ts (1)
packages/queue/src/queues.ts (1)
  • eventsGroupQueues (141-160)
packages/db/src/buffers/profile-buffer.ts (3)
packages/db/src/services/profile.service.ts (1)
  • IClickhouseProfile (245-255)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/json/index.ts (1)
  • getSafeJson (3-9)
packages/trpc/src/routers/event.ts (1)
packages/db/src/services/event.service.ts (1)
  • getEventList (394-578)
packages/queue/src/queues.ts (3)
packages/logger/index.ts (1)
  • createLogger (11-100)
packages/sdks/sdk/src/index.ts (1)
  • TrackPayload (25-29)
packages/redis/redis.ts (2)
  • getRedisGroupQueue (108-120)
  • getRedisQueue (93-105)
🪛 ast-grep (0.39.6)
apps/api/src/bots/index.ts

[warning] 7-7: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(bot.regex)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
🔇 Additional comments (36)
apps/start/src/components/realtime/realtime-referrals.tsx (1)

68-91: LGTM! Consistent addition of Sessions metric.

The Events column width adjustment and new Sessions column are implemented correctly with consistent formatting and layout.

apps/start/src/components/realtime/realtime-paths.tsx (1)

85-108: LGTM! Consistent addition of Sessions metric.

The Events column width adjustment and new Sessions column match the pattern established in the other realtime components.

apps/worker/src/jobs/events.create-session-end.ts (3)

80-80: Correctness improvement: Awaiting async notification handler.

Adding await ensures that session-end notifications are fully processed before the job completes, and that any errors thrown are properly caught by the surrounding try-catch block. Previously, the promise could have been left dangling.


106-108: Verify the offset reduction from 1000ms to 100ms.

The date conversion change to use convertClickhouseDateToJs is appropriate for handling Clickhouse date strings. However, the offset was reduced from 1000ms to 100ms. Please verify that 100ms is sufficient to ensure the session_end event timestamp consistently comes after all other events in the session, especially considering buffered events or timing edge cases.


71-71: Reduced logging verbosity for routine processing.

Changing to logger.debug reduces log noise for routine session-end processing, which aligns well with the performance objectives of this PR.

packages/db/src/services/event.service.ts (2)

443-444: LGTM: Ensuring required fields for pagination.

Explicitly selecting createdAt and projectId ensures these fields are always available for cursor-based pagination and filtering, regardless of the dynamic select options. This is a good defensive practice.


898-898: LGTM: Aligns with caching refactor.

Switching from getProfileByIdCached to getProfileById is correct. Based on the PR context, getProfileById now handles caching internally via profileBuffer.fetchFromCache, eliminating the need for cached wrapper functions at call sites.

apps/start/src/routes/_app.$organizationId.$projectId_.events._tabs.conversions.tsx (2)

2-2: LGTM: Import statement is correct.

The import correctly references the hook from the data-table-hooks module.


22-22: LGTM: Hook usage is correct.

The hook is properly called within the component with the appropriate key.

docker-compose.yml (2)

28-36: I need to examine the docker-compose files more closely to understand the Dragonfly configuration and verify whether it's actively used.

Let me examine the docker-compose.yml file directly to see the full service definitions:

Based on my verification, the original review comment's concern is substantiated. The codebase shows:

  1. Redis is actively used across the application:

    • packages/trpc/src/trpc.ts imports getRedisCache from @openpanel/redis
    • self-hosting/quiz.ts includes REDIS_URL configuration and logic to remove the Redis service if an external URL is provided
    • Default Redis service is op-kv on port 6379
  2. Dragonfly appears to have no integration in the application code:

    • Zero references to Dragonfly in TypeScript/JavaScript/Python/Go files
    • No environment variables or configuration for Dragonfly
    • No changes to caching layer or client code to support Dragonfly

Dragonfly is fully compatible with Redis and Memcached APIs and requires no code changes to adopt, which means the application code wouldn't need modification to switch. However, the issue is that both services are defined in the docker-compose file with no documentation of their intended relationship.

Clarify Redis and Dragonfly service architecture in docker-compose

Both Redis (op-kv on port 6379) and Dragonfly (op-df on port 6380) are configured, but no code references Dragonfly and the application actively uses Redis. Document whether:

  • Dragonfly is a migration target or performance evaluation tool (remove or document as experimental)
  • Both services are intended to coexist for different purposes
  • This is work-in-progress pending application code changes

Update the PR description to clarify the service architecture decision.


1-1: No issues found; the stylistic changes are valid and follow best practices.

The version field should be quoted (e.g., version: "3") to ensure compatibility with docker-compose and the Compose spec, as unquoted versions can be parsed as numbers and cause errors. The change from unquoted 3 to "3" is the correct and recommended approach. The YAML syntax is valid, and both command array formatting styles used in the file (list format and bracket notation with double quotes) are compatible YAML syntax. All deployment and linting tools will continue to work correctly.

packages/geo/package.json (1)

10-11: LGTM! LRU cache dependency added correctly.

The lru-cache dependency is properly added to support the in-memory caching implementation in geo.ts.

packages/geo/src/geo.ts (2)

10-10: LGTM! Import added correctly.

The LRUCache import is properly added to support the caching implementation.


54-58: Cache configuration is reasonable, but cache population is missing.

The LRU cache configuration looks good with 1000 max entries and a 5-minute TTL. However, the cache is never populated (see comment on lines 65-68).

packages/db/src/services/profile.service.ts (1)

109-112: LGTM: Cache-first retrieval pattern.

The cache lookup via profileBuffer.fetchFromCache before falling back to the database is a clean implementation of the cache-first approach. The null-check and transformation logic properly handle both cache hits and misses.

packages/db/src/clickhouse/client.ts (2)

64-67: LGTM: Timeout increases support large batch operations.

The timeout increases (request_timeout: 60s → 300s, idle_socket_ttl: 8s → 60s) align with the PR's goals of handling larger event batches and reduce the risk of premature connection drops during heavy insert operations.


136-156: LGTM: Comprehensive insert resilience.

The retry wrapper with exponential backoff and the extensive clickhouse_settings provide solid protection against transient connection failures and proxy timeouts. The combination of async_insert, large block sizes, and progress headers ensures large batch inserts complete reliably.

packages/db/src/buffers/profile-buffer.ts (1)

18-20: LGTM: Simplified TTL strategy.

Replacing the conditional daysToKeep logic with a uniform ttlInSeconds (default 1 hour) simplifies the caching strategy and reduces complexity without significant impact on cache effectiveness.

packages/redis/cachable.ts (1)

67-81: LGTM: Useful cache introspection.

The new helper functions provide valuable observability into the global LRU cache state.

packages/db/src/services/clients.service.ts (1)

37-37: LGTM: LRU caching enabled.

The addition of the true flag correctly enables L1 LRU caching for client lookups, improving performance for frequently accessed clients.

packages/db/src/services/salt.service.ts (1)

20-47: LGTM: Clean cacheable wrapper.

The conversion to use the cacheable wrapper with a 10-minute TTL and LRU caching is well-structured and maintains the original validation logic.

packages/db/package.json (1)

16-16: Upgrade is safe—no breaking changes found across v1.2.0 to v1.12.1.

None of the releases between v1.2.0 and v1.5.0 list breaking changes, and v1.12.1 shows only performance improvements and CI updates with no breaking changes. The upgrade follows semver—all intermediate versions add only new features and improvements without removing or altering existing APIs. Your codebase's use of standard ClickHouseClient APIs is fully compatible.

packages/queue/src/queues.ts (5)

14-24: LGTM! Shard infrastructure is well-designed.

The shard selection using SHA-1 hash and modulo operation provides good distribution across shards. The implementation is clean and deterministic.


162-169: LGTM! Helper function properly retrieves shard queues.

The getEventsGroupQueueShard function correctly uses pickShard and includes proper error handling for missing queues.


171-210: Queue namespaces now use Redis cluster hash tags.

The braces {...} in queue names ensure all keys for a given queue go to the same Redis cluster slot, which is essential for Redis Cluster compatibility.


141-160: Undocumented autoBatch environment variables and missing test coverage.

The autoBatch configuration depends on environment variables (AUTO_BATCH_MAX_WAIT_MS and AUTO_BATCH_SIZE) that are not documented in .env.example, and there are no tests covering this configuration. The conditional uses AND logic, so autoBatch remains disabled by default unless both variables are explicitly set.

  • Add AUTO_BATCH_MAX_WAIT_MS and AUTO_BATCH_SIZE to .env.example with example values and descriptions
  • Add test coverage for eventsGroupQueues initialization, including cases where autoBatch is enabled and disabled

130-133: No changes needed — groupmq 1.1.0-next.5 supports orderingDelayMs.

The Queue constructor accepts orderingDelayMs as a configuration option, confirming the code change is valid.

apps/worker/src/metrics.ts (1)

28-83: Approve metric name sanitization for Redis cluster compatibility.

Stripping braces from queue names in metric identifiers is correct, as Prometheus metric names should not contain braces (which are used for Redis cluster hash tags).

apps/api/package.json (1)

16-41: Fastify ecosystem security verification complete—all versions are safe.

The security advisory check shows that fastify 5.6.1 is outside the vulnerable range (>= 5.0.0, <= 5.3.1) for the HIGH severity content-type parsing CVE, and @fastify/websocket 11.2.0 is outside both documented vulnerable ranges. @fastify/compress, @fastify/cors, and @fastify/rate-limit report no known vulnerabilities.

apps/api/src/utils/deduplicate.ts (1)

4-27: Remove contradictory tags and substantiate the 100ms window claim with evidence or context.

The code change itself is valid—adding ip and origin to the deduplication hash prevents legitimate duplicates from different sources. However, the review comment contains two contradictory tags (and) and makes an unsubstantiated claim that "100ms provides a reasonable deduplication window" without evidence, testing, or explanation.

To properly approve or verify this change:

  • Either remove the verification request if 100ms is empirically sufficient, or
  • Provide measured data (e.g., typical retry timing from SDK, production duplicate scenarios) to justify the window, or
  • Add configuration/documentation explaining the choice
packages/queue/package.json (1)

13-14: No compatibility issues detected; pre-release groupmq version warrants evaluation.

The bullmq update (5.8.7 → 5.63.0) introduces breaking changes—explicit Redis connections, queue markers restructuring, job ID type constraints, and stalled job handling—but your codebase already accommodates them: all Worker and Queue instances use explicit connection parameters, job IDs are strings (not integers), and no stalled job repair or marker manipulation occurs. No security vulnerabilities were found for either package.

The primary concern is groupmq 1.1.0-next.5, a prerelease version that should be treated as unstable for production unless explicitly evaluated. Verify whether this pre-release version is intentional and aligns with your stability requirements.

apps/worker/src/index.ts (1)

36-38: LGTM!

The Bull Board configuration correctly adapts to the new sharded queue architecture by mapping over eventsGroupQueues to create individual adapters for each shard.

apps/worker/src/boot-workers.ts (4)

35-64: LGTM!

The getEnabledQueues function provides flexible queue configuration with sensible defaults and clear logging of the selected queues.


66-82: LGTM!

The getConcurrencyFor function provides per-queue concurrency configuration with proper validation and fallback to default values.


230-255: LGTM!

The metrics collection for job duration with per-queue-shard and status labels provides valuable observability into event processing performance.


277-287: LGTM!

The shutdown sequence properly waits for the cron queue to drain before closing workers and releases the singleton lock, ensuring graceful termination.

Comment on lines 90 to 95
...res,
os: {
...res.os,
version: osVersion[1]!.replace('_', '.'),
},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Convert every underscore in iPadOS versions

String.prototype.replace('_', '.') touches only the first underscore. iPadOS patch builds like 17_1_2 therefore surface as 17.1_2, breaking version comparisons and analytics bucketing. Swap to a global replacement:

-          version: osVersion[1]!.replace('_', '.'),
+          version: osVersion[1]!.replace(/_/g, '.'),
🤖 Prompt for AI Agents
In packages/common/server/parser-user-agent.ts around lines 90 to 95, the code
uses osVersion[1]!.replace('_', '.') which only replaces the first underscore
and leaves later underscores (e.g. "17_1_2") intact; change this to perform a
global replacement (e.g. use replace(/_/g, '.') or replaceAll('_', '.')
depending on target runtime/TS lib) so every underscore in the iPadOS version
string is converted to a dot, preserving the non-null assertion and original
object spread structure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/worker/src/boot-workers.ts (1)

183-232: Fix incorrect metric usage: eventsGroupJobDuration used for all worker types.

Lines 207-210 and 226-229 call eventsGroupJobDuration.observe() for all workers in the forEach loop (line 183). However, this metric is specifically named events_group_job_duration_ms and should only track event group queue jobs, not sessions, cron, notification, or misc jobs. This pollutes the metric with incorrect data.

Apply this diff to only observe the metric for event workers:

     (worker as Worker).on('failed', (job) => {
       if (job) {
-        if (job.processedOn && job.finishedOn) {
+        // Only track duration for event group workers
+        if (worker.name.startsWith('events_') && job.processedOn && job.finishedOn) {
           const duration = job.finishedOn - job.processedOn;
           eventsGroupJobDuration.observe(
             { queue_shard: worker.name, status: 'failed' },
             duration,
           );
         }
         logger.error('job failed', {
           jobId: job.id,
           worker: worker.name,
-          data: job.data,
           error: job.failedReason,
           options: job.opts,
         });
       }
     });

     (worker as Worker).on('completed', (job) => {
       if (job) {
-        if (job.processedOn && job.finishedOn) {
+        // Only track duration for event group workers
+        if (worker.name.startsWith('events_') && job.processedOn && job.finishedOn) {
           const duration = job.finishedOn - job.processedOn;
           eventsGroupJobDuration.observe(
             { queue_shard: worker.name, status: 'success' },
             duration,
           );
         }
       }
     });
♻️ Duplicate comments (2)
apps/worker/src/metrics.ts (1)

18-25: Past issue appears resolved, but verify metric registration.

The past review flagged double registration of eventsGroupJobDuration. The current code shows only an explicit register.registerMetric() call on line 25 with no registers: [register] option in the Histogram constructor. This should work correctly, as other metrics in this file follow the same pattern (lines 28-82).

apps/worker/src/boot-workers.ts (1)

203-220: Logging job.data exposes PII including profileId and user properties.

Line 215 logs the entire job.data object for failed jobs. For event workers, this includes PII such as profileId, event.properties, geo coordinates, and device IDs, which qualify as personally identifiable information under GDPR/CCPA.

Remove data: job.data from line 215 and log only non-PII identifiers:

         logger.error('job failed', {
           jobId: job.id,
           worker: worker.name,
-          data: job.data,
           error: job.failedReason,
           options: job.opts,
         });
🧹 Nitpick comments (2)
packages/db/src/buffers/session-buffer.ts (1)

15-17: LGTM! Configurable chunk size follows existing pattern.

The new chunkSize property allows configuration of ClickHouse insert batch size independently from Redis fetch size, which is appropriate for different performance characteristics.

Optional: Consider input validation.

Both chunkSize and batchSize could benefit from validation to handle invalid environment variable values (NaN, zero, negative). However, this is consistent with the existing pattern for batchSize, so it can be addressed separately if needed.

packages/db/src/buffers/profile-buffer.ts (1)

148-161: Correctly addresses past review feedback.

The method now uses this.redis.get(cacheKey) instead of getRedisCache().get(cacheKey), addressing the consistency concern from previous reviews. The use of getSafeJson ensures safe parsing.

Consider adding input validation for edge cases:

 public async fetchFromCache(
   profileId: string,
   projectId: string,
 ): Promise<IClickhouseProfile | null> {
+  if (!profileId || !projectId) {
+    return null;
+  }
   const cacheKey = this.getProfileCacheKey({
     profileId,
     projectId,
   });
   const existingProfile = await this.redis.get(cacheKey);
   if (!existingProfile) {
     return null;
   }
   return getSafeJson<IClickhouseProfile>(existingProfile);
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c133a90 and c1b92b4.

📒 Files selected for processing (7)
  • apps/worker/scripts/cleanup-old-event-buffer-keys.ts (1 hunks)
  • apps/worker/scripts/migrate-delayed-jobs.ts (1 hunks)
  • apps/worker/src/boot-cron.ts (2 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/metrics.ts (5 hunks)
  • packages/db/src/buffers/profile-buffer.ts (7 hunks)
  • packages/db/src/buffers/session-buffer.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/worker/src/boot-cron.ts
🧰 Additional context used
🧬 Code graph analysis (5)
apps/worker/src/metrics.ts (1)
packages/queue/src/queues.ts (3)
  • sessionsQueue (171-176)
  • cronQueue (181-186)
  • eventsGroupQueues (141-160)
apps/worker/src/boot-workers.ts (4)
packages/redis/redis.ts (1)
  • getRedisQueue (93-105)
packages/queue/src/queues.ts (8)
  • EVENTS_GROUP_QUEUES_SHARDS (14-17)
  • eventsGroupQueues (141-160)
  • EventsQueuePayloadIncomingEvent (28-68)
  • queueLogger (26-26)
  • sessionsQueue (171-176)
  • cronQueue (181-186)
  • notificationQueue (202-210)
  • miscQueue (188-193)
apps/worker/src/jobs/events.incoming-event.ts (1)
  • incomingEventPure (52-212)
apps/worker/src/metrics.ts (1)
  • eventsGroupJobDuration (18-23)
packages/db/src/buffers/profile-buffer.ts (2)
packages/db/src/services/profile.service.ts (1)
  • IClickhouseProfile (245-255)
packages/json/index.ts (1)
  • getSafeJson (3-9)
apps/worker/scripts/migrate-delayed-jobs.ts (1)
packages/redis/redis.ts (1)
  • getRedisQueue (93-105)
apps/worker/scripts/cleanup-old-event-buffer-keys.ts (2)
packages/redis/redis.ts (1)
  • getRedisCache (66-72)
packages/logger/index.ts (1)
  • createLogger (11-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (8)
packages/db/src/buffers/session-buffer.ts (2)

215-215: Past review concern addressed - now uses configurable chunk size.

The hard-coded 1000 value has been replaced with this.chunkSize, aligning the ClickHouse insert batch size with the configurable property.


231-231: Appropriate log level adjustment.

Changing from info to debug reduces log verbosity for this high-frequency operation while maintaining error visibility at line 235.

packages/db/src/buffers/profile-buffer.ts (6)

18-20: LGTM! Clear TTL configuration.

The uniform TTL configuration with environment variable override is well-structured and the 1-hour default is appropriate for profile caching.


52-52: Correctly addresses past review feedback.

The method now consistently uses this.redis instead of calling getRedisCache() directly, which aligns with the instance field initialized in the constructor.


136-142: Good refactoring for maintainability.

The refactored code using fetchFromCache is cleaner and provides better separation of concerns compared to inline Redis operations.


190-190: Appropriate log level adjustments.

Downgrading routine processing logs to debug level reduces log noise in production while maintaining error visibility. This is appropriate for background buffer processing operations.

Also applies to: 202-202, 222-222


163-186: Verify the caching strategy for ClickHouse fetches.

When fetchFromClickhouse retrieves a profile from the database (lines 163-186), it returns the result without caching it in Redis. This means subsequent reads for the same profile will hit ClickHouse again until the profile is updated through the add() method.

This could lead to repeated database queries for profiles that are frequently read but infrequently updated. If this is unintentional, consider caching the ClickHouse result:

 private async fetchFromClickhouse(
   profile: IClickhouseProfile,
   logger: ILogger,
 ): Promise<IClickhouseProfile | null> {
   logger.debug('Fetching profile from Clickhouse');
   const result = await chQuery<IClickhouseProfile>(
     `SELECT *
      FROM ${TABLE_NAMES.profiles}
      WHERE project_id = '${profile.project_id}'
        AND id = '${profile.id}'
        ${
          profile.is_external === false
            ? 'AND created_at > now() - INTERVAL 2 DAY'
            : ''
        }
      ORDER BY created_at DESC
      LIMIT 1`,
   );

   logger.debug('Clickhouse fetch result', {
     found: !!result[0],
   });
-  return result[0] || null;
+  const fetchedProfile = result[0] || null;
+  
+  // Cache the fetched profile to avoid repeated ClickHouse queries
+  if (fetchedProfile) {
+    const cacheKey = this.getProfileCacheKey({
+      profileId: fetchedProfile.id,
+      projectId: fetchedProfile.project_id,
+    });
+    await this.redis.set(
+      cacheKey,
+      JSON.stringify(fetchedProfile),
+      'EX',
+      this.ttlInSeconds,
+    );
+  }
+  
+  return fetchedProfile;
 }

100-100: The review comment is based on an incorrect assumption and should be dismissed.

The is_external flag at line 174 is used in a Clickhouse query for time-based filtering, not for Redis cache TTL management. There is no evidence of previous "per-profile external/internal TTL logic." The uniform ttlInSeconds appears to be the existing implementation, and the code uses a single TTL value derived from the PROFILE_BUFFER_TTL_IN_SECONDS environment variable (default: 1 hour). No behavior change regarding TTL differentiation has occurred.

Likely an incorrect or invalid review comment.

Comment on lines +182 to +198
do {
const [newCursor, keys] = await redis.scan(
cursor,
'MATCH',
'event_buffer:session:*',
'COUNT',
100,
);
cursor = newCursor;

if (keys.length > 0) {
const deleted = await redis.del(...keys);
remainingSessionKeys += deleted;
stats.totalKeysDeleted += deleted;
logger.info(`Found and deleted ${deleted} remaining session keys`);
}
} while (cursor !== '0');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid deleting leftover session lists without migrating their events

The final scan removes every event_buffer:session:* key outright. If any session list slipped out of the sorted sets (e.g., partial writes, manual cleanup), this code drops the events instead of migrating them into event_buffer:queue, causing silent data loss. Please migrate any remaining list contents (or at least log and halt) before deletion.

Apply this diff to migrate-and-delete safely:

-    do {
+    const newQueueKey = 'event_buffer:queue';
+    do {
       const [newCursor, keys] = await redis.scan(
         cursor,
         'MATCH',
         'event_buffer:session:*',
         'COUNT',
         100,
       );
       cursor = newCursor;

       if (keys.length > 0) {
-        const deleted = await redis.del(...keys);
-        remainingSessionKeys += deleted;
-        stats.totalKeysDeleted += deleted;
-        logger.info(`Found and deleted ${deleted} remaining session keys`);
+        for (const key of keys) {
+          const events = await redis.lrange(key, 0, -1);
+          if (events.length > 0) {
+            await redis.rpush(newQueueKey, ...events);
+            await redis.incrby('event_buffer:total_count', events.length);
+            stats.totalEventsMigrated += events.length;
+          }
+
+          const deleted = await redis.del(key);
+          if (deleted) {
+            remainingSessionKeys += 1;
+            stats.totalKeysDeleted += 1;
+          }
+        }
+        logger.info(`Processed ${keys.length} remaining session keys`);
       }
-    } while (cursor !== '0');
+    } while (cursor !== '0');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
do {
const [newCursor, keys] = await redis.scan(
cursor,
'MATCH',
'event_buffer:session:*',
'COUNT',
100,
);
cursor = newCursor;
if (keys.length > 0) {
const deleted = await redis.del(...keys);
remainingSessionKeys += deleted;
stats.totalKeysDeleted += deleted;
logger.info(`Found and deleted ${deleted} remaining session keys`);
}
} while (cursor !== '0');
const newQueueKey = 'event_buffer:queue';
do {
const [newCursor, keys] = await redis.scan(
cursor,
'MATCH',
'event_buffer:session:*',
'COUNT',
100,
);
cursor = newCursor;
if (keys.length > 0) {
for (const key of keys) {
const events = await redis.lrange(key, 0, -1);
if (events.length > 0) {
await redis.rpush(newQueueKey, ...events);
await redis.incrby('event_buffer:total_count', events.length);
stats.totalEventsMigrated += events.length;
}
const deleted = await redis.del(key);
if (deleted) {
remainingSessionKeys += 1;
stats.totalKeysDeleted += 1;
}
}
logger.info(`Processed ${keys.length} remaining session keys`);
}
} while (cursor !== '0');
🤖 Prompt for AI Agents
In apps/worker/scripts/cleanup-old-event-buffer-keys.ts around lines 182 to 198,
the code currently deletes every event_buffer:session:* key outright which can
drop events; instead, for each scanned key inspect its type and contents and
migrate any remaining list items into the canonical event_buffer:queue before
deleting. Concretely: for each key returned by SCAN, call TYPE (or LRANGE 0 -1)
to detect a list and fetch all elements; if elements exist, push them into
event_buffer:queue in the correct order (use RPUSH or LPUSH to preserve original
ordering), update remainingSessionKeys/stats with number migrated (not just
deleted), log the migration, and only then DEL the empty session key; if
migration fails, log a clear error and stop/throw rather than silently deleting;
handle non-list keys by logging and skipping or deleting based on policy; ensure
operations are batched or use a MULTI/EXEC pipeline to make migration + delete
atomic where possible and add error handling and metrics.

Comment on lines +140 to +155
const queuesToMigrate: Array<{ old: string; new: string }> = [
{ old: 'sessions', new: '{sessions}' },
{ old: 'misc', new: '{misc}' },
];

// Filter to specific queue if requested
const filtered = specificQueue
? queuesToMigrate.filter((q) => q.old === specificQueue)
: queuesToMigrate;

if (filtered.length === 0) {
console.error(
`❌ Queue "${specificQueue}" not found. Valid queues: sessions, cron, notification, misc`,
);
process.exit(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix inconsistency between available queues and error message.

The queuesToMigrate array (lines 140-143) only includes sessions and misc, but the error message on line 152 claims that valid queues are "sessions, cron, notification, misc". This is misleading.

Apply this diff to align the error message with the actual available queues:

   if (filtered.length === 0) {
     console.error(
-      `❌ Queue "${specificQueue}" not found. Valid queues: sessions, cron, notification, misc`,
+      `❌ Queue "${specificQueue}" not found. Valid queues: sessions, misc`,
     );
     process.exit(1);
   }

Alternatively, if cron and notification queues should be supported, add them to the queuesToMigrate array.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const queuesToMigrate: Array<{ old: string; new: string }> = [
{ old: 'sessions', new: '{sessions}' },
{ old: 'misc', new: '{misc}' },
];
// Filter to specific queue if requested
const filtered = specificQueue
? queuesToMigrate.filter((q) => q.old === specificQueue)
: queuesToMigrate;
if (filtered.length === 0) {
console.error(
`❌ Queue "${specificQueue}" not found. Valid queues: sessions, cron, notification, misc`,
);
process.exit(1);
}
const queuesToMigrate: Array<{ old: string; new: string }> = [
{ old: 'sessions', new: '{sessions}' },
{ old: 'misc', new: '{misc}' },
];
// Filter to specific queue if requested
const filtered = specificQueue
? queuesToMigrate.filter((q) => q.old === specificQueue)
: queuesToMigrate;
if (filtered.length === 0) {
console.error(
`❌ Queue "${specificQueue}" not found. Valid queues: sessions, misc`,
);
process.exit(1);
}

@lindesvard lindesvard closed this Nov 7, 2025
@lindesvard lindesvard deleted the feature/performance branch November 7, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants