Skip to content

Conversation

@lindesvard
Copy link
Contributor

@lindesvard lindesvard commented Oct 19, 2025

Summary by CodeRabbit

  • New Features

    • Event import flow for Umami and Mixpanel with provider-specific import modal and project mappings
    • New "Imports" settings tab showing import history, live progress, status badges, retry/delete actions and detailed config view
  • UI

    • Badge component gains success/warning/info variants
    • Project mapper UI for mapping source→destination projects; date picker caption offers dropdown layout
  • Chores

    • Database and build updates to support import tracking and background import processing

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 28

🧹 Nitpick comments (35)
apps/start/src/modals/index.tsx (1)

8-8: Consider lazy‑loading AddImport to keep initial bundle slim

AddImport brings TRPC/React Query; load it on demand.

Apply:

+import { lazy } from 'react';
-import AddImport from './add-import';
+const AddImport = lazy(() => import('./add-import'));

Ensure ModalProvider or the modal surface renders within a .

Also applies to: 41-41

packages/importer/src/base-provider.ts (1)

7-8: Make core hooks async‑friendly and identifiers immutable

Allow providers to perform async transforms/validation; mark identifiers readonly.

 export abstract class BaseImportProvider<
   TRawEvent extends BaseRawEvent = BaseRawEvent,
 > {
-  abstract provider: string;
-  abstract version: string;
+  abstract readonly provider: string;
+  abstract readonly version: string;
+
+  // Utility type to allow sync or async implementations
+  export type MaybePromise<T> = T | Promise<T>;

   /**
    * Stream-read and parse source (file/API) → yields raw events
    * This should be implemented as an async generator to handle large files efficiently
    */
   abstract parseSource(): AsyncGenerator<TRawEvent, void, unknown>;

   /**
    * Convert provider format → IClickhouseEvent
    */
-  abstract transformEvent(rawEvent: TRawEvent): IClickhouseEvent;
+  abstract transformEvent(rawEvent: TRawEvent): MaybePromise<IClickhouseEvent>;

   /**
    * Validate raw event structure
    */
-  abstract validate(rawEvent: TRawEvent): boolean;
+  abstract validate(rawEvent: TRawEvent): MaybePromise<boolean>;

Also applies to: 19-24

packages/importer/src/utils/referrer-parser.ts (3)

5-35: Type the map and normalize hosts; reduce duplicates

Strong types + normalization avoid listing www/m. variants twice and improve coverage.

-import { stripTrailingSlash } from '@openpanel/common';
+import { stripTrailingSlash } from '@openpanel/common';

+type ReferrerType = 'search' | 'social' | 'referral';
+type ReferrerEntry = { type: ReferrerType; name: string };
+const normalizeHost = (h: string) => h.replace(/^(www\.|m\.)/i, '').toLowerCase();

-const referrers: Record<string, { type: string; name: string }> = {
+const referrers: Record<string, ReferrerEntry> = {
   // Search engines
-  'google.com': { type: 'search', name: 'Google' },
-  'www.google.com': { type: 'search', name: 'Google' },
+  'google.com': { type: 'search', name: 'Google' },
   'bing.com': { type: 'search', name: 'Bing' },
-  'www.bing.com': { type: 'search', name: 'Bing' },
   'yahoo.com': { type: 'search', name: 'Yahoo!' },
-  'www.yahoo.com': { type: 'search', name: 'Yahoo!' },
   'duckduckgo.com': { type: 'search', name: 'DuckDuckGo' },
-  'www.duckduckgo.com': { type: 'search', name: 'DuckDuckGo' },
   'baidu.com': { type: 'search', name: 'Baidu' },
-  'www.baidu.com': { type: 'search', name: 'Baidu' },

   // Social media
-  'facebook.com': { type: 'social', name: 'Facebook' },
-  'www.facebook.com': { type: 'social', name: 'Facebook' },
+  'facebook.com': { type: 'social', name: 'Facebook' },
   'twitter.com': { type: 'social', name: 'Twitter' },
-  'www.twitter.com': { type: 'social', name: 'Twitter' },
   'x.com': { type: 'social', name: 'X' },
-  'www.x.com': { type: 'social', name: 'X' },
   'linkedin.com': { type: 'social', name: 'LinkedIn' },
-  'www.linkedin.com': { type: 'social', name: 'LinkedIn' },
   'instagram.com': { type: 'social', name: 'Instagram' },
-  'www.instagram.com': { type: 'social', name: 'Instagram' },
   'tiktok.com': { type: 'social', name: 'TikTok' },
-  'www.tiktok.com': { type: 'social', name: 'TikTok' },
   'youtube.com': { type: 'social', name: 'YouTube' },
-  'www.youtube.com': { type: 'social', name: 'YouTube' },
   'reddit.com': { type: 'social', name: 'Reddit' },
-  'www.reddit.com': { type: 'social', name: 'Reddit' },
+  't.co': { type: 'social', name: 'Twitter' },
+  'lnkd.in': { type: 'social', name: 'LinkedIn' },
 };
 
 export function parseReferrer(url: string | undefined) {
-  const hostname = getHostname(url);
-  const match = referrers[hostname] ?? referrers[hostname.replace('www.', '')];
+  const hostname = normalizeHost(getHostname(url));
+  const match = referrers[hostname];

Also applies to: 49-52


49-57: Guard output URL to http(s) only

Avoid echoing non-web schemes (e.g., javascript:) back to callers.

 export function parseReferrer(url: string | undefined) {
   const hostname = getHostname(url);
   const match = referrers[hostname] ?? referrers[hostname.replace('www.', '')];

-  return {
+  let safeUrl = '';
+  try {
+    if (url) {
+      const u = new URL(url);
+      if (u.protocol === 'http:' || u.protocol === 'https:') {
+        safeUrl = stripTrailingSlash(url);
+      }
+    }
+  } catch {
+    // ignore
+  }
+
+  return {
     name: match?.name ?? '',
     type: match?.type ?? 'referral',
-    url: stripTrailingSlash(url ?? ''),
+    url: safeUrl,
   };
 }

60-91: Minor: normalize source before lookup and keep types consistent

Lower-case once; reuse normalizeHost for direct key match; return typed result.

-export function getReferrerWithQuery(
-  query: Record<string, string> | undefined,
-) {
+export function getReferrerWithQuery(
+  query: Record<string, string> | undefined,
+): { name: string; type: ReferrerType; url: string } | null {
   if (!query) {
     return null;
   }

-  const source = query.utm_source ?? query.ref ?? query.utm_referrer ?? '';
+  const source = (query.utm_source ?? query.ref ?? query.utm_referrer ?? '').trim();
   if (source === '') {
     return null;
   }

-  const match =
-    Object.values(referrers).find(
-      (referrer) => referrer.name.toLowerCase() === source.toLowerCase(),
-    ) || referrers[source];
+  const lower = source.toLowerCase();
+  const match =
+    Object.values(referrers).find((r) => r.name.toLowerCase() === lower) ||
+    referrers[normalizeHost(lower)];

   if (match) {
     return {
       name: match.name,
       type: match.type,
       url: '',
     };
   }
packages/db/prisma/migrations/20251018214030_fix/migration.sql (1)

1-13: Consider more descriptive migration naming.

The migration name "fix" is not descriptive. Consider renaming to something like "remove_redundant_import_columns" or "migrate_import_config_to_json" to better indicate the purpose of this schema change.

packages/db/code-migrations/5-add-imports-table.sql (1)

32-32: Consider using an Enum for import_status.

The import_status field uses LowCardinality(String) with a default of 'pending'. For better type safety and query performance, consider using an Enum type instead:

`import_status` LowCardinality(Enum8('pending' = 1, 'processing' = 2, 'completed' = 3, 'failed' = 4)) DEFAULT 'pending'

This ensures only valid status values can be stored and provides better compression.

packages/db/prisma/schema.prisma (1)

470-486: Consider using an enum for the status field.

The status field is defined as a String with a comment indicating it should be one of 'pending' | 'processing' | 'completed' | 'failed'. For better type safety and database constraints, define a Prisma enum:

enum ImportStatus {
  pending
  processing
  completed
  failed
}

model Import {
  // ...
  status ImportStatus // instead of String
  // ...
}

This provides compile-time type checking and database-level validation.

packages/trpc/src/root.ts (1)

7-7: Router key named "import" (reserved keyword) — consider renaming for DX.

Valid in JS/TS, but can confuse tooling/usages (e.g., appRouter.import vs dynamic import()). Consider renaming to imports/importer for clarity.

Also applies to: 44-44

packages/queue/src/queues.ts (1)

163-176: Harden import jobs: add attempts/backoff/timeout; optionally expose QueueEvents.

Imports call external providers; transient failures are expected. Provide sensible defaults and metrics hook.

Apply:

 export const importQueue = new Queue<ImportQueuePayload>('import', {
   connection: getRedisQueue(),
   defaultJobOptions: {
     removeOnComplete: 10,
-    removeOnFail: 50,
+    removeOnFail: 50,
+    attempts: 3,
+    backoff: { type: 'exponential', delay: 30_000 },
+    timeout: 1000 * 60 * 30, // 30 min; tune as needed
   },
 });
+
+// Optional: enable events for monitoring dashboards/Bull Board progress
+export const importQueueEvents = new QueueEvents('import', {
+  connection: getRedisQueue(),
+});
apps/worker/src/boot-workers.ts (3)

138-142: Also drain import queue before shutdown to avoid cutting imports mid-run.

-      await waitForQueueToEmpty(cronQueue);
+      await waitForQueueToEmpty(cronQueue);
+      await waitForQueueToEmpty(importQueue, 10 * 60_000); // 10 min cap; tune as needed

155-167: Dev path skips graceful shutdown — can leave jobs in inconsistent state.

Unify behavior and call the same exit handler in dev.

-      process.on(evt, (code) => {
-        if (process.env.NODE_ENV === 'production') {
-          exitHandler(evt, code);
-        } else {
-          logger.info('Shutting down for development', {
-            event: evt,
-            code,
-          });
-          process.exit(0);
-        }
-      });
+      process.on(evt, (code) => {
+        exitHandler(evt, code ?? 0);
+      });

4-13: Remove unused imports and dead/commented code for clarity.

eventsGroupQueue, GroupWorker, eventsJob, incomingEventPure are no longer used; commented worker block can be dropped.

Also applies to: 18-23, 33-48

packages/trpc/src/routers/import.ts (3)

121-131: Guard job removal on delete with error handling (active/locked jobs can throw).

        const job = await importQueue.getJob(importRecord.jobId);
-        if (job) {
-          await job.remove();
-        }
+        if (job) {
+          try {
+            await job.remove();
+          } catch (e) {
+            // best-effort removal; proceed with deletion
+          }
+        }

165-171: Retry should set stable jobId and attempts/backoff like create.

-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
+      const job = await importQueue.add(
+        'import',
+        { type: 'import', payload: { importId: importRecord.id } },
+        { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+      );

3-6: Unused type import (IImportConfig).

IImportConfig is not referenced. Remove to satisfy lint.

packages/importer/src/index.ts (1)

1-14: Public API surface looks good; minor polish

Consider also re-exporting provider config/types (e.g., IUmamiImportConfig) from providers to keep consumers on a single import path. Optional.

IMPORT_REVIEW.md (1)

320-337: Move this review doc to /docs or PR notes to avoid shipping in package tarballs

Keeps repo clean and avoids confusing consumers.

apps/worker/src/jobs/import.ts (4)

1-11: Avoid deep imports; prefer package barrel exports

Importing services via '@openpanel/db/src/services/import.service' couples to internal paths. Re-export these from '@openpanel/db' and import from there to reduce breakage.


75-81: Progress totalBatches fluctuates; compute once or report pct

As totalEvents grows, totalBatches changes mid-run. Either omit totalBatches here or compute a percentage for UI.

-          totalBatches: Math.ceil(totalEvents / BATCH_SIZE),
+          progressPct: totalEvents ? Math.round((processedEvents / totalEvents) * 100) : 0,

17-17: Make BATCH_SIZE configurable

Read from config/env to tune per deployment.

-const BATCH_SIZE = 5000;
+const BATCH_SIZE = Number(process.env.IMPORT_BATCH_SIZE ?? 5000);

200-209: Validate provider-specific config with a schema before instantiation

Avoid passing untyped Prisma JSON into providers. Parse with provider schema (zod) and fail early.

 function createProvider(
   record: Prisma.ImportGetPayload<{ include: { project: true } }>,
 ) {
   switch (record.config.provider) {
     case 'umami':
-      return new UmamiProvider(record.projectId, record.config);
+      // Example: zUmamiImportConfig.safeParse(record.config)
+      return new UmamiProvider(
+        record.projectId,
+        (record.config as any) /* TODO: parse/validate */,
+      );
     default:
       throw new Error(`Unknown provider: ${record.config.provider}`);
   }
 }

I can wire in zod parsing if you expose the provider config schemas from @openpanel/importer.

packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (2)

9-16: Constrain status values

Add a CHECK constraint (or enum) to prevent invalid states.

   "status" TEXT NOT NULL,
@@
 ) ;
+
+-- Restrict status values
+ALTER TABLE "public"."imports"
+  ADD CONSTRAINT "imports_status_check"
+  CHECK ("status" IN ('pending','processing','completed','failed'));

18-22: Add indexes for common queries

List/filters will hit projectId/status/createdAt frequently.

 -- AddForeignKey
 ALTER TABLE "public"."imports" ADD CONSTRAINT "imports_projectId_fkey" FOREIGN KEY ("projectId") REFERENCES "public"."projects"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- Indexes for performance
+CREATE INDEX IF NOT EXISTS "idx_imports_project_createdAt"
+  ON "public"."imports" ("projectId", "createdAt" DESC);
+CREATE INDEX IF NOT EXISTS "idx_imports_status_createdAt"
+  ON "public"."imports" ("status", "createdAt" DESC);
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (5)

5-10: Remove unused icon import

CheckCircle is unused.

-import {
-  CheckCircle,
-  CheckCircleIcon,
-  Download,
-  Loader2Icon,
-  XCircleIcon,
-} from 'lucide-react';
+import { CheckCircleIcon, Download, Loader2Icon, XCircleIcon } from 'lucide-react';

83-104: Add onError to mutations for user feedback

Surface errors with toasts; also consider logging.

 const deleteImport = useMutation(
   trpc.import.delete.mutationOptions({
     onSuccess: () => {
       toast.success('Import deleted', {
         description: 'The import has been successfully deleted.',
       });
       queryClient.invalidateQueries(trpc.import.list.pathFilter());
     },
+    onError: (err) => {
+      toast.error('Delete failed', { description: String(err) });
+    },
   }),
 );
@@
 const retryImport = useMutation(
   trpc.import.retry.mutationOptions({
     onSuccess: () => {
       toast.success('Import retried', {
         description: 'The import has been queued for processing again.',
       });
       queryClient.invalidateQueries(trpc.import.list.pathFilter());
     },
+    onError: (err) => {
+      toast.error('Retry failed', { description: String(err) });
+    },
   }),
 );

244-261: Disable action buttons while pending to avoid double submits

Tie disabled state to mutation pending flags.

-                    {imp.status === 'failed' && (
+                    {imp.status === 'failed' && (
                       <Button
                         variant="outline"
                         size="sm"
-                        onClick={() => retryImport.mutate({ id: imp.id })}
+                        disabled={retryImport.isPending}
+                        onClick={() => retryImport.mutate({ id: imp.id })}
                       >
                         Retry
                       </Button>
                     )}
                     <Button
                       variant="ghost"
                       size="sm"
-                      onClick={() => deleteImport.mutate({ id: imp.id })}
+                      disabled={deleteImport.isPending}
+                      onClick={() => deleteImport.mutate({ id: imp.id })}
                     >
                       Delete
                     </Button>

41-66: Consider serving logos locally or via asset proxy

External Brandfetch URLs can break and may violate CSP. If feasible, mirror assets or use a proxy.


141-161: Consider extending IntegrationCardLogoImage to accept a customizable alt prop for improved accessibility

The alt attribute is currently hardcoded to the generic "Integration Logo" at line 66 of apps/start/src/components/integrations/integration-card.tsx. Extending the component to accept an optional alt prop and passing provider.name would provide more descriptive alternative text for each integration, improving accessibility and SEO.

packages/validation/src/index.ts (1)

444-445: Local file support vs schema

fileUrl is z.string().url(), but the Umami provider supports local files. Either (a) allow file:/// or absolute paths, or (b) drop local-path support in the provider. Today this schema blocks local imports.

- fileUrl: z.string().url(),
+ fileUrl: z.string().refine(
+   (v) => /^https?:\/\//i.test(v) || /^file:\/\//i.test(v) || v.startsWith('/'),
+   'Must be an http(s) URL, file:// URL, or absolute path',
+ ),
packages/importer/src/providers/umami.ts (2)

178-195: Byte counting via body.on('data') competes with pipeline

Attach a lightweight Transform to count bytes between body and decompress to preserve backpressure and avoid dual consumers.

-    if (maxBytes) {
-      body.on('data', (chunk: Buffer) => {
-        seenBytes += chunk.length;
-        if (seenBytes > maxBytes) {
-          controller.abort();
-          body.destroy(new Error(`Stream exceeded size limit (${seenBytes} > ${maxBytes})`));
-        }
-      });
-    }
+    const byteCounter = new (await import('node:stream')).Transform({
+      transform(chunk, _enc, cb) {
+        seenBytes += chunk.length;
+        if (maxBytes && seenBytes > maxBytes) {
+          cb(new Error(`Stream exceeded size limit (${seenBytes} > ${maxBytes})`));
+          return;
+        }
+        cb(null, chunk);
+      },
+    });
@@
-        if (decompress) {
-          await pipeline(body, decompress, parser, { signal: controller.signal });
-        } else {
-          await pipeline(body, parser, { signal: controller.signal });
-        }
+        if (decompress) {
+          await pipeline(body, byteCounter, decompress, parser, { signal: controller.signal });
+        } else {
+          await pipeline(body, byteCounter, parser, { signal: controller.signal });
+        }

Also applies to: 211-227


91-99: Optional: support local Brotli (.br) like remote path

Mirror remote handling and support .br locally if needed.

-    const isGzipped = filePath.endsWith('.gz');
+    const isGzipped = filePath.endsWith('.gz');
+    const isBrotli = filePath.endsWith('.br');
@@
-        if (isGzipped) {
+        if (isGzipped) {
           await pipeline(fileStream, createGunzip(), parser);
-        } else {
+        } else if (isBrotli) {
+          await pipeline(fileStream, createBrotliDecompress(), parser);
+        } else {
           await pipeline(fileStream, parser);
         }
packages/db/src/services/import.service.ts (3)

78-92: Consider using CTEs or subqueries for better maintainability.

The argMin/argMax tuple approach works but is fragile—the 17-element tuple order must match the destructuring on lines 147-173. If fields are added, removed, or reordered, this becomes error-prone and difficult to debug.

A clearer alternative would be to use CTEs with ROW_NUMBER() or separate queries for first/last events, making field selection explicit:

WITH ranked_events AS (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY device_id, session_id, project_id, profile_id ORDER BY created_at ASC) as first_rank,
    ROW_NUMBER() OVER (PARTITION BY device_id, session_id, project_id, profile_id ORDER BY created_at DESC) as last_rank
  FROM ${TABLE_NAMES.events_imports}
  WHERE ${rangeWhere} AND name NOT IN ('session_start', 'session_end')
)
SELECT ...
FROM ranked_events
WHERE first_rank = 1 OR last_rank = 1

26-525: Consider adding explicit error handling and logging.

None of the functions include try-catch blocks or error enrichment. While throwing errors to the caller is acceptable, adding contextual error handling would improve debuggability and allow for better error classification (transient vs. permanent failures).

Consider wrapping critical operations:

export async function insertImportBatch(
  events: IClickhouseEvent[],
  importId: string,
): Promise<ImportStageResult> {
  try {
    if (events.length === 0) {
      return { importId, totalEvents: 0, insertedEvents: 0 };
    }
    
    const eventsWithMetadata = events.map((event) => ({
      ...event,
      import_id: importId,
      import_status: 'pending',
      imported_at_meta: new Date(),
    }));

    await ch.insert({
      table: TABLE_NAMES.events_imports,
      values: eventsWithMetadata,
      format: 'JSONEachRow',
    });

    return {
      importId,
      totalEvents: events.length,
      insertedEvents: events.length,
    };
  } catch (error) {
    // Add context and rethrow or log
    throw new Error(`Failed to insert import batch for ${importId}: ${error instanceof Error ? error.message : String(error)}`);
  }
}

This pattern would help with debugging import failures in production.


278-356: Consider batch migration for very large imports.

The INSERT INTO ... SELECT approach is efficient for most cases, but for imports with millions of events, it might:

  • Block for extended periods
  • Consume significant memory
  • Provide no progress feedback during the migration

For very large imports, consider implementing batch migration:

export async function migrateImportToProduction(
  importId: string,
  from?: string,
  to?: string,
  batchSize = 100000, // configurable batch size
): Promise<void> {
  let offset = 0;
  let migrated = 0;
  
  while (true) {
    const migrationQuery = `
      INSERT INTO ${TABLE_NAMES.events} (...)
      SELECT ...
      FROM ${TABLE_NAMES.events_imports}
      WHERE import_id = {importId:String}
      ${from ? 'AND created_at >= {from:String}' : ''}
      ${to ? 'AND created_at < {to:String}' : ''}
      ORDER BY created_at ASC
      LIMIT {batchSize:UInt32} OFFSET {offset:UInt32}
    `;
    
    const result = await ch.command({
      query: migrationQuery,
      query_params: { importId, from, to, batchSize, offset },
    });
    
    // If no rows affected, we're done
    if (result.query_id && migrated === 0) break;
    
    offset += batchSize;
    // Could call updateImportProgress here
  }
}

This is optional and depends on expected import sizes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8a297e and d71712e.

⛔ Files ignored due to path filters (2)
  • packages/importer/a70ff57d-f632-4292-a98b-658d7734fbec.csv is excluded by !**/*.csv
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (35)
  • IMPORT_REVIEW.md (1 hunks)
  • apps/start/package.json (1 hunks)
  • apps/start/src/components/integrations/integration-card.tsx (1 hunks)
  • apps/start/src/components/ui/badge.tsx (1 hunks)
  • apps/start/src/modals/add-import.tsx (1 hunks)
  • apps/start/src/modals/index.tsx (2 hunks)
  • apps/start/src/routeTree.gen.ts (11 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1 hunks)
  • apps/worker/package.json (1 hunks)
  • apps/worker/src/boot-workers.ts (5 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/import.ts (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.sql (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.ts (1 hunks)
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql (1 hunks)
  • packages/db/prisma/schema.prisma (2 hunks)
  • packages/db/src/clickhouse/client.ts (1 hunks)
  • packages/db/src/services/event.service.ts (1 hunks)
  • packages/db/src/services/import.service.ts (1 hunks)
  • packages/db/src/types.ts (2 hunks)
  • packages/importer/package.json (1 hunks)
  • packages/importer/src/base-provider.ts (1 hunks)
  • packages/importer/src/index.ts (1 hunks)
  • packages/importer/src/providers/umami.ts (1 hunks)
  • packages/importer/src/types.ts (1 hunks)
  • packages/importer/src/utils/referrer-parser.ts (1 hunks)
  • packages/importer/tsconfig.json (1 hunks)
  • packages/importer/vitest.config.ts (1 hunks)
  • packages/queue/src/queues.ts (1 hunks)
  • packages/trpc/package.json (1 hunks)
  • packages/trpc/src/root.ts (2 hunks)
  • packages/trpc/src/routers/import.ts (1 hunks)
  • packages/validation/src/index.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/components/integrations/integration-card.tsx
  • apps/start/src/modals/index.tsx
  • apps/start/src/modals/add-import.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/components/ui/badge.tsx
  • apps/start/src/routeTree.gen.ts
🧬 Code graph analysis (17)
packages/db/src/types.ts (1)
packages/validation/src/index.ts (1)
  • IImportConfig (464-467)
packages/importer/src/utils/referrer-parser.ts (1)
packages/common/src/string.ts (1)
  • stripTrailingSlash (1-3)
packages/queue/src/queues.ts (1)
packages/redis/redis.ts (1)
  • getRedisQueue (93-105)
apps/worker/src/jobs/import.ts (3)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (163-168)
packages/importer/src/providers/umami.ts (2)
  • UmamiRawEvent (55-55)
  • UmamiProvider (57-426)
packages/db/src/services/import.service.ts (8)
  • insertImportBatch (26-53)
  • updateImportProgress (387-400)
  • getImportDateBounds (508-525)
  • reconstructSessions (61-272)
  • migrateImportToProduction (278-356)
  • markImportComplete (361-382)
  • getImportProgress (434-488)
  • markImportFailed (405-429)
apps/start/src/modals/index.tsx (1)
apps/start/src/modals/add-import.tsx (1)
  • AddImport (29-164)
packages/importer/vitest.config.ts (1)
vitest.shared.ts (1)
  • getSharedVitestConfig (4-27)
apps/worker/src/boot-workers.ts (2)
packages/queue/src/queues.ts (1)
  • importQueue (170-176)
apps/worker/src/jobs/import.ts (1)
  • importJob (19-198)
packages/trpc/src/root.ts (1)
packages/trpc/src/routers/import.ts (1)
  • importRouter (11-183)
packages/db/code-migrations/5-add-imports-table.ts (2)
packages/db/code-migrations/helpers.ts (1)
  • getIsCluster (17-24)
packages/db/src/clickhouse/migration.ts (2)
  • createTable (68-116)
  • runClickhouseMigrationCommands (322-454)
apps/start/src/modals/add-import.tsx (2)
apps/start/src/integrations/tanstack-query/root-provider.tsx (1)
  • Provider (68-86)
packages/importer/src/types.ts (1)
  • ImportConfig (7-12)
packages/trpc/src/routers/import.ts (6)
packages/trpc/src/trpc.ts (2)
  • createTRPCRouter (138-138)
  • protectedProcedure (161-164)
packages/trpc/src/access.ts (1)
  • getProjectAccessCached (4-4)
packages/trpc/src/errors.ts (1)
  • TRPCAccessError (3-7)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/validation/src/index.ts (1)
  • zCreateImport (469-477)
packages/queue/src/queues.ts (1)
  • importQueue (170-176)
packages/db/src/services/import.service.ts (3)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/clickhouse/client.ts (4)
  • ch (131-141)
  • TABLE_NAMES (48-60)
  • convertClickhouseDateToJs (218-220)
  • formatClickhouseDate (188-199)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
apps/worker/src/index.ts (1)
packages/queue/src/queues.ts (1)
  • importQueue (170-176)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (2)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)
  • Route (12-34)
apps/start/src/components/integrations/integration-card.tsx (3)
  • IntegrationCard (91-112)
  • IntegrationCardLogoImage (50-69)
  • IntegrationCardFooter (3-15)
packages/importer/src/base-provider.ts (2)
packages/importer/src/types.ts (3)
  • BaseRawEvent (40-42)
  • ImportJobMetadata (69-73)
  • ErrorContext (45-51)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/importer/src/providers/umami.ts (4)
packages/validation/src/index.ts (1)
  • IUmamiImportConfig (450-450)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/importer/src/utils/referrer-parser.ts (2)
  • parseReferrer (49-58)
  • getReferrerWithQuery (60-91)
packages/importer/src/types.ts (2)
packages/db/src/services/event.service.ts (1)
  • IServiceImportedEventPayload (144-147)
packages/db/src/services/import.service.ts (2)
  • ImportProgress (16-21)
  • ImportStageResult (10-14)
🪛 Gitleaks (8.28.0)
apps/worker/src/index.ts

[high] 60-60: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 LanguageTool
IMPORT_REVIEW.md

[uncategorized] ~35-~35: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... batchNumber: number ) ``` --- ## 🟡 Medium Priority Issues ### 2. **Missing UUID Dependenc...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~156-~156: Ensure spelling is correct
Context: ...d for EVERY event ``` Improvement: Cleanup periodically (e.g., every 1000 events) ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (24)
apps/start/package.json (1)

6-6: Clarify the motivation for renaming "testing" → "testing2".

The script was renamed from "testing" to "testing2", but the name "testing2" is not descriptive and appears to be a placeholder. Additionally, this change seems orthogonal to the PR objective ("new importer").

Please clarify:

  1. What is the intended purpose of renaming this script?
  2. Is "testing2" a temporary name, or was a different name intended?
  3. Was the original "testing" script referenced in CI/CD pipelines, documentation, or developer workflows? If so, those references need to be updated to avoid breaking them.

If this change is unintended, consider reverting it.

apps/start/src/components/integrations/integration-card.tsx (1)

50-69: LGTM! Consistent API enhancement.

The addition of the optional className prop aligns IntegrationCardLogoImage with the other components in this file (IntegrationCardFooter, IntegrationCardHeader, IntegrationCardHeaderButtons) that already accept className. The prop is properly forwarded to IntegrationCardLogo, which correctly merges it using the cn() utility.

apps/start/src/components/ui/badge.tsx (2)

12-13: LGTM! Success variant follows good practices.

The emerald-700 background with white text provides good contrast and is semantically appropriate for success states.


16-16: LGTM! Info variant is well-implemented.

The blue-500 background with white text provides adequate contrast and follows the established pattern.

packages/importer/tsconfig.json (1)

1-9: LGTM! Clean TypeScript configuration.

The configuration is properly structured for the new importer package, extending the shared base config and setting appropriate compiler options for a source-in-src, output-to-dist layout.

packages/importer/package.json (1)

27-27: bullmq dependency placement is correct.

Verification shows bullmq is exclusively in devDependencies with no runtime imports or usage in the importer source code. The package does not directly enqueue or process jobs—job operations are handled through the @openpanel/queue workspace dependency. The current placement is appropriate.

packages/db/src/types.ts (1)

2-2: Alignment with validation types looks good

The Prisma JSON alias cleanly mirrors the validation union. No issues spotted.

Also applies to: 16-16

packages/trpc/package.json (1)

19-19: No circular dependency detected—safe to proceed.

Verification confirms @openpanel/queue does not depend on @openpanel/trpc. The dependency is unidirectional: @openpanel/trpc (line 19) → @openpanel/queue. No cycle risk.

apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)

45-45: LGTM! Clean addition of the Imports tab.

The new imports tab follows the existing pattern and integrates well with the settings navigation structure.

apps/worker/package.json (1)

23-23: LGTM! Necessary dependency for import functionality.

The addition of the @openpanel/importer workspace dependency enables the worker to process import jobs.

packages/db/src/clickhouse/client.ts (1)

59-59: LGTM! Table name constant added correctly.

The events_imports table name addition follows the existing pattern and aligns with the new ClickHouse migration.

apps/worker/src/index.ts (1)

8-8: LGTM! Import queue integration added correctly.

The importQueue is properly imported and registered with Bull Board for monitoring.

Also applies to: 42-42

packages/db/src/services/event.service.ts (1)

144-147: LGTM! Type definition is clear and purposeful.

The IServiceImportedEventPayload type correctly omits profile and meta fields for imported events, which makes sense as these would need to be resolved separately during import processing.

packages/db/code-migrations/5-add-imports-table.sql (2)

42-43: Verify the 7-day TTL is intentional for staging data.

The TTL of 7 days on imported_at_meta means all staging import data will be automatically deleted after one week. Confirm this aligns with your data retention requirements and import workflow expectations. If imports might take longer than 7 days to validate or if you need to keep historical import records, consider increasing this value or making it configurable.


1-38: LGTM! Well-designed ClickHouse table schema.

The table structure uses appropriate data types, codecs for compression, and partitioning strategy. The use of LowCardinality for categorical fields and specialized codecs (ZSTD, Delta, Gorilla) shows good understanding of ClickHouse optimization patterns.

packages/db/prisma/schema.prisma (1)

470-486: LGTM! Import model is well-structured.

The Import model has appropriate fields for tracking import progress, includes proper relations to Project with cascade delete, and uses sensible defaults. The use of Json for config provides flexibility while the comment /// [IPrismaImportConfig] indicates type documentation exists elsewhere.

Also applies to: 197-197

IMPORT_REVIEW.md (2)

37-59: Remove the action item; importer uses node:crypto, not the uuid package

The script results show that packages/importer uses randomUUID() imported from node:crypto (in packages/importer/src/providers/umami.ts and packages/cli/src/importer/importer.ts), not from the external uuid package. The uuid package is used elsewhere in the codebase (specifically in packages/db/src/services/event.service.ts), but not in the importer package. The review comment's recommendation to add uuid to the importer's dependencies is unnecessary.

Likely an incorrect or invalid review comment.


10-31: Review comment is based on outdated or incorrect code state; actual implementation already uses proper types

The review comment cites code that does not match the current implementation:

  • processBatch function: Does not exist in the codebase (rg search found zero matches)
  • any type usage: No matches found in apps/worker/src/jobs/ (contradicts claim of "extensive use")
  • Actual typing: Line 46 of apps/worker/src/jobs/import.ts already uses const eventBatch: UmamiRawEvent[] = []; (properly typed, not any[])
  • BaseImportProvider: Confirmed generic with <TRawEvent extends BaseRawEvent = BaseRawEvent>, and UmamiProvider correctly extends it as BaseImportProvider<UmamiRawEvent>

The current implementation already has the proper type safety that the review comment suggests. The code is not coupled via weak typing but properly typed with UmamiRawEvent. No refactoring is needed.

Likely an incorrect or invalid review comment.

apps/start/src/routeTree.gen.ts (1)

57-57: Generated file — no manual edits

Looks consistent with the new Imports tab. No action; keep excluded from lint/format.

Also applies to: 373-378, 511-511, 564-564, 800-801, 1170-1176, 1504-1517

packages/importer/src/types.ts (1)

20-25: Naming duplication exists but no actual import collision risk – review comment is incorrect.

The analysis shows that while ImportProgress and ImportStageResult are defined in both packages, the importer package does not import these collision types from @openpanel/db. The importer only imports unrelated types like IClickhouseEvent and IServiceImportedEventPayload.

Additionally, the ImportProgress types serve different purposes: the importer's version tracks batch processing state (totalEvents, processedEvents, currentBatch, totalBatches), while the db version tracks job metadata (importId, totalEvents, insertedEvents, status). This is intentional, not a collision.

Since there are no cross-imports and the types aren't used interchangeably, no "confusion in imports" or "subtle type bugs" will occur. Both packages maintain independent type contracts appropriate to their concerns.

Likely an incorrect or invalid review comment.

packages/db/src/services/import.service.ts (4)

1-21: LGTM!

The imports and type definitions are well-structured. The interfaces clearly define the data contracts for staging results and progress tracking.


26-53: LGTM!

The batch insert function properly handles empty arrays, adds appropriate metadata, and leverages the retry-wrapped ClickHouse insert. The implementation is clean and defensive.


493-503: Verify cleanup scope: should Postgres import record also be deleted?

The cleanupFailedImport function only deletes from ClickHouse's events_imports table. The corresponding Postgres import record remains with status 'failed'.

Is this intentional (to preserve audit history), or should the Postgres record also be deleted for consistency?

If Postgres records should also be cleaned up, add:

 export async function cleanupFailedImport(importId: string): Promise<void> {
   const deleteQuery = `
     DELETE FROM ${TABLE_NAMES.events_imports}
     WHERE import_id = {importId:String} AND import_status = 'failed'
   `;

   await ch.command({
     query: deleteQuery,
     query_params: { importId },
   });
+  
+  // Also remove from Postgres if desired
+  await db.import.delete({
+    where: { id: importId, status: 'failed' },
+  });
 }

508-525: LGTM!

The getImportDateBounds utility correctly handles the nullable case and provides a clean interface for date range queries. Defensive null check on line 524 prevents runtime errors.

Comment on lines +14 to +15
warning:
'border-transparent bg-yellow-500 text-white [a&]:hover:bg-yellow-500/90',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix color contrast for accessibility.

White text on a yellow-500 background likely fails WCAG AA contrast requirements (4.5:1). Yellow backgrounds conventionally require dark text for proper readability.

Apply this diff to fix the contrast issue:

-        warning:
-          'border-transparent bg-yellow-500 text-white [a&]:hover:bg-yellow-500/90',
+        warning:
+          'border-transparent bg-yellow-500 text-yellow-950 [a&]:hover:bg-yellow-500/90',

Alternatively, use a darker yellow background with white text:

-        warning:
-          'border-transparent bg-yellow-500 text-white [a&]:hover:bg-yellow-500/90',
+        warning:
+          'border-transparent bg-yellow-600 text-white [a&]:hover:bg-yellow-600/90',
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
warning:
'border-transparent bg-yellow-500 text-white [a&]:hover:bg-yellow-500/90',
warning:
'border-transparent bg-yellow-500 text-yellow-950 [a&]:hover:bg-yellow-500/90',
Suggested change
warning:
'border-transparent bg-yellow-500 text-white [a&]:hover:bg-yellow-500/90',
warning:
'border-transparent bg-yellow-600 text-white [a&]:hover:bg-yellow-600/90',
🤖 Prompt for AI Agents
In apps/start/src/components/ui/badge.tsx around lines 14-15, the warning
variant uses 'bg-yellow-500 text-white' which likely fails WCAG contrast; change
either the text color to a dark color (e.g., replace text-white with
text-gray-900 or text-black) so dark text sits on yellow-500, or keep white text
and use a darker yellow background (e.g., replace bg-yellow-500 with
bg-yellow-600 or bg-yellow-700) and adjust the hover class accordingly to
maintain accessible contrast.

Comment on lines +55 to +251
toast.error('Import failed', {
description: error.message,
});
},
}),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid leaking secrets in error paths/logs

error.message may include upstream details. Ensure server never echoes API secrets. Consider a generic toast and log details server-side only.

🤖 Prompt for AI Agents
In apps/start/src/modals/add-import.tsx around lines 55 to 60, the current toast
displays error.message which can leak upstream secrets; replace the client-side
toast with a generic user-facing message (e.g., "Import failed, try again
later") and remove inclusion of error.message, and instead ensure the server
logs the full error details securely (or returns a sanitized error code/message)
so secrets are never echoed to the client; update the toast usage accordingly
and confirm server-side logging/response sanitization is in place.

Comment on lines 12 to 14
import { UmamiProvider, type UmamiRawEvent } from '@openpanel/importer';
import type { ImportQueuePayload } from '@openpanel/queue';
import type { Job } from 'bullmq';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove UmamiRawEvent coupling; use BaseImportProvider generics

This job should be provider-agnostic. Swap UmamiRawEvent for BaseImportProvider.

-import { UmamiProvider, type UmamiRawEvent } from '@openpanel/importer';
+import { UmamiProvider, type BaseImportProvider } from '@openpanel/importer';

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around lines 12 to 14, remove the direct
coupling to UmamiRawEvent and instead use the generic BaseImportProvider<T> from
the importer package: drop the UmamiRawEvent import, import BaseImportProvider
(and its type) from '@openpanel/importer', and replace any references to
UmamiRawEvent in the job/payload types with the generic BaseImportProvider<T>
(e.g., ImportQueuePayload<BaseImportProvider<any>> or a concrete T where
appropriate). Ensure all imports and Job generics are updated to use the
BaseImportProvider generic so the job becomes provider-agnostic.

Comment on lines 45 to 69
// Phase 1: Load & Transform - Process events in batches
const eventBatch: UmamiRawEvent[] = [];

for await (const rawEvent of providerInstance.parseSource()) {
// Validate event
if (!providerInstance.validate(rawEvent)) {
jobLogger.warn('Skipping invalid event', { rawEvent });
continue;
}

eventBatch.push(rawEvent);
totalEvents++;

// Process batch when it reaches the batch size
if (eventBatch.length >= BATCH_SIZE) {
jobLogger.info('Processing batch', { batchSize: eventBatch.length });

// Transform events
const transformedEvents = eventBatch.map((event) =>
providerInstance.transformEvent(event),
);

// Phase 2: Insert into staging table
await insertImportBatch(transformedEvents, importId);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make batch buffer provider-agnostic and harden transform stage

  • Use unknown[] for the raw batch.
  • Guard transformEvent with try/catch to skip bad rows without failing the whole job.
-    const eventBatch: UmamiRawEvent[] = [];
+    const eventBatch: unknown[] = [];
@@
-        const transformedEvents = eventBatch.map((event) =>
-          providerInstance.transformEvent(event),
-        );
+        const transformedEvents: any[] = [];
+        for (const event of eventBatch) {
+          try {
+            transformedEvents.push(providerInstance.transformEvent(event as any));
+          } catch (e) {
+            jobLogger.warn('Skipping event on transform failure', {
+              provider: providerInstance.provider,
+              reason: e instanceof Error ? e.message : String(e),
+            });
+          }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Phase 1: Load & Transform - Process events in batches
const eventBatch: UmamiRawEvent[] = [];
for await (const rawEvent of providerInstance.parseSource()) {
// Validate event
if (!providerInstance.validate(rawEvent)) {
jobLogger.warn('Skipping invalid event', { rawEvent });
continue;
}
eventBatch.push(rawEvent);
totalEvents++;
// Process batch when it reaches the batch size
if (eventBatch.length >= BATCH_SIZE) {
jobLogger.info('Processing batch', { batchSize: eventBatch.length });
// Transform events
const transformedEvents = eventBatch.map((event) =>
providerInstance.transformEvent(event),
);
// Phase 2: Insert into staging table
await insertImportBatch(transformedEvents, importId);
// Phase 1: Load & Transform - Process events in batches
const eventBatch: unknown[] = [];
for await (const rawEvent of providerInstance.parseSource()) {
// Validate event
if (!providerInstance.validate(rawEvent)) {
jobLogger.warn('Skipping invalid event', { rawEvent });
continue;
}
eventBatch.push(rawEvent);
totalEvents++;
// Process batch when it reaches the batch size
if (eventBatch.length >= BATCH_SIZE) {
jobLogger.info('Processing batch', { batchSize: eventBatch.length });
// Transform events
const transformedEvents: any[] = [];
for (const event of eventBatch) {
try {
transformedEvents.push(providerInstance.transformEvent(event as any));
} catch (e) {
jobLogger.warn('Skipping event on transform failure', {
provider: providerInstance.provider,
reason: e instanceof Error ? e.message : String(e),
});
}
}
// Phase 2: Insert into staging table
await insertImportBatch(transformedEvents, importId);
🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around lines 45 to 69, the batch buffer and
transform stage need hardening: change eventBatch from UmamiRawEvent[] to
unknown[] so the buffer is provider-agnostic, and wrap each
providerInstance.transformEvent call in a try/catch so a thrown error on one row
is caught, logged (jobLogger.warn with the rawEvent and error), and that row is
skipped without failing the whole job; ensure you collect only successful
transformed events (filter out failed/undefined results) before calling
insertImportBatch(importId).

Comment on lines +301 to +297
let properties: Record<string, any> = {};

if (query) {
properties.__query = query;
}

// Add useful properties from Umami data
if (rawEvent.page_title) properties.__title = rawEvent.page_title;
if (rawEvent.screen) properties.__screen = rawEvent.screen;
if (rawEvent.language) properties.__language = rawEvent.language;
if (rawEvent.utm_source)
properties = assocPath(
['__query', 'utm_source'],
rawEvent.utm_source,
properties,
);
if (rawEvent.utm_medium)
properties = assocPath(
['__query', 'utm_medium'],
rawEvent.utm_medium,
properties,
);
if (rawEvent.utm_campaign)
properties = assocPath(
['__query', 'utm_campaign'],
rawEvent.utm_campaign,
properties,
);
if (rawEvent.utm_content)
properties = assocPath(
['__query', 'utm_content'],
rawEvent.utm_content,
properties,
);
if (rawEvent.utm_term)
properties = assocPath(
['__query', 'utm_term'],
rawEvent.utm_term,
properties,
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

properties contains nested objects; ClickHouse expects primitives

properties.__query = query and nested __query.* entries violate IClickhouseEvent.properties type. Flatten or stringify.

-    let properties: Record<string, any> = {};
-
-    if (query) {
-      properties.__query = query;
-    }
+    const properties: Record<string, string | number | boolean | null | undefined> = {};
+    if (query) {
+      for (const [k, v] of Object.entries(query)) {
+        properties[`query.${k}`] = v as string;
+      }
+    }
@@
-    if (rawEvent.utm_source)
-      properties = assocPath(['__query', 'utm_source'], rawEvent.utm_source, properties);
-    if (rawEvent.utm_medium)
-      properties = assocPath(['__query', 'utm_medium'], rawEvent.utm_medium, properties);
-    if (rawEvent.utm_campaign)
-      properties = assocPath(['__query', 'utm_campaign'], rawEvent.utm_campaign, properties);
-    if (rawEvent.utm_content)
-      properties = assocPath(['__query', 'utm_content'], rawEvent.utm_content, properties);
-    if (rawEvent.utm_term)
-      properties = assocPath(['__query', 'utm_term'], rawEvent.utm_term, properties);
+    if (rawEvent.utm_source) properties.utm_source = rawEvent.utm_source;
+    if (rawEvent.utm_medium) properties.utm_medium = rawEvent.utm_medium;
+    if (rawEvent.utm_campaign) properties.utm_campaign = rawEvent.utm_campaign;
+    if (rawEvent.utm_content) properties.utm_content = rawEvent.utm_content;
+    if (rawEvent.utm_term) properties.utm_term = rawEvent.utm_term;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let properties: Record<string, any> = {};
if (query) {
properties.__query = query;
}
// Add useful properties from Umami data
if (rawEvent.page_title) properties.__title = rawEvent.page_title;
if (rawEvent.screen) properties.__screen = rawEvent.screen;
if (rawEvent.language) properties.__language = rawEvent.language;
if (rawEvent.utm_source)
properties = assocPath(
['__query', 'utm_source'],
rawEvent.utm_source,
properties,
);
if (rawEvent.utm_medium)
properties = assocPath(
['__query', 'utm_medium'],
rawEvent.utm_medium,
properties,
);
if (rawEvent.utm_campaign)
properties = assocPath(
['__query', 'utm_campaign'],
rawEvent.utm_campaign,
properties,
);
if (rawEvent.utm_content)
properties = assocPath(
['__query', 'utm_content'],
rawEvent.utm_content,
properties,
);
if (rawEvent.utm_term)
properties = assocPath(
['__query', 'utm_term'],
rawEvent.utm_term,
properties,
);
const properties: Record<string, string | number | boolean | null | undefined> = {};
if (query) {
for (const [k, v] of Object.entries(query)) {
properties[`query.${k}`] = v as string;
}
}
// Add useful properties from Umami data
if (rawEvent.page_title) properties.__title = rawEvent.page_title;
if (rawEvent.screen) properties.__screen = rawEvent.screen;
if (rawEvent.language) properties.__language = rawEvent.language;
if (rawEvent.utm_source) properties.utm_source = rawEvent.utm_source;
if (rawEvent.utm_medium) properties.utm_medium = rawEvent.utm_medium;
if (rawEvent.utm_campaign) properties.utm_campaign = rawEvent.utm_campaign;
if (rawEvent.utm_content) properties.utm_content = rawEvent.utm_content;
if (rawEvent.utm_term) properties.utm_term = rawEvent.utm_term;
🤖 Prompt for AI Agents
In packages/importer/src/providers/umami.ts around lines 301 to 341, properties
is being built with nested objects (properties.__query = query and
assocPath(['__query', ...])) which violates IClickhouseEvent.properties that
expects primitives; replace the nested structure by either flattening the query
into primitive top-level keys (e.g. __query_<key>=value) or by storing the whole
query as a JSON string (e.g. __query = JSON.stringify(query)) and update the
assocPath usage to set primitive keys accordingly so every properties entry is a
primitive value compatible with ClickHouse.

Comment on lines +53 to +66
// Properties type for events - more specific than Record<string, any>
export interface EventProperties {
[key: string]:
| string
| number
| boolean
| null
| undefined
| Record<string, unknown>;
__query?: Record<string, unknown>;
__title?: string;
__screen?: string;
__language?: string;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

EventProperties allows nested objects, but ClickHouse properties is primitives-only

IClickhouseEvent.properties is Record<string, string|number|boolean|undefined|null>. Storing objects (e.g., __query) will break ingestion unless stringified.

  • Constrain EventProperties values to primitives.
  • Or require JSON.stringify for nested shapes (e.g., __queryJson: string).
🤖 Prompt for AI Agents
In packages/importer/src/types.ts around lines 53 to 66, EventProperties
currently allows nested objects which will break ClickHouse ingestion because
IClickhouseEvent.properties must be primitives-only; update the EventProperties
interface so values are limited to string | number | boolean | null | undefined
(remove Record<string, unknown>) or replace nested object fields with explicit
JSON string fields (e.g., __queryJson: string) and rename/remove existing
__query, ensuring all nested shapes are JSON.stringified before being assigned
to EventProperties.

@@ -0,0 +1,3 @@
import { getSharedVitestConfig } from '../../vitest.shared';

export default getSharedVitestConfig({ __dirname });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

__dirname is unavailable in ESM modules.

Since package.json declares "type": "module", this file runs as ESM. The __dirname global doesn't exist in ESM and will throw a ReferenceError.

Apply this diff to compute the directory path in ESM:

+import { dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
 import { getSharedVitestConfig } from '../../vitest.shared';

-export default getSharedVitestConfig({ __dirname });
+export default getSharedVitestConfig({ 
+  __dirname: dirname(fileURLToPath(import.meta.url)) 
+});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export default getSharedVitestConfig({ __dirname });
import { dirname } from 'node:path';
import { fileURLToPath } from 'node:url';
import { getSharedVitestConfig } from '../../vitest.shared';
export default getSharedVitestConfig({
__dirname: dirname(fileURLToPath(import.meta.url))
});
🤖 Prompt for AI Agents
In packages/importer/vitest.config.ts around line 3, the code passes the
non-existent __dirname to getSharedVitestConfig (ESM modules don't have
__dirname); compute the directory from import.meta.url instead and pass that
value: import fileURLToPath from 'url' and path from 'path' (or use their named
functions), derive const __dirname =
path.dirname(fileURLToPath(import.meta.url)), then call getSharedVitestConfig({
__dirname }) so the config receives the correct directory path in ESM.

Comment on lines 12 to 32
list: protectedProcedure
.input(z.object({ projectId: z.string() }))
.query(async ({ input, ctx }) => {
const access = await getProjectAccessCached({
projectId: input.projectId,
userId: ctx.session.userId,
});

if (!access) {
throw TRPCAccessError('You do not have access to this project');
}

return db.import.findMany({
where: {
projectId: input.projectId,
},
orderBy: {
createdAt: 'desc',
},
});
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sensitive config exposure in list/get — restrict or redact.

config likely contains provider credentials. Current list/get return it to any project member with read access. Redact or gate by role.

Option A (redact for read-level):

   list: protectedProcedure
     .input(z.object({ projectId: z.string() }))
     .query(async ({ input, ctx }) => {
       const access = await getProjectAccessCached({
         projectId: input.projectId,
         userId: ctx.session.userId,
       });
       if (!access) {
         throw TRPCAccessError('You do not have access to this project');
       }
-      return db.import.findMany({
+      const rows = await db.import.findMany({
         where: { projectId: input.projectId },
         orderBy: { createdAt: 'desc' },
       });
+      const canSeeSecrets =
+        typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets
+        ? rows
+        : rows.map((r) => ({ ...r, config: null }));
     }),
   get: protectedProcedure
     .input(z.object({ id: z.string() }))
     .query(async ({ input, ctx }) => {
       const importRecord = await db.import.findUniqueOrThrow({
         where: { id: input.id },
         include: { project: true },
       });
       const access = await getProjectAccessCached({
         projectId: importRecord.projectId,
         userId: ctx.session.userId,
       });
       if (!access) {
         throw TRPCAccessError('You do not have access to this import');
       }
-      return importRecord;
+      const canSeeSecrets =
+        typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets ? importRecord : { ...importRecord, config: null };
     }),

Option B: Use Prisma select to exclude config in list and only include it for admin/write roles.

Also applies to: 34-57

🤖 Prompt for AI Agents
In packages/trpc/src/routers/import.ts around lines 12 to 32 (and likewise apply
the change to lines 34 to 57), the current list/get resolvers return the full
import records including the sensitive config field to any project member;
change the DB queries to explicitly exclude the config field by using a Prisma
select (e.g., select: { config: false, ...otherFields }) for read-level access,
and only fetch/include config when the caller has a higher role (admin/write)
after re-checking the caller's role from ctx.session or the access object;
implement a conditional: if role allows, run a second query or include config in
select, otherwise omit/redact config from the returned objects and ensure no
accidental passthrough in list results.

Comment on lines +469 to +486
export const zCreateImport = z.object({
projectId: z.string().min(1),
provider: z.enum(['umami', 'plausible', 'mixpanel']),
config: z.union([
zUmamiImportConfig,
zPlausibleImportConfig,
zMixpanelImportConfig,
]),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Discriminate by provider to prevent mismatched payloads

provider and config can disagree (e.g., provider='umami' with a Mixpanel config). Make zCreateImport a discriminated union on provider so config shape is tied to the selected provider.

-export const zCreateImport = z.object({
-  projectId: z.string().min(1),
-  provider: z.enum(['umami', 'plausible', 'mixpanel']),
-  config: z.union([
-    zUmamiImportConfig,
-    zPlausibleImportConfig,
-    zMixpanelImportConfig,
-  ]),
-});
+export const zCreateImport = z.discriminatedUnion('provider', [
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('umami'),
+    config: zUmamiImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('plausible'),
+    config: zPlausibleImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('mixpanel'),
+    config: zMixpanelImportConfig,
+  }),
+]);
🤖 Prompt for AI Agents
In packages/validation/src/index.ts around lines 469 to 477, zCreateImport
currently allows provider and config to disagree; change it to a discriminated
union so each provider literal is paired with the correct config shape. Replace
the single z.object with either z.discriminatedUnion('provider', [...]) or a
z.union of three z.object schemas where provider is a literal ('umami' |
'plausible' | 'mixpanel') and config is the matching zUmamiImportConfig,
zPlausibleImportConfig, or zMixpanelImportConfig respectively; keep the same
exported name and types so callers get compile-time validation that provider and
config match.

@vercel
Copy link

vercel bot commented Oct 28, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
openpanel-public Error Error Nov 5, 2025 8:38am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

Walkthrough

Adds an importer package, provider implementations (Umami, Mixpanel), DB migrations and ClickHouse support, a TRPC import router, UI for imports, job queue and worker import job, import service & CSV utilities, refactors common exports, and removes the CLI package.

Changes

Cohort / File(s) Summary
Importer package & types
packages/importer/package.json, packages/importer/tsconfig.json, packages/importer/vitest.config.ts, packages/importer/src/index.ts, packages/importer/src/types.ts, packages/importer/src/base-provider.ts, packages/importer/src/providers/metadata.ts
New @openpanel/importer package with public types, base provider abstraction, provider metadata and package config/ts/vitest setup.
Providers & tests
packages/importer/src/providers/umami.ts, packages/importer/src/providers/mixpanel.ts, packages/importer/src/providers/mixpanel.test.ts, packages/importer/src/providers/umami.ts
New Umami and Mixpanel providers: streaming/parsing, validation, transform to ClickHouse shape, and unit tests.
DB schema & migrations
packages/db/prisma/schema.prisma, packages/db/prisma/migrations/*, packages/db/code-migrations/5-add-imports-table.ts, packages/db/code-migrations/5-add-imports-table.sql
Adds Import model/enum, Prisma migrations for imports table, and ClickHouse migration for events_imports with TTL.
ClickHouse CSV & client changes
packages/db/src/clickhouse/csv.ts, packages/db/src/clickhouse/migration.ts, packages/db/src/clickhouse/client.ts
New CSV escaping utilities for ClickHouse Map(String,String), modifyTTL helper, extended client settings, withRetry export, and chInsertCSV helper.
Import service & types
packages/db/src/services/import.service.ts, packages/db/src/services/event.service.ts, packages/db/src/types.ts
New staged import service APIs (insert batch, generate session IDs, move to production, backfill, progress/status updates) and related types.
Queue & worker job
packages/queue/src/queues.ts, apps/worker/src/boot-workers.ts, apps/worker/src/index.ts, apps/worker/src/jobs/import.ts
New importQueue, import worker registration, Bull Board exposure, and importJob implementing multi-phase pipeline with resume and progress.
TRPC router
packages/trpc/src/routers/import.ts, packages/trpc/src/root.ts, packages/trpc/package.json
New importRouter with list/get/create/delete/retry procedures; added to appRouter; package dependency updated.
Start app UI & modals
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx, apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx, apps/start/src/modals/add-import.tsx, apps/start/src/modals/index.tsx, apps/start/src/modals/date-ranger-picker.tsx, apps/start/src/components/integrations/integration-card.tsx, apps/start/src/components/project-mapper.tsx, apps/start/src/components/ui/badge.tsx, apps/start/src/routeTree.gen.ts, apps/start/package.json, apps/start/Dockerfile
New Imports settings page and route, AddImport modal (Umami/Mixpanel forms), ProjectMapper component, badge variants, integration-card className prop, route tree updates, package.json build script change and dependency, and Dockerfile updated to include importer package.json.
Worker event & tests updates
apps/worker/src/jobs/events.incoming-event.ts, apps/worker/src/jobs/events.incoming-events.test.ts, apps/worker/src/jobs/cron.delete-projects.ts
Swap to common/server parse-referrer and parseUserAgent exports, adjust referrerType expectations in tests, and minor ClickHouse setting type tweak.
Common package refactor & scripts
packages/common/package.json, packages/common/server/index.ts, packages/common/server/parse-referrer.ts, packages/common/server/parse-referrer.test.ts, packages/common/server/parser-user-agent.ts, packages/common/scripts/get-referrers.ts, packages/common/src/object.ts, packages/common/src/object.test.ts
Add exports surface, refactor parse-referrer behavior (type '' instead of 'unknown'), add parser UserAgent types/overrides, extend toDots JSON handling, and add referrer-generation script + tests.
DB exports & tests
packages/db/index.ts, packages/db/src/clickhouse/csv.ts, packages/db/src/buffers/base-buffer.ts, packages/db/src/buffers/event-buffer.test.ts, packages/db/src/services/session.service.ts, packages/db/package.json
Export csv and import service, updated generateSecureId import path, test payload updates with clickhouse_settings, session lookback default change, and pinned clickhouse client version.
Validation additions
packages/validation/src/index.ts, packages/db/src/types.ts
New zod schemas/types for import configs (Umami, Plausible, Mixpanel) and union IImportConfig; Prisma JSON import config type alias added.
CLI removal
packages/cli/* (package.json, src/, tsconfig.json, tsup.config.ts)
CLI package and importer CLI code removed.
Workspace package updates & deps
apps/worker/package.json, apps/api/package.json, packages/redis/package.json, packages/trpc/package.json, package.json, packages/importer/*
Dependency additions/removals and version pins: add @openpanel/importer and @openpanel/queue where needed; pin tsdown/ioredis/clickhouse client versions; update scripts.
Docker/build and utils
apps/worker/Dockerfile, apps/start/Dockerfile, apps/start/src/utils/math.ts, apps/start/src/utils/slug.ts, apps/start/src/utils/op.ts
Dockerfiles include importer package.json; re-export paths updated to '@openpanel/common'; OpenPanel client factory added with no-op proxy fallback.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI as Start UI
    participant TRPC
    participant Queue as Job Queue
    participant Worker
    participant Provider as Import Provider
    participant CH as ClickHouse
    participant PG as Postgres

    User->>UI: Configure & submit import
    UI->>TRPC: create import
    TRPC->>PG: INSERT import (status: pending)
    TRPC->>Queue: enqueue import job
    TRPC-->>UI: return created import

    Queue->>Worker: deliver job
    Worker->>PG: load import record
    Worker->>Provider: parseSource() (stream events)
    Provider-->>Worker: yield raw events

    Worker->>CH: insertImportBatch (staging CSV)
    CH-->>Worker: ack + progress

    alt provider.shouldGenerateSessionIds
        Worker->>CH: generateSessionIds()
    end
    Worker->>CH: createSessionsStartEndEvents()
    Worker->>CH: moveImportsToProduction()
    Worker->>CH: backfillSessionsToProduction()

    Worker->>PG: mark import complete
    Worker-->>TRPC: (status updated)
    TRPC-->>UI: UI observes progress/completion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • packages/importer/src/providers/{umami,mixpanel}.ts — streaming, parsing, decompression, transform correctness, schema validation.
  • apps/worker/src/jobs/import.ts — multi-phase job orchestration, resume logic, concurrency and yielding behavior.
  • packages/db/src/services/import.service.ts & ClickHouse client changes — CSV escaping, insert settings, TTL/migration SQL consistency.
  • Prisma and ClickHouse migration diffs — schema/type alignment and migration ordering.
  • TRPC import router — access checks, job enqueue and job-id synchronization with DB.

Possibly related PRs

Poem

🐰 Hop! I found a stream of rows,

From Mixpanel fields to Umami flows.
I stitched the batches, queues in tow,
ClickHouse hummed, the progress showed.
Bye, CLI — hello import show!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'new importer' is vague and generic, using non-descriptive terms that don't clearly convey the substantial scope of changes, which include a comprehensive importer system with providers, database migrations, worker jobs, UI components, and TRPC routing. Refine the title to be more specific about the main functionality, such as 'Add data importer system with Umami and Mixpanel providers' or 'Implement multi-provider import workflow for event ingestion'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/import

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b86c7a4 and 18d9905.

📒 Files selected for processing (1)
  • packages/db/code-migrations/5-add-imports-table.sql (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-dashboard
🔇 Additional comments (5)
packages/db/code-migrations/5-add-imports-table.sql (5)

30-33: Clarify the semantics of dual timestamp columns.

The table has two timestamp columns with overlapping purposes:

  • Line 30: imported_at (Nullable(DateTime)) — presumably when the event was originally captured/imported from the source system
  • Line 33: imported_at_meta (DateTime DEFAULT now()) — when the record was inserted into this table

If this distinction is intentional, it should be documented. If both track insertion time, consider consolidating to avoid confusion and reduce storage.

Can you clarify the intended semantics? If both are necessary, consider renaming for clarity (e.g., original_imported_at and inserted_at_meta).


36-37: Verify partition key aligns with query patterns.

The table partitions by toYYYYMM(imported_at_meta) and orders by (import_id, created_at). This is effective for:

  • Time-series queries grouped by month
  • Efficient queries filtering by import_id

However, if the majority of queries filter by created_at date ranges (not imported_at_meta), consider partitioning by toYYYYMM(created_at) instead to avoid reading unnecessary partitions.

Confirm the expected query patterns to ensure partition alignment.


43-43: Verify shard distribution uniformity.

The distributed table uses cityHash64(import_id) to distribute rows across shards. This assumes import_id has sufficient cardinality and even distribution. If import batches are skewed (e.g., one import_id dominates), shards will be unbalanced.

Can you confirm that import_id values are expected to be uniformly distributed across imports?


47-47: Verify 7-day TTL aligns with retention policy.

The table is configured to drop rows after 7 days from imported_at_meta. Ensure this retention window meets compliance, audit, and business requirements.

Confirm the 7-day TTL is intentional and aligns with data retention policy.


1-43: Incorrect assessment based on file type and ClickHouse macro misunderstanding.

The file shown in the review (packages/db/code-migrations/5-add-imports-table.sql) is an auto-generated artifact, not the migration source. The actual TypeScript migration file (packages/db/code-migrations/5-add-imports-table.ts) uses the createTable() and modifyTTL() helpers to generate SQL, which is then written to the .sql file for documentation purposes (line 79–87).

The template placeholders ({cluster}, {installation}, {shard}, {replica}) are native ClickHouse macros defined in each node's config (macros.xml). ClickHouse automatically resolves them at query execution time—no custom wrapper substitution is required. The migration execution flow is straightforward: the .ts file calls runClickhouseMigrationCommands(), which executes the SQL strings directly with the macros intact. The isClustered flag determines whether the SQL includes clustering directives (e.g., ON CLUSTER '{cluster}' and replicated/distributed table definitions).

The dual timestamp columns serve distinct purposes: imported_at (nullable) tracks when external data was imported, while imported_at_meta (always set to now()) drives partitioning and TTL enforcement.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/common/src/object.ts (1)

22-29: Inconsistent handling of whitespace-only strings.

The guard on line 22 now filters out explicit empty strings, but whitespace-only strings (e.g., ' ') will pass this check, get trimmed to an empty string on line 28, and still be added to the accumulator. This creates inconsistent behavior.

Consider moving the empty string check after the cleaning step to ensure all empty values (including trimmed whitespace) are filtered consistently:

-    if (value === undefined || value === null || value === '') {
+    if (value === undefined || value === null) {
       return acc;
     }
 
     const cleanedValue =
       typeof value === 'string'
         ? removeInvalidSurrogates(value).trim()
         : String(value);
+
+    if (cleanedValue === '') {
+      return acc;
+    }
 
     return {
       ...acc,
       [`${path}${key}`]: cleanedValue,
     };
♻️ Duplicate comments (18)
apps/start/src/components/ui/badge.tsx (1)

12-16: New badge variants added for status indication.

The three new variants (success, warning, info) extend the badge component to support import status displays and other UI feedback needs.

Note: The warning variant's color contrast issue (white text on yellow-500) has already been flagged in a previous review.

packages/db/code-migrations/5-add-imports-table.ts (1)

65-69: TTL must be defined at CREATE time (and on local replicated table in cluster).

ALTERing TTL on events_imports will hit the Distributed table in cluster mode and be a no-op. Bake TTL into createTable so it applies to the replicated local table and the single-node path. Remove the ALTER.

Apply these diffs:

In this file:

   const sqls: string[] = [
     ...createTable({
       name: 'events_imports',
       columns: [
         // ...
       ],
       orderBy: ['import_id', 'created_at'],
       partitionBy: 'toYYYYMM(imported_at_meta)',
       settings: {
         index_granularity: 8192,
       },
       distributionHash: 'cityHash64(import_id)',
       replicatedVersion: '1',
       isClustered,
+      ttl: "imported_at_meta + INTERVAL 7 DAY",
     }),
   ];
 
-  // Add TTL policy for auto-cleanup after 7 days
-  sqls.push(`
-    ALTER TABLE events_imports 
-    MODIFY TTL imported_at_meta + INTERVAL 7 DAY
-  `);

And extend createTable (packages/db/src/clickhouse/migration.ts):

 export function createTable({
   name: tableName,
   columns,
   indices = [],
   engine = 'MergeTree()',
   orderBy = ['tuple()'],
   partitionBy,
   settings = {},
   distributionHash,
   replicatedVersion,
   isClustered,
+  ttl,
 }: CreateTableOptions): string[] {
   const columnDefinitions = [...columns, ...indices].join(',\n  ');
   const settingsClause = Object.entries(settings).length
     ? `SETTINGS ${Object.entries(settings).map(([k, v]) => `${k} = ${v}`).join(', ')}`
     : '';
   const partitionByClause = partitionBy ? `PARTITION BY ${partitionBy}` : '';
+  const ttlClause = ttl ? `\nTTL ${ttl}` : '';
 
   if (!isClustered) {
     return [
       `CREATE TABLE IF NOT EXISTS ${tableName} (
   ${columnDefinitions}
 )
 ENGINE = ${engine}
 ${partitionByClause}
 ORDER BY (${orderBy.join(', ')})
-${settingsClause}`.trim(),
+${ttlClause}
+${settingsClause}`.trim(),
     ];
   }
 
   return [
     `CREATE TABLE IF NOT EXISTS ${replicated(tableName)} ON CLUSTER '{cluster}' (
   ${columnDefinitions}
 )
 ENGINE = Replicated${engine.replace(/^(.+?)\((.+?)?\)/, `$1('${CLUSTER_REPLICA_PATH.replace('{replicatedVersion}', replicatedVersion)}', '{replica}', $2)`).replace(/, \)$/, ')')}
 ${partitionByClause}
 ORDER BY (${orderBy.join(', ')})
-${settingsClause}`.trim(),
+${ttlClause}
+${settingsClause}`.trim(),
     `CREATE TABLE IF NOT EXISTS ${tableName} ON CLUSTER '{cluster}' AS ${replicated(tableName)}
 ENGINE = Distributed('{cluster}', currentDatabase(), ${replicated(tableName)}, ${distributionHash})`,
   ];
 }

Also applies to: 15-63

packages/validation/src/index.ts (1)

478-486: Make zCreateImport a discriminated union to bind provider ⇔ config.

As-is, provider='umami' can carry a Mixpanel config. Use a discriminated union on provider.

-export const zCreateImport = z.object({
-  projectId: z.string().min(1),
-  provider: z.enum(['umami', 'plausible', 'mixpanel']),
-  config: z.union([
-    zUmamiImportConfig,
-    zPlausibleImportConfig,
-    zMixpanelImportConfig,
-  ]),
-});
+export const zCreateImport = z.discriminatedUnion('provider', [
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('umami'),
+    config: zUmamiImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('plausible'),
+    config: zPlausibleImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('mixpanel'),
+    config: zMixpanelImportConfig,
+  }),
+]);
packages/trpc/src/routers/import.ts (3)

12-32: Sensitive config exposure in list — restrict or redact.

config contains provider credentials (API keys, secrets). Current implementation returns it to any project member with read access. Redact or gate by role.

Option A (redact for read-level):

  list: protectedProcedure
    .input(z.object({ projectId: z.string() }))
    .query(async ({ input, ctx }) => {
      const access = await getProjectAccessCached({
        projectId: input.projectId,
        userId: ctx.session.userId,
      });
      if (!access) {
        throw TRPCAccessError('You do not have access to this project');
      }
-      return db.import.findMany({
+      const rows = await db.import.findMany({
        where: { projectId: input.projectId },
        orderBy: { createdAt: 'desc' },
      });
+      const canSeeSecrets =
+        typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets
+        ? rows
+        : rows.map((r) => ({ ...r, config: null }));
    }),

34-56: Sensitive config exposure in get — restrict or redact.

config contains provider credentials. Apply the same access-level check as in list to redact secrets for read-only users.

  get: protectedProcedure
    .input(z.object({ id: z.string() }))
    .query(async ({ input, ctx }) => {
      const importRecord = await db.import.findUniqueOrThrow({
        where: { id: input.id },
        include: { project: true },
      });
      const access = await getProjectAccessCached({
        projectId: importRecord.projectId,
        userId: ctx.session.userId,
      });
      if (!access) {
        throw TRPCAccessError('You do not have access to this import');
      }
-      return importRecord;
+      const canSeeSecrets =
+        typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets ? importRecord : { ...importRecord, config: null };
    }),

72-99: Queue enqueue not wrapped — record can be left pending with no job.

If importQueue.add fails, DB keeps a dangling 'pending' import. Wrap in try/catch, set failed status, and pass stable jobId/attempts/backoff.

      // Create import record
      const importRecord = await db.import.create({
        data: {
          projectId: input.projectId,
          config: input.config,
          status: 'pending',
        },
      });

-      // Add job to queue
-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
-
-      // Update import record with job ID
-      await db.import.update({
-        where: { id: importRecord.id },
-        data: { jobId: job.id },
-      });
+      // Add job to queue
+      try {
+        const job = await importQueue.add(
+          'import',
+          { type: 'import', payload: { importId: importRecord.id } },
+          { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+        );
+        await db.import.update({
+          where: { id: importRecord.id },
+          data: { jobId: job.id },
+        });
+      } catch (err) {
+        await db.import.update({
+          where: { id: importRecord.id },
+          data: { status: 'failed', errorMessage: err instanceof Error ? err.message : 'Queue enqueue failed' },
+        });
+        throw err;
+      }
apps/worker/src/jobs/import.ts (3)

130-139: Redact raw event logging to avoid PII leakage.

Line 137 logs the full rawEvent which may contain PII (emails, user identifiers, etc.). Log only minimal, non-PII identifiers.

      if (
        !providerInstance.validate(
          // @ts-expect-error
          rawEvent,
        )
      ) {
-        jobLogger.warn('Skipping invalid event', { rawEvent });
+        jobLogger.warn('Skipping invalid event', {
+          provider: providerInstance.provider,
+          reason: 'validation_failed',
+        });
        continue;
      }

126-169: Harden batch buffer and transform stage.

Batch uses any type (Line 126) and transform stage (Lines 147-152) lacks error handling. A single transform failure fails the entire batch. Use unknown[] and wrap transforms in try/catch to skip invalid events.

-    const eventBatch: any = [];
+    const eventBatch: unknown[] = [];
    for await (const rawEvent of providerInstance.parseSource(
      resumeLoadingFrom,
    )) {
      // ... validation ...
      
      // Process batch when it reaches the batch size
      if (eventBatch.length >= BATCH_SIZE) {
        jobLogger.info('Processing batch', { batchSize: eventBatch.length });

-        const transformedEvents: IClickhouseEvent[] = eventBatch.map(
-          (
-            // @ts-expect-error
-            event,
-          ) => providerInstance!.transformEvent(event),
-        );
+        const transformedEvents: IClickhouseEvent[] = [];
+        for (const event of eventBatch) {
+          try {
+            transformedEvents.push(providerInstance.transformEvent(event as any));
+          } catch (e) {
+            jobLogger.warn('Skipping event on transform failure', {
+              provider: providerInstance.provider,
+              reason: e instanceof Error ? e.message : String(e),
+            });
+          }
+        }

246-265: Sync final Postgres counters with ClickHouse progress.

UI reads processedEvents/totalEvents from Postgres. After completion, Postgres isn't updated with final ClickHouse counts, so progress may show stale values. Update both to finalProgress values.

    await markImportComplete(importId);
    await updateImportStatus(jobLogger, job, importId, {
      step: 'completed',
    });
    jobLogger.info('Import marked as complete');

    // Get final progress
    const finalProgress = await getImportProgress(importId);

    jobLogger.info('Import job completed successfully', {
      totalEvents: finalProgress.totalEvents,
      insertedEvents: finalProgress.insertedEvents,
      status: finalProgress.status,
    });
+
+    // Keep Postgres in sync for UI progress
+    await db.import.update({
+      where: { id: importId },
+      data: {
+        totalEvents: finalProgress.totalEvents,
+        processedEvents: finalProgress.insertedEvents,
+      },
+    });
apps/start/src/modals/add-import.tsx (1)

245-249: Avoid leaking secrets in error messages.

Line 247 displays error.message which may include upstream API details or secrets. Use a generic client-facing message and log details server-side only.

      onError: (error) => {
        toast.error('Import failed', {
-          description: error.message,
+          description: 'Failed to start import. Please check your configuration and try again.',
        });
      },
packages/importer/src/providers/umami.ts (5)

79-198: SSRF exposure persists when fetching arbitrary URLs.

The fetch operation on line 97 accepts user-provided URLs without validating that they don't resolve to internal networks or restricted IP ranges. This remains a critical security risk.


213-220: Device ID fallback remains incorrect and unsafe.

Lines 216-217 still use rawEvent.visit_id! for both ip and ua parameters with non-null assertions. The visit_id is semantically unrelated to IP address or user agent, and the non-null assertion will throw if visit_id is undefined.


224-228: URL construction issues remain unresolved.

Line 225 concatenates hostname, path, and query without ensuring proper formatting. The query parameter needs a leading '?' and the hostname should be required for absolute URLs.


231-233: Referrer query parameter is still ignored.

Line 232 constructs the referrerUrl but omits rawEvent.referrer_query, losing query string information from the referrer.


257-296: Properties structure violates ClickHouse type expectations.

Lines 260 and 267-296 build nested objects (properties.__query and properties.__query.utm_*) which conflict with IClickhouseEvent.properties: Record<string, string | number | boolean | undefined | null>. ClickHouse expects flat primitive values.

packages/db/src/services/import.service.ts (3)

256-256: Timestamp collision risk persists.

Lines 256 and 293 adjust timestamps by exactly ±1000ms (1 second). This can collide with real events at those timestamps, causing non-deterministic ordering in time-based queries.

Also applies to: 293-293


559-559: Non-deterministic status aggregation remains.

Line 559 uses any(import_status) which returns an arbitrary status when events have mixed statuses. This makes progress reporting unpredictable.


603-603: Type assertion bypasses runtime validation.

Line 603 uses a type assertion for row.status without runtime validation. If ClickHouse returns an unexpected value (due to the any() issue on line 559 or data corruption), this will silently pass invalid data.

🧹 Nitpick comments (10)
apps/start/src/components/project-mapper.tsx (1)

50-50: Consider validating for duplicate source project IDs.

The component allows users to add multiple mappings with the same "from" (source project ID), which could lead to ambiguous or conflicting import behavior. Consider adding validation to prevent duplicate source IDs, either here or in the parent form's validation schema.

packages/db/src/clickhouse/client.ts (1)

136-156: Consider immutability in insert proxy.

The proxy mutates args[0] directly when injecting clickhouse_settings. While functional, this modifies the caller's object.

Consider creating a new object to avoid mutation:

       return (...args: any[]) =>
         withRetry(() => {
-          args[0].clickhouse_settings = {
+          const settings = {
             // Allow bigger HTTP payloads/time to stream rows
             async_insert: 1,
             wait_for_async_insert: 1,
             // Increase insert timeouts and buffer sizes for large batches
             max_execution_time: 300,
             max_insert_block_size: '500000',
             max_http_get_redirects: '0',
             // Ensure JSONEachRow stays efficient
             input_format_parallel_parsing: 1,
             // Keep long-running inserts/queries from idling out at proxies by sending progress headers
             send_progress_in_http_headers: 1,
             http_headers_progress_interval_ms: '50000',
             // Ensure server holds the connection until the query is finished
             wait_end_of_query: 1,
             ...args[0].clickhouse_settings,
           };
+          return value.apply(target, [{ ...args[0], clickhouse_settings: settings }]);
-          return value.apply(target, args);
         });
packages/common/scripts/get-referrers.ts (2)

91-92: Improve error handling in main function.

The current error handling only logs to console without providing actionable information or proper error propagation.

Apply this diff to improve error handling:

   } catch (e) {
-    console.log(e);
+    console.error('Failed to generate referrers:', e);
+    process.exit(1);
   }

This ensures the script fails with a non-zero exit code when referrer generation fails, allowing CI/build systems to detect the failure.


69-71: Implement robust error handling and fallback strategy for the external data source.

The Snowplow referrer data source is currently available and returns valid JSON (HTTP 200), but the script lacks fallback mechanisms. If the fetch fails during the build, the error is only logged to the console without preventing silent failures. Consider adding retry logic, a timeout, or fallback to a cached/bundled version to ensure build reliability when the external source becomes temporarily unavailable.

packages/importer/src/providers/metadata.ts (2)

13-30: Keep provider metadata in sync with validation (plausible present there).

Validation accepts 'plausible', but it’s missing here. Either add Plausible to IMPORT_PROVIDERS or drop it from validation until ready.

Would you like a patch to add Plausible with a neutral placeholder logo?


18-19: Avoid hot‑linking external logos; serve from our assets/CDN.

Brandfetch links can break or change. Prefer checked‑in assets or our CDN.

Also applies to: 26-28

packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1)

1-3: Consider indexes if filtering/sorting by progress fields.

If currentBatch/currentStep appear in WHERE/ORDER BY, add indexes (even partial) to avoid seq scans. Otherwise this change looks fine.

apps/worker/src/index.ts (1)

35-45: Minor improvement: Cleaner Bull Board initialization.

Refactoring from a wrapped object literal to a direct createBullBoard() call improves code clarity. This is a good incidental cleanup.

packages/common/server/parser-user-agent.ts (1)

84-112: Consider simplifying the redundant type checks.

Each override field uses a pattern like:

typeof overrides?.__os === 'string' && overrides?.__os
  ? overrides?.__os
  : res.os.name

The condition typeof overrides?.__os === 'string' && overrides?.__os is redundant. If the typeof check passes and the value is truthy, the second check adds no value unless you're explicitly filtering out empty strings. If that's the intent, a clearer pattern would be:

-    os:
-      typeof overrides?.__os === 'string' && overrides?.__os
-        ? overrides?.__os
-        : res.os.name,
+    os:
+      (typeof overrides?.__os === 'string' && overrides?.__os !== '')
+        ? overrides?.__os
+        : res.os.name,

Or if empty strings should be accepted:

-    os:
-      typeof overrides?.__os === 'string' && overrides?.__os
-        ? overrides?.__os
-        : res.os.name,
+    os: typeof overrides?.__os === 'string' ? overrides?.__os : res.os.name,
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1)

46-76: Polling interval appropriate for progress tracking.

5-second refetch interval on Line 50 provides real-time import progress updates without excessive load. Consider adding a condition to stop polling when all imports are completed/failed.

Optional optimization:

  const importsQuery = useQuery(
    trpc.import.list.queryOptions(
      { projectId },
      {
-        refetchInterval: 5000,
+        refetchInterval: (query) => {
+          const hasActiveImports = query.state.data?.some(
+            (imp) => imp.status === 'pending' || imp.status === 'processing'
+          );
+          return hasActiveImports ? 5000 : false;
+        },
      },
    ),
  );
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d71712e and b60dd0e.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (63)
  • apps/api/package.json (1 hunks)
  • apps/start/package.json (2 hunks)
  • apps/start/src/components/integrations/integration-card.tsx (1 hunks)
  • apps/start/src/components/project-mapper.tsx (1 hunks)
  • apps/start/src/components/ui/badge.tsx (1 hunks)
  • apps/start/src/modals/add-import.tsx (1 hunks)
  • apps/start/src/modals/date-ranger-picker.tsx (1 hunks)
  • apps/start/src/modals/index.tsx (2 hunks)
  • apps/start/src/routeTree.gen.ts (11 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1 hunks)
  • apps/worker/package.json (3 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • apps/worker/src/jobs/events.incoming-events.test.ts (2 hunks)
  • apps/worker/src/jobs/import.ts (1 hunks)
  • package.json (1 hunks)
  • packages/cli/package.json (0 hunks)
  • packages/cli/src/cli.ts (0 hunks)
  • packages/cli/src/importer/importer.ts (0 hunks)
  • packages/cli/src/importer/index.ts (0 hunks)
  • packages/cli/tsconfig.json (0 hunks)
  • packages/cli/tsup.config.ts (0 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/scripts/get-referrers.ts (1 hunks)
  • packages/common/server/index.ts (1 hunks)
  • packages/common/server/parse-referrer.test.ts (5 hunks)
  • packages/common/server/parse-referrer.ts (4 hunks)
  • packages/common/server/parser-user-agent.ts (2 hunks)
  • packages/common/src/object.ts (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.sql (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.ts (1 hunks)
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/schema.prisma (2 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (8 hunks)
  • packages/db/src/clickhouse/client.ts (2 hunks)
  • packages/db/src/services/event.service.ts (1 hunks)
  • packages/db/src/services/import.service.ts (1 hunks)
  • packages/db/src/services/session.service.ts (1 hunks)
  • packages/db/src/types.ts (2 hunks)
  • packages/importer/package.json (1 hunks)
  • packages/importer/src/base-provider.ts (1 hunks)
  • packages/importer/src/index.ts (1 hunks)
  • packages/importer/src/providers/metadata.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.test.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.ts (1 hunks)
  • packages/importer/src/providers/umami.ts (1 hunks)
  • packages/importer/src/types.ts (1 hunks)
  • packages/importer/tsconfig.json (1 hunks)
  • packages/importer/vitest.config.ts (1 hunks)
  • packages/queue/src/queues.ts (1 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/package.json (1 hunks)
  • packages/trpc/src/root.ts (2 hunks)
  • packages/trpc/src/routers/import.ts (1 hunks)
  • packages/validation/src/index.ts (1 hunks)
💤 Files with no reviewable changes (6)
  • packages/cli/tsup.config.ts
  • packages/cli/src/importer/index.ts
  • packages/cli/package.json
  • packages/cli/src/cli.ts
  • packages/cli/src/importer/importer.ts
  • packages/cli/tsconfig.json
✅ Files skipped from review due to trivial changes (2)
  • packages/importer/src/providers/mixpanel.test.ts
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql
🚧 Files skipped from review as they are similar to previous changes (12)
  • packages/db/src/services/event.service.ts
  • packages/db/code-migrations/5-add-imports-table.sql
  • packages/importer/vitest.config.ts
  • apps/worker/package.json
  • apps/start/src/components/integrations/integration-card.tsx
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql
  • packages/db/src/types.ts
  • packages/importer/tsconfig.json
  • packages/importer/package.json
  • packages/importer/src/types.ts
  • apps/start/src/routeTree.gen.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/modals/index.tsx
  • apps/start/src/modals/date-ranger-picker.tsx
  • apps/start/src/components/ui/badge.tsx
  • apps/start/src/components/project-mapper.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/modals/add-import.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx
🧬 Code graph analysis (17)
apps/start/src/modals/index.tsx (1)
apps/start/src/modals/add-import.tsx (1)
  • AddImport (231-282)
apps/worker/src/index.ts (1)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
packages/trpc/src/routers/import.ts (6)
packages/trpc/src/trpc.ts (2)
  • createTRPCRouter (138-138)
  • protectedProcedure (161-164)
packages/trpc/src/access.ts (1)
  • getProjectAccessCached (4-4)
packages/trpc/src/errors.ts (1)
  • TRPCAccessError (3-7)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/validation/src/index.ts (1)
  • zCreateImport (478-486)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
packages/trpc/src/root.ts (1)
packages/trpc/src/routers/import.ts (1)
  • importRouter (11-178)
packages/db/code-migrations/5-add-imports-table.ts (2)
packages/db/code-migrations/helpers.ts (1)
  • getIsCluster (17-24)
packages/db/src/clickhouse/migration.ts (2)
  • createTable (68-116)
  • runClickhouseMigrationCommands (322-454)
packages/queue/src/queues.ts (1)
packages/redis/redis.ts (1)
  • getRedisQueue (93-105)
apps/worker/src/boot-workers.ts (2)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
apps/worker/src/jobs/import.ts (1)
  • importJob (27-283)
apps/start/src/components/project-mapper.tsx (2)
packages/db/src/services/project.service.ts (1)
  • IServiceProject (7-7)
apps/start/src/components/ui/select.tsx (5)
  • Select (173-173)
  • SelectTrigger (181-181)
  • SelectValue (182-182)
  • SelectContent (174-174)
  • SelectItem (176-176)
packages/importer/src/base-provider.ts (2)
packages/importer/src/types.ts (3)
  • BaseRawEvent (40-42)
  • ImportJobMetadata (69-73)
  • ErrorContext (45-51)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (4)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)
  • Route (12-34)
packages/importer/src/providers/metadata.ts (1)
  • IMPORT_PROVIDERS (13-30)
apps/public/components/ui/tooltip.tsx (1)
  • Tooltiper (36-51)
apps/start/src/components/integrations/integration-card.tsx (3)
  • IntegrationCard (91-112)
  • IntegrationCardLogoImage (50-69)
  • IntegrationCardFooter (3-15)
apps/start/src/modals/add-import.tsx (3)
packages/validation/src/index.ts (5)
  • zUmamiImportConfig (452-454)
  • zMixpanelImportConfig (461-470)
  • IUmamiImportConfig (456-456)
  • IMixpanelImportConfig (471-471)
  • IImportConfig (473-476)
apps/start/src/components/project-mapper.tsx (1)
  • ProjectMapper (33-116)
apps/public/lib/utils.ts (1)
  • cn (4-6)
packages/common/server/parse-referrer.test.ts (1)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/importer/src/providers/mixpanel.ts (7)
packages/validation/src/index.ts (1)
  • IMixpanelImportConfig (471-471)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/common/server/parser-user-agent.ts (2)
  • parseUserAgent (72-115)
  • UserAgentInfo (71-71)
packages/common/src/object.ts (2)
  • toDots (3-36)
  • strip (48-48)
packages/importer/src/providers/umami.ts (5)
packages/validation/src/index.ts (1)
  • IUmamiImportConfig (456-456)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/db/src/services/import.service.ts (4)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/clickhouse/client.ts (4)
  • ch (131-161)
  • TABLE_NAMES (48-60)
  • convertClickhouseDateToJs (238-240)
  • formatClickhouseDate (208-219)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
apps/worker/src/jobs/import.ts (9)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (168-173)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/db/src/services/import.service.ts (10)
  • ImportSteps (671-671)
  • updateImportStatus (673-736)
  • getImportDateBounds (610-634)
  • insertImportBatch (28-55)
  • generateSessionIds (62-98)
  • createSessionsStartEndEvents (106-316)
  • moveImportsToProduction (322-426)
  • backfillSessionsToProduction (428-521)
  • markImportComplete (526-545)
  • getImportProgress (550-605)
packages/db/src/clickhouse/client.ts (2)
  • formatClickhouseDate (208-219)
  • error (40-45)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/importer/src/index.ts (2)
  • UmamiProvider (1-1)
  • MixpanelProvider (2-2)
packages/importer/src/providers/umami.ts (1)
  • UmamiProvider (59-382)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-441)
packages/common/scripts/get-referrers.ts (1)
apps/worker/scripts/get-referrers.ts (1)
  • main (66-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
🔇 Additional comments (43)
packages/db/src/services/session.service.ts (1)

195-199: Incomplete scope: Performance concern applies to both session and event query services.

The dateIntervalInDays change (7 → 360 days) was applied to both packages/db/src/services/session.service.ts (lines 195-199) and packages/db/src/services/event.service.ts (lines 418-432) with identical logic. The original review only identifies the session service impact but misses the event service, effectively doubling the performance concern.

Both services use the same binary threshold (1 day for >1M events, 360 days otherwise) to filter ClickHouse queries, meaning most organizations will experience a 51x increase in the lookback window across both query surfaces.

Recommendations remain the same for both services:

  • Introduce intermediate tiers based on organization size
  • Make the interval configurable per organization
  • Add UI controls for explicit date range selection

Likely an incorrect or invalid review comment.

apps/start/src/modals/date-ranger-picker.tsx (1)

30-30: LGTM! Good UX improvement.

Adding captionLayout="dropdown" provides a more compact date navigation interface, especially useful in modal contexts where space is limited.

apps/api/package.json (1)

67-67: LGTM! Reasonable pinning for build tool.

Pinning the build tool (tsdown) to an exact version ensures consistent builds across environments. Since this is a devDependency, the trade-off of missing automatic updates is acceptable.

packages/db/src/buffers/event-buffer.test.ts (2)

13-23: LGTM! Test data aligned with production settings.

The clickhouseSettings constant mirrors the settings added to the production ClickHouse client, ensuring tests validate the actual insert behavior including async operations, timeouts, and performance optimizations.


142-142: LGTM! Consistent test expectations.

All insert expectations now include the clickhouse_settings field, ensuring tests verify that the proxy-wrapped insert method properly injects these settings.

Also applies to: 187-187, 519-519, 570-570, 615-615, 640-640, 688-688

packages/common/package.json (1)

8-9: LGTM! Script for generating referrer mappings.

The new gen:referrers script generates up-to-date referrer mappings from an external source, with automatic formatting applied to the output.

packages/db/src/clickhouse/client.ts (2)

59-59: LGTM! Table name for import tracking.

Adding events_imports to the table names registry supports the new import system introduced in this PR.


64-67: LGTM! Extended timeouts for large import operations.

The increased timeouts accommodate large data imports:

  • request_timeout: 60s → 300s (5 minutes)
  • idle_socket_ttl: 8s → 60s

These align with the max_execution_time: 300 setting in the insert configuration.

packages/redis/package.json (1)

11-11: Accept ioredis 5.8.2 upgrade with exact pinning.

ioredis 5.8.2 is the latest version, and the upgrade is safe. v5.8.2 (released 2025-10-21) contains only bug fixes: default IP family selection and CLIENT SETINFO command improvements—no breaking changes. No security vulnerabilities were detected. The exact version pin (removing the caret) trades automatic patch-level security updates for build reproducibility, which is a valid tradeoff for production stability.

apps/worker/src/jobs/events.incoming-events.test.ts (1)

100-103: <!-- [scratchpad_start]
[task review_verification 1/1]
Analyze shell script results for referrerType normalization concerns.

[observations]

  • First regex search for referrer(Type)?\s*[=:]\s*['\"]unknown['\"] returned NO matches for filters checking equality with 'unknown'
  • Output shows field assignments/mappings, but NO evidence of dashboards/queries filtering on referrerType='unknown'
  • Code pattern in event.service.ts (lines 338-340) shows: referrer_type: payload.referrerType ?? '', — already defaults undefined/null to empty string
  • ClickHouse schema files show referrer_type as LowCardinality(String) — accepts empty strings
  • No breaking query patterns found in the codebase

[analysis]

  • Original concern #1 (queries filtering on 'unknown') — NOT FOUND; no evidence of such filters
  • Original concern #2 (DB defaults empty to '' for non-nullable columns) — CONFIRMED; code already uses ?? '' pattern
  • Change from test setting referrerType to '' is consistent with existing null/undefined handling

[pending]

  • Verify the specific test file change at apps/worker/src/jobs/events.incoming-events.test.ts lines 100-103 and 208-211
  • Confirm no other downstream assumptions exist

[done]
Script results show no matching filters on 'unknown' values and confirm existing empty-string defaulting pattern.
[scratchpad_end]
-->

Referrer type normalization aligns with existing code patterns — no breaking query filters found.

The codebase already normalizes undefined/null referrer fields to empty strings via nullish coalescing (?? '') in event.service.ts. No queries or dashboards filtering on referrerType='unknown' were detected in the search. The change to normalize referrerType to '' in the test is consistent with the production code's existing behavior.

apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)

45-45: LGTM! Clean addition of the Imports tab.

The new tab entry follows the existing pattern and integrates well with the settings navigation structure.

packages/trpc/package.json (1)

19-19: LGTM! Appropriate dependency addition.

Adding @openpanel/queue enables TRPC routes to interact with the import queue system, aligning with the new import workflow introduced in this PR.

packages/common/server/index.ts (1)

4-4: LGTM! Consistent re-export pattern.

The new re-export follows the existing pattern and appropriately expands the public API surface of the common/server module.

apps/worker/src/index.ts (1)

8-8: LGTM! ImportQueue properly integrated into Bull Board monitoring.

The addition of importQueue to the Bull Board UI enables visibility and monitoring of import jobs, which is essential for debugging and tracking the new import workflow.

Also applies to: 42-42

apps/start/src/modals/index.tsx (1)

8-8: LGTM! Modal registration follows established pattern.

The AddImport modal is properly registered in the modals registry, enabling the import configuration UI flow described in the PR objectives.

Also applies to: 41-41

package.json (1)

12-12: Verified: the common package correctly defines the gen:referrers script.

The script exists in packages/common/package.json and runs the referrer generation logic. The filter change from worker to common is valid and properly configured.

packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1)

1-2: Migration correctly implements nullable currentStep for import workflow resumption.

The codebase explicitly depends on currentStep being nullable: apps/worker/src/jobs/import.ts:56 checks if (record.currentStep === null) to identify new imports versus resumed imports. This null state allows the import job to distinguish between first-time executions (null) and resumptions from a previous step. The migration is correct as-is.

apps/start/package.json (1)

34-34: LGTM! Dependency addition aligns with the import feature.

The new @openpanel/importer workspace dependency is correctly added to support the import functionality introduced in this PR.

packages/trpc/src/root.ts (1)

7-7: LGTM! Import router correctly integrated.

The importRouter is properly imported and wired into the appRouter, following the established pattern for other routers in the application.

Also applies to: 44-44

packages/common/server/parser-user-agent.ts (1)

71-71: LGTM! Useful type export.

The new UserAgentInfo type export provides a clean type definition for consumers of parseUserAgent.

packages/common/server/parse-referrer.test.ts (1)

8-8: LGTM! Test expectations correctly updated.

The test expectations have been updated from 'unknown' to '' (empty string) to match the implementation changes in parse-referrer.ts. This is consistent with the new default behavior for unknown referrers.

Also applies to: 14-14, 44-44, 52-52, 67-67, 91-91, 99-99, 113-113

packages/importer/src/index.ts (1)

1-13: LGTM! Clean barrel export for importer package.

The index file properly exports the provider implementations and type definitions, establishing a clean public API for the importer package.

packages/queue/src/queues.ts (1)

168-181: LGTM! Import queue properly configured.

The new importQueue follows the established pattern for other queues in the codebase, with appropriate job retention settings (removeOnComplete: 10, removeOnFail: 50).

apps/worker/src/boot-workers.ts (1)

8-8: LGTM! Import worker correctly configured.

The importWorker is properly set up with:

  • Correct queue and job handler imports
  • Configurable concurrency via IMPORT_JOB_CONCURRENCY environment variable
  • Standard worker options

Also applies to: 23-23, 61-64

packages/common/server/parse-referrer.ts (3)

35-51: Improved referrer matching logic.

The updated matching logic is cleaner and more maintainable:

  • Source is now normalized to lowercase upfront
  • Matching checks direct key, .com suffix, and case-insensitive name lookup
  • Removes redundant fallback

The logic now properly handles case-insensitive matching for query parameters like utm_source=GoOgLe.


23-23: No breaking change detected.

The code already defaults to empty strings for unknown referrers. Tests confirm this is the expected behavior, and no consumers in the codebase check for type === 'unknown'. The alleged change from 'unknown' to '' either does not exist or was already in place prior to this PR.

Likely an incorrect or invalid review comment.


1-1: Import paths are correct and resolve properly.

Both imports in packages/common/server/parse-referrer.ts successfully resolve:

  • Line 1: ../src/stringpackages/common/src/string.ts
  • Line 3: ./referrerspackages/common/server/referrers/index.ts
packages/trpc/src/routers/import.ts (1)

101-133: LGTM!

Delete procedure correctly checks write permissions and safely removes queue job before deleting the record.

apps/worker/src/jobs/events.incoming-event.ts (2)

8-12: LGTM!

Import reorganization consolidates referrer utilities from @openpanel/common/server and correctly uses type-only import for Job.

Also applies to: 21-21, 23-23


120-122: Referrer priority adjusted — verify behavior.

baseEvent.referrer now exclusively uses referrer?.url (removing fallback to utmReferrer?.url), and referrerType prioritizes referrer?.type before utmReferrer?.type. This aligns with centralized referrer parsing, but ensure existing analytics flows handle the priority change correctly.

packages/db/prisma/schema.prisma (2)

470-475: LGTM!

ImportStatus enum defines a clear import lifecycle: pending → processing → completed/failed.


197-197: LGTM!

Project relation to imports is correctly defined and matches the Import model's foreign key.

apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (2)

88-122: LGTM!

getStatusBadge provides clear visual feedback with appropriate icons and tooltips for failed imports showing error details.


229-263: LGTM!

Progress display handles multiple scenarios (estimated counts, N/A, processing) with clear user communication via tooltips and progress bars.

apps/worker/src/jobs/import.ts (1)

285-298: LGTM!

createProvider factory cleanly instantiates provider-specific implementations and throws descriptive errors for unknown providers.

packages/importer/src/base-provider.ts (3)

4-31: LGTM!

Abstract base class defines a clear provider contract with streaming support (AsyncGenerator), transformation, validation, and event counting. Well-documented and type-safe.


33-58: LGTM!

Optional hooks provide extension points for providers without forcing implementation. Default behaviors (re-throw errors, return 0 for estimated total) are sensible.


78-120: LGTM!

getDateChunks utility correctly splits date ranges for batched processing, handles edge cases (same start/end date, chunks exceeding end date), and is well-documented.

apps/start/src/modals/add-import.tsx (3)

44-107: LGTM!

UmamiImport form correctly uses zUmamiImportConfig schema, includes required fields (provider, type, fileUrl), and integrates ProjectMapper for optional mappings.


115-229: LGTM!

MixpanelImport form correctly implements required fields with appropriate input types (password for secret) and integrates date range picker for selecting import window.


253-259: LGTM!

Payload construction correctly passes config from forms which already include required provider and type fields set in form defaults (Lines 59-60 for Umami, Lines 124-125 for Mixpanel).

packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1)

1-14: The migration concerns are valid and already identified in the codebase.

The migration under review makes currentStep NOT NULL without backfilling data, and the very next migration (20251028150655_fix_imports_again) immediately reverts this change, indicating the problem was encountered. Additionally, another follow-up migration (20251028152531_imports_again) modifies currentBatch again.

Since these are part of an active development cycle on a newly created table (10 days old as of the migration date), consider consolidating these migrations:

  • Combine 20251028150123, 20251028150655, and 20251028152531 into a single migration that correctly handles the schema without intermediate reversals, or
  • If these are across separate PRs, ensure currentStep remains nullable (or add a backfill to populate existing NULLs before setting NOT NULL).

The current state requires the next migration to fix the previous one, which suggests the schema design should be finalized before deployment.

packages/db/src/services/import.service.ts (1)

682-691: Verify terminal state handling.

Lines 684-691 unconditionally set status: 'processing' when step is 'loading'. If an import has already reached a terminal state ('completed' or 'failed'), this would revert it. Confirm whether this is intentional for retry scenarios, or if a state check is needed.

If terminal states should be protected, add a guard:

const current = await db.import.findUnique({ 
  where: { id: importId },
  select: { status: true }
});
if (current?.status === 'completed' || current?.status === 'failed') {
  // Skip or throw depending on requirements
  return;
}

"type": "module",
"scripts": {
"testing": "pnpm dev",
"testing2": "pnpm dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the script rename from "testing" to "testing2".

The script key was renamed from "testing" to "testing2" without clear justification. This appears to be a development artifact rather than an intentional change for this PR.

If this rename is intentional, please clarify the purpose. Otherwise, consider reverting it or using a more descriptive name.

🤖 Prompt for AI Agents
In apps/start/package.json around line 6, the script key was renamed from
"testing" to "testing2" which looks like an accidental dev artifact; either
revert the key back to "testing" to preserve original functionality or replace
it with a descriptive name (e.g., "dev" or "start:testing") if the rename is
intentional, and update any references in documentation or CI that call the
script accordingly; if intentional, add a short comment in the PR description
explaining the reason for the new name.

Comment on lines +23 to +31
interface ProjectMapperProps {
fields: FieldArrayWithId<any, 'projectMapper', 'id'>[];
append: UseFieldArrayAppend<any, 'projectMapper'>;
remove: UseFieldArrayRemove;
projects: IServiceProject[];
register: UseFormRegister<any>;
watch: UseFormWatch<any>;
setValue: UseFormSetValue<any>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Define a proper form type instead of using any.

The interface uses any for all generic type parameters, which defeats TypeScript's type safety. Define an explicit form schema type to enable proper type checking and autocomplete.

Apply this diff to add proper typing:

+interface ProjectMapperFormData {
+  projectMapper: Array<{ from: string; to: string }>;
+}
+
 interface ProjectMapperProps {
-  fields: FieldArrayWithId<any, 'projectMapper', 'id'>[];
-  append: UseFieldArrayAppend<any, 'projectMapper'>;
+  fields: FieldArrayWithId<ProjectMapperFormData, 'projectMapper', 'id'>[];
+  append: UseFieldArrayAppend<ProjectMapperFormData, 'projectMapper'>;
   remove: UseFieldArrayRemove;
   projects: IServiceProject[];
-  register: UseFormRegister<any>;
-  watch: UseFormWatch<any>;
-  setValue: UseFormSetValue<any>;
+  register: UseFormRegister<ProjectMapperFormData>;
+  watch: UseFormWatch<ProjectMapperFormData>;
+  setValue: UseFormSetValue<ProjectMapperFormData>;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
interface ProjectMapperProps {
fields: FieldArrayWithId<any, 'projectMapper', 'id'>[];
append: UseFieldArrayAppend<any, 'projectMapper'>;
remove: UseFieldArrayRemove;
projects: IServiceProject[];
register: UseFormRegister<any>;
watch: UseFormWatch<any>;
setValue: UseFormSetValue<any>;
}
interface ProjectMapperFormData {
projectMapper: Array<{ from: string; to: string }>;
}
interface ProjectMapperProps {
fields: FieldArrayWithId<ProjectMapperFormData, 'projectMapper', 'id'>[];
append: UseFieldArrayAppend<ProjectMapperFormData, 'projectMapper'>;
remove: UseFieldArrayRemove;
projects: IServiceProject[];
register: UseFormRegister<ProjectMapperFormData>;
watch: UseFormWatch<ProjectMapperFormData>;
setValue: UseFormSetValue<ProjectMapperFormData>;
}
🤖 Prompt for AI Agents
In apps/start/src/components/project-mapper.tsx around lines 23 to 31, the
ProjectMapperProps interface uses `any` for all react-hook-form generics;
replace these with a concrete form data type (e.g., define an exported
FormValues or ProjectForm type that describes fields used by the component,
including projectMapper: { id: string; <other fields> }[] and any nested
fields); update the interface to use that type in FieldArrayWithId<FormValues,
'projectMapper', 'id'>, UseFieldArrayAppend<FormValues, 'projectMapper'>,
UseFieldArrayRemove<FormValues>, UseFormRegister<FormValues>,
UseFormWatch<FormValues>, and UseFormSetValue<FormValues>; import or declare the
FormValues type near the top of the file (or from a shared types file) and
adjust any call sites to match the new type.

Comment on lines +82 to +98
<Select
value={watch(`projectMapper.${index}.to`)}
onValueChange={(value) =>
setValue(`projectMapper.${index}.to`, value)
}
>
<SelectTrigger className="mt-1 w-full" size="sm">
<SelectValue placeholder="Select project" />
</SelectTrigger>
<SelectContent>
{projects.slice(0, 10).map((project) => (
<SelectItem key={project.id} value={project.id}>
{project.name}
</SelectItem>
))}
</SelectContent>
</Select>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the arbitrary 10-project limit.

Line 92 limits the dropdown to only the first 10 projects, preventing users with more projects from mapping to those beyond the limit. The Select component should display all available projects with scrolling, or implement search/pagination if performance is a concern.

Apply this diff to show all projects:

                     <SelectContent>
-                      {projects.slice(0, 10).map((project) => (
+                      {projects.map((project) => (
                         <SelectItem key={project.id} value={project.id}>
                           {project.name}
                         </SelectItem>

If performance is a concern with many projects, consider adding a search input or ComboBox component instead of a plain Select.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<Select
value={watch(`projectMapper.${index}.to`)}
onValueChange={(value) =>
setValue(`projectMapper.${index}.to`, value)
}
>
<SelectTrigger className="mt-1 w-full" size="sm">
<SelectValue placeholder="Select project" />
</SelectTrigger>
<SelectContent>
{projects.slice(0, 10).map((project) => (
<SelectItem key={project.id} value={project.id}>
{project.name}
</SelectItem>
))}
</SelectContent>
</Select>
<Select
value={watch(`projectMapper.${index}.to`)}
onValueChange={(value) =>
setValue(`projectMapper.${index}.to`, value)
}
>
<SelectTrigger className="mt-1 w-full" size="sm">
<SelectValue placeholder="Select project" />
</SelectTrigger>
<SelectContent>
{projects.map((project) => (
<SelectItem key={project.id} value={project.id}>
{project.name}
</SelectItem>
))}
</SelectContent>
</Select>
🤖 Prompt for AI Agents
In apps/start/src/components/project-mapper.tsx around lines 82 to 98 the Select
is artificially limited by projects.slice(0, 10) which hides projects beyond the
first 10; remove the slice so the SelectContent maps over the entire projects
array (or replace with a virtualized list / searchable ComboBox if performance
is a concern) and ensure the SelectContent is allowed to scroll by keeping the
UI container styles intact.

Comment on lines +158 to +166
if (process.env.NODE_ENV === 'production') {
exitHandler(evt, code);
} else {
logger.info('Shutting down for development', {
event: evt,
code,
});
process.exit(0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Development exit handler bypasses graceful shutdown.

The modified exit handler now exits immediately with code 0 in development mode, bypassing the graceful shutdown logic (closing workers, waiting for queues). This could lead to:

  • Jobs left in inconsistent states
  • Incomplete database transactions
  • Resource leaks (unclosed Redis connections)

Consider one of the following approaches:

  1. Keep graceful shutdown for development but with a shorter timeout
  2. Document why immediate exit is acceptable in development
  3. At minimum, still close workers even if you skip waiting for queue completion
  ['uncaughtException', 'unhandledRejection', 'SIGTERM', 'SIGINT'].forEach(
    (evt) => {
      process.on(evt, (code) => {
-        if (process.env.NODE_ENV === 'production') {
-          exitHandler(evt, code);
-        } else {
-          logger.info('Shutting down for development', {
-            event: evt,
-            code,
-          });
-          process.exit(0);
-        }
+        exitHandler(evt, code);
      });
    },
  );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (process.env.NODE_ENV === 'production') {
exitHandler(evt, code);
} else {
logger.info('Shutting down for development', {
event: evt,
code,
});
process.exit(0);
}
exitHandler(evt, code);

Comment on lines +1 to +4
-- AlterTable
ALTER TABLE "public"."imports" ALTER COLUMN "currentBatch" DROP NOT NULL,
ALTER COLUMN "currentBatch" DROP DEFAULT,
ALTER COLUMN "currentBatch" SET DATA TYPE TEXT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the relaxed constraints on currentBatch align with batch tracking requirements.

This migration makes three significant changes to currentBatch:

  1. Removes NOT NULL constraint (allows imports without batches)
  2. Removes DEFAULT value (no automatic batch assignment)
  3. Changes type to TEXT (allows flexible batch identifiers)

Please confirm that:

  • Imports are expected to exist without a currentBatch (nullable is correct)
  • The workflow handles missing batch values appropriately
  • TEXT is the appropriate type for your batch identifier scheme (e.g., UUIDs, timestamps, custom strings)

🏁 Script executed:

#!/bin/bash
# Description: Check how currentBatch is used in import processing logic

# Search for currentBatch usage and null checks
rg -nP --type=ts -C3 'currentBatch' -g '!*.gen.ts' -g '!node_modules'

Length of output: 4902



🏁 Script executed:

#!/bin/bash
# Find the Prisma schema and check currentBatch field definition
fd -name 'schema.prisma' -o -name '*.prisma'

Length of output: 230


🏁 Script executed:

#!/bin/bash
# List migrations in order to find the one before this one
ls -1 packages/db/prisma/migrations/ | sort | tail -10

Length of output: 295


🏁 Script executed:

#!/bin/bash
# Check the imports table definition in the latest schema or migrations
fd -name '*.sql' -path '*/migrations/*' -exec grep -l 'currentBatch' {} \;

Length of output: 230



🏁 Script executed:

#!/bin/bash
# View the previous migration to understand what currentBatch type was
cat packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql

Length of output: 86


🏁 Script executed:

#!/bin/bash
# View the migration that added the import table initially
cat packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql

Length of output: 840


🏁 Script executed:

#!/bin/bash
# Search for schema.prisma in common locations
find packages/db -name 'schema.prisma' 2>/dev/null

Length of output: 32


🏁 Script executed:

#!/bin/bash
# Check what type currentBatch has in recent migrations
rg 'currentBatch' packages/db/prisma/migrations/ -A1 -B1

Length of output: 859



🏁 Script executed:

#!/bin/bash
# View the Prisma schema definition
cat packages/db/prisma/schema.prisma | grep -A 50 'model imports'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's a type generation process or codegen in Prisma
rg -A5 'generator|prisma' packages/db/prisma/schema.prisma | head -30

Length of output: 432


🏁 Script executed:

#!/bin/bash
# Check how batch parameter is used - what values are passed
rg -nP 'options\.batch|batch:' packages/db/src/ apps/worker/src/ -B2 -A2

Length of output: 4984



🏁 Script executed:

#!/bin/bash
# View full imports model from schema
rg -A 30 'model imports' packages/db/prisma/schema.prisma

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check generated Prisma types for imports
cat packages/db/src/generated/prisma/index.d.ts 2>/dev/null | grep -A 30 'interface imports' | head -40

Length of output: 0


🏁 Script executed:

#!/bin/bash
# View the imports.service.ts to see the UpdateImportStatusOptions type
rg -B5 -A10 'interface.*UpdateImportStatusOptions|type UpdateImportStatusOptions' packages/db/src/services/import.service.ts

Length of output: 357



🏁 Script executed:

#!/bin/bash
# Find and view the ImportProgress interface definition
rg -B3 -A8 'interface ImportProgress' packages/importer/src/types.ts

Length of output: 232


🏁 Script executed:

#!/bin/bash
# Check if there are any type errors or if the types are reconciled elsewhere
rg -C5 'currentBatch.*number|number.*currentBatch' packages/

Length of output: 530



🏁 Script executed:

#!/bin/bash
# Check where ImportProgress is used and how currentBatch is accessed
rg -C3 'ImportProgress|currentBatch' packages/importer/src/ apps/

Length of output: 3534


Fix TypeScript type definition for currentBatch in ImportProgress interface.

The migration correctly changes currentBatch from INTEGER NOT NULL DEFAULT 0 to TEXT nullable with no default. However, the TypeScript type definition needs to be updated:

  • File: packages/importer/src/types.ts, line 23
  • Current: currentBatch: number;
  • Should be: currentBatch: string | null;

Reason: The migration stores TEXT values (timestamps, batch identifiers), and the code already handles null batches. The type should reflect this reality.

🤖 Prompt for AI Agents
In packages/importer/src/types.ts around line 23, the ImportProgress interface
currently declares currentBatch: number; but the DB migration changes
currentBatch to nullable TEXT; update the TypeScript definition to currentBatch:
string | null; so the type matches stored values (text/batch IDs or null) and
adjust any places expecting a number accordingly.

Comment on lines +477 to +498
model Import {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
projectId String
project Project @relation(fields: [projectId], references: [id], onDelete: Cascade)
jobId String? // BullMQ job ID
status ImportStatus
statusMessage String? // Human-readable current step like "Importing events (Feb 2025)", "Generating session IDs"
errorMessage String?
/// [IPrismaImportConfig]
config Json
totalEvents Int @default(0)
processedEvents Int @default(0)
currentStep String?
currentBatch String? // String date 2020-01-01
createdAt DateTime @default(now())
completedAt DateTime?
updatedAt DateTime @default(now()) @updatedAt

@@map("imports")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistency: currentStep nullable in schema but NOT NULL in migration.

Line 489 defines currentStep as String? (optional), but migration 20251028150123_fix_imports_table/migration.sql sets it to NOT NULL. Align schema and migration:

  • If imports always have a step, make it required: currentStep String
  • If it's optional during creation, remove NOT NULL constraint from migration
- currentStep     String?
+ currentStep     String

Or revert the migration's SET NOT NULL constraint.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model Import {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
projectId String
project Project @relation(fields: [projectId], references: [id], onDelete: Cascade)
jobId String? // BullMQ job ID
status ImportStatus
statusMessage String? // Human-readable current step like "Importing events (Feb 2025)", "Generating session IDs"
errorMessage String?
/// [IPrismaImportConfig]
config Json
totalEvents Int @default(0)
processedEvents Int @default(0)
currentStep String?
currentBatch String? // String date 2020-01-01
createdAt DateTime @default(now())
completedAt DateTime?
updatedAt DateTime @default(now()) @updatedAt
@@map("imports")
}
model Import {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
projectId String
project Project @relation(fields: [projectId], references: [id], onDelete: Cascade)
jobId String? // BullMQ job ID
status ImportStatus
statusMessage String? // Human-readable current step like "Importing events (Feb 2025)", "Generating session IDs"
errorMessage String?
/// [IPrismaImportConfig]
config Json
totalEvents Int @default(0)
processedEvents Int @default(0)
currentStep String
currentBatch String? // String date 2020-01-01
createdAt DateTime @default(now())
completedAt DateTime?
updatedAt DateTime @default(now()) @updatedAt
@@map("imports")
}
🤖 Prompt for AI Agents
In packages/db/prisma/schema.prisma around lines 477 to 496, the model Import
declares currentStep as nullable (String?) while the migration
20251028150123_fix_imports_table/migration.sql sets current_step to NOT NULL;
pick one and make them consistent: if imports must always have a step, change
schema currentStep to non-nullable (currentStep String) and run prisma migrate
to regenerate the SQL; otherwise, edit the migration SQL to remove the SET NOT
NULL on current_step (or create a new migration that drops the NOT NULL
constraint) so the column remains nullable, then reapply migrations.

Comment on lines +85 to +91
this.logger?.info('Fetching events from Mixpanel', {
url: `${url}?${params}`,
from,
to,
projectId,
serviceAccount,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sensitive credentials logged.

Line 90 logs the serviceAccount credential. Even though the secret is not logged, exposing account identifiers in logs can aid attackers in credential stuffing or enumeration attacks.

Consider removing sensitive fields from logs:

 this.logger?.info('Fetching events from Mixpanel', {
   url: `${url}?${params}`,
   from,
   to,
   projectId,
-  serviceAccount,
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
this.logger?.info('Fetching events from Mixpanel', {
url: `${url}?${params}`,
from,
to,
projectId,
serviceAccount,
});
this.logger?.info('Fetching events from Mixpanel', {
url: `${url}?${params}`,
from,
to,
projectId,
});
🤖 Prompt for AI Agents
In packages/importer/src/providers/mixpanel.ts around lines 85 to 91, the logger
call includes the serviceAccount identifier which is considered sensitive;
remove serviceAccount from the logged metadata and only log non-sensitive
context (url, from, to, projectId) or replace serviceAccount with a
redacted/boolean flag (e.g., serviceAccountPresent: true) if presence needs to
be recorded; update the logger invocation accordingly and run a quick search to
ensure no other logs emit service account identifiers.

Comment on lines +160 to +176
// Add new job to queue
const job = await importQueue.add('import', {
type: 'import',
payload: {
importId: importRecord.id,
},
});

// Update import record
return db.import.update({
where: { id: importRecord.id },
data: {
jobId: job.id,
status: 'pending',
errorMessage: null,
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrap retry enqueue in try/catch.

Same issue as create: if importQueue.add fails, the record is updated to 'pending' but no job exists. Wrap in try/catch and revert status on failure.

      // Add new job to queue
-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
-
-      // Update import record
-      return db.import.update({
-        where: { id: importRecord.id },
-        data: {
-          jobId: job.id,
-          status: 'pending',
-          errorMessage: null,
-        },
-      });
+      try {
+        const job = await importQueue.add(
+          'import',
+          { type: 'import', payload: { importId: importRecord.id } },
+          { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+        );
+        return db.import.update({
+          where: { id: importRecord.id },
+          data: {
+            jobId: job.id,
+            status: 'pending',
+            errorMessage: null,
+          },
+        });
+      } catch (err) {
+        throw new Error('Failed to enqueue retry job: ' + (err instanceof Error ? err.message : String(err)));
+      }
    }),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Add new job to queue
const job = await importQueue.add('import', {
type: 'import',
payload: {
importId: importRecord.id,
},
});
// Update import record
return db.import.update({
where: { id: importRecord.id },
data: {
jobId: job.id,
status: 'pending',
errorMessage: null,
},
});
// Add new job to queue
try {
const job = await importQueue.add(
'import',
{ type: 'import', payload: { importId: importRecord.id } },
{ jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
);
return db.import.update({
where: { id: importRecord.id },
data: {
jobId: job.id,
status: 'pending',
errorMessage: null,
},
});
} catch (err) {
throw new Error('Failed to enqueue retry job: ' + (err instanceof Error ? err.message : String(err)));
}
🤖 Prompt for AI Agents
In packages/trpc/src/routers/import.ts around lines 160 to 176, the code calls
importQueue.add and then unconditionally updates the DB record to status
'pending', which can leave the DB pointing to a non‑existent job if add fails;
wrap the call to importQueue.add in a try/catch, only update the import record
to 'pending' and set jobId after a successful add, and in the catch update the
import record to a terminal state (e.g., 'failed') with errorMessage set to the
caught error (or revert to the previous status if tracked); ensure the DB
updates are awaited and consider doing the job add and record update in a
transaction or in the correct sequence so no pending state is recorded when
enqueueing fails.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/trpc/src/routers/auth.ts (2)

227-229: Remove sensitive console logs from auth flow.

Logging session objects and cookie setters risks leaking tokens/PII.

Apply this diff:

-      console.log('session', session);
       setSessionTokenCookie(ctx.setCookie, token, session.expiresAt);
-      console.log('ctx.setCookie', ctx.setCookie);

201-203: Avoid credential enumeration; return a generic auth error.

“User does not exists” reveals account presence. Use a single generic message.

Apply this diff:

-import { TRPCAccessError, TRPCNotFoundError } from '../errors';
+import { TRPCAccessError, TRPCNotFoundError } from '../errors';
@@
-      if (!user) {
-        throw TRPCNotFoundError('User does not exists');
-      }
+      if (!user) {
+        throw TRPCAccessError('Incorrect email or password');
+      }
Note: If you prefer stricter semantics, you can also replace the other branch’s incorrect-password error to the same message to keep responses uniform.

Also applies to: 31-31

apps/worker/src/jobs/events.incoming-event.ts (1)

39-44: Do not log full event payload; redact to avoid PII leakage.

Logging entire payload and headers can include emails, IDs, and custom properties. Log only identifiers and sizes.

-  logger.info('Creating event', { event: payload, jobData });
+  logger.info('Creating event', {
+    name: payload.name,
+    projectId: payload.projectId,
+    profileId: payload.profileId,
+    sessionId: (payload as any).sessionId,
+    deviceId: (payload as any).deviceId,
+    propertiesSize: Object.keys(payload.properties ?? {}).length,
+  });
♻️ Duplicate comments (31)
apps/start/src/components/project-mapper.tsx (2)

23-31: Define a proper form type instead of using any.

The interface uses any for all generic type parameters, which defeats TypeScript's type safety. Define an explicit form schema type to enable proper type checking and autocomplete.


92-92: Remove the arbitrary 10-project limit.

The slice(0, 10) limits the dropdown to only the first 10 projects, preventing users with more projects from mapping to those beyond the limit. The Select component should display all available projects with scrolling, or implement search/pagination if performance is a concern.

packages/common/scripts/get-referrers.ts (1)

74-74: Fix incorrect output path.

The script writes to ../../worker/src/referrers/index.ts, but from packages/common/scripts/, this resolves to an unexpected location outside the package. The corresponding format command in package.json expects ./server/referrers/index.ts relative to the package root.

Apply this diff to fix the output path:

-      path.resolve(__dirname, '../../worker/src/referrers/index.ts'),
+      path.resolve(__dirname, '../server/referrers/index.ts'),
packages/db/code-migrations/5-add-imports-table.ts (2)

1-2: Fix __filename usage in ESM module.

The package is configured as ESM ("type": "module"), and using __filename directly will fail at runtime. Import fileURLToPath from node:url and derive __filename properly.

Apply this diff:

 import fs from 'node:fs';
 import path from 'node:path';
+import { fileURLToPath } from 'node:url';
 import { TABLE_NAMES } from '../src/clickhouse/client';
 import {
   chMigrationClient,
   createTable,
   runClickhouseMigrationCommands,
 } from '../src/clickhouse/migration';
 import { getIsCluster } from './helpers';
 
+const __filename = fileURLToPath(import.meta.url);
+
 export async function up() {

Also applies to: 72-72


65-69: TTL on Distributed table will fail in clustered mode.

ALTER TABLE events_imports MODIFY TTL will fail or be a no-op on the Distributed table in clustered deployments. TTL must be defined on the local Replicated MergeTree table, preferably in the CREATE TABLE statement.

Consider extending createTable to support TTL parameter and include it in the CREATE TABLE statement, or target the replicated table explicitly with ON CLUSTER '{cluster}' if the ALTER approach is retained.

packages/validation/src/index.ts (1)

478-486: Discriminate by provider to prevent mismatched payloads.

The current schema allows provider and config to disagree (e.g., provider='umami' with a Mixpanel config). This was flagged in a previous review and remains unresolved.

Apply a discriminated union on provider so the config shape is tied to the selected provider:

-export const zCreateImport = z.object({
-  projectId: z.string().min(1),
-  provider: z.enum(['umami', 'plausible', 'mixpanel']),
-  config: z.union([
-    zUmamiImportConfig,
-    zPlausibleImportConfig,
-    zMixpanelImportConfig,
-  ]),
-});
+export const zCreateImport = z.discriminatedUnion('provider', [
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('umami'),
+    config: zUmamiImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('plausible'),
+    config: zPlausibleImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('mixpanel'),
+    config: zMixpanelImportConfig,
+  }),
+]);
packages/importer/src/types.ts (1)

53-66: EventProperties allows nested objects, but ClickHouse requires primitives only.

The EventProperties interface (line 61) allows Record<string, unknown> and defines __query as Record<string, unknown> (line 62). This conflicts with ClickHouse's IClickhouseEvent.properties which must be Record<string, string|number|boolean|undefined|null>. Storing nested objects will break ingestion unless stringified. This was flagged in a previous review and remains unresolved.

Either constrain EventProperties to primitives only:

 export interface EventProperties {
   [key: string]:
     | string
     | number
     | boolean
     | null
-    | undefined
-    | Record<string, unknown>;
-  __query?: Record<string, unknown>;
+    | undefined;
+  __query?: string; // JSON.stringify before storing
   __title?: string;
   __screen?: string;
   __language?: string;
 }

Or explicitly handle JSON stringification for nested fields before they reach ClickHouse.

packages/db/prisma/migrations/20251018214030_fix/migration.sql (1)

10-13: Data loss risk: backfill config before dropping columns.

Migrate provider/source fields into config first, then DROP. Example:

+-- Backfill config from legacy columns before dropping them
+UPDATE "public"."imports" i
+SET "config" = COALESCE(
+  jsonb_build_object(
+    'provider', i."provider",
+    'sourceType', i."sourceType",
+    'sourceLocation', i."sourceLocation"
+  ) || COALESCE(i."config", '{}'::jsonb),
+  '{}'::jsonb
+)
+WHERE i."provider" IS NOT NULL
+   OR i."sourceType" IS NOT NULL
+   OR i."sourceLocation" IS NOT NULL;
+
 -- AlterTable
 ALTER TABLE "public"."imports" DROP COLUMN "provider",
 DROP COLUMN "sourceLocation",
 DROP COLUMN "sourceType",
 ALTER COLUMN "config" DROP DEFAULT;
packages/importer/vitest.config.ts (1)

3-3: ESM: __dirname is undefined. Compute from import.meta.url.

Apply:

+import { dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
 import { getSharedVitestConfig } from '../../vitest.shared';
 
-export default getSharedVitestConfig({ __dirname });
+export default getSharedVitestConfig({
+  __dirname: dirname(fileURLToPath(import.meta.url)),
+});
apps/worker/src/boot-workers.ts (2)

66-73: eventsGroupWorker is unmanaged; either add it back to lifecycle or remove it entirely.

Currently it runs but is excluded from handlers/close, risking leaks and unobserved failures. Choose one path and apply consistently.

Option A — manage it with the others (recommended):

   const workers = [
     sessionsWorker,
     cronWorker,
     notificationWorker,
     miscWorker,
     importWorker,
-    // eventsGroupWorker,
+    // GroupMQ worker participates in lifecycle
+    // (we still keep eventsGroupWorker.run(); above)
+    eventsGroupWorker as unknown as Worker,
   ];

Option B — if intentionally disabled, remove it completely:

-  const eventsGroupWorker = new GroupWorker<...>({ ... });
-  eventsGroupWorker.run();
+  // eventsGroupWorker intentionally disabled; remove instantiation to avoid orphaned worker.

155-166: Always use graceful shutdown; remove the dev fast-exit path.

Avoid skipping worker close/queue drain in dev.

-        if (process.env.NODE_ENV === 'production') {
-          exitHandler(evt, code);
-        } else {
-          logger.info('Shutting down for development', {
-            event: evt,
-            code,
-          });
-          process.exit(0);
-        }
+        exitHandler(evt, code);
packages/db/prisma/schema.prisma (1)

489-489: Schema/migration mismatch on currentStep.

Schema has currentStep String? but the migration sets it NOT NULL. Keep it nullable or update both consistently. This also aligns with import.ts where currentStep === null denotes a new import.

packages/trpc/src/routers/import.ts (4)

24-31: Sensitive config exposure in list — redact for read-level access.

Return imports without leaking config to read-only members. Gate by role and null-out config when needed.

Apply:

-      return db.import.findMany({
-        where: {
-          projectId: input.projectId,
-        },
-        orderBy: {
-          createdAt: 'desc',
-        },
-      });
+      const rows = await db.import.findMany({
+        where: { projectId: input.projectId },
+        orderBy: { createdAt: 'desc' },
+      });
+      const canSeeSecrets =
+        typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets ? rows : rows.map((r) => ({ ...r, config: null }));

55-56: Sensitive config exposure in get — redact for read-level access.

Avoid returning secrets to read-only members.

Apply:

-      return importRecord;
+      const canSeeSecrets =
+        typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets ? importRecord : { ...importRecord, config: null };

81-99: Enqueue not guarded — pending record can orphan without a job.

Wrap queue add in try/catch, use stable jobId, attempts, and mark DB failed on error.

Apply:

-      // Add job to queue
-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
-
-      // Update import record with job ID
-      await db.import.update({
-        where: { id: importRecord.id },
-        data: { jobId: job.id },
-      });
+      // Add job to queue
+      try {
+        const job = await importQueue.add(
+          'import',
+          { type: 'import', payload: { importId: importRecord.id } },
+          { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+        );
+        await db.import.update({
+          where: { id: importRecord.id },
+          data: { jobId: job.id },
+        });
+      } catch (err) {
+        await db.import.update({
+          where: { id: importRecord.id },
+          data: {
+            status: 'failed',
+            errorMessage: err instanceof Error ? err.message : 'Queue enqueue failed',
+          },
+        });
+        throw err;
+      }

160-176: Retry enqueue not guarded — can set 'pending' without job.

Same reliability issue: wrap in try/catch and only update status on success.

Apply:

-      // Add new job to queue
-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
-
-      // Update import record
-      return db.import.update({
-        where: { id: importRecord.id },
-        data: {
-          jobId: job.id,
-          status: 'pending',
-          errorMessage: null,
-        },
-      });
+      // Add new job to queue
+      try {
+        const job = await importQueue.add(
+          'import',
+          { type: 'import', payload: { importId: importRecord.id } },
+          { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+        );
+        return db.import.update({
+          where: { id: importRecord.id },
+          data: { jobId: job.id, status: 'pending', errorMessage: null },
+        });
+      } catch (err) {
+        throw new Error(
+          'Failed to enqueue retry job: ' +
+            (err instanceof Error ? err.message : String(err)),
+        );
+      }
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1)

265-276: Config tooltip exposes credentials — sanitize/redact before display.

Redact known secret keys and handle null config.

Apply:

+const redactConfig = (cfg: any) => {
+  if (!cfg) return null;
+  const secretKeys = new Set([
+    'apiKey','apiSecret','serviceAccount','serviceSecret','token','password',
+    'secret','privateKey','accessKey','secretAccessKey',
+  ]);
+  const walk = (obj: any): any =>
+    Object.fromEntries(
+      Object.entries(obj).map(([k, v]) => [
+        k,
+        v && typeof v === 'object' ? walk(v) : secretKeys.has(k) ? '***' : v,
+      ]),
+    );
+  return walk(cfg);
+};
...
                   <Tooltiper
                     content={
                       <pre className="font-mono text-sm leading-normal whitespace-pre-wrap break-words">
-                          {JSON.stringify(imp.config, null, 2)}
+                          {JSON.stringify(redactConfig(imp.config), null, 2)}
                       </pre>
                     }
                     tooltipClassName="max-w-xs"
                   >
                     <Badge>Config</Badge>
                   </Tooltiper>
apps/worker/src/jobs/import.ts (2)

129-137: Avoid logging full raw events (PII leak).

Log minimal reason/context only.

Apply:

-        ) {
-          jobLogger.warn('Skipping invalid event', { rawEvent });
+        ) {
+          jobLogger.warn('Skipping invalid event', {
+            provider: providerInstance.provider,
+            reason: 'validation_failed',
+          });
           continue;
         }

251-258: Sync final Postgres counters with ClickHouse progress for UI.

Update totalEvents/processedEvents on completion.

Apply:

     const finalProgress = await getImportProgress(importId);
@@
     jobLogger.info('Import job completed successfully', {
       totalEvents: finalProgress.totalEvents,
       insertedEvents: finalProgress.insertedEvents,
       status: finalProgress.status,
     });
+
+    await db.import.update({
+      where: { id: importId },
+      data: {
+        totalEvents: finalProgress.totalEvents,
+        processedEvents: finalProgress.insertedEvents,
+      },
+    });
packages/importer/src/providers/umami.ts (6)

1-15: Add SSRF guard before fetching remote files.

Validate scheme and resolve host to block private/loopback ranges.

Apply:

 import { randomUUID } from 'node:crypto';
 import { Readable } from 'node:stream';
 import { pipeline } from 'node:stream/promises';
 import { createBrotliDecompress, createGunzip } from 'node:zlib';
+import { lookup } from 'node:dns/promises';
+import { isIP } from 'node:net';
 import { isSameDomain, parsePath } from '@openpanel/common';
 import { generateDeviceId } from '@openpanel/common/server';
 import { getReferrerWithQuery, parseReferrer } from '@openpanel/common/server';
 import type { IClickhouseEvent } from '@openpanel/db';
 import type { ILogger } from '@openpanel/logger';
 import type { IUmamiImportConfig } from '@openpanel/validation';
 import { parse } from 'csv-parse';
-import { assocPath } from 'ramda';
 import { z } from 'zod';
 import { BaseImportProvider } from '../base-provider';
+
+async function assertRemoteUrlAllowed(target: string) {
+  const u = new URL(target);
+  if (!/^https?:$/i.test(u.protocol)) {
+    throw new Error('Only http/https URLs are allowed');
+  }
+  const addrs = await lookup(u.hostname, { all: true, verbatim: true });
+  const isPrivate = (addr: string) =>
+    addr === '127.0.0.1' ||
+    addr === '::1' ||
+    addr.startsWith('10.') ||
+    /^172\.(1[6-9]|2\d|3[0-1])\./.test(addr) ||
+    addr.startsWith('192.168.') ||
+    addr.startsWith('fc') ||
+    addr.startsWith('fd') ||
+    addr.startsWith('fe80:');
+  for (const { address } of addrs) {
+    if (isIP(address) && isPrivate(address)) {
+      throw new Error('Blocked private address');
+    }
+  }
+}

79-87: Call SSRF guard before fetch.

Block unsafe targets early.

Apply:

   private async *parseRemoteFile(
     url: string,
@@
   ): AsyncGenerator<UmamiRawEvent, void, unknown> {
     const { signal, maxBytes, maxRows } = opts;
     const controller = new AbortController();
@@
-    const res = await fetch(url, { signal: controller.signal });
+    await assertRemoteUrlAllowed(url);
+    const res = await fetch(url, { signal: controller.signal });

Also applies to: 97-103


213-221: Device ID fallback is invalid; use stable available identifiers.

Stop passing visit_id as ip/ua.

Apply:

-    const deviceId =
-      rawEvent.visit_id ||
-      generateDeviceId({
-        ip: rawEvent.visit_id!,
-        ua: rawEvent.visit_id!,
-        origin: projectId,
-        salt: 'xxx',
-      });
+    const deviceId =
+      rawEvent.visit_id ||
+      rawEvent.distinct_id ||
+      rawEvent.session_id ||
+      rawEvent.event_id ||
+      randomUUID();

224-229: Fix URL construction (require hostname, prefix query with '?').

Avoid invalid URLs.

Apply:

-    const url = rawEvent.url_path
-      ? `https://${[rawEvent.hostname, rawEvent.url_path, rawEvent.url_query]
-          .filter(Boolean)
-          .join('')}`
-      : '';
+    let url = '';
+    if (rawEvent.hostname && rawEvent.url_path) {
+      const q = rawEvent.url_query
+        ? rawEvent.url_query.startsWith('?')
+          ? rawEvent.url_query
+          : `?${rawEvent.url_query}`
+        : '';
+      url = `https://${rawEvent.hostname}${rawEvent.url_path}${q}`;
+    }

231-234: Include referrer_query in referrer URL.

Preserve query parameters.

Apply:

-    const referrerUrl = rawEvent.referrer_domain
-      ? `https://${rawEvent.referrer_domain}${rawEvent.referrer_path || ''}`
-      : '';
+    const refQuery = rawEvent.referrer_query
+      ? rawEvent.referrer_query.startsWith('?')
+        ? rawEvent.referrer_query
+        : `?${rawEvent.referrer_query}`
+      : '';
+    const referrerUrl = rawEvent.referrer_domain
+      ? `https://${rawEvent.referrer_domain}${rawEvent.referrer_path || ''}${refQuery}`
+      : '';

257-297: Flatten properties to primitives; ClickHouse rejects nested objects.

Replace nested __query and assocPath usage with flat primitive keys.

Apply:

-    let properties: Record<string, any> = {};
-
-    if (query) {
-      properties.__query = query;
-    }
+    const properties: Record<
+      string,
+      string | number | boolean | null | undefined
+    > = {};
+    if (query) {
+      for (const [k, v] of Object.entries(query)) {
+        properties[`query.${k}`] = v as string;
+      }
+    }
@@
-    if (rawEvent.utm_source)
-      properties = assocPath(
-        ['__query', 'utm_source'],
-        rawEvent.utm_source,
-        properties,
-      );
-    if (rawEvent.utm_medium)
-      properties = assocPath(
-        ['__query', 'utm_medium'],
-        rawEvent.utm_medium,
-        properties,
-      );
-    if (rawEvent.utm_campaign)
-      properties = assocPath(
-        ['__query', 'utm_campaign'],
-        rawEvent.utm_campaign,
-        properties,
-      );
-    if (rawEvent.utm_content)
-      properties = assocPath(
-        ['__query', 'utm_content'],
-        rawEvent.utm_content,
-        properties,
-      );
-    if (rawEvent.utm_term)
-      properties = assocPath(
-        ['__query', 'utm_term'],
-        rawEvent.utm_term,
-        properties,
-      );
+    if (rawEvent.utm_source) properties.utm_source = rawEvent.utm_source;
+    if (rawEvent.utm_medium) properties.utm_medium = rawEvent.utm_medium;
+    if (rawEvent.utm_campaign) properties.utm_campaign = rawEvent.utm_campaign;
+    if (rawEvent.utm_content) properties.utm_content = rawEvent.utm_content;
+    if (rawEvent.utm_term) properties.utm_term = rawEvent.utm_term;
apps/start/src/modals/add-import.tsx (1)

245-249: Avoid leaking upstream error details to end users.

Use a generic error toast; log details server-side.

Apply:

-      onError: (error) => {
-        toast.error('Import failed', {
-          description: error.message,
-        });
-      },
+      onError: () => {
+        toast.error('Import failed', {
+          description: 'We could not start the import. Please try again later.',
+        });
+      },
packages/importer/src/providers/mixpanel.ts (1)

85-91: Sensitive credentials logged.

Line 90 logs the serviceAccount credential, which can aid attackers in credential stuffing or enumeration attacks.

packages/db/src/services/import.service.ts (4)

256-256: Risk of timestamp collisions with existing events.

Lines 256 and 293 adjust timestamps by exactly ±1000ms (1 second), which could create timestamp conflicts with real events at those exact timestamps, leading to non-deterministic event ordering.

Also applies to: 293-293


505-506: Inconsistent parameter handling.

Lines 505-506 use sqlstring.escape() for inline parameter substitution, while the rest of the codebase uses ClickHouse's query_params mechanism. This inconsistency reduces maintainability.


559-559: Use deterministic status aggregation instead of any(import_status).

Line 559 uses any(import_status) which returns an arbitrary status from the group, making the result non-deterministic when events have mixed statuses.


603-603: Validate status value instead of using type assertion.

Line 603 uses a type assertion which bypasses type safety. If ClickHouse returns an unexpected status value, the type system won't catch it.

🧹 Nitpick comments (15)
packages/common/server/parser-user-agent.ts (1)

84-112: Consider reducing code duplication.

The pattern typeof overrides?.__x === 'string' && overrides?.__x ? overrides?.__x : res.x is repeated seven times for different fields. This increases maintenance overhead—if the override logic needs adjustment, you'd need to update it in seven places.

Consider extracting this logic into a helper function:

+function getOverride<T>(
+  override: unknown,
+  fallback: T | undefined
+): T | undefined | string {
+  return typeof override === 'string' && override !== '' ? override : fallback;
+}
+
 export function parseUserAgent(
   ua?: string | null,
   overrides?: Record<string, unknown>,
 ) {
   if (!ua) return parsedServerUa;
   const res = parse(ua);
 
   if (isServer(res)) {
     return parsedServerUa;
   }
 
   return {
-    os:
-      typeof overrides?.__os === 'string' && overrides?.__os
-        ? overrides?.__os
-        : res.os.name,
+    os: getOverride(overrides?.__os, res.os.name),
-    osVersion:
-      typeof overrides?.__osVersion === 'string' && overrides?.__osVersion
-        ? overrides?.__osVersion
-        : res.os.version,
+    osVersion: getOverride(overrides?.__osVersion, res.os.version),
-    browser:
-      typeof overrides?.__browser === 'string' && overrides?.__browser
-        ? overrides?.__browser
-        : res.browser.name,
+    browser: getOverride(overrides?.__browser, res.browser.name),
-    browserVersion:
-      typeof overrides?.__browserVersion === 'string' &&
-      overrides?.__browserVersion
-        ? overrides?.__browserVersion
-        : res.browser.version,
+    browserVersion: getOverride(overrides?.__browserVersion, res.browser.version),
-    device:
-      typeof overrides?.__device === 'string' && overrides?.__device
-        ? overrides?.__device
-        : res.device.type || getDevice(ua),
+    device: getOverride(overrides?.__device, res.device.type || getDevice(ua)),
-    brand:
-      typeof overrides?.__brand === 'string' && overrides?.__brand
-        ? overrides?.__brand
-        : res.device.vendor,
+    brand: getOverride(overrides?.__brand, res.device.vendor),
-    model:
-      typeof overrides?.__model === 'string' && overrides?.__model
-        ? overrides?.__model
-        : res.device.model,
+    model: getOverride(overrides?.__model, res.device.model),
     isServer: false,
   } as const;
 }
packages/db/src/buffers/base-buffer.ts (1)

92-126: Avoid lock expiry during long flushes (possible double processing).

If onFlush exceeds lockTimeout (60s), the Redis lock can expire mid-run and another worker may acquire it. Add a heartbeat to extend TTL only when the lockId still matches.

Apply this diff:

@@
   private async releaseLock(lockId: string): Promise<void> {
@@
   }
 
+  private async refreshLock(lockId: string): Promise<void> {
+    const script = `
+      if redis.call("get", KEYS[1]) == ARGV[1] then
+        return redis.call("expire", KEYS[1], ARGV[2])
+      else
+        return 0
+      end
+    `;
+    await getRedisCache().eval(
+      script,
+      1,
+      this.lockKey,
+      lockId,
+      this.lockTimeout.toString(),
+    );
+  }
+
   async tryFlush() {
@@
-    if (acquired === 'OK') {
+    if (acquired === 'OK') {
+      const intervalMs =
+        Math.max(10, Math.floor(this.lockTimeout / 2)) * 1000;
+      const heartbeat = setInterval(() => {
+        void this.refreshLock(lockId).catch(() => {});
+      }, intervalMs);
       try {
@@
       } catch (error) {
@@
       } finally {
+        clearInterval(heartbeat);
         await this.releaseLock(lockId);
@@
       }
     }
   }
packages/trpc/src/routers/auth.ts (1)

152-154: Use the correct error type for “User already exists”.

This is a conflict, not a not-found case. Prefer a 400/409-style error.

Apply this diff:

-import { TRPCAccessError, TRPCNotFoundError } from '../errors';
+import {
+  TRPCAccessError,
+  TRPCNotFoundError,
+  TRPCBadRequestError,
+} from '../errors';
@@
-      if (user) {
-        throw TRPCNotFoundError('User already exists');
-      }
+      if (user) {
+        throw TRPCBadRequestError('User already exists');
+      }

Also applies to: 31-31

packages/importer/src/providers/mixpanel.test.ts (1)

5-23: Granularity mismatch: daily vs monthly chunks.

Test expects daily chunks, but Mixpanel provider comments mention monthly chunking for reliability. Align expectation and implementation (or update the comment) to avoid future drift.

packages/importer/src/index.ts (1)

1-13: LGTM: coherent public surface.

Exports for Umami/Mixpanel and core types look consistent. Consider exporting BaseImportProvider if external providers are intended later.

apps/worker/src/boot-workers.ts (2)

61-64: Clamp concurrency to ≥1 to avoid invalid worker configs.

ENV may be "0" or non-numeric; guard with fallback.

-  concurrency: Number.parseInt(process.env.IMPORT_JOB_CONCURRENCY || '1', 10),
+  concurrency:
+    Math.max(1, Number.parseInt(process.env.IMPORT_JOB_CONCURRENCY ?? '', 10) || 1),
-  concurrency: Number.parseInt(process.env.EVENT_JOB_CONCURRENCY || '1', 10),
+  concurrency:
+    Math.max(1, Number.parseInt(process.env.EVENT_JOB_CONCURRENCY ?? '', 10) || 1),

Also applies to: 37-38


137-143: Graceful shutdown should consider all active queues (include import).

Only waiting on cron can exit while imports are active. Wait for key queues or rely solely on worker.close.

-      await waitForQueueToEmpty(cronQueue);
+      // Wait for critical queues to drain before closing workers
+      await Promise.all([
+        waitForQueueToEmpty(cronQueue),
+        waitForQueueToEmpty(sessionsQueue),
+        waitForQueueToEmpty(notificationQueue),
+        waitForQueueToEmpty(miscQueue),
+        waitForQueueToEmpty(importQueue),
+      ]);
packages/queue/src/queues.ts (1)

168-181: Add retries/backoff for import jobs.

Imports are long-running and transiently fail (rate limits, net hiccups). Configure sane defaults.

 export const importQueue = new Queue<ImportQueuePayload>('import', {
   connection: getRedisQueue(),
   defaultJobOptions: {
     removeOnComplete: 10,
     removeOnFail: 50,
+    attempts: 3,
+    backoff: { type: 'exponential', delay: 5000 },
+    // prevent duplicates if enqueued idempotently from API
+    // jobId can be importId if desired at call-site
   },
 });
packages/db/prisma/schema.prisma (1)

477-496: Import model polish: set defaults and add indexes; keep currentStep nullable.

  • Set status default to 'pending' to match lifecycle.
  • Add indexes for common lookups (by projectId, status, recency).
  • Keep currentStep optional; worker logic uses NULL for “new import”.
 model Import {
   id              String       @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
   projectId       String
   project         Project      @relation(fields: [projectId], references: [id], onDelete: Cascade)
   jobId           String? // BullMQ job ID
-  status          ImportStatus
+  status          ImportStatus @default(pending)
   statusMessage   String? // Human-readable current step like "Importing events (Feb 2025)", "Generating session IDs"
   errorMessage    String?
   /// [IPrismaImportConfig]
   config          Json
   totalEvents     Int          @default(0)
   processedEvents Int          @default(0)
-  currentStep     String?
+  currentStep     String?
   currentBatch    String? // String date 2020-01-01
   createdAt       DateTime     @default(now())
   completedAt     DateTime?
   updatedAt       DateTime     @default(now()) @updatedAt

   @@map("imports")
+  @@index([projectId, createdAt(sort: Desc)])
+  @@index([status])
+  @@index([projectId, status])
 }
packages/trpc/src/routers/import.ts (2)

121-126: Make job removal resilient.

Wrap job.remove() in try/catch so delete isn’t blocked by queue errors; optionally skip removal if status is processing.

Apply:

-      if (importRecord.jobId) {
-        const job = await importQueue.getJob(importRecord.jobId);
-        if (job) {
-          await job.remove();
-        }
-      }
+      if (importRecord.jobId) {
+        try {
+          const job = await importQueue.getJob(importRecord.jobId);
+          if (job) await job.remove();
+        } catch (e) {
+          // Best-effort cleanup; continue with delete
+        }
+      }

24-31: Paginate list to avoid unbounded result sets.

Add take/skip or cursor-based pagination; default e.g. last 50, with client-driven paging.

Apply:

-      const rows = await db.import.findMany({
-        where: { projectId: input.projectId },
-        orderBy: { createdAt: 'desc' },
-      });
+      const rows = await db.import.findMany({
+        where: { projectId: input.projectId },
+        orderBy: { createdAt: 'desc' },
+        take: 50,
+      });
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (2)

206-213: Null-safe provider/type display to work with server-side redaction.

Avoid crashes when config is null.

Apply:

-                      <div>{imp.config.provider}</div>
+                      <div>{imp.config?.provider ?? 'unknown'}</div>
...
-                        {imp.config.type}
+                        {imp.config?.type ?? 'n/a'}

102-111: Don’t surface full internal error details in UI.

Truncate error text to reduce leak risk.

Apply:

-        <Tooltiper
-          content={errorMessage}
+        <Tooltiper
+          content={errorMessage ? String(errorMessage).slice(0, 300) : 'Error'}
           tooltipClassName="max-w-xs break-words"
         >
apps/worker/src/jobs/import.ts (1)

201-207: Use job logger instead of console.log.

Align logging and ensure capture.

Apply:

-        console.log('Generating session IDs', { from });
+        jobLogger.info('Generating session IDs', { from });
packages/importer/src/providers/mixpanel.ts (1)

309-314: Address TODO or document why it's deferred.

The TODO comment indicates incomplete screen view detection logic. This heuristic-based approach may cause incorrect event name mapping for production data.

If this is a temporary workaround, please document when and how it should be removed. Otherwise, do you want me to help implement a more robust solution or open an issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b60dd0e and 1a11373.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (72)
  • .github/workflows/docker-build.yml (3 hunks)
  • apps/api/package.json (1 hunks)
  • apps/start/Dockerfile (2 hunks)
  • apps/start/package.json (2 hunks)
  • apps/start/src/components/integrations/integration-card.tsx (1 hunks)
  • apps/start/src/components/project-mapper.tsx (1 hunks)
  • apps/start/src/components/ui/badge.tsx (1 hunks)
  • apps/start/src/modals/add-import.tsx (1 hunks)
  • apps/start/src/modals/date-ranger-picker.tsx (1 hunks)
  • apps/start/src/modals/index.tsx (2 hunks)
  • apps/start/src/routeTree.gen.ts (11 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1 hunks)
  • apps/start/src/utils/math.ts (1 hunks)
  • apps/start/src/utils/slug.ts (1 hunks)
  • apps/worker/Dockerfile (2 hunks)
  • apps/worker/package.json (3 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • apps/worker/src/jobs/events.incoming-events.test.ts (2 hunks)
  • apps/worker/src/jobs/import.ts (1 hunks)
  • package.json (1 hunks)
  • packages/cli/package.json (0 hunks)
  • packages/cli/src/cli.ts (0 hunks)
  • packages/cli/src/importer/importer.ts (0 hunks)
  • packages/cli/src/importer/index.ts (0 hunks)
  • packages/cli/tsconfig.json (0 hunks)
  • packages/cli/tsup.config.ts (0 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/scripts/get-referrers.ts (1 hunks)
  • packages/common/server/index.ts (1 hunks)
  • packages/common/server/parse-referrer.test.ts (5 hunks)
  • packages/common/server/parse-referrer.ts (4 hunks)
  • packages/common/server/parser-user-agent.ts (2 hunks)
  • packages/common/src/object.ts (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.sql (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.ts (1 hunks)
  • packages/db/index.ts (1 hunks)
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/schema.prisma (2 hunks)
  • packages/db/src/buffers/base-buffer.ts (1 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (8 hunks)
  • packages/db/src/clickhouse/client.ts (2 hunks)
  • packages/db/src/services/event.service.ts (1 hunks)
  • packages/db/src/services/import.service.ts (1 hunks)
  • packages/db/src/services/session.service.ts (1 hunks)
  • packages/db/src/types.ts (2 hunks)
  • packages/importer/package.json (1 hunks)
  • packages/importer/src/base-provider.ts (1 hunks)
  • packages/importer/src/index.ts (1 hunks)
  • packages/importer/src/providers/metadata.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.test.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.ts (1 hunks)
  • packages/importer/src/providers/umami.ts (1 hunks)
  • packages/importer/src/types.ts (1 hunks)
  • packages/importer/tsconfig.json (1 hunks)
  • packages/importer/vitest.config.ts (1 hunks)
  • packages/queue/src/queues.ts (1 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/package.json (1 hunks)
  • packages/trpc/src/root.ts (2 hunks)
  • packages/trpc/src/routers/auth.ts (1 hunks)
  • packages/trpc/src/routers/import.ts (1 hunks)
  • packages/trpc/src/routers/organization.ts (1 hunks)
  • packages/validation/src/index.ts (1 hunks)
💤 Files with no reviewable changes (6)
  • packages/cli/src/cli.ts
  • packages/cli/src/importer/importer.ts
  • packages/cli/tsconfig.json
  • packages/cli/package.json
  • packages/cli/tsup.config.ts
  • packages/cli/src/importer/index.ts
🚧 Files skipped from review as they are similar to previous changes (26)
  • apps/start/src/modals/date-ranger-picker.tsx
  • packages/common/server/index.ts
  • apps/api/package.json
  • apps/start/src/modals/index.tsx
  • packages/db/src/services/event.service.ts
  • packages/trpc/package.json
  • packages/db/src/types.ts
  • packages/common/server/parse-referrer.test.ts
  • apps/start/src/components/ui/badge.tsx
  • apps/start/src/components/integrations/integration-card.tsx
  • apps/start/src/routes/app.$organizationId.$projectId.settings._tabs.tsx
  • packages/db/src/buffers/event-buffer.test.ts
  • packages/common/package.json
  • packages/importer/tsconfig.json
  • packages/db/src/services/session.service.ts
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql
  • packages/common/src/object.ts
  • apps/start/package.json
  • apps/start/src/routeTree.gen.ts
  • packages/db/src/clickhouse/client.ts
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql
  • packages/importer/src/base-provider.ts
  • packages/importer/package.json
  • package.json
  • packages/trpc/src/root.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/utils/slug.ts
  • apps/start/src/components/project-mapper.tsx
  • apps/start/src/modals/add-import.tsx
  • apps/start/src/utils/math.ts
🧬 Code graph analysis (15)
apps/worker/src/boot-workers.ts (2)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
apps/worker/src/jobs/import.ts (1)
  • importJob (25-281)
packages/db/code-migrations/5-add-imports-table.ts (2)
packages/db/code-migrations/helpers.ts (1)
  • getIsCluster (17-24)
packages/db/src/clickhouse/migration.ts (2)
  • createTable (68-116)
  • runClickhouseMigrationCommands (322-454)
packages/queue/src/queues.ts (1)
packages/redis/redis.ts (1)
  • getRedisQueue (93-105)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (4)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)
  • Route (12-34)
packages/importer/src/providers/metadata.ts (1)
  • IMPORT_PROVIDERS (13-30)
apps/public/components/ui/tooltip.tsx (1)
  • Tooltiper (36-51)
apps/start/src/components/integrations/integration-card.tsx (3)
  • IntegrationCard (91-112)
  • IntegrationCardLogoImage (50-69)
  • IntegrationCardFooter (3-15)
packages/trpc/src/routers/import.ts (6)
packages/trpc/src/trpc.ts (2)
  • createTRPCRouter (138-138)
  • protectedProcedure (161-164)
packages/trpc/src/access.ts (1)
  • getProjectAccessCached (4-4)
packages/trpc/src/errors.ts (1)
  • TRPCAccessError (3-7)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/validation/src/index.ts (1)
  • zCreateImport (478-486)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
apps/start/src/components/project-mapper.tsx (2)
packages/db/src/services/project.service.ts (1)
  • IServiceProject (7-7)
apps/start/src/components/ui/select.tsx (5)
  • Select (173-173)
  • SelectTrigger (181-181)
  • SelectValue (182-182)
  • SelectContent (174-174)
  • SelectItem (176-176)
packages/importer/vitest.config.ts (1)
vitest.shared.ts (1)
  • getSharedVitestConfig (4-27)
apps/worker/src/jobs/import.ts (6)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (168-173)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/db/src/services/import.service.ts (10)
  • ImportSteps (671-671)
  • updateImportStatus (673-736)
  • getImportDateBounds (610-634)
  • insertImportBatch (28-55)
  • generateSessionIds (62-98)
  • createSessionsStartEndEvents (106-316)
  • moveImportsToProduction (322-426)
  • backfillSessionsToProduction (428-521)
  • markImportComplete (526-545)
  • getImportProgress (550-605)
packages/db/src/clickhouse/client.ts (2)
  • formatClickhouseDate (208-219)
  • error (40-45)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/importer/src/types.ts (2)
packages/db/src/services/event.service.ts (1)
  • IServiceImportedEventPayload (144-147)
packages/db/src/services/import.service.ts (2)
  • ImportProgress (18-23)
  • ImportStageResult (12-16)
apps/worker/src/index.ts (1)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
packages/importer/src/providers/umami.ts (5)
packages/validation/src/index.ts (1)
  • IUmamiImportConfig (456-456)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
apps/start/src/modals/add-import.tsx (3)
packages/validation/src/index.ts (5)
  • zUmamiImportConfig (452-454)
  • zMixpanelImportConfig (461-470)
  • IUmamiImportConfig (456-456)
  • IMixpanelImportConfig (471-471)
  • IImportConfig (473-476)
apps/start/src/components/project-mapper.tsx (1)
  • ProjectMapper (33-116)
apps/public/lib/utils.ts (1)
  • cn (4-6)
packages/importer/src/providers/mixpanel.test.ts (1)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-441)
packages/db/src/services/import.service.ts (4)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/clickhouse/client.ts (4)
  • ch (131-161)
  • TABLE_NAMES (48-60)
  • convertClickhouseDateToJs (238-240)
  • formatClickhouseDate (208-219)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/importer/src/providers/mixpanel.ts (7)
packages/validation/src/index.ts (1)
  • IMixpanelImportConfig (471-471)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/common/server/parser-user-agent.ts (2)
  • parseUserAgent (72-115)
  • UserAgentInfo (71-71)
packages/common/src/object.ts (2)
  • toDots (3-36)
  • strip (48-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
🔇 Additional comments (29)
.github/workflows/docker-build.yml (1)

149-149: Guard tag creation to main branch only—LGTM.

The conditional guards on all three tag-creation steps (if: github.ref == 'refs/heads/main') correctly restrict tag creation to the main branch. This is good practice for release/deployment management, preventing accidental tags from feature branches while allowing Docker images to be built for all branches. The syntax and consistency across all three jobs are correct.

Also applies to: 210-210, 271-271

packages/common/server/parser-user-agent.ts (1)

71-71: Good addition of type export.

Exporting the return type as UserAgentInfo is a best practice that improves type safety and developer experience for consumers of this function.

packages/redis/package.json (1)

11-11: Verify the intent behind pinning ioredis to an exact version.

Version 5.8.2 is confirmed as valid, latest, and published by the official Redis Labs maintainer. However, this is the only production dependency in the package.json using exact pinning—all others use caret constraints (^) or monorepo references (workspace:*, catalog:). Ensure this pinning is intentional for your queue/job processing stability, rather than an accidental divergence from project conventions. If stability is not a critical concern here, consider aligning with the project's versioning strategy.

packages/db/src/buffers/base-buffer.ts (1)

1-1: Import-path update is consistent with the monorepo re-exports.

For safety, run the same re-export check script shared in organization.ts to ensure no accidental duplicate exports.

packages/trpc/src/routers/auth.ts (1)

15-15: Import-path change LGTM.

packages/trpc/src/routers/organization.ts (1)

14-14: No issues found; import change is correct.

Re-export verified: generateSecureId is defined once in packages/common/server/id.ts and properly re-exported. The import path consolidation to @openpanel/common/server is consistent with existing patterns across the codebase (already used in auth.ts, db/buffers, and other files). Tree-shaking will work correctly.

apps/worker/src/jobs/events.incoming-events.test.ts (1)

102-102: LGTM! Test expectations correctly updated.

The referrerType expectations now correctly use an empty string instead of 'unknown', aligning with the updated behavior in packages/common/server/parse-referrer.ts where the default type for non-matches changed from 'unknown' to ''.

Also applies to: 210-210

apps/start/src/utils/math.ts (1)

1-1: LGTM! Re-export now uses package root.

The change from '@openpanel/common/src/math' to '@openpanel/common' aligns with the pattern of consuming the public API surface from package roots rather than internal paths.

packages/db/code-migrations/5-add-imports-table.sql (1)

1-43: LGTM! Table schema is well-structured.

The events_imports table schema provides comprehensive event tracking with import-specific metadata. The 7-day TTL on imported_at_meta provides automatic cleanup for staging data. Note that in clustered deployments, the TTL should be applied to the replicated table rather than the distributed table (see related comment on the .ts migration file).

apps/start/src/utils/slug.ts (1)

1-1: LGTM! Re-export now uses package root.

Consistent with the broader pattern of consuming the public API surface from '@openpanel/common' rather than internal paths.

packages/common/server/parse-referrer.ts (1)

1-3: LGTM! Referrer parsing logic improved.

The changes improve consistency and efficiency:

  • Default type changed from 'unknown' to empty string for unmatched referrers
  • Source is now lowercased once during computation rather than repeatedly during comparisons
  • Updated matching logic is cleaner and more maintainable

Test files correctly reflect these behavioral changes.

Also applies to: 23-23, 35-51, 63-63

packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1)

1-2: LGTM!

The migration correctly adds a nullable statusMessage column to track detailed import status information.

packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1)

1-2: LGTM!

The migration correctly makes currentStep nullable, which is appropriate for representing imports that haven't started or tracking optional step progression.

apps/worker/package.json (2)

41-41: Verify intentional removal of version range flexibility.

The tsdown dependency was changed from ^0.14.2 to 0.14.2 (caret removed), which prevents automatic minor/patch updates. This appears unrelated to the importer feature.

Is this pinning intentional for reproducibility, or was it accidental? If unintentional, consider restoring the caret:

-    "tsdown": "0.14.2",
+    "tsdown": "^0.14.2",

22-22: LGTM!

Adding the @openpanel/importer dependency correctly wires the worker to process import jobs.

packages/db/index.ts (1)

25-25: LGTM!

The export correctly surfaces the new import service, consistent with other service exports in this file.

apps/worker/src/index.ts (1)

8-8: LGTM!

The importQueue is correctly imported and added to the Bull Board UI, enabling monitoring of import jobs alongside existing queues.

Also applies to: 42-42

apps/worker/Dockerfile (2)

38-38: LGTM: workspace caching for importer.

Including packages/importer/package.json improves pnpm cache hits.


85-87: LGTM: importer copied into runner image.

Runtime availability is ensured; ordering with logger/common is fine.

packages/db/prisma/migrations/20251018214030_fix/migration.sql (1)

13-13: Migration is safe—config validation requires the field.

The config field in zCreateImport is mandatory (a union of three required config types with no .optional()), and the single import creation path always provides it from validated input. Since the column remains NOT NULL, dropping the DEFAULT poses no risk of introducing NULLs.

apps/worker/src/jobs/events.incoming-event.ts (2)

8-12: Consolidation to @openpanel/common/server looks good.

Shared parser usage reduces drift.


120-123: Referrer precedence change: confirm downstream expectations.

Now referrer/url is only from parsed referrer (not UTM), and type prioritizes referrer over UTM. Validate dashboards/segments relying on previous precedence (tests adjusted, but product semantics matter).

packages/importer/src/providers/mixpanel.ts (1)

401-440: LGTM!

The property stripping and string coercion logic correctly handles edge cases including null/undefined values and nested objects. The regex pattern effectively filters Mixpanel-specific properties.

packages/db/src/services/import.service.ts (6)

28-55: LGTM!

The batch insertion logic correctly handles empty arrays, adds required import metadata, and leverages ClickHouse's JSONEachRow format for efficient bulk inserts.


62-98: LGTM!

The deterministic session ID generation using MD5 hashing of device_id and 30-minute time windows ensures consistent results across imports. The use of mutations_sync: '2' properly ensures replication before proceeding.


322-426: LGTM!

The migration logic correctly handles session events with adjusted timestamps by including them via session_id lookup rather than strict date filtering. The progress headers configuration prevents connection timeouts for large imports.


526-545: LGTM!

The completion marking correctly uses mutations_sync: '2' to ensure all replicas are updated before the function returns, preventing race conditions.


610-634: LGTM!

The date bounds query correctly handles the optional fromCreatedAt parameter and provides sensible defaults for empty result sets.


673-736: Well-structured status update flow.

The step-based union type provides excellent type safety, ensuring each step can only include valid fields. The status message generation is clear and user-friendly.

Note: The Postgres update at the end (lines 732-735) occurs after the job progress update. If the database update fails, the job progress will be out of sync with the database state. Consider whether this ordering is intentional or if database-first updates would be more appropriate.

COPY packages/trpc/package.json packages/trpc/
COPY packages/json/package.json packages/json/
COPY packages/common/package.json packages/common/
COPY packages/importer/package.json packages/importer/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Ensure @openpanel/importer is either available at runtime or not linked.

You add packages/importer/package.json for install caching, but the runner image does not copy packages/importer. If apps/start does any value import (not type-only) of @openpanel/importer, PNPM’s workspace symlink will point to a missing path at runtime.

  • Option A: If only type imports are used, keep as-is.
  • Option B: If runtime imports exist, also copy the importer sources into runner:
+# In runner stage, alongside other packages
+COPY --from=build /app/packages/importer ./packages/importer

Run to confirm no value imports from apps/start:

Also applies to: 89-89


🏁 Script executed:

#!/bin/bash
# Find imports of @openpanel/importer in Start; flag non-type imports
rg -nP --type=ts --type=tsx -C2 "from ['\"]@openpanel/importer['\"]" apps/start \
  | grep -v "import type" || echo "No value imports found."

Length of output: 51


I need to re-run the verification with a corrected command, as the file type flag issue makes the result inconclusive.


---

🏁 Script executed:

```shell
#!/bin/bash
# Broader search: find any imports of @openpanel/importer in apps/start
# without file type restrictions to avoid flag parsing issues
echo "=== Searching for @openpanel/importer imports ==="
rg -n "@openpanel/importer" apps/start || echo "No matches found."

echo ""
echo "=== Directory structure of apps/start ==="
find apps/start -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \) | head -20

Length of output: 1250


Need to verify if this is the only import and search more thoroughly:


---

🏁 Script executed:

```shell
#!/bin/bash
# Find all imports/requires of @openpanel/importer in apps/start
echo "=== All references to @openpanel/importer ==="
rg -n "@openpanel/importer" apps/start -A2 -B2

echo ""
echo "=== Count of files with @openpanel/importer imports ==="
rg -l "@openpanel/importer" apps/start | wc -l

Length of output: 1172


Copy @openpanel/importer sources to runner stage to fix broken workspace symlink at runtime.

Apps/start has a value import (import { IMPORT_PROVIDERS } from '@openpanel/importer/providers') in apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx:22. The Dockerfile copies packages/importer/package.json for caching but omits the source files in the runner stage, causing the workspace symlink to point to a missing path at runtime.

Copy the importer sources in the runner stage:

+# In runner stage, alongside other packages
+COPY --from=build /app/packages/importer ./packages/importer

Also applies to: line 89

🤖 Prompt for AI Agents
In apps/start/Dockerfile around lines 32 and 89, the Dockerfile only copies
packages/importer/package.json which leaves the @openpanel/importer source files
absent in the runner stage and breaks the workspace symlink at runtime; fix by
copying the importer sources into the runner stage (either COPY
packages/importer/ packages/importer/ in the runner stage or COPY --from=builder
/app/packages/importer/ /app/packages/importer/ if the builder stage contains
the sources/artifacts) so the import path @openpanel/importer resolves at
runtime.

Comment on lines +35 to +47
const jobLogger = logger.child({
importId,
config: record.config,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Logger context leaks config secrets.

Avoid attaching full config to logs; include non-sensitive fields only.

Apply:

-  const jobLogger = logger.child({
-    importId,
-    config: record.config,
-  });
+  const jobLogger = logger.child({
+    importId,
+    provider: record.config?.provider,
+  });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const jobLogger = logger.child({
importId,
config: record.config,
});
const jobLogger = logger.child({
importId,
provider: record.config?.provider,
});
🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around lines 35 to 38, the logger child is
attaching the entire record.config which may contain secrets; replace that with
a whitelist of non-sensitive fields (e.g., config.name, config.type,
config.source, config.region, config.bucket or other known safe metadata) or
explicitly redact sensitive keys before passing to logger.child — do not log the
full config object; update the jobLogger creation to only include importId plus
the selected safe config fields (or a redact/mask function) so secrets are never
emitted to logs.

Comment on lines 124 to 188
const eventBatch: any = [];
for await (const rawEvent of providerInstance.parseSource(
resumeLoadingFrom,
)) {
// Validate event
if (
!providerInstance.validate(
// @ts-expect-error
rawEvent,
)
) {
jobLogger.warn('Skipping invalid event', { rawEvent });
continue;
}

eventBatch.push(rawEvent);

// Process batch when it reaches the batch size
if (eventBatch.length >= BATCH_SIZE) {
jobLogger.info('Processing batch', { batchSize: eventBatch.length });

const transformedEvents: IClickhouseEvent[] = eventBatch.map(
(
// @ts-expect-error
event,
) => providerInstance!.transformEvent(event),
);

await insertImportBatch(transformedEvents, importId);

processedEvents += eventBatch.length;
eventBatch.length = 0;

const createdAt = new Date(transformedEvents[0]?.created_at || '')
.toISOString()
.split('T')[0];

await updateImportStatus(jobLogger, job, importId, {
step: 'loading',
batch: createdAt,
totalEvents,
processedEvents,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Harden batch transform; avoid crashes on empty/failed transforms.

  • Use unknown[] buffer.
  • Guard per-event transform with try/catch.
  • Safe createdAt derivation.

Apply:

-      const eventBatch: any = [];
+      const eventBatch: unknown[] = [];
@@
-        if (eventBatch.length >= BATCH_SIZE) {
+        if (eventBatch.length >= BATCH_SIZE) {
           jobLogger.info('Processing batch', { batchSize: eventBatch.length });
-
-          const transformedEvents: IClickhouseEvent[] = eventBatch.map(
-            (
-              // @ts-expect-error
-              event,
-            ) => providerInstance!.transformEvent(event),
-          );
+          const transformedEvents: IClickhouseEvent[] = [];
+          for (const ev of eventBatch) {
+            try {
+              transformedEvents.push(
+                providerInstance!.transformEvent(ev as any),
+              );
+            } catch (e) {
+              jobLogger.warn('Skipping event on transform failure', {
+                provider: providerInstance.provider,
+                reason: e instanceof Error ? e.message : String(e),
+              });
+            }
+          }
 
           await insertImportBatch(transformedEvents, importId);
 
           processedEvents += eventBatch.length;
           eventBatch.length = 0;
 
-          const createdAt = new Date(transformedEvents[0]?.created_at || '')
-            .toISOString()
-            .split('T')[0];
+          const firstCreatedAt = transformedEvents[0]?.created_at;
+          const createdAt = firstCreatedAt
+            ? firstCreatedAt.split('T')[0]!
+            : formatClickhouseDate(new Date(), true);
 
           await updateImportStatus(jobLogger, job, importId, {
             step: 'loading',
             batch: createdAt,
             totalEvents,
             processedEvents,
           });

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 170 to 216
// Process remaining events in the last batch
if (eventBatch.length > 0) {
const transformedEvents = eventBatch.map(
(
// @ts-expect-error
event,
) => providerInstance!.transformEvent(event),
);

await insertImportBatch(transformedEvents, importId);

processedEvents += eventBatch.length;
eventBatch.length = 0;

const createdAt = new Date(transformedEvents[0]?.created_at || '')
.toISOString()
.split('T')[0];

await updateImportStatus(jobLogger, job, importId, {
step: 'loading',
batch: createdAt,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Repeat: harden final partial batch and persist counters.

Mirror guarded transform and include totals in status update.

Apply:

-      if (eventBatch.length > 0) {
-        const transformedEvents = eventBatch.map(
-          (
-            // @ts-expect-error
-            event,
-          ) => providerInstance!.transformEvent(event),
-        );
+      if (eventBatch.length > 0) {
+        const transformedEvents: IClickhouseEvent[] = [];
+        for (const ev of eventBatch) {
+          try {
+            transformedEvents.push(providerInstance!.transformEvent(ev as any));
+          } catch (e) {
+            jobLogger.warn('Skipping event on transform failure', {
+              provider: providerInstance.provider,
+              reason: e instanceof Error ? e.message : String(e),
+            });
+          }
+        }
 
         await insertImportBatch(transformedEvents, importId);
 
         processedEvents += eventBatch.length;
         eventBatch.length = 0;
 
-        const createdAt = new Date(transformedEvents[0]?.created_at || '')
-          .toISOString()
-          .split('T')[0];
+        const firstCreatedAt = transformedEvents[0]?.created_at;
+        const createdAt = firstCreatedAt
+          ? firstCreatedAt.split('T')[0]!
+          : formatClickhouseDate(new Date(), true);
 
         await updateImportStatus(jobLogger, job, importId, {
           step: 'loading',
           batch: createdAt,
+          totalEvents,
+          processedEvents,
         });
       }
🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around lines 170 to 192, the final partial
batch is processed without the same guards and error handling used earlier and
the status update omits processed/total counters; update the block to perform
the guarded transform exactly like the earlier batch processing (check
providerInstance exists, handle per-event transform errors or wrap transform in
try/catch and skip/collect failures), ensure transformedEvents is computed only
if providerInstance is present and has valid created_at before deriving
createdAt, and include the processedEvents and total/expected total (e.g.,
totalEvents) in the call to updateImportStatus so counters are persisted
alongside step and batch.

Comment on lines +8 to +14
-- CreateEnum
CREATE TYPE "public"."ImportStatus" AS ENUM ('pending', 'processing', 'completed', 'failed');

-- AlterTable
ALTER TABLE "public"."imports" DROP COLUMN "status",
ADD COLUMN "status" "public"."ImportStatus" NOT NULL,
ALTER COLUMN "currentStep" SET NOT NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Destructive migration: status drop/recreate will lose data; currentStep NOT NULL conflicts with code.

  • Map TEXT → enum safely; do not drop without backfill.
  • Keep currentStep nullable; worker import.ts uses null to detect new imports.

Safe migration example:

--- a/packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql
+++ b/packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql
 CREATE TYPE "public"."ImportStatus" AS ENUM ('pending', 'processing', 'completed', 'failed');

-ALTER TABLE "public"."imports" DROP COLUMN "status",
-ADD COLUMN     "status" "public"."ImportStatus" NOT NULL,
-ALTER COLUMN "currentStep" SET NOT NULL;
+-- 1) Add new enum column with default to preserve existing rows
+ALTER TABLE "public"."imports"
+  ADD COLUMN "status_v2" "public"."ImportStatus" NOT NULL DEFAULT 'pending';
+
+-- 2) Map old text values to enum; fall back to 'pending' for unknowns
+UPDATE "public"."imports"
+SET "status_v2" = CASE lower(coalesce("status", 'pending'))
+  WHEN 'pending' THEN 'pending'::"public"."ImportStatus"
+  WHEN 'processing' THEN 'processing'::"public"."ImportStatus"
+  WHEN 'completed' THEN 'completed'::"public"."ImportStatus"
+  WHEN 'failed' THEN 'failed'::"public"."ImportStatus"
+  ELSE 'pending'::"public"."ImportStatus"
+END;
+
+-- 3) Drop old column and rename
+ALTER TABLE "public"."imports" DROP COLUMN "status";
+ALTER TABLE "public"."imports" RENAME COLUMN "status_v2" TO "status";
+ALTER TABLE "public"."imports" ALTER COLUMN "status" DROP DEFAULT;
+
+-- 4) Keep currentStep nullable (code relies on NULL to denote new import)
+-- (Remove the SET NOT NULL)

Also, consider wrapping in a transaction if your migration runner allows it.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- CreateEnum
CREATE TYPE "public"."ImportStatus" AS ENUM ('pending', 'processing', 'completed', 'failed');
-- AlterTable
ALTER TABLE "public"."imports" DROP COLUMN "status",
ADD COLUMN "status" "public"."ImportStatus" NOT NULL,
ALTER COLUMN "currentStep" SET NOT NULL;
-- CreateEnum
CREATE TYPE "public"."ImportStatus" AS ENUM ('pending', 'processing', 'completed', 'failed');
-- AlterTable
-- 1) Add new enum column with default to preserve existing rows
ALTER TABLE "public"."imports"
ADD COLUMN "status_v2" "public"."ImportStatus" NOT NULL DEFAULT 'pending';
-- 2) Map old text values to enum; fall back to 'pending' for unknowns
UPDATE "public"."imports"
SET "status_v2" = CASE lower(coalesce("status", 'pending'))
WHEN 'pending' THEN 'pending'::"public"."ImportStatus"
WHEN 'processing' THEN 'processing'::"public"."ImportStatus"
WHEN 'completed' THEN 'completed'::"public"."ImportStatus"
WHEN 'failed' THEN 'failed'::"public"."ImportStatus"
ELSE 'pending'::"public"."ImportStatus"
END;
-- 3) Drop old column and rename
ALTER TABLE "public"."imports" DROP COLUMN "status";
ALTER TABLE "public"."imports" RENAME COLUMN "status_v2" TO "status";
ALTER TABLE "public"."imports" ALTER COLUMN "status" DROP DEFAULT;
-- 4) Keep currentStep nullable (code relies on NULL to denote new import)
-- (Remove the SET NOT NULL)
🤖 Prompt for AI Agents
In packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql
around lines 8 to 14, the migration currently drops and recreates the "status"
column (which will lose data) and forces "currentStep" to NOT NULL (which
conflicts with import worker logic); instead, create the new enum type, add a
new nullable enum column (e.g. status_new), backfill status_new from the
existing text status using a safe mapping, then drop the old text column and
rename status_new to status (or use ALTER COLUMN TYPE after backfill), and leave
currentStep nullable (do not set NOT NULL). Wrap the whole sequence in a
transaction if your runner supports it and ensure any NOT NULL constraints are
only applied after backfilling.

const event = JSON.parse(line);
yield event;
} catch (error) {
console.warn('Failed to parse Mixpanel event:', line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use configured logger instead of console.warn.

Lines 134 and 146 use console.warn, bypassing the structured logger passed to the constructor. This creates inconsistent logging and prevents proper log aggregation and filtering.

Apply this diff to use the configured logger:

         for (const line of lines) {
           if (line.trim()) {
             try {
               const event = JSON.parse(line);
               yield event;
             } catch (error) {
-              console.warn('Failed to parse Mixpanel event:', line);
+              this.logger?.warn('Failed to parse Mixpanel event', { line, error });
             }
           }
         }
       if (buffer.trim()) {
         try {
           const event = JSON.parse(buffer);
           yield event;
         } catch (error) {
-          console.warn('Failed to parse final Mixpanel event:', buffer);
+          this.logger?.warn('Failed to parse final Mixpanel event', { buffer, error });
         }
       }

Also applies to: 146-146

🤖 Prompt for AI Agents
In packages/importer/src/providers/mixpanel.ts around lines 134 and 146, replace
the direct console.warn calls with the instance's configured logger (e.g.,
this.logger.warn or the logger variable passed into the constructor) so logging
uses the structured logger; update both occurrences to call the logger with the
same message and include the event/line and any parse error details, preserving
message clarity and log level.

Comment on lines +202 to +205
const userAgent = props.osVersion || '';
const uaInfo = this.isWebEvent(props.mp_lib)
? parseUserAgent(userAgent, props)
: this.parseServerDeviceInfo(props);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect user agent assignment.

Line 202 assigns props.osVersion to the userAgent variable, but OS version is not the same as the user agent string. The user agent should be extracted from a property like props.$user_agent or similar.

Verify the correct Mixpanel property name for the user agent string and apply the fix:

-    const userAgent = props.osVersion || '';
+    const userAgent = props.$user_agent || '';
     const uaInfo = this.isWebEvent(props.mp_lib)
       ? parseUserAgent(userAgent, props)
       : this.parseServerDeviceInfo(props);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/importer/src/providers/mixpanel.ts around lines 202 to 205, the code
incorrectly assigns props.osVersion to userAgent; replace this with the Mixpanel
user agent property (e.g., props.$user_agent or the correct property name from
Mixpanel events), fall back to an empty string or props.osVersion if user agent
is not present, and pass that userAgent into parseUserAgent; ensure the
isWebEvent branch uses the actual UA string and not the OS version.

Comment on lines +288 to +279
os: uaInfo.os || props.$os,
os_version: uaInfo.osVersion || props.$osVersion,
browser: uaInfo.browser || props.$browser,
browser_version:
uaInfo.browserVersion || props.$browserVersion
? String(props.$browser_version)
: '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Inconsistent property naming may cause data loss.

Lines 289 and 292-293 use props.$osVersion and props.$browser_version respectively, but Mixpanel typically uses underscores (e.g., $os_version). This inconsistency may result in missing data if the wrong property names are used.

Verify the correct Mixpanel property naming convention:


🌐 Web query:

What are the correct Mixpanel event property names for OS version and browser version?

💡 Result:

Use Mixpanel's default property names:

  • OS name: $os — OS version: $os_version. [1]
  • Browser name: $browser — Browser version: $browser_version. [1]

Source: Mixpanel Default/Property reference. [1]


Fix inconsistent Mixpanel property names to prevent data loss.

The code uses camelCase property names (props.$osVersion, props.$browserVersion) but Mixpanel's standard property names are $os_version and $browser_version (snake_case). Additionally, lines 292-294 check props.$browserVersion in the condition but access props.$browser_version in the ternary, creating a logic inconsistency. Update both lines 289 and 292 to use the correct snake_case property names to ensure data is properly captured.

🤖 Prompt for AI Agents
In packages/importer/src/providers/mixpanel.ts around lines 288 to 294, the
Mixpanel property names are inconsistent (using camelCase props.$osVersion and
props.$browserVersion) and the ternary condition checks a different prop than it
accesses, which can drop data; change references to use Mixpanel's snake_case
names (props.$os_version and props.$browser_version) and make the ternary check
and the value access use the same snake_case prop, converting to string as
needed (e.g., String(props.$browser_version)) so os_version and browser_version
are populated correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (29)
packages/validation/src/index.ts (1)

478-486: This issue has already been flagged in a previous review.

The concern about provider and config mismatches remains unaddressed. Please refer to the previous review comment which provides a discriminated union solution to enforce that each provider literal is paired with its correct config shape.

packages/db/prisma/migrations/20251028152531_imports_again/migration.sql (1)

1-4: SQL migration is correct.

The migration correctly alters the currentBatch column to TEXT nullable without a default. The TypeScript type mismatch identified in the past review comment (lines 1-4) should be addressed separately in packages/importer/src/types.ts.

packages/db/prisma/migrations/20251018214030_fix/migration.sql (1)

10-13: Critical: Data migration missing before dropping columns.

As identified in the past review comment (lines 10-12), this migration drops provider, sourceLocation, and sourceType columns without first migrating their data to the config JSON field. This will cause data loss for existing import records and break the worker's createProvider function which expects config.provider.

packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1)

8-14: Destructive migration: status drop/recreate will lose data; currentStep NOT NULL conflicts with code.

This migration drops and recreates the status column, which will lose existing data. Additionally, forcing currentStep to NOT NULL conflicts with the schema definition (line 489 in schema.prisma shows currentStep String?) and worker code that relies on null values to detect new imports.

The safe migration approach is to:

  1. Add a new enum column with a default value
  2. Backfill from the existing text column with proper mapping
  3. Drop the old column and rename
  4. Keep currentStep nullable

Refer to the detailed migration example in the existing past review comment.

apps/start/src/components/project-mapper.tsx (2)

23-31: Define a proper form type instead of using any.

The interface uses any for all generic type parameters, defeating TypeScript's type safety. Define an explicit form schema type to enable proper type checking and autocomplete.

Apply the diff suggested in the past review comment to add proper typing with a ProjectMapperFormData interface.


92-92: Remove the arbitrary 10-project limit.

Line 92 limits the dropdown to only the first 10 projects, preventing users with more projects from mapping to those beyond the limit. The Select component should display all available projects with scrolling.

Remove .slice(0, 10) to show all projects, or implement search/pagination if performance is a concern.

packages/importer/package.json (2)

5-5: Main entry point references source instead of build output.

The main field points to src/index.ts, but consumers will need the compiled JavaScript output. This should point to ./dist/index.js.

Update to:

-  "main": "src/index.ts",
+  "main": "./dist/index.js",

Also ensure "types" field points to the compiled declaration file.


21-22: Review CSV library usage and align Ramda versions.

Two concerns:

  1. Only csv-parse is listed (csv-parser is not present), so this may be fine
  2. ramda is at ^0.29.1 but @types/ramda is at ^0.31.1 - version mismatch may cause type errors

Update ramda to match its types:

-    "ramda": "^0.29.1",
+    "ramda": "^0.31.1",
packages/db/prisma/schema.prisma (1)

479-498: Inconsistency: currentStep nullable in schema but NOT NULL in migration.

Line 489 defines currentStep as String? (optional), but migration 20251028150123_fix_imports_table/migration.sql sets it to NOT NULL. This mismatch will cause runtime issues.

Align schema and migration: either make it required in schema (currentStep String) or remove the NOT NULL constraint from the migration. Based on the worker code needing null to detect new imports, keeping it nullable is recommended.

packages/db/code-migrations/5-add-imports-table.ts (1)

74-85: Fix __filename usage in ESM context.

The package is configured as ESM ("type": "module"), and __filename is undefined in ESM. Other migration files (2-accounts.ts, 3-init-ch.ts) already use the correct pattern.

Import and use fileURLToPath:

+import { fileURLToPath } from 'node:url';
+
+const __filename = fileURLToPath(import.meta.url);
+
 export async function up() {
packages/importer/src/providers/mixpanel.test.ts (1)

39-127: Test masks mapping bugs in provider (browser/os version).

The test expects browser_version: '' and os_version: undefined even though the rawEvent contains $browser_version: 135 and $os: 'Android'. This masks a bug in the provider's field mapping logic.

Tighten the test assertions to expect the actual values:

-      os_version: undefined,
-      browser_version: '',
+      os_version: 'Android',  // or appropriate parsed version
+      browser_version: '135',

Then fix the provider (packages/importer/src/providers/mixpanel.ts) to correctly map $browser_version and $os_version fields, handling both snake_case and camelCase variants.

packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1)

1-3: Ensure gen_random_uuid() is available.

The migration uses gen_random_uuid() without ensuring the pgcrypto extension is enabled, which will cause the migration to fail on databases where pgcrypto isn't already available.

Add the extension check before the table creation:

+-- Enable required extensions
+CREATE EXTENSION IF NOT EXISTS pgcrypto;
+
 -- CreateTable
 CREATE TABLE "public"."imports" (
packages/importer/src/providers/mixpanel.ts (4)

128-147: Route parse errors through the provider logger

console.warn bypasses our structured logger and can disappear in production. Use this.logger?.warn (with the error and raw line) instead.

Apply this diff:

-              console.warn('Failed to parse Mixpanel event:', line);
+              this.logger?.warn('Failed to parse Mixpanel event', {
+                line,
+                error,
+              });
@@
-          console.warn('Failed to parse final Mixpanel event:', buffer);
+          this.logger?.warn('Failed to parse final Mixpanel event', {
+            buffer,
+            error,
+          });

85-91: Remove service account from logs

Logging serviceAccount discloses a credential identifier. Drop it (or mask it) from the metadata.

Apply this diff:

     this.logger?.info('Fetching events from Mixpanel', {
       url: `${url}?${params}`,
       from,
       to,
       projectId,
-      serviceAccount,
     });

201-205: Parse UA from $user_agent, not osVersion

props.osVersion is not a user-agent string, so every web event ends up with blank UA/device data. Prefer $user_agent (fallback to empty string) before feeding parseUserAgent.

Apply this diff:

-    const userAgent = props.osVersion || '';
+    const userAgent =
+      typeof props.$user_agent === 'string' && props.$user_agent
+        ? props.$user_agent
+        : '';

273-279: Fix Mixpanel property names and browser version fallback

Mixpanel uses snake_case ($os_version, $browser_version), and the current ternary always stringifies $browser_version even when undefined, yielding 'undefined'. Use the correct keys and direct fallbacks.

Apply this diff:

-      os: uaInfo.os || props.$os,
-      os_version: uaInfo.osVersion || props.$osVersion,
-      browser: uaInfo.browser || props.$browser,
-      browser_version:
-        uaInfo.browserVersion || props.$browserVersion
-          ? String(props.$browser_version)
-          : '',
+      os: uaInfo.os || props.$os || '',
+      os_version: uaInfo.osVersion || props.$os_version || '',
+      browser: uaInfo.browser || props.$browser || '',
+      browser_version:
+        uaInfo.browserVersion || props.$browser_version || '',
apps/worker/src/jobs/import.ts (5)

35-38: Redact import configuration in logs

record.config can carry API keys and secrets; logging it verbatim leaks credentials. Please log only non-sensitive identifiers such as provider (and explicitly redacted fields) before fanning out to child logs.

Apply this diff:

-  const jobLogger = logger.child({
-    importId,
-    config: record.config,
-  });
+  const jobLogger = logger.child({
+    importId,
+    provider: record.config?.provider,
+  });

134-142: Do not log raw provider events

Invalid events are currently dumped to the logger, exposing full payloads (PII, tokens, etc.). Emit only minimal context (provider + reason) so we stay compliant while still highlighting the failure.

Apply this diff:

-      if (
-        !providerInstance.validate(
-          // @ts-expect-error
-          rawEvent,
-        )
-      ) {
-        jobLogger.warn('Skipping invalid event', { rawEvent });
+      if (!providerInstance.validate(rawEvent as never)) {
+        jobLogger.warn('Skipping invalid event', {
+          provider: providerInstance.provider,
+          reason: 'validation_failed',
+        });
         continue;
       }

128-198: Harden batch transforms to avoid import aborts

A single transformEvent exception currently tears down the job, and we over-count processed rows even when transforms fail. Buffer the batch as unknown[], wrap each transform in try/catch, log failures, and increment processedEvents based on the successfully transformed rows only (both for full and tail batches).

Apply this diff:

-      const eventBatch: any = [];
+      const eventBatch: unknown[] = [];
@@
-          const transformedEvents: IClickhouseEvent[] = eventBatch.map(
-            (
-              // @ts-expect-error
-              event,
-            ) => providerInstance!.transformEvent(event),
-          );
+          const transformedEvents: IClickhouseEvent[] = [];
+          for (const raw of eventBatch) {
+            try {
+              transformedEvents.push(
+                providerInstance.transformEvent(
+                  raw as Parameters<typeof providerInstance.transformEvent>[0],
+                ),
+              );
+            } catch (error) {
+              jobLogger.warn('Skipping event on transform failure', {
+                provider: providerInstance.provider,
+                reason: error instanceof Error ? error.message : String(error),
+              });
+            }
+          }
+          if (transformedEvents.length === 0) {
+            eventBatch.length = 0;
+            continue;
+          }
 
           await insertImportBatch(transformedEvents, importId);
 
-          processedEvents += eventBatch.length;
+          processedEvents += transformedEvents.length;
@@
-      if (eventBatch.length > 0) {
-        const transformedEvents = eventBatch.map(
-          (
-            // @ts-expect-error
-            event,
-          ) => providerInstance!.transformEvent(event),
-        );
+      if (eventBatch.length > 0) {
+        const transformedEvents: IClickhouseEvent[] = [];
+        for (const raw of eventBatch) {
+          try {
+            transformedEvents.push(
+              providerInstance.transformEvent(
+                raw as Parameters<typeof providerInstance.transformEvent>[0],
+              ),
+            );
+          } catch (error) {
+            jobLogger.warn('Skipping event on transform failure', {
+              provider: providerInstance.provider,
+              reason: error instanceof Error ? error.message : String(error),
+            });
+          }
+        }
+        if (transformedEvents.length === 0) {
+          eventBatch.length = 0;
+          continue;
+        }
 
         await insertImportBatch(transformedEvents, importId);
 
-        processedEvents += eventBatch.length;
+        processedEvents += transformedEvents.length;

194-197: Include counters in the final loading update

The tail-batch status update omits totalEvents and processedEvents, leaving Postgres stale for the UI. Pass the counters just like the earlier batches.

Apply this diff:

         await updateImportStatus(jobLogger, job, importId, {
           step: 'loading',
           batch: createdAt,
+          totalEvents,
+          processedEvents,
         });

250-269: Persist final totals after completion

After markImportComplete, Postgres still holds outdated processedEvents/totalEvents. Fetch the final ClickHouse progress and write those numbers back (either via updateImportStatus or a direct db.import.update) so the UI reflects 100 % completion.

Apply this diff:

-    await updateImportStatus(jobLogger, job, importId, {
-      step: 'completed',
-    });
+    await updateImportStatus(jobLogger, job, importId, {
+      step: 'completed',
+      totalEvents,
+      processedEvents,
+    });
@@
-    const finalProgress = await getImportProgress(importId);
+    const finalProgress = await getImportProgress(importId);
+    await db.import.update({
+      where: { id: importId },
+      data: {
+        totalEvents: finalProgress.totalEvents,
+        processedEvents: finalProgress.insertedEvents,
+      },
+    });
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1)

265-275: Redact sensitive config before displaying

imp.config frequently contains API keys and secrets; dumping it straight into the tooltip leaks credentials. Sanitize (or redact) known secret fields before stringifying, and/or restrict the tooltip to privileged roles.

Apply this diff (or similar):

-                      content={
-                        <pre className="font-mono text-sm leading-normal whitespace-pre-wrap break-words">
-                          {JSON.stringify(imp.config, null, 2)}
-                        </pre>
-                      }
+                      content={
+                        <pre className="font-mono text-sm leading-normal whitespace-pre-wrap break-words">
+                          {JSON.stringify(redactImportConfig(imp.config), null, 2)}
+                        </pre>
+                      }

Add a helper such as:

function redactImportConfig(config: Record<string, unknown>) {
  const masked = { ...config };
  for (const key of ['apiKey', 'apiSecret', 'serviceAccount', 'serviceSecret', 'token', 'secret']) {
    if (masked[key]) masked[key] = '***';
  }
  return masked;
}
apps/start/src/modals/add-import.tsx (1)

246-248: Use a generic failure message

Surfacing error.message can leak upstream credentials or debug details. Replace it with a neutral user-facing string and keep the sensitive message server-side.

Apply this diff:

-      onError: (error) => {
-        toast.error('Import failed', {
-          description: error.message,
-        });
+      onError: () => {
+        toast.error('Import failed', {
+          description: 'We could not queue the import. Please try again or contact support.',
+        });
packages/importer/src/types.ts (1)

53-66: EventProperties allows nested objects, but ClickHouse properties is primitives-only.

EventProperties permits Record<string, unknown> (line 61) and defines __query as a nested object (line 62). However, IClickhouseEvent.properties expects Record<string, string | number | boolean | undefined | null>. Storing objects will break ingestion unless stringified.

Consider:

  • Constrain values to primitives: string | number | boolean | null | undefined
  • Or require JSON stringification for nested data (e.g., __queryJson: string)

Apply this diff to constrain to primitives:

 export interface EventProperties {
   [key: string]:
     | string
     | number
     | boolean
     | null
-    | undefined
-    | Record<string, unknown>;
-  __query?: Record<string, unknown>;
+    | undefined;
   __title?: string;
   __screen?: string;
   __language?: string;
 }
packages/db/src/services/import.service.ts (5)

1-12: Missing import for crypto.randomUUID().

Lines 272 and 309 call crypto.randomUUID(), but randomUUID is not imported from node:crypto.

Add the import:

 import type { ILogger } from '@openpanel/logger';
+import { randomUUID } from 'node:crypto';
 import sqlstring from 'sqlstring';

Then replace usage:

-      id: crypto.randomUUID(),
+      id: randomUUID(),

262-288: Risk of timestamp collisions with existing events.

Adjusting session boundaries by ±1 second (line 288: -1000ms, line 325: +500ms) could create timestamp conflicts if real events exist at those exact timestamps, leading to non-deterministic ordering.

Consider:

  1. Use sub-millisecond offsets (±1ms) to reduce collision probability
  2. Add an is_synthetic flag and use (created_at, is_synthetic) for ordering
  3. Use ClickHouse's native session reconstruction capabilities
-     created_at: adjustTimestamp(session.first_timestamp, -1000),
+     created_at: adjustTimestamp(session.first_timestamp, -1),

553-554: Inconsistent parameter handling.

Lines 553-554 use sqlstring.escape() for inline substitution, while the rest of the file uses ClickHouse's query_params mechanism (e.g., lines 122, 449). This inconsistency reduces maintainability.

Refactor to use query_params:

     WHERE 
-      e.import_id = ${sqlstring.escape(importId)}
-      AND toDate(e.created_at) = ${sqlstring.escape(from)}
+      e.import_id = {importId:String}
+      AND toDate(e.created_at) = {from:String}
       AND e.session_id != ''
     GROUP BY e.session_id
   `;

   await ch.command({
     query: sessionsInsertQuery,
+    query_params: { importId, from },
     clickhouse_settings: {

601-613: Use deterministic status aggregation instead of any(import_status).

any(import_status) (line 607) returns an arbitrary status from the group, making results non-deterministic when events have mixed statuses (e.g., some 'pending', some 'processed').

Use deterministic aggregation:

     SELECT 
       import_id,
       COUNT(*) as total_events,
       COUNTIf(import_status = 'pending') as pending_events,
       COUNTIf(import_status = 'processed') as processed_events,
-      any(import_status) as status
+      CASE
+        WHEN COUNTIf(import_status = 'failed') > 0 THEN 'failed'
+        WHEN COUNTIf(import_status = 'processed') = COUNT(*) THEN 'processed'
+        WHEN COUNTIf(import_status = 'pending') > 0 THEN 'processing'
+        ELSE 'pending'
+      END as status
     FROM ${TABLE_NAMES.events_imports}

647-652: Validate status value instead of using type assertion.

The type assertion on line 651 bypasses type safety. If ClickHouse returns an unexpected status value, the type system won't catch it.

Add runtime validation:

+ const validStatuses = ['pending', 'processing', 'processed', 'failed'] as const;
+ const status = validStatuses.includes(row.status as any) 
+   ? (row.status as typeof validStatuses[number])
+   : 'pending';
+
  return {
    importId,
    totalEvents: row.total_events,
    insertedEvents: row.processed_events,
-   status: row.status as 'pending' | 'processing' | 'processed' | 'failed',
+   status,
  };
🧹 Nitpick comments (5)
packages/common/src/object.test.ts (1)

5-32: Strong test coverage for main scenarios.

The test thoroughly validates the core functionality including primitives, arrays, nested objects, and JSON string parsing. The expectations correctly reflect the implementation's behavior of stringifying values, flattening structures, and filtering out null/undefined/empty strings.

Consider adding test cases for additional edge cases:

  • Boolean values (true/false)
  • Falsy number (0)
  • Empty arrays and objects
  • Arrays containing null/undefined elements

These would further strengthen the test suite but are not critical for the current functionality.

packages/validation/src/index.ts (1)

461-471: Consider date validation for temporal fields.

The from and to fields likely represent date ranges but are only validated as non-empty strings. Invalid date formats could cause runtime errors in the import processor.

Consider adding date format validation:

 export const zMixpanelImportConfig = z.object({
   provider: z.literal('mixpanel'),
   type: z.literal('api'),
   serviceAccount: z.string().min(1),
   serviceSecret: z.string().min(1),
   projectId: z.string().min(1),
-  from: z.string().min(1),
-  to: z.string().min(1),
+  from: z.string().datetime(),
+  to: z.string().datetime(),
   mapScreenViewProperty: z.string().optional(),
 });

Note: Use .datetime() for ISO 8601 timestamps or .date() for YYYY-MM-DD format, depending on the expected input format.

packages/redis/package.json (1)

11-11: Consider using caret range for consistency.

The change from "^5.7.0" to "5.8.2" removes semver flexibility and pins to an exact version. This is inconsistent with dependency management patterns elsewhere in the monorepo and prevents automatic patch updates.

Consider reverting to a caret range for consistency:

-    "ioredis": "5.8.2"
+    "ioredis": "^5.8.2"
packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1)

1-3: Migration creates type churn.

The migration adds currentBatch as INTEGER NOT NULL DEFAULT 0, but a subsequent migration (20251028152531_imports_again) changes it to TEXT nullable. While both migrations are technically valid, this indicates design iteration. For future migrations, consider finalizing type decisions before creating schema changes to reduce churn.

packages/db/src/services/import.service.ts (1)

721-784: Consider coordinating Postgres and ClickHouse state updates.

updateImportStatus updates Postgres only, while markImportComplete (lines 574-593) updates ClickHouse only. If one system's update fails, import state becomes inconsistent across the two databases.

Consider one of these approaches:

  1. Update Postgres first (as source of truth), then ClickHouse, with retry logic
  2. Use an idempotent reconciliation job that periodically syncs state
  3. At minimum, add try-catch with logging for manual reconciliation
 export async function updateImportStatus(
   jobLogger: ILogger,
   job: { updateProgress: (progress: Record<string, any>) => void },
   importId: string,
   options: UpdateImportStatusOptions,
 ): Promise<void> {
+  try {
     const data: Prisma.ImportUpdateInput = {};
     // ... build data object ...
     
     await job.updateProgress(data);
     await db.import.update({
       where: { id: importId },
       data,
     });
+  } catch (error) {
+    jobLogger.error('Failed to update import status', { importId, error });
+    throw error;
+  }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a11373 and c9ebec6.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (76)
  • apps/api/package.json (1 hunks)
  • apps/start/Dockerfile (2 hunks)
  • apps/start/package.json (2 hunks)
  • apps/start/src/components/integrations/integration-card.tsx (1 hunks)
  • apps/start/src/components/project-mapper.tsx (1 hunks)
  • apps/start/src/components/ui/badge.tsx (1 hunks)
  • apps/start/src/modals/add-import.tsx (1 hunks)
  • apps/start/src/modals/date-ranger-picker.tsx (1 hunks)
  • apps/start/src/modals/index.tsx (2 hunks)
  • apps/start/src/routeTree.gen.ts (11 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1 hunks)
  • apps/start/src/utils/math.ts (1 hunks)
  • apps/start/src/utils/slug.ts (1 hunks)
  • apps/worker/Dockerfile (2 hunks)
  • apps/worker/package.json (3 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/cron.delete-projects.ts (1 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • apps/worker/src/jobs/events.incoming-events.test.ts (2 hunks)
  • apps/worker/src/jobs/import.ts (1 hunks)
  • package.json (1 hunks)
  • packages/cli/package.json (0 hunks)
  • packages/cli/src/cli.ts (0 hunks)
  • packages/cli/src/importer/importer.ts (0 hunks)
  • packages/cli/src/importer/index.ts (0 hunks)
  • packages/cli/tsconfig.json (0 hunks)
  • packages/cli/tsup.config.ts (0 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/scripts/get-referrers.ts (1 hunks)
  • packages/common/server/index.ts (1 hunks)
  • packages/common/server/parse-referrer.test.ts (5 hunks)
  • packages/common/server/parse-referrer.ts (4 hunks)
  • packages/common/server/parser-user-agent.ts (2 hunks)
  • packages/common/src/object.test.ts (1 hunks)
  • packages/common/src/object.ts (2 hunks)
  • packages/db/code-migrations/5-add-imports-table.sql (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.ts (1 hunks)
  • packages/db/index.ts (2 hunks)
  • packages/db/package.json (1 hunks)
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/schema.prisma (2 hunks)
  • packages/db/src/buffers/base-buffer.ts (1 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (8 hunks)
  • packages/db/src/clickhouse/client.ts (4 hunks)
  • packages/db/src/clickhouse/csv.ts (1 hunks)
  • packages/db/src/clickhouse/migration.ts (1 hunks)
  • packages/db/src/services/event.service.ts (1 hunks)
  • packages/db/src/services/import.service.ts (1 hunks)
  • packages/db/src/services/session.service.ts (1 hunks)
  • packages/db/src/types.ts (2 hunks)
  • packages/importer/package.json (1 hunks)
  • packages/importer/src/base-provider.ts (1 hunks)
  • packages/importer/src/index.ts (1 hunks)
  • packages/importer/src/providers/metadata.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.test.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.ts (1 hunks)
  • packages/importer/src/providers/umami.ts (1 hunks)
  • packages/importer/src/types.ts (1 hunks)
  • packages/importer/tsconfig.json (1 hunks)
  • packages/importer/vitest.config.ts (1 hunks)
  • packages/queue/src/queues.ts (1 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/package.json (1 hunks)
  • packages/trpc/src/root.ts (2 hunks)
  • packages/trpc/src/routers/auth.ts (1 hunks)
  • packages/trpc/src/routers/import.ts (1 hunks)
  • packages/trpc/src/routers/organization.ts (1 hunks)
  • packages/validation/src/index.ts (1 hunks)
💤 Files with no reviewable changes (6)
  • packages/cli/package.json
  • packages/cli/tsup.config.ts
  • packages/cli/src/cli.ts
  • packages/cli/src/importer/importer.ts
  • packages/cli/src/importer/index.ts
  • packages/cli/tsconfig.json
🚧 Files skipped from review as they are similar to previous changes (33)
  • packages/common/server/index.ts
  • apps/start/src/modals/date-ranger-picker.tsx
  • apps/worker/src/index.ts
  • packages/common/server/parse-referrer.ts
  • packages/trpc/src/root.ts
  • packages/db/src/types.ts
  • apps/start/src/modals/index.tsx
  • packages/common/src/object.ts
  • apps/worker/Dockerfile
  • packages/db/src/buffers/event-buffer.test.ts
  • packages/db/index.ts
  • packages/trpc/src/routers/import.ts
  • apps/start/src/components/integrations/integration-card.tsx
  • packages/trpc/src/routers/organization.ts
  • apps/start/src/utils/math.ts
  • packages/importer/src/providers/metadata.ts
  • packages/common/scripts/get-referrers.ts
  • apps/start/package.json
  • apps/start/src/components/ui/badge.tsx
  • package.json
  • packages/importer/src/providers/umami.ts
  • apps/worker/package.json
  • packages/importer/vitest.config.ts
  • packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql
  • packages/db/src/services/event.service.ts
  • apps/start/Dockerfile
  • packages/importer/src/index.ts
  • apps/worker/src/boot-workers.ts
  • packages/queue/src/queues.ts
  • packages/importer/tsconfig.json
  • packages/trpc/src/routers/auth.ts
  • apps/start/src/routes/app.$organizationId.$projectId.settings._tabs.tsx
  • packages/db/src/services/session.service.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/modals/add-import.tsx
  • apps/start/src/utils/slug.ts
  • apps/start/src/routeTree.gen.ts
  • apps/start/src/components/project-mapper.tsx
🧠 Learnings (9)
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, use the helper functions in ./packages/db/src/clickhouse/query-functions.ts rather than crafting raw queries

Applied to files:

  • packages/db/src/clickhouse/csv.ts
  • packages/db/code-migrations/5-add-imports-table.ts
  • packages/db/src/clickhouse/client.ts
  • packages/db/src/services/import.service.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*.tsx : Define React route files with createFileRoute and let the bundler/CLI manage the path string

Applied to files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/__root.tsx : Declare the root route in routes/__root.tsx using createRootRoute or createRootRouteWithContext

Applied to files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/{index,*.index}.tsx : For index routes, use index.tsx or *.index.tsx files and ensure the createFileRoute path uses a trailing slash (e.g., '/posts/')

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*$*.tsx : Name dynamic route files with $param (e.g., posts.$postId.tsx) to capture path parameters

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/$.tsx : Use $.tsx for splat (catch-all) routes to capture the remaining path into params._splat

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*_.tsx : Use a trailing underscore in a segment filename to create a non-nested (un-nested) route (e.g., posts_.tsx)

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/_*.tsx : Use leading underscore filenames for pathless layout routes that wrap children without matching a URL segment

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
PR: Openpanel-dev/openpanel#0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, always construct them using the custom query builder in ./packages/db/src/clickhouse/query-builder.ts

Applied to files:

  • packages/db/src/clickhouse/client.ts
  • packages/db/src/services/import.service.ts
🧬 Code graph analysis (12)
packages/db/code-migrations/5-add-imports-table.ts (2)
packages/db/code-migrations/helpers.ts (1)
  • getIsCluster (17-24)
packages/db/src/clickhouse/migration.ts (3)
  • createTable (68-116)
  • modifyTTL (118-132)
  • runClickhouseMigrationCommands (338-470)
packages/importer/src/base-provider.ts (2)
packages/importer/src/types.ts (3)
  • BaseRawEvent (40-42)
  • ImportJobMetadata (69-73)
  • ErrorContext (45-51)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/importer/src/providers/mixpanel.test.ts (1)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-452)
packages/common/server/parse-referrer.test.ts (1)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
apps/worker/src/jobs/import.ts (8)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (168-173)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/db/src/services/import.service.ts (10)
  • ImportSteps (719-719)
  • updateImportStatus (721-784)
  • getImportDateBounds (658-682)
  • insertImportBatch (30-87)
  • generateSessionIds (94-130)
  • createSessionsStartEndEvents (138-348)
  • moveImportsToProduction (354-458)
  • backfillSessionsToProduction (460-569)
  • markImportComplete (574-593)
  • getImportProgress (598-653)
packages/db/src/clickhouse/client.ts (2)
  • formatClickhouseDate (237-248)
  • error (41-46)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/importer/src/providers/umami.ts (1)
  • UmamiProvider (59-382)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-452)
packages/common/src/object.test.ts (1)
packages/common/src/object.ts (1)
  • toDots (16-67)
packages/importer/src/types.ts (2)
packages/db/src/services/event.service.ts (1)
  • IServiceImportedEventPayload (144-147)
packages/db/src/services/import.service.ts (2)
  • ImportProgress (20-25)
  • ImportStageResult (14-18)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (4)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)
  • Route (12-34)
packages/importer/src/providers/metadata.ts (1)
  • IMPORT_PROVIDERS (13-30)
apps/public/components/ui/tooltip.tsx (1)
  • Tooltiper (36-51)
apps/start/src/components/integrations/integration-card.tsx (3)
  • IntegrationCard (91-112)
  • IntegrationCardLogoImage (50-69)
  • IntegrationCardFooter (3-15)
apps/start/src/modals/add-import.tsx (3)
packages/validation/src/index.ts (5)
  • zUmamiImportConfig (452-454)
  • zMixpanelImportConfig (461-470)
  • IUmamiImportConfig (456-456)
  • IMixpanelImportConfig (471-471)
  • IImportConfig (473-476)
apps/start/src/components/project-mapper.tsx (1)
  • ProjectMapper (33-116)
apps/public/lib/utils.ts (1)
  • cn (4-6)
apps/start/src/components/project-mapper.tsx (2)
packages/db/src/services/project.service.ts (1)
  • IServiceProject (7-7)
apps/start/src/components/ui/select.tsx (5)
  • Select (173-173)
  • SelectTrigger (181-181)
  • SelectValue (182-182)
  • SelectContent (174-174)
  • SelectItem (176-176)
packages/importer/src/providers/mixpanel.ts (7)
packages/validation/src/index.ts (1)
  • IMixpanelImportConfig (471-471)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/common/server/parser-user-agent.ts (1)
  • parseUserAgent (72-115)
packages/common/src/object.ts (2)
  • toDots (16-67)
  • strip (79-79)
packages/db/src/services/import.service.ts (5)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/clickhouse/csv.ts (2)
  • csvEscapeField (42-53)
  • csvEscapeJson (15-38)
packages/db/src/clickhouse/client.ts (5)
  • formatClickhouseDate (237-248)
  • chInsertCSV (202-228)
  • TABLE_NAMES (49-61)
  • ch (132-162)
  • convertClickhouseDateToJs (267-269)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-worker
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-dashboard
🔇 Additional comments (30)
apps/api/package.json (1)

67-67: Verify the rationale for pinning tsdown to an exact version.

The caret operator has been removed from tsdown, changing it from "^0.14.2" to "0.14.2" (exact version pin). The AI summary indicates this change is coordinated across the monorepo. Ensure this is intentional for reproducible builds or a known stability requirement with this specific version.

Is this pinning part of a broader monorepo strategy, or is there a specific reason (e.g., bug fix, stability) for locking to 0.14.2? You may want to verify consistency across other workspace packages to confirm alignment.

packages/db/src/buffers/base-buffer.ts (1)

1-1: LGTM! Import path consolidation.

The import path change from @openpanel/common/server/id to @openpanel/common/server is part of a coordinated refactor to centralize server utilities, with similar updates across auth.ts and organization.ts.

packages/common/src/object.test.ts (2)

1-2: LGTM!

The imports are correct and appropriate for a vitest test file.


34-47: Excellent validation of malformed JSON handling.

This test effectively validates that the function correctly:

  • Parses valid JSON strings and expands them into dot notation
  • Skips strings that appear to be JSON but are malformed
  • Preserves regular strings as-is

The test expectations align well with the implementation's defensive approach to JSON parsing.

packages/common/package.json (1)

6-9: Public exports field enables dual entry points.

The new exports field correctly exposes the main entry point and a server-specific export, aligning with the referrer utilities being made part of the public API surface (parse-referrer, id).

packages/validation/src/index.ts (2)

439-459: Well-structured file-based import configs.

The createFileImportConfig helper elegantly reduces duplication while maintaining type safety with provider literals. The zProjectMapper schema and its integration into zUmamiImportConfig provides clear validation for project mapping requirements.


473-476: Clean union type export.

The IImportConfig union type appropriately combines the three provider-specific config types for type safety in consuming code.

packages/trpc/package.json (1)

19-19: LGTM!

The addition of the @openpanel/queue workspace dependency aligns with the new import queue functionality introduced in this PR.

packages/db/src/clickhouse/csv.ts (3)

5-13: LGTM!

The escape order is correct (backslashes first), and the function properly handles all necessary control characters for ClickHouse Map format.


15-38: LGTM!

The function correctly handles null values, normalizes to strings for Map(String,String), and properly delegates to csvEscapeField for final CSV escaping.


42-53: LGTM!

The function correctly implements CSV RFC 4180 standard by escaping double quotes through doubling and wrapping fields containing special characters.

packages/db/src/clickhouse/client.ts (4)

1-1: LGTM!

The Readable import is correctly used in the new chInsertCSV function, and the events_imports table addition aligns with the import system introduced in this PR.

Also applies to: 60-60


65-65: Verify timeout increases are appropriate for the import workload.

The request timeout increased 5x (60s → 300s) and idle socket TTL increased 7.5x (8s → 60s). While these increases support long-running import operations, consider monitoring these values in production to ensure they don't mask underlying performance issues or cause resource exhaustion.

Also applies to: 68-68


137-157: LGTM!

The retry wrapper and comprehensive ClickHouse settings properly support large batch inserts. The settings spread order (line 154) correctly allows user settings to override defaults, and the progress headers prevent proxy timeouts during long-running operations.


202-228: LGTM!

The CSV streaming implementation is correct, and the format settings properly align with the CSV escaping utilities (double quotes for CSV fields, single quotes for Map format internally). Error handling and logging are appropriate.

apps/worker/src/jobs/cron.delete-projects.ts (1)

57-57: LGTM!

The type change from numeric 0 to string '0' aligns with ClickHouse setting conventions and likely reflects updated type definitions in the newer @clickhouse/client version.

packages/importer/src/base-provider.ts (3)

4-31: LGTM!

The abstract class design is clean with appropriate type constraints. Using AsyncGenerator for parseSource enables memory-efficient streaming for large imports, which is essential for this use case.


36-68: LGTM!

The optional hooks are well-designed with sensible defaults. The shouldGenerateSessionIds() default of false appropriately assumes providers handle session ID generation during streaming.


78-120: LGTM!

The date chunking implementation correctly splits date ranges while handling edge cases (same start/end date) and ensuring chunks don't exceed the end date. The break condition prevents infinite loops.

packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1)

1-2: LGTM!

The migration correctly adds an optional statusMessage column to support detailed import status tracking, aligning with the PR's import system objectives.

packages/db/package.json (1)

16-16: Verify compatibility with @clickhouse/client 1.12.1.

The upgrade from ^1.2.0 to ^1.12.1 spans 10 minor versions. While maintaining semver compatibility, ensure there are no behavioral changes affecting existing ClickHouse operations.

apps/worker/src/jobs/events.incoming-events.test.ts (1)

102-102: LGTM! Test expectations align with updated referrer parsing behavior.

The expectations correctly reflect that referrerType now returns an empty string ('') instead of 'unknown' for non-derived referrers, consistent with the changes in packages/common/server/parse-referrer.ts.

Also applies to: 210-210

packages/db/src/clickhouse/migration.ts (1)

118-132: LGTM! Clean implementation following established patterns.

The modifyTTL function correctly handles both clustered and non-clustered scenarios, using the replicated() helper consistently with other migration utilities in this file. The implementation is type-safe and follows the existing code structure.

packages/common/server/parse-referrer.test.ts (1)

8-8: LGTM! Comprehensive test coverage for the referrer type change.

All test expectations have been consistently updated to reflect that unknown or empty referrer types now return '' instead of 'unknown'. The changes cover both parseReferrer and getReferrerWithQuery functions across various scenarios (undefined, empty, invalid, and unknown inputs).

Also applies to: 14-14, 44-44, 52-52, 67-67, 91-91, 99-99, 113-113

packages/db/code-migrations/5-add-imports-table.sql (1)

1-43: LGTM! Well-designed import staging table.

The events_imports table schema is well-structured:

  • Mirrors the main events table structure with additional import tracking fields (import_id, import_status, imported_at_meta)
  • Uses appropriate compression codecs for each data type
  • Partitioning by month on imported_at_meta aligns with the 7-day TTL for efficient partition drops
  • ORDER BY (import_id, created_at) supports efficient queries by import batch

The separate TTL ALTER statement follows correct ClickHouse patterns.

packages/common/server/parser-user-agent.ts (2)

71-71: Good API expansion with exported type.

The UserAgentInfo type export provides a clean public API for consumers to type the return value of parseUserAgent, using TypeScript's ReturnType utility type.


84-112: LGTM! More defensive override handling.

The updated override logic adds explicit type guards and truthiness checks, preventing empty strings or non-string values from being used as overrides. This is more defensive than the previous ?? operator approach and ensures only valid, non-empty string overrides are applied. The pattern is consistently applied across all fields.

packages/db/code-migrations/5-add-imports-table.ts (1)

65-72: TTL implementation via modifyTTL helper is correct.

The code uses the modifyTTL helper function which correctly targets the replicated table in cluster mode (see packages/db/src/clickhouse/migration.ts lines 117-131). This addresses the concern about TTL on distributed tables.

The helper properly constructs:

  • ALTER TABLE ${replicated(tableName)} ON CLUSTER '{cluster}' MODIFY TTL ${ttl} for clustered
  • ALTER TABLE ${tableName} MODIFY TTL ${ttl} for non-clustered
apps/worker/src/jobs/events.incoming-event.ts (2)

8-23: Import reorganization aligns with common package refactoring.

The imports have been updated to use public exports from @openpanel/common/server, and Job is now correctly imported as a type-only import. The separate import of omit from ramda alongside the namespace import is acceptable.


120-122: Referrer logic improvements are correct.

The changes improve referrer handling:

  • Line 120: baseEvent.referrer now uses only referrer?.url (removed utmReferrer?.url), which is correct since UTM parameters shouldn't override the actual referrer URL
  • Line 122: referrerType now prioritizes referrer?.type before utmReferrer?.type, which properly favors detected referrer types (like 'search') over UTM-based types

@@ -1 +1 @@
export * from '@openpanel/common/src/slug';
export * from '@openpanel/common';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify export scope: file exports entire common package instead of slug utilities.

This file is named slug.ts but now exports everything from @openpanel/common, not just slug-related functions. This introduces namespace pollution and breaks the single-responsibility principle for utility modules.

Consider using named exports to maintain the focused scope:

-export * from '@openpanel/common';
+export { slugify, /* other slug-specific exports */ } from '@openpanel/common';

Run the following script to verify what slug-related symbols should be exported:


🏁 Script executed:

#!/bin/bash
# Find usages of this slug utility module to determine required exports
rg -n --type=ts --type=tsx "from ['\"]@/utils/slug['\"]" -A 1 -B 1

# Find the original slug exports from common package
fd -e ts -e tsx --full-path "packages/common" | xargs rg -n "export.*slug" -i

Length of output: 157


🏁 Script executed:

#!/bin/bash
# Check all exports in the common package's index
echo "=== Contents of packages/common/index.ts ==="
cat packages/common/index.ts

echo ""
echo "=== Search for any imports of apps/start/src/utils/slug.ts ==="
rg -n --type=ts "utils/slug" -C 2

echo ""
echo "=== Check if apps/start/src/utils/slug.ts file exists and its content ==="
cat apps/start/src/utils/slug.ts 2>/dev/null || echo "File not found"

Length of output: 921


🏁 Script executed:

#!/bin/bash
# Check full content of packages/common/src/slug.ts to see all exports
echo "=== Full content of packages/common/src/slug.ts ==="
cat packages/common/src/slug.ts

echo ""
echo "=== Check if there are any other references to slug utilities ==="
rg -n "slug" apps/start/src/utils/ --type=ts

Length of output: 616


Use named export to maintain module scope: export { slug } from '@openpanel/common';

The file is named slug.ts but re-exports the entire @openpanel/common package (11 modules), when it should only export the slug function. While named imports in consuming files (import { slug }) will tree-shake correctly, this creates a confusing public interface that doesn't match the module's name and breaks the expectation of focused utility exports.

-export * from '@openpanel/common';
+export { slug } from '@openpanel/common';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export * from '@openpanel/common';
export { slug } from '@openpanel/common';
🤖 Prompt for AI Agents
In apps/start/src/utils/slug.ts around line 1, the file currently re-exports the
entire @openpanel/common package which exposes many unrelated modules; change it
to a focused named re-export so this module only exposes the slug utility.
Replace the bulk export with a named export that re-exports just slug from
@openpanel/common (maintain TypeScript types and default/namespace
considerations), run a quick type-check/build to ensure no other modules rely on
the previous broad export surface.

Comment on lines +14 to +25
export interface SessionInfo {
id: string;
lastTimestamp: number;
lastEvent: IServiceImportedEventPayload;
}

export interface ImportProgress {
totalEvents: number;
processedEvents: number;
currentBatch: number;
totalBatches: number;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove duplicate type definitions.

ImportProgress (lines 20-25) and ImportStageResult (lines 75-80) are redefined in packages/db/src/services/import.service.ts. This duplication risks type drift if one definition is updated without the other.

The service file should import these types from this module:

In packages/db/src/services/import.service.ts, remove the duplicate definitions and add:

import type { ImportProgress, ImportStageResult } from '@openpanel/importer';

Also applies to: 75-80

🤖 Prompt for AI Agents
In packages/importer/src/types.ts around lines 14-25 and 75-80, ImportProgress
and ImportStageResult are defined here but duplicated in
packages/db/src/services/import.service.ts; remove the duplicate type
definitions from packages/db/src/services/import.service.ts and replace them
with a type-only import from this module: add "import type { ImportProgress,
ImportStageResult } from '@openpanel/importer';" at the top of the service file
and update any local references to use the imported types.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
apps/worker/src/jobs/import.ts (4)

44-47: Logger context leaks config secrets.

The full record.config object may contain sensitive credentials (API keys, database passwords, access tokens). Logging it exposes secrets in logs.

Apply this diff to log only safe metadata:

 const jobLogger = logger.child({
   importId,
-  config: record.config,
+  provider: record.config?.provider,
 });

142-188: Harden batch processing: type safety, PII redaction, transform guards, safe createdAt.

This segment has multiple issues flagged in prior reviews:

  • Line 142: any type loses safety; use unknown[]
  • Line 153: Logs full rawEvent which may contain PII
  • Lines 163-168: transformEvent can throw and crash the job
  • Lines 175-177: Unsafe createdAt derivation if transformedEvents[0]?.created_at is missing or invalid

Apply this diff to address all issues:

-    const eventBatch: any = [];
+    const eventBatch: unknown[] = [];
     for await (const rawEvent of providerInstance.parseSource(
       resumeLoadingFrom,
     )) {
       // Validate event
       if (
         !providerInstance.validate(
-          // @ts-expect-error
-          rawEvent,
+          rawEvent as any,
         )
       ) {
-        jobLogger.warn('Skipping invalid event', { rawEvent });
+        jobLogger.warn('Skipping invalid event', {
+          provider: providerInstance.provider,
+          reason: 'validation_failed',
+        });
         continue;
       }
 
       eventBatch.push(rawEvent);
 
       // Process batch when it reaches the batch size
       if (eventBatch.length >= BATCH_SIZE) {
         jobLogger.info('Processing batch', { batchSize: eventBatch.length });
 
-        const transformedEvents: IClickhouseEvent[] = eventBatch.map(
-          (
-            // @ts-expect-error
-            event,
-          ) => providerInstance!.transformEvent(event),
-        );
+        const transformedEvents: IClickhouseEvent[] = [];
+        for (const ev of eventBatch) {
+          try {
+            transformedEvents.push(providerInstance!.transformEvent(ev as any));
+          } catch (e) {
+            jobLogger.warn('Skipping event on transform failure', {
+              provider: providerInstance.provider,
+              reason: e instanceof Error ? e.message : String(e),
+            });
+          }
+        }
 
         await insertImportBatch(transformedEvents, importId);
 
         processedEvents += eventBatch.length;
         eventBatch.length = 0;
 
-        const createdAt = new Date(transformedEvents[0]?.created_at || '')
-          .toISOString()
-          .split('T')[0];
+        const firstCreatedAt = transformedEvents[0]?.created_at;
+        const createdAt = firstCreatedAt
+          ? firstCreatedAt.split('T')[0]!
+          : formatClickhouseDate(new Date(), true);
 
         await updateImportStatus(jobLogger, job, importId, {
           step: 'loading',
           batch: createdAt,
           totalEvents,
           processedEvents,
         });

192-216: Harden final batch: guard transforms, safe createdAt, include counters.

This final batch suffers the same issues as the main batch plus missing progress counters:

  • Lines 193-198: No try-catch around transformEvent
  • Lines 205-207: Unsafe createdAt derivation
  • Lines 209-212: Missing totalEvents and processedEvents in status update

Apply this diff:

     // Process remaining events in the last batch
     if (eventBatch.length > 0) {
-      const transformedEvents = eventBatch.map(
-        (
-          // @ts-expect-error
-          event,
-        ) => providerInstance!.transformEvent(event),
-      );
+      const transformedEvents: IClickhouseEvent[] = [];
+      for (const ev of eventBatch) {
+        try {
+          transformedEvents.push(providerInstance!.transformEvent(ev as any));
+        } catch (e) {
+          jobLogger.warn('Skipping event on transform failure', {
+            provider: providerInstance.provider,
+            reason: e instanceof Error ? e.message : String(e),
+          });
+        }
+      }
 
       await insertImportBatch(transformedEvents, importId);
 
       processedEvents += eventBatch.length;
       eventBatch.length = 0;
 
-      const createdAt = new Date(transformedEvents[0]?.created_at || '')
-        .toISOString()
-        .split('T')[0];
+      const firstCreatedAt = transformedEvents[0]?.created_at;
+      const createdAt = firstCreatedAt
+        ? firstCreatedAt.split('T')[0]!
+        : formatClickhouseDate(new Date(), true);
 
       await updateImportStatus(jobLogger, job, importId, {
         step: 'loading',
         batch: createdAt,
+        totalEvents,
+        processedEvents,
       });

280-299: Sync final ClickHouse counters back to Postgres.

The UI reads processedEvents and totalEvents from the Postgres import table. After completion, finalProgress from ClickHouse contains the actual counts, but these are never written back to Postgres.

Apply this diff to persist final counts:

     await markImportComplete(importId);
     await updateImportStatus(jobLogger, job, importId, {
       step: 'completed',
     });
     jobLogger.info('Import marked as complete');
 
     // Get final progress
     const finalProgress = await getImportProgress(importId);
 
+    // Sync final counts to Postgres for UI
+    await db.import.update({
+      where: { id: importId },
+      data: {
+        totalEvents: finalProgress.totalEvents,
+        processedEvents: finalProgress.insertedEvents,
+      },
+    });
+
     jobLogger.info('Import job completed successfully', {
       totalEvents: finalProgress.totalEvents,
       insertedEvents: finalProgress.insertedEvents,
       status: finalProgress.status,
     });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ebec6 and 0fdd842.

📒 Files selected for processing (1)
  • apps/worker/src/jobs/import.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/worker/src/jobs/import.ts (6)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (168-173)
packages/db/src/prisma-client.ts (1)
  • db (270-270)
packages/db/src/services/import.service.ts (10)
  • ImportSteps (719-719)
  • updateImportStatus (721-784)
  • getImportDateBounds (658-682)
  • insertImportBatch (30-87)
  • generateSessionIds (94-130)
  • createSessionsStartEndEvents (138-348)
  • moveImportsToProduction (354-458)
  • backfillSessionsToProduction (460-569)
  • markImportComplete (574-593)
  • getImportProgress (598-653)
packages/db/src/clickhouse/client.ts (2)
  • formatClickhouseDate (237-248)
  • error (41-46)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/logger/index.ts (1)
  • ILogger (6-6)
🔇 Additional comments (1)
apps/worker/src/jobs/import.ts (1)

300-332: LGTM! Error handling and provider factory are well-structured.

The error handling properly marks the import as failed with a nested try-catch to ensure marking failures are logged. The createProvider factory cleanly delegates to the appropriate provider implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

♻️ Duplicate comments (14)
packages/db/prisma/migrations/20251018214030_fix/migration.sql (1)

1-13: 🔴 CRITICAL: Data migration missing before dropping columns (unresolved from prior review).

This migration drops provider, sourceLocation, and sourceType columns without migrating their data to the config JSON field. The import job handler (apps/worker/src/jobs/import.ts) expects config.provider to exist when processing imports. Existing import records will lose this metadata and the handler will fail.

Add a data migration step before dropping the columns to preserve the values:

+-- AlterTable
+UPDATE "public"."imports" 
+SET "config" = jsonb_build_object(
+  'provider', provider,
+  'sourceType', "sourceType",
+  'sourceLocation', "sourceLocation"
) || "config";
+
 -- AlterTable
 ALTER TABLE "public"."imports" DROP COLUMN "provider",
 DROP COLUMN "sourceLocation",
 DROP COLUMN "sourceType",
 ALTER COLUMN "config" DROP DEFAULT;
packages/importer/vitest.config.ts (1)

3-3: __dirname is unavailable in ESM; this issue was previously flagged.

This critical issue was already identified in a prior review: __dirname does not exist in ESM modules and will throw a ReferenceError at runtime.

Apply this diff to fix the issue:

+import { dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
 import { getSharedVitestConfig } from '../../vitest.shared';

-export default getSharedVitestConfig({ __dirname });
+export default getSharedVitestConfig({ 
+  __dirname: dirname(fileURLToPath(import.meta.url)) 
+});
packages/validation/src/index.ts (1)

478-486: Address the unresolved provider/config type safety issue.

This was flagged in a previous review and remains unresolved. The current schema allows provider and config to contain mismatched provider values (e.g., provider: 'umami' with config.provider: 'mixpanel'), which breaks type safety.

The suggested fix from the previous review uses a discriminated union to enforce that the provider literal matches the config shape. This is a critical type safety improvement that should be implemented.

packages/common/scripts/get-referrers.ts (1)

74-74: Fix incorrect output path.

The script writes to ../../worker/src/referrers/index.ts, but this is incorrect based on the past review comment. The path should be ../server/referrers/index.ts to match the package-local structure.

apps/worker/src/jobs/import.ts (2)

44-47: Stop logging full provider config. Including the entire record.config in the logger context risks leaking credentials and secrets to log storage. Strip it down to non-sensitive identifiers (e.g., provider name) or explicitly mask sensitive keys before calling logger.child.


286-299: Persist final counters back to Postgres. After getImportProgress, only the ClickHouse view reflects the final totals; Postgres import.totalEvents/processedEvents stays at the previous batch values. The UI reads Postgres, so progress appears stale. Update the import row with finalProgress.totalEvents and finalProgress.insertedEvents before returning success. For example:

     jobLogger.info('Import job completed successfully', {
       totalEvents: finalProgress.totalEvents,
       insertedEvents: finalProgress.insertedEvents,
       status: finalProgress.status,
     });
+
+    await db.import.update({
+      where: { id: importId },
+      data: {
+        totalEvents: finalProgress.totalEvents,
+        processedEvents: finalProgress.insertedEvents,
+      },
+    });
packages/importer/src/providers/mixpanel.ts (1)

85-91: Remove service account from logs. Line [90] logs serviceAccount, exposing credential identifiers in centralized logs. Drop the field or replace it with a boolean flag (e.g., serviceAccountPresent: true).

packages/importer/src/providers/mixpanel.test.ts (1)

83-126: Tighten expectations for OS/browser version. The provider should emit $os_version and $browser_version, but this test expects them empty, masking the bug. Update the assertion to expect real values (e.g., 'Android', '135') after fixing the transformer.

packages/trpc/src/routers/import.ts (4)

12-32: Sensitive config exposure in list — restrict or redact.

The list procedure returns import records including the config field to any project member with read access. This config likely contains provider credentials (API keys, secrets, service accounts). Redact sensitive fields or gate full config access by role.

Consider one of these approaches:

Option A: Redact for read-level users

  list: protectedProcedure
    .input(z.object({ projectId: z.string() }))
    .query(async ({ input, ctx }) => {
      const access = await getProjectAccessCached({
        projectId: input.projectId,
        userId: ctx.session.userId,
      });
      if (!access) {
        throw TRPCAccessError('You do not have access to this project');
      }
-      return db.import.findMany({
+      const rows = await db.import.findMany({
        where: { projectId: input.projectId },
        orderBy: { createdAt: 'desc' },
      });
+      const canSeeSecrets = typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets ? rows : rows.map((r) => ({ ...r, config: null }));
    }),

Option B: Exclude config from query for read users
Use Prisma select to omit the config field for read-level access.


34-56: Sensitive config exposure in get — restrict or redact.

The get procedure returns the full import record including sensitive config to any project member. Apply the same redaction/role-based gating as in list.

  get: protectedProcedure
    .input(z.object({ id: z.string() }))
    .query(async ({ input, ctx }) => {
      const importRecord = await db.import.findUniqueOrThrow({
        where: { id: input.id },
        include: { project: true },
      });
      const access = await getProjectAccessCached({
        projectId: importRecord.projectId,
        userId: ctx.session.userId,
      });
      if (!access) {
        throw TRPCAccessError('You do not have access to this import');
      }
-      return importRecord;
+      const canSeeSecrets = typeof access !== 'boolean' && access.level !== 'read';
+      return canSeeSecrets ? importRecord : { ...importRecord, config: null };
    }),

72-99: Queue enqueue not wrapped — record can be left pending with no job.

If importQueue.add fails, the database keeps a dangling 'pending' import with no corresponding job. Wrap the enqueue in try/catch, mark the record as failed on error, and configure stable jobId with retry options.

      // Create import record
      const importRecord = await db.import.create({
        data: {
          projectId: input.projectId,
          config: input.config,
          status: 'pending',
        },
      });

-      // Add job to queue
-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
-
-      // Update import record with job ID
-      await db.import.update({
-        where: { id: importRecord.id },
-        data: { jobId: job.id },
-      });
+      // Add job to queue
+      try {
+        const job = await importQueue.add(
+          'import',
+          { type: 'import', payload: { importId: importRecord.id } },
+          { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+        );
+        await db.import.update({
+          where: { id: importRecord.id },
+          data: { jobId: job.id },
+        });
+      } catch (err) {
+        await db.import.update({
+          where: { id: importRecord.id },
+          data: { 
+            status: 'failed', 
+            errorMessage: err instanceof Error ? err.message : 'Queue enqueue failed' 
+          },
+        });
+        throw err;
+      }

160-176: Wrap retry enqueue in try/catch.

Same issue as create: if importQueue.add fails, the record is updated to 'pending' but no job exists. Wrap in try/catch and handle failures.

      // Add new job to queue
-      const job = await importQueue.add('import', {
-        type: 'import',
-        payload: {
-          importId: importRecord.id,
-        },
-      });
-
-      // Update import record
-      return db.import.update({
-        where: { id: importRecord.id },
-        data: {
-          jobId: job.id,
-          status: 'pending',
-          errorMessage: null,
-        },
-      });
+      try {
+        const job = await importQueue.add(
+          'import',
+          { type: 'import', payload: { importId: importRecord.id } },
+          { jobId: importRecord.id, attempts: 3, backoff: { type: 'exponential', delay: 30_000 } },
+        );
+        return db.import.update({
+          where: { id: importRecord.id },
+          data: {
+            jobId: job.id,
+            status: 'pending',
+            errorMessage: null,
+          },
+        });
+      } catch (err) {
+        throw new Error('Failed to enqueue retry job: ' + (err instanceof Error ? err.message : String(err)));
+      }
    }),
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1)

265-276: Config tooltip exposes sensitive credentials.

The tooltip at Line 269 renders the raw imp.config JSON which can expose API keys, secrets, and service credentials to anyone viewing the imports page. Redact sensitive fields before display.

  <Tooltiper
    content={
      <pre className="font-mono text-sm leading-normal whitespace-pre-wrap break-words">
-        {JSON.stringify(imp.config, null, 2)}
+        {JSON.stringify(
+          {
+            ...imp.config,
+            apiKey: imp.config.apiKey ? '***' : undefined,
+            apiSecret: imp.config.apiSecret ? '***' : undefined,
+            serviceAccount: imp.config.serviceAccount ? '***' : undefined,
+            serviceSecret: imp.config.serviceSecret ? '***' : undefined,
+          },
+          null,
+          2
+        )}
      </pre>
    }
    tooltipClassName="max-w-xs"
  >

Or implement a utility function to sanitize known sensitive keys (apiKey, api_secret, secret, password, token, credentials, etc.) before rendering.

apps/start/src/modals/add-import.tsx (1)

245-249: Avoid leaking secrets in error paths.

The error toast at Line 247 displays error.message which may include upstream details or API secrets. Use a generic user-facing message and log full details server-side only.

      onError: (error) => {
        toast.error('Import failed', {
-          description: error.message,
+          description: 'Unable to start import. Please try again or contact support.',
        });
      },

Ensure the server logs the full error details securely without echoing them to the client.

🧹 Nitpick comments (9)
packages/db/src/clickhouse/migration.ts (1)

118-132: Add validation or documentation for TTL format.

The function correctly handles clustered and non-clustered scenarios. However, the ttl parameter is directly interpolated into the SQL statement without validation. ClickHouse TTL expressions must follow a specific syntax (e.g., created + INTERVAL 30 DAY), and invalid formats will only fail at runtime.

Consider adding:

  1. Input validation to check TTL syntax before generating SQL
  2. JSDoc documentation specifying the expected TTL format
+/**
+ * Generates ALTER TABLE statement to modify TTL
+ * @param tableName - Name of the table
+ * @param isClustered - Whether the deployment is clustered
+ * @param ttl - TTL expression in ClickHouse format (e.g., 'created + INTERVAL 30 DAY')
+ * @returns SQL statement to modify TTL
+ */
 export const modifyTTL = ({
   tableName,
   isClustered,
   ttl,
 }: {
   tableName: string;
   isClustered: boolean;
   ttl: string;
 }) => {
+  // Basic validation - TTL should not be empty
+  if (!ttl || ttl.trim().length === 0) {
+    throw new Error('TTL expression cannot be empty');
+  }
+  
   if (isClustered) {
     return `ALTER TABLE ${replicated(tableName)} ON CLUSTER '{cluster}' MODIFY TTL ${ttl}`;
   }

   return `ALTER TABLE ${tableName} MODIFY TTL ${ttl}`;
 };
packages/validation/src/index.ts (2)

439-442: Consider exporting zProjectMapper if used externally.

The schema is well-defined, but verify whether external packages (e.g., @openpanel/importer) need access to this schema for validation or type inference. If so, add the export keyword.


444-449: Good abstraction for file-based import configs.

The helper reduces duplication effectively. Similar to zProjectMapper, consider exporting this if external packages need to create compatible config schemas.

packages/common/src/object.ts (1)

44-55: Good use of try-catch for safe JSON parsing.

The try-catch properly handles parse errors and prevents crashes from malformed JSON. However, consider simplifying by removing the pre-check in isValidJsonString (see previous comment) since the try-catch already provides the necessary safety.

apps/worker/package.json (1)

41-41: Version pinning removes automatic patch updates.

Removing the caret (^) from tsdown pins it to exactly 0.14.2, preventing automatic patch updates that might include bug fixes. Unless there's a specific compatibility reason, consider keeping the caret for patch-level updates.

packages/db/code-migrations/5-add-imports-table.sql (1)

42-43: Consider extending TTL for debugging.

A 7-day TTL may be too aggressive for debugging failed imports or auditing historical import issues. Consider extending to 30-90 days depending on your operational needs and storage constraints.

packages/common/scripts/get-referrers.ts (1)

81-86: Consider formatting the generated output.

The JSON.stringify output is not formatted, which will produce a single-line file that's difficult to read and review. Consider adding indentation or using a code formatter.

-        `const referrers: Record<string, { type: string, name: string }> = ${JSON.stringify(
+        `const referrers: Record<string, { type: string, name: string }> = ${JSON.stringify(
           {
             ...transform(data),
             ...extraReferrers,
           },
+          null,
+          2,
         )} as const;`,
packages/db/src/clickhouse/client.ts (1)

206-232: Validate CSV row format before insertion.

The chInsertCSV function joins rows with newlines without validating that each row is properly formatted CSV. Consider adding validation or documenting the expected row format to prevent silent data corruption.

 export async function chInsertCSV(tableName: string, rows: string[]) {
+  if (rows.length === 0) {
+    logger.warn('chInsertCSV called with empty rows array');
+    return;
+  }
+  
   try {
     const now = performance.now();
packages/db/src/services/import.service.ts (1)

468-568: Use ClickHouse query_params instead of sqlstring.escape. In backfillSessionsToProduction, inline sqlstring.escape(importId)/sqlstring.escape(from) breaks consistency with the rest of the file and forfeits ClickHouse parameter handling. Replace them with {importId:String}/{from:String} placeholders and pass query_params: { importId, from }. Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9211e62 and 2808bdb.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (76)
  • apps/api/package.json (1 hunks)
  • apps/start/Dockerfile (2 hunks)
  • apps/start/package.json (2 hunks)
  • apps/start/src/components/integrations/integration-card.tsx (1 hunks)
  • apps/start/src/components/project-mapper.tsx (1 hunks)
  • apps/start/src/components/ui/badge.tsx (1 hunks)
  • apps/start/src/modals/add-import.tsx (1 hunks)
  • apps/start/src/modals/date-ranger-picker.tsx (1 hunks)
  • apps/start/src/modals/index.tsx (2 hunks)
  • apps/start/src/routeTree.gen.ts (11 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1 hunks)
  • apps/start/src/utils/math.ts (1 hunks)
  • apps/start/src/utils/slug.ts (1 hunks)
  • apps/worker/Dockerfile (2 hunks)
  • apps/worker/package.json (3 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/cron.delete-projects.ts (1 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • apps/worker/src/jobs/events.incoming-events.test.ts (2 hunks)
  • apps/worker/src/jobs/import.ts (1 hunks)
  • package.json (1 hunks)
  • packages/cli/package.json (0 hunks)
  • packages/cli/src/cli.ts (0 hunks)
  • packages/cli/src/importer/importer.ts (0 hunks)
  • packages/cli/src/importer/index.ts (0 hunks)
  • packages/cli/tsconfig.json (0 hunks)
  • packages/cli/tsup.config.ts (0 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/scripts/get-referrers.ts (1 hunks)
  • packages/common/server/index.ts (1 hunks)
  • packages/common/server/parse-referrer.test.ts (5 hunks)
  • packages/common/server/parse-referrer.ts (4 hunks)
  • packages/common/server/parser-user-agent.ts (2 hunks)
  • packages/common/src/object.test.ts (1 hunks)
  • packages/common/src/object.ts (2 hunks)
  • packages/db/code-migrations/5-add-imports-table.sql (1 hunks)
  • packages/db/code-migrations/5-add-imports-table.ts (1 hunks)
  • packages/db/index.ts (2 hunks)
  • packages/db/package.json (1 hunks)
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/schema.prisma (2 hunks)
  • packages/db/src/buffers/base-buffer.ts (1 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (8 hunks)
  • packages/db/src/clickhouse/client.ts (6 hunks)
  • packages/db/src/clickhouse/csv.ts (1 hunks)
  • packages/db/src/clickhouse/migration.ts (1 hunks)
  • packages/db/src/services/event.service.ts (1 hunks)
  • packages/db/src/services/import.service.ts (1 hunks)
  • packages/db/src/services/session.service.ts (1 hunks)
  • packages/db/src/types.ts (2 hunks)
  • packages/importer/package.json (1 hunks)
  • packages/importer/src/base-provider.ts (1 hunks)
  • packages/importer/src/index.ts (1 hunks)
  • packages/importer/src/providers/metadata.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.test.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.ts (1 hunks)
  • packages/importer/src/providers/umami.ts (1 hunks)
  • packages/importer/src/types.ts (1 hunks)
  • packages/importer/tsconfig.json (1 hunks)
  • packages/importer/vitest.config.ts (1 hunks)
  • packages/queue/src/queues.ts (1 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/package.json (1 hunks)
  • packages/trpc/src/root.ts (2 hunks)
  • packages/trpc/src/routers/auth.ts (1 hunks)
  • packages/trpc/src/routers/import.ts (1 hunks)
  • packages/trpc/src/routers/organization.ts (1 hunks)
  • packages/validation/src/index.ts (1 hunks)
💤 Files with no reviewable changes (6)
  • packages/cli/tsup.config.ts
  • packages/cli/tsconfig.json
  • packages/cli/src/importer/index.ts
  • packages/cli/package.json
  • packages/cli/src/importer/importer.ts
  • packages/cli/src/cli.ts
🚧 Files skipped from review as they are similar to previous changes (43)
  • packages/db/src/buffers/event-buffer.test.ts
  • apps/start/src/components/integrations/integration-card.tsx
  • packages/common/src/object.test.ts
  • apps/start/src/modals/date-ranger-picker.tsx
  • packages/db/src/services/event.service.ts
  • packages/importer/tsconfig.json
  • packages/db/src/services/session.service.ts
  • packages/trpc/package.json
  • packages/db/index.ts
  • apps/worker/src/jobs/events.incoming-events.test.ts
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql
  • packages/queue/src/queues.ts
  • apps/worker/Dockerfile
  • apps/worker/src/index.ts
  • packages/db/src/types.ts
  • packages/common/server/index.ts
  • packages/importer/src/index.ts
  • apps/worker/src/jobs/cron.delete-projects.ts
  • apps/start/package.json
  • apps/start/src/modals/index.tsx
  • apps/start/src/components/project-mapper.tsx
  • packages/db/prisma/schema.prisma
  • apps/api/package.json
  • packages/db/package.json
  • apps/start/src/components/ui/badge.tsx
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql
  • packages/common/server/parse-referrer.test.ts
  • apps/start/Dockerfile
  • packages/common/server/parser-user-agent.ts
  • packages/importer/src/base-provider.ts
  • packages/redis/package.json
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql
  • apps/worker/src/boot-workers.ts
  • apps/worker/src/jobs/events.incoming-event.ts
  • packages/importer/src/types.ts
  • packages/db/code-migrations/5-add-imports-table.ts
  • packages/importer/src/providers/metadata.ts
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql
  • packages/common/package.json
  • packages/importer/package.json
  • packages/db/src/buffers/base-buffer.ts
  • apps/start/src/utils/slug.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/utils/math.ts
  • apps/start/src/routeTree.gen.ts
  • apps/start/src/modals/add-import.tsx
🧠 Learnings (9)
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/__root.tsx : Declare the root route in routes/__root.tsx using createRootRoute or createRootRouteWithContext

Applied to files:

  • packages/trpc/src/root.ts
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*.tsx : Define React route files with createFileRoute and let the bundler/CLI manage the path string

Applied to files:

  • packages/trpc/src/root.ts
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/_*.tsx : Use leading underscore filenames for pathless layout routes that wrap children without matching a URL segment

Applied to files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, use the helper functions in ./packages/db/src/clickhouse/query-functions.ts rather than crafting raw queries

Applied to files:

  • packages/db/src/clickhouse/csv.ts
  • packages/db/src/clickhouse/migration.ts
  • packages/db/src/clickhouse/client.ts
  • packages/importer/src/providers/umami.ts
  • packages/db/src/services/import.service.ts
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, always construct them using the custom query builder in ./packages/db/src/clickhouse/query-builder.ts

Applied to files:

  • packages/db/src/clickhouse/client.ts
  • packages/importer/src/providers/umami.ts
  • packages/db/src/services/import.service.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/{index,*.index}.tsx : For index routes, use index.tsx or *.index.tsx files and ensure the createFileRoute path uses a trailing slash (e.g., '/posts/')

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*$*.tsx : Name dynamic route files with $param (e.g., posts.$postId.tsx) to capture path parameters

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/$.tsx : Use $.tsx for splat (catch-all) routes to capture the remaining path into params._splat

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*_.tsx : Use a trailing underscore in a segment filename to create a non-nested (un-nested) route (e.g., posts_.tsx)

Applied to files:

  • apps/start/src/routeTree.gen.ts
🧬 Code graph analysis (10)
apps/worker/src/jobs/import.ts (5)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (168-173)
packages/db/src/services/import.service.ts (10)
  • ImportSteps (719-719)
  • updateImportStatus (721-784)
  • getImportDateBounds (658-682)
  • insertImportBatch (30-87)
  • generateSessionIds (94-130)
  • createSessionsStartEndEvents (138-348)
  • moveImportsToProduction (354-458)
  • backfillSessionsToProduction (460-569)
  • markImportComplete (574-593)
  • getImportProgress (598-653)
packages/db/src/clickhouse/client.ts (2)
  • formatClickhouseDate (241-252)
  • error (38-43)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/trpc/src/root.ts (1)
packages/trpc/src/routers/import.ts (1)
  • importRouter (11-178)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (3)
packages/importer/src/providers/metadata.ts (1)
  • IMPORT_PROVIDERS (13-30)
apps/public/components/ui/tooltip.tsx (1)
  • Tooltiper (36-51)
apps/start/src/components/integrations/integration-card.tsx (3)
  • IntegrationCard (91-112)
  • IntegrationCardLogoImage (50-69)
  • IntegrationCardFooter (3-15)
packages/trpc/src/routers/import.ts (5)
packages/trpc/src/trpc.ts (2)
  • createTRPCRouter (134-134)
  • protectedProcedure (166-170)
packages/trpc/src/errors.ts (1)
  • TRPCAccessError (3-7)
packages/db/src/prisma-client.ts (1)
  • db (276-276)
packages/validation/src/index.ts (1)
  • zCreateImport (478-486)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
packages/importer/src/providers/mixpanel.test.ts (1)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-452)
packages/importer/src/providers/umami.ts (5)
packages/validation/src/index.ts (1)
  • IUmamiImportConfig (456-456)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/importer/src/providers/mixpanel.ts (7)
packages/validation/src/index.ts (1)
  • IMixpanelImportConfig (471-471)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (701-847)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/common/server/parser-user-agent.ts (2)
  • parseUserAgent (72-115)
  • UserAgentInfo (71-71)
packages/common/src/object.ts (2)
  • toDots (16-67)
  • strip (79-79)
apps/start/src/modals/add-import.tsx (3)
packages/validation/src/index.ts (5)
  • zUmamiImportConfig (452-454)
  • zMixpanelImportConfig (461-470)
  • IUmamiImportConfig (456-456)
  • IMixpanelImportConfig (471-471)
  • IImportConfig (473-476)
apps/start/src/components/project-mapper.tsx (1)
  • ProjectMapper (33-116)
apps/public/lib/utils.ts (1)
  • cn (4-6)
packages/db/src/services/import.service.ts (5)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/clickhouse/csv.ts (2)
  • csvEscapeField (42-53)
  • csvEscapeJson (15-38)
packages/db/src/clickhouse/client.ts (5)
  • formatClickhouseDate (241-252)
  • chInsertCSV (206-232)
  • TABLE_NAMES (46-58)
  • ch (129-166)
  • convertClickhouseDateToJs (271-273)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/prisma-client.ts (1)
  • db (276-276)
packages/importer/vitest.config.ts (1)
vitest.shared.ts (1)
  • getSharedVitestConfig (4-27)
🔇 Additional comments (22)
packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1)

1-2: Schema modification is safe and straightforward.

Dropping the NOT NULL constraint on currentStep aligns with flexible import lifecycle handling as part of the broader imports table normalization.

packages/validation/src/index.ts (2)

452-471: Provider config schemas are well-structured.

Each provider config correctly captures its specific requirements:

  • Umami: file-based with project mapping
  • Plausible: simple file-based
  • Mixpanel: API-based with credentials and date range

The schemas properly use literals for provider/type discrimination.


473-476: LGTM!

The union type provides proper TypeScript inference for all import config variants.

packages/common/src/object.ts (1)

35-37: Empty string filtering is intentional and tested behavior—no issues found.

The test file explicitly validates that empty strings are filtered out. In packages/common/src/object.test.ts, the test input includes empty: '', but the expected output does not contain an empty key, confirming this filtering is the designed behavior. This matches the code at lines 35-37, which filters undefined, null, and empty strings. No breaking changes are present—the behavior is tested and consistent across all usages in the codebase.

apps/worker/package.json (1)

22-22: LGTM!

The addition of the @openpanel/importer workspace dependency aligns with the new import subsystem introduced in this PR.

packages/trpc/src/routers/organization.ts (1)

14-14: LGTM!

The import path consolidation to @openpanel/common/server improves module organization without changing functionality.

package.json (1)

12-12: LGTM!

The script update correctly reflects the migration of referrers generation from the worker scope to the common scope, aligning with the new packages/common/scripts/get-referrers.ts script.

packages/db/code-migrations/5-add-imports-table.sql (2)

30-30: Verify nullable imported_at field.

The imported_at field is nullable, which makes sense for pending imports. Ensure that queries filtering by this field handle NULL values appropriately, or consider adding a non-null default for completed imports.


37-37: Verify ordering key matches query patterns.

The table is ordered by (import_id, created_at). Ensure this ordering supports your primary query patterns. If you frequently query by project_id or import_status, consider including those in the ordering key.

Based on learnings.

packages/db/src/clickhouse/csv.ts (3)

5-13: LGTM!

The escape order is correct (backslashes first), and all necessary control characters are properly escaped for ClickHouse Map format.


40-53: LGTM!

The CSV field escaping correctly follows RFC 4180 CSV standard by doubling internal quotes and wrapping fields that contain special characters.


29-29: Code is correct — ClickHouse CSV supports {} as empty Map representation.

The function correctly returns '{}' as ClickHouse CSV accepts {} (optionally quoted as "{}") as the valid empty Map(String, String) representation.

packages/db/src/clickhouse/client.ts (4)

57-57: LGTM!

Adding events_imports table name aligns with the new import subsystem introduced in this PR.


62-62: Timeout increases support long-running imports.

The timeout changes (request_timeout: 300s, idle_socket_ttl: 60s) are appropriate for supporting large CSV import operations.

Also applies to: 65-65


89-127: LGTM!

Exporting withRetry enables reuse across the codebase. The retry logic correctly handles connection errors with exponential backoff while immediately throwing non-retriable errors.


142-143: String values for numeric settings are correct and explicitly supported.

The ClickHouse client documentation confirms that numeric settings like max_insert_block_size and max_http_get_redirects can be specified as strings. The docs show an explicit example passing a UInt64 setting as a string (http_headers_progress_interval_ms: '110000'), and the client handles this appropriately. No changes are needed.

Likely an incorrect or invalid review comment.

packages/common/server/parse-referrer.ts (2)

35-51: Case-insensitive matching is explicitly tested and working as intended.

The test suite includes a dedicated case-insensitive test that validates mixed-case inputs (e.g., 'GoOgLe') correctly resolve to the proper referrer (e.g., 'Google'). The implementation consistently lowercases the source value and falls back gracefully for unknown sources. All three current usages in the codebase (event processing, Umami importer, Mixpanel importer) are compatible with case-insensitive matching for query parameters.


23-23: Review comment is incorrect and should be disregarded.

The parse-referrer.ts code already defaults to empty string ('') when no referrer match is found. This is not a new change—line 23 shows type: match?.type ?? '', which means if there's no match, the type defaults to an empty string.

The 'unknown' value that exists in the referrers data is NOT a default; it's a specific type value assigned to certain known referrers (e.g., Google support pages, Yahoo subdomains). No downstream code checks for 'unknown' as a sentinel value to trigger special handling. The codebase contains zero matches for conditional checks like type === 'unknown'.

Likely an incorrect or invalid review comment.

packages/trpc/src/routers/auth.ts (1)

15-15: LGTM! Import path consolidation.

The import path change from @openpanel/common/server/id to @openpanel/common/server consolidates module imports without affecting behavior.

apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1)

45-45: LGTM! New Imports tab added.

The new tab entry correctly wires to the imports route under project settings.

packages/trpc/src/root.ts (1)

7-7: LGTM! Import router wired correctly.

The new import router is properly integrated into the root TRPC router, following established patterns.

Also applies to: 44-44

apps/start/src/routeTree.gen.ts (1)

59-59: Auto-generated route tree — no review required.

This file is automatically generated by TanStack Router and correctly includes the new imports route declarations.

Also applies to: 385-390, 525-525, 580-580, 645-645, 705-705, 760-760, 824-824, 1210-1216, 1544-1544, 1556-1557

@@ -1 +1 @@
export * from '@openpanel/common/src/math';
export * from '@openpanel/common';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Export only math utilities, not the entire common package.

This file is named math.ts, suggesting it should provide math-related utilities. Exporting everything from @openpanel/common introduces namespace pollution and makes it unclear what this module provides. This could lead to unintended exports and maintenance issues.

Consider one of these approaches:

Option 1: If the package structure supports it, use a specific export path:

-export * from '@openpanel/common';
+export * from '@openpanel/common/math';

Option 2: Explicitly re-export only math-related utilities:

-export * from '@openpanel/common';
+export { round, clamp, percentage, /* other math utilities */ } from '@openpanel/common';

Verify that consumers of this module still have access to all expected math utilities:


🏁 Script executed:

#!/bin/bash
# Find all files importing from this math utility module
rg -n --type=ts --type=tsx "from ['\"].*utils/math['\"]" -A 2 -B 1

# Check what @openpanel/common exports at the root
fd -e ts -e tsx "index.ts" packages/common/ --exec cat {}

Length of output: 162843


Change removes math utilities; will break consumers relying on this module.

The change from '@openpanel/common/src/math' to '@openpanel/common' is a breaking change. The @openpanel/common package index does not export math utilities—it exports only: crypto, profileId, parser-user-agent, parse-referrer, and id. This will break any code importing math functions from apps/start/src/utils/math.ts.

Either restore the specific path or ensure math utilities are added to @openpanel/common's root exports.

🤖 Prompt for AI Agents
In apps/start/src/utils/math.ts around line 1, the export was changed from
'@openpanel/common/src/math' to '@openpanel/common', removing math utilities and
breaking consumers; restore compatibility by reverting to export the specific
math module path (e.g., re-export from '@openpanel/common/src/math') or,
alternatively, add the math utilities to the root exports of @openpanel/common
and update its index so this file can safely export from '@openpanel/common';
ensure the chosen fix is applied consistently and run tests/usages to confirm
imports resolve.

Comment on lines +147 to +154
if (
!providerInstance.validate(
// @ts-expect-error
rawEvent,
)
) {
jobLogger.warn('Skipping invalid event', { rawEvent });
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact invalid events in logs. jobLogger.warn('Skipping invalid event', { rawEvent }) dumps the full raw payload, which can contain PII or customer data. Log only safe metadata (provider, event ID/type, reason) and drop the payload.

🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around lines 147 to 154, the current warning
logs include the full rawEvent payload which may contain PII; change the log to
redact the payload by extracting and logging only safe metadata (provider name,
event id/type if available, and a short reason like "validation failed"). Remove
or omit rawEvent from the logged object, and ensure any fields you include are
explicitly accessed (e.g., providerInstance.name, rawEvent.id, rawEvent.type)
with safe guards so undefined values are handled; keep the log message and
continue behavior unchanged.

Comment on lines +163 to +170
const transformedEvents: IClickhouseEvent[] = eventBatch.map(
(
// @ts-expect-error
event,
) => providerInstance!.transformEvent(event),
);

await insertImportBatch(transformedEvents, importId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard batch transform errors. A single transformEvent throw terminates the whole job because the map runs outside any try/catch. Wrap each transform in a guarded loop, skip failures, and only insert successful events; also use a provider-agnostic unknown[] buffer. For example:

-      const eventBatch: any = [];
+      const eventBatch: unknown[] = [];-          const transformedEvents: IClickhouseEvent[] = eventBatch.map(
-            (event) => providerInstance!.transformEvent(event),
-          );
+          const transformedEvents: IClickhouseEvent[] = [];
+          for (const event of eventBatch) {
+            try {
+              transformedEvents.push(
+                providerInstance!.transformEvent(event as never),
+              );
+            } catch (error) {
+              jobLogger.warn('Skipping event on transform failure', {
+                provider: providerInstance.provider,
+                reason: error instanceof Error ? error.message : String(error),
+              });
+            }
+          }

Apply the same guard for the final partial batch.

Also applies to: 191-198

🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around lines 163-170 (and similarly 191-198),
the current code maps eventBatch through providerInstance.transformEvent outside
any try/catch so a single throw aborts the whole job; change this to a guarded
per-event loop using an intermediary provider-agnostic buffer (unknown[]),
iterate events with a try/catch around each transform, push only successful
transformed events into the buffer, optionally log or count failures, and call
insertImportBatch(buffer, importId) only when the buffer is non-empty; repeat
the same guarded logic for the final partial batch at the later lines.

providerInstance.shouldGenerateSessionIds()
) {
await whileBounds(resumeGeneratingSessionIdsFrom, async (from) => {
console.log('Generating session IDs', { from });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use jobLogger instead of console.log. Switch this message to jobLogger.info('Generating session IDs', { from }) to retain context (importId, provider) and stay consistent with structured logging.

🤖 Prompt for AI Agents
In apps/worker/src/jobs/import.ts around line 225, replace the console.log call
with a structured jobLogger.info call: change console.log('Generating session
IDs', { from }) to jobLogger.info('Generating session IDs', { from, importId,
provider }) so the message uses the jobLogger and includes contextual fields
(importId, provider) for consistent structured logging.

Comment on lines +91 to +93
} catch (e) {
console.log(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Error handling silently swallows failures.

The catch block only logs errors without throwing or exiting with a non-zero code. This may cause silent failures in CI/CD pipelines.

   } catch (e) {
     console.log(e);
+    process.exit(1);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (e) {
console.log(e);
}
} catch (e) {
console.log(e);
process.exit(1);
}
🤖 Prompt for AI Agents
In packages/common/scripts/get-referrers.ts around lines 91 to 93, the catch
block only logs the error with console.log which silently swallows failures;
change it to log to console.error and either rethrow the error or call
process.exit(1) so the process fails in CI; ensure any async contexts properly
propagate the error (throw or reject) and include the caught error message/stack
in the log for debugging.

Comment on lines +202 to +279
const userAgent = props.osVersion || '';
const uaInfo = this.isWebEvent(props.mp_lib)
? parseUserAgent(userAgent, props)
: this.parseServerDeviceInfo(props);

// Map event name - $mp_web_page_view should be screen_view
let eventName = rawEvent.event;
if (eventName === '$mp_web_page_view') {
eventName = 'screen_view';
}

// Build properties object - strip Mixpanel-specific properties
const properties = this.stripMixpanelProperties(props, query);

if (props.$insert_id) {
properties.__source_insert_id = String(props.$insert_id);
}
// Add useful properties
if (props.$screen_width && props.$screen_height) {
properties.__screen = `${props.$screen_width}x${props.$screen_height}`;
}
if (props.$screen_dpi) {
properties.__dpi = props.$screen_dpi;
}
if (props.$language) {
properties.__language = props.$language;
}
if (props.$timezone) {
properties.__timezone = props.$timezone;
}
if (props.$app_version) {
properties.__version = props.$app_version;
}
if (props.$app_build_number) {
properties.__buildNumber = props.$app_build_number;
}
if (props.$lib_version) {
properties.__lib_version = props.$lib_version;
}

if (hash) {
properties.__hash = hash;
}

if (Object.keys(query).length > 0) {
properties.__query = query;
}

if (props.current_page_title) {
properties.__title = props.current_page_title;
}

if (userAgent) {
properties.__userAgent = userAgent;
}

// Always use UUID for id to match ClickHouse UUID column
const event = {
id: randomUUID(),
name: eventName,
device_id: deviceId,
profile_id: profileId,
project_id: projectId,
session_id: '', // Will be generated in SQL after import
properties: toDots(properties), // Flatten nested objects/arrays to Map(String, String)
created_at: new Date(props.time * 1000).toISOString(),
country,
city,
region,
longitude: null,
latitude: null,
os: uaInfo.os || props.$os,
os_version: uaInfo.osVersion || props.$osVersion,
browser: uaInfo.browser || props.$browser,
browser_version:
uaInfo.browserVersion || props.$browserVersion
? String(props.$browser_version)
: '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pull user agent and version fields from the correct properties. Using props.osVersion for userAgent, props.$osVersion (camelCase) for OS version, and the ternary uaInfo.browserVersion || props.$browserVersion ? String(props.$browser_version) : '' means:

  • UA parsing runs on the OS version string.
  • $os_version (Mixpanel default) is ignored, so OS version stays empty.
  • Browser version resolves to 'undefined' or '' even when $browser_version exists.

Read from $user_agent, $os_version, $browser_version, and order the fallbacks explicitly:

const userAgent = props.$user_agent || '';
const osVersion =
  uaInfo.osVersion || props.$os_version || props.$osVersion || '';
const browserVersion =
  uaInfo.browserVersion ||
  props.$browser_version ||
  props.$browserVersion ||
  '';

Coerce the final strings with String(...). This keeps device metadata accurate.

🤖 Prompt for AI Agents
In packages/importer/src/providers/mixpanel.ts around lines 202 to 279, the code
uses the wrong properties for user agent and version fields (props.osVersion,
props.$osVersion, and a faulty ternary for browser_version), causing UA parsing
and version values to be incorrect; update to read userAgent from
props.$user_agent, compute osVersion and browserVersion with explicit fallbacks
like: uaInfo.osVersion || props.$os_version || props.$osVersion || '' and
uaInfo.browserVersion || props.$browser_version || props.$browserVersion || '',
coerce final values with String(...), pass the correct userAgent into the UA
parsing call (so parsing runs on $user_agent), and assign the computed
os_version and browser_version values into the event object instead of the
current faulty expressions.

Comment on lines +79 to +197
private async *parseRemoteFile(
url: string,
opts: {
signal?: AbortSignal;
maxBytes?: number;
maxRows?: number;
} = {},
): AsyncGenerator<UmamiRawEvent, void, unknown> {
const { signal, maxBytes, maxRows } = opts;
const controller = new AbortController();

// Link to caller's signal for cancellation
if (signal) {
signal.addEventListener('abort', () => controller.abort(), {
once: true,
});
}

const res = await fetch(url, { signal: controller.signal });
if (!res.ok || !res.body) {
throw new Error(
`Failed to fetch remote file: ${res.status} ${res.statusText}`,
);
}

const contentType = res.headers.get('content-type') || '';
const contentEnc = res.headers.get('content-encoding') || '';
const contentLen = Number(res.headers.get('content-length') ?? 0);

if (
contentType &&
!/text\/csv|text\/plain|application\/gzip|application\/octet-stream/i.test(
contentType,
)
) {
console.warn(`Warning: Content-Type is ${contentType}, expected CSV-ish`);
}

if (maxBytes && contentLen && contentLen > maxBytes) {
throw new Error(
`Remote file exceeds size limit (${contentLen} > ${maxBytes})`,
);
}

const looksGzip =
/\.gz($|\?)/i.test(url) ||
/gzip/i.test(contentEnc) ||
/application\/gzip/i.test(contentType);
const looksBr = /br/i.test(contentEnc) || /\.br($|\?)/i.test(url);

// WHATWG -> Node stream
const body = Readable.fromWeb(res.body as any);

// Optional size guard during stream
let seenBytes = 0;
if (maxBytes) {
body.on('data', (chunk: Buffer) => {
seenBytes += chunk.length;
if (seenBytes > maxBytes) {
controller.abort();
body.destroy(
new Error(
`Stream exceeded size limit (${seenBytes} > ${maxBytes})`,
),
);
}
});
}

// Build decode chain (gzip/brotli -> CSV parser)
const decompress = looksGzip
? createGunzip()
: looksBr
? createBrotliDecompress()
: null;

const parser = parse({
columns: true, // objects per row
bom: true, // handle UTF-8 BOM
relax_column_count: true,
skip_empty_lines: true,
});

// Wire the pipeline for proper backpressure & error propagation
(async () => {
try {
if (decompress) {
await pipeline(body, decompress, parser, {
signal: controller.signal,
});
} else {
await pipeline(body, parser, { signal: controller.signal });
}
} catch (e) {
parser.destroy(e as Error);
}
})().catch(() => {
/* handled by iterator */
});

let rows = 0;
try {
for await (const record of parser) {
rows++;
if (maxRows && rows > maxRows) {
controller.abort();
throw new Error(`Row limit exceeded (${rows} > ${maxRows})`);
}
yield record as UmamiRawEvent;
}
} catch (err) {
throw new Error(
`Failed to parse remote file from ${url}: ${
err instanceof Error ? err.message : String(err)
}`,
);
} finally {
controller.abort(); // ensure fetch stream is torn down
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add SSRF defenses before fetching user URLs. parseRemoteFile accepts arbitrary fileUrl and streams it without DNS/IP validation. Attackers can pivot to internal networks or metadata services. Resolve the hostname, block private/link-local/loopback ranges (10/8, 172.16/12, 192.168/16, 127/8, ::1, fc00::/7, fe80::/10), and recheck after redirects—alternatively require signed/allowlisted URLs. Abort the fetch if any resolved address is disallowed.

Comment on lines +213 to +220
const deviceId =
rawEvent.visit_id ||
generateDeviceId({
ip: rawEvent.visit_id!,
ua: rawEvent.visit_id!,
origin: projectId,
salt: 'xxx',
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix device ID fallback. Falling back to generateDeviceId with rawEvent.visit_id! for both ip and ua breaks when visit_id is absent (runtime undefined) and produces unstable identifiers even when present. Prefer existing identifiers: visit_id, distinct_id, session_id, or event_id, and only hash real IP/UA values if you have them.

Comment on lines +223 to +233
// Parse URL if available - use same logic as real-time events
const url = rawEvent.url_path
? `https://${[rawEvent.hostname, rawEvent.url_path, rawEvent.url_query]
.filter(Boolean)
.join('')}`
: '';
const { path, hash, query, origin } = parsePath(url);
// Extract referrer information - use same logic as real-time events
const referrerUrl = rawEvent.referrer_domain
? `https://${rawEvent.referrer_domain}${rawEvent.referrer_path || ''}`
: '';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden URL assembly. Building url via https://${[hostname, path, query].join('')} omits the leading ? and produces invalid strings when hostname is missing. Require hostname before creating an absolute URL, preserve the leading slash on url_path, and normalize url_query so it starts with ?. Otherwise hand parsePath the relative path.

🤖 Prompt for AI Agents
In packages/importer/src/providers/umami.ts around lines 223–233, the URL
assembly is fragile: only build an absolute URL when rawEvent.hostname exists;
preserve a leading slash on rawEvent.url_path (prepend '/' if non-empty and
missing), normalize rawEvent.url_query so it either is empty or starts with '?'
(strip any leading '?' then prefix one), then assemble origin as
`https://${hostname}` and join origin + path + query to pass to parsePath; if
hostname is missing, pass a relative path to parsePath by combining the
normalized path and query (no origin). Apply the same hostname-check and path
normalization for the referrer (require rawEvent.referrer_domain, ensure
referrer_path starts with '/', normalize any referrer query similarly) so
parsePath always receives a well-formed absolute URL or a correctly formed
relative path.

Comment on lines +257 to +296
let properties: Record<string, any> = {};

if (query) {
properties.__query = query;
}

// Add useful properties from Umami data
if (rawEvent.page_title) properties.__title = rawEvent.page_title;
if (rawEvent.screen) properties.__screen = rawEvent.screen;
if (rawEvent.language) properties.__language = rawEvent.language;
if (rawEvent.utm_source)
properties = assocPath(
['__query', 'utm_source'],
rawEvent.utm_source,
properties,
);
if (rawEvent.utm_medium)
properties = assocPath(
['__query', 'utm_medium'],
rawEvent.utm_medium,
properties,
);
if (rawEvent.utm_campaign)
properties = assocPath(
['__query', 'utm_campaign'],
rawEvent.utm_campaign,
properties,
);
if (rawEvent.utm_content)
properties = assocPath(
['__query', 'utm_content'],
rawEvent.utm_content,
properties,
);
if (rawEvent.utm_term)
properties = assocPath(
['__query', 'utm_term'],
rawEvent.utm_term,
properties,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Flatten properties instead of nesting objects. Storing properties.__query = query plus assocPath(['__query', …]) leaves nested objects. They serialize to [object Object] in ClickHouse, losing UTM data. Flatten the fields (e.g., properties[__query.${key}] = value) and keep all values primitive so insertImportBatch emits valid Map(String,String) rows.

🤖 Prompt for AI Agents
In packages/importer/src/providers/umami.ts around lines 257 to 296, the code
currently sets properties.__query = query and uses assocPath(['__query', ...])
which creates nested objects that serialize poorly for ClickHouse; replace this
by flattening query and UTM fields into primitive map entries (e.g. for each key
in query set properties[`__query.${key}`] = String(value)), remove assocPath and
any assignment that creates an object at properties.__query, and ensure all
assigned values are primitives (strings or numbers) so insertImportBatch emits
valid Map(String,String) rows.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
packages/trpc/src/routers/import.ts (4)

12-32: Config field contains sensitive credentials — restrict access.

The list procedure returns the full import records, including the config field, which likely contains provider credentials. This exposes secrets to any project member with read access.

The past review comment at lines 12-32 provides detailed remediation options (redact for read-level users or use Prisma select to exclude the field conditionally).


34-56: Config field exposure in get — apply same restrictions as list.

The get procedure returns the full import record, including sensitive config data, to any user with project access.

Refer to the past review comment at lines 34-57 for detailed remediation (conditional select/redaction based on role).


72-99: Wrap queue enqueue in try/catch to avoid orphaned pending records.

If importQueue.add fails, the import record remains in 'pending' status but no job exists in the queue. This leaves a dangling record that will never complete.

The past review comment at lines 72-99 provides a detailed fix with error handling, stable jobId, and retry configuration.


160-176: Wrap retry enqueue in try/catch.

Same issue as create: if importQueue.add fails, the import record will be updated to 'pending' but no job will exist, leaving it in a stuck state.

The past review comment at lines 160-176 provides a detailed fix with error handling.

🧹 Nitpick comments (2)
packages/trpc/src/routers/import.ts (2)

121-126: Consider wrapping job removal in try/catch.

If job.remove() fails (e.g., job not found or already completed), the entire delete operation will fail and the import record will remain in the database. Depending on requirements, you may want to proceed with record deletion even if job cleanup fails.

       if (importRecord.jobId) {
-        const job = await importQueue.getJob(importRecord.jobId);
-        if (job) {
-          await job.remove();
-        }
+        try {
+          const job = await importQueue.getJob(importRecord.jobId);
+          if (job) {
+            await job.remove();
+          }
+        } catch (err) {
+          // Log but don't block deletion if job cleanup fails
+          console.error('Failed to remove job:', err);
+        }
       }

66-66: Consider extracting repeated write-access check.

The pattern !access || (typeof access !== 'boolean' && access.level === 'read') is repeated in create, delete, and retry procedures (lines 66, 115, 149). Consider extracting to a helper function.

function canWrite(access: Awaited<ReturnType<typeof getProjectAccess>>): boolean {
  return !!access && (typeof access === 'boolean' || access.level !== 'read');
}

Then use:

-      if (!access || (typeof access !== 'boolean' && access.level === 'read')) {
+      if (!canWrite(access)) {
         throw TRPCAccessError(
           'You do not have permission to create imports for this project',
         );
       }

Also applies to: 115-115, 149-149

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2808bdb and 2f71c57.

📒 Files selected for processing (1)
  • packages/trpc/src/routers/import.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/trpc/src/routers/import.ts (5)
packages/trpc/src/trpc.ts (2)
  • createTRPCRouter (134-134)
  • protectedProcedure (166-170)
packages/trpc/src/errors.ts (1)
  • TRPCAccessError (3-7)
packages/db/src/prisma-client.ts (1)
  • db (276-276)
packages/validation/src/index.ts (1)
  • zCreateImport (478-486)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-push-dashboard
  • GitHub Check: build-and-push-api
  • GitHub Check: build-and-push-worker
🔇 Additional comments (1)
packages/trpc/src/routers/import.ts (1)

15-15: Review comment is based on incorrect assumptions.

The getProjectAccess function is already cached. It's defined as export const getProjectAccess = cacheable('getProjectAccess', ...) in packages/db/src/services/access.service.ts. No separate getProjectAccessCached variant exists. The code is already using the cached version.

Likely an incorrect or invalid review comment.

Comment on lines +155 to +158
// Only allow retry for failed imports
if (importRecord.status !== 'failed') {
throw new Error('Only failed imports can be retried');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use TRPCError for consistency.

The error thrown here uses a generic Error, while other access-related errors in this file use TRPCError. Use TRPCError with an appropriate code for consistency.

       // Only allow retry for failed imports
       if (importRecord.status !== 'failed') {
-        throw new Error('Only failed imports can be retried');
+        throw new TRPCError({
+          code: 'BAD_REQUEST',
+          message: 'Only failed imports can be retried',
+        });
       }

You'll need to add TRPCError to the imports:

-import { TRPCAccessError } from '../errors';
+import { TRPCError } from '@trpc/server';
+import { TRPCAccessError } from '../errors';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Only allow retry for failed imports
if (importRecord.status !== 'failed') {
throw new Error('Only failed imports can be retried');
}
// Only allow retry for failed imports
if (importRecord.status !== 'failed') {
throw new TRPCError({
code: 'BAD_REQUEST',
message: 'Only failed imports can be retried',
});
}
Suggested change
// Only allow retry for failed imports
if (importRecord.status !== 'failed') {
throw new Error('Only failed imports can be retried');
}
import { TRPCError } from '@trpc/server';
import { TRPCAccessError } from '../errors';
🤖 Prompt for AI Agents
In packages/trpc/src/routers/import.ts around lines 155 to 158, replace the
generic throw new Error('Only failed imports can be retried') with a TRPCError
so it matches other access-related errors; add an import for TRPCError from
'@trpc/server' at the top and throw new TRPCError({ code: 'BAD_REQUEST',
message: 'Only failed imports can be retried' }) instead.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (27)
packages/common/scripts/get-referrers.ts (2)

74-74: Fix incorrect output path.

This issue was already identified in a previous review. The path ../../worker/src/referrers/index.ts from packages/common/scripts/ doesn't match the expected location referenced in package.json scripts.


91-93: Error handling silently swallows failures.

This issue was already identified in a previous review. The catch block only logs errors without throwing or exiting with a non-zero code, which may cause silent failures in CI/CD pipelines.

packages/common/src/object.ts (1)

3-14: Previous review concern: misleading function names.

As flagged in the earlier review, these function names suggest full JSON validation but only check bracket/brace matching. A string like "{not valid json" would incorrectly pass isValidJsonString despite being invalid JSON.

packages/validation/src/index.ts (1)

478-486: Discriminate by provider to prevent mismatched payloads.

This issue was previously flagged but remains unresolved. The current schema allows provider and config to disagree (e.g., provider: 'umami' with a Mixpanel config), which can lead to runtime errors. Use a discriminated union on provider to ensure the config shape matches the selected provider.

Apply the suggested fix from the previous review:

-export const zCreateImport = z.object({
-  projectId: z.string().min(1),
-  provider: z.enum(['umami', 'plausible', 'mixpanel']),
-  config: z.union([
-    zUmamiImportConfig,
-    zPlausibleImportConfig,
-    zMixpanelImportConfig,
-  ]),
-});
+export const zCreateImport = z.discriminatedUnion('provider', [
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('umami'),
+    config: zUmamiImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('plausible'),
+    config: zPlausibleImportConfig,
+  }),
+  z.object({
+    projectId: z.string().min(1),
+    provider: z.literal('mixpanel'),
+    config: zMixpanelImportConfig,
+  }),
+]);
apps/start/src/utils/math.ts (1)

1-1: Duplicate: Breaking change already flagged.

This issue was already identified in a previous review. The export path change from '@openpanel/common/src/math' to '@openpanel/common' removes math utilities and breaks consumers.

apps/start/src/utils/slug.ts (1)

1-1: Duplicate: Namespace pollution already flagged.

This issue was already identified in a previous review. The file exports the entire @openpanel/common package when it should only export the slug utility.

packages/db/prisma/schema.prisma (1)

472-498: Duplicate: Schema-migration inconsistency already flagged.

The inconsistency between currentStep being nullable in the schema (String?) but NOT NULL in the migration was already identified in a previous review.

packages/importer/package.json (2)

5-5: Duplicate: Main entry point issue already flagged.

The issue of main pointing to source TypeScript instead of compiled output was already identified in a previous review.


21-22: Duplicate: Ramda version mismatch already flagged.

The version mismatch between ramda runtime (^0.29.1) and @types/ramda (^0.31.1) was already identified in a previous review.

packages/db/src/services/import.service.ts (3)

272-272: Missing explicit import for randomUUID.

Lines 272 and 309 use crypto.randomUUID() without importing from node:crypto. While Node provides a global crypto object, explicit imports are clearer and match the Umami provider's pattern.

Add at the top:

 import type { ILogger } from '@openpanel/logger';
+import { randomUUID } from 'node:crypto';
 import sqlstring from 'sqlstring';

Then update usage:

-  id: crypto.randomUUID(),
+  id: randomUUID(),

Also applies to: Line 309


269-341: Avoid whole-second shifts for synthetic events.

Shifting created_at by 1000 ms (session_start) or 500 ms (session_end) risks colliding with real events, scrambling chronological order. Also note: Line 307 comment says "1 second after" but code uses 500ms—inconsistent.

Use millisecond offsets (±1 ms) to reduce collision probability:

- created_at: adjustTimestamp(session.first_timestamp, -1000), // 1 second before first event
+ created_at: adjustTimestamp(session.first_timestamp, -1), // 1ms before first event
- created_at: adjustTimestamp(session.last_timestamp, 500), // 1 second after last event
+ created_at: adjustTimestamp(session.last_timestamp, 1), // 1ms after last event

602-652: Derive status deterministically.

Using any(import_status) returns an arbitrary value when rows contain mixed statuses (e.g., some pending, some processed), making progress reporting unreliable.

Compute status explicitly using COUNTIf:

  SELECT 
    import_id,
    COUNT(*) as total_events,
    COUNTIf(import_status = 'pending') as pending_events,
    COUNTIf(import_status = 'processed') as processed_events,
-   any(import_status) as status
+   CASE
+     WHEN COUNTIf(import_status = 'failed') > 0 THEN 'failed'
+     WHEN COUNTIf(import_status = 'processed') = COUNT(*) THEN 'processed'
+     WHEN COUNTIf(import_status = 'pending') > 0 THEN 'processing'
+     ELSE 'pending'
+   END as status
  FROM ${TABLE_NAMES.events_imports}
  WHERE import_id = {importId:String} 
  AND name NOT IN ('session_start', 'session_end')
  GROUP BY import_id
packages/importer/src/providers/umami.ts (4)

79-198: SSRF vulnerability: arbitrary URLs fetched without validation.

The parseRemoteFile method fetches user-provided URLs without DNS/IP validation, allowing attackers to probe internal networks, metadata services (e.g., 169.254.169.254), or localhost endpoints.

Before calling fetch, resolve the hostname to its IP addresses and reject private/loopback/link-local ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, ::1, fc00::/7, fe80::/10). Revalidate after redirects. Alternatively, require signed/allowlisted URLs or proxy downloads through a controlled service.


213-221: Fix device ID fallback logic.

Using rawEvent.visit_id! for both ip and ua parameters is incorrect and will throw when visit_id is undefined. This produces unstable/invalid device IDs.

Prefer a stable fallback hierarchy:

 const deviceId =
   rawEvent.visit_id ||
-  generateDeviceId({
-    ip: rawEvent.visit_id!,
-    ua: rawEvent.visit_id!,
-    origin: projectId,
-    salt: 'xxx',
-  });
+  rawEvent.distinct_id ||
+  rawEvent.session_id ||
+  rawEvent.event_id ||
+  randomUUID();

224-233: Harden URL and referrer construction.

URL assembly omits the leading ? for queries and can produce invalid strings when hostname is missing. Referrer construction ignores referrer_query entirely.

Apply proper URL normalization:

-const url = rawEvent.url_path
-  ? `https://${[rawEvent.hostname, rawEvent.url_path, rawEvent.url_query]
-      .filter(Boolean)
-      .join('')}`
-  : '';
+let url = '';
+if (rawEvent.hostname && rawEvent.url_path) {
+  const q = rawEvent.url_query
+    ? rawEvent.url_query.startsWith('?')
+      ? rawEvent.url_query
+      : `?${rawEvent.url_query}`
+    : '';
+  url = `https://${rawEvent.hostname}${rawEvent.url_path}${q}`;
+}
+
-const referrerUrl = rawEvent.referrer_domain
-  ? `https://${rawEvent.referrer_domain}${rawEvent.referrer_path || ''}`
-  : '';
+const refQuery = rawEvent.referrer_query
+  ? rawEvent.referrer_query.startsWith('?')
+    ? rawEvent.referrer_query
+    : `?${rawEvent.referrer_query}`
+  : '';
+const referrerUrl = rawEvent.referrer_domain
+  ? `https://${rawEvent.referrer_domain}${rawEvent.referrer_path || ''}${refQuery}`
+  : '';

257-297: Flatten properties instead of nesting objects.

Assigning properties.__query = query and using assocPath(['__query', ...]) creates nested objects that serialize to [object Object] in ClickHouse, losing all UTM and query data.

Flatten all fields to primitives:

-let properties: Record<string, any> = {};
-
-if (query) {
-  properties.__query = query;
-}
+const properties: Record<string, string | number | boolean | null | undefined> = {};
+if (query) {
+  for (const [k, v] of Object.entries(query)) {
+    properties[`__query.${k}`] = v as string;
+  }
+}
 
 if (rawEvent.page_title) properties.__title = rawEvent.page_title;
 if (rawEvent.screen) properties.__screen = rawEvent.screen;
 if (rawEvent.language) properties.__language = rawEvent.language;
-if (rawEvent.utm_source)
-  properties = assocPath(['__query', 'utm_source'], rawEvent.utm_source, properties);
-if (rawEvent.utm_medium)
-  properties = assocPath(['__query', 'utm_medium'], rawEvent.utm_medium, properties);
-if (rawEvent.utm_campaign)
-  properties = assocPath(['__query', 'utm_campaign'], rawEvent.utm_campaign, properties);
-if (rawEvent.utm_content)
-  properties = assocPath(['__query', 'utm_content'], rawEvent.utm_content, properties);
-if (rawEvent.utm_term)
-  properties = assocPath(['__query', 'utm_term'], rawEvent.utm_term, properties);
+if (rawEvent.utm_source) properties['__query.utm_source'] = rawEvent.utm_source;
+if (rawEvent.utm_medium) properties['__query.utm_medium'] = rawEvent.utm_medium;
+if (rawEvent.utm_campaign) properties['__query.utm_campaign'] = rawEvent.utm_campaign;
+if (rawEvent.utm_content) properties['__query.utm_content'] = rawEvent.utm_content;
+if (rawEvent.utm_term) properties['__query.utm_term'] = rawEvent.utm_term;
packages/db/prisma/migrations/20251018214030_fix/migration.sql (1)

10-12: Migration drops columns without data migration.

The migration drops provider, sourceLocation, and sourceType without migrating their data to the config JSON field. The import job handler (apps/worker/src/jobs/import.ts) expects config.provider to exist when processing imports, so existing records will fail.

Add a data migration step before dropping:

-- Migrate data to config field
UPDATE "public"."imports" 
SET "config" = jsonb_build_object(
  'provider', provider,
  'sourceType', "sourceType",
  'sourceLocation', "sourceLocation"
) || COALESCE("config", '{}'::jsonb)
WHERE provider IS NOT NULL OR "sourceType" IS NOT NULL OR "sourceLocation" IS NOT NULL;

-- Then drop the columns
ALTER TABLE "public"."imports" DROP COLUMN "provider",
DROP COLUMN "sourceLocation",
DROP COLUMN "sourceType",
ALTER COLUMN "config" DROP DEFAULT;
packages/importer/src/providers/mixpanel.ts (4)

134-134: Duplicate: Use configured logger instead of console.warn.

Lines 134 and 146 use console.warn, bypassing the structured logger. Replace with this.logger?.warn('Failed to parse Mixpanel event', { line, error }) for consistent logging.

Also applies to: 146-146


85-91: Duplicate: Sensitive credentials logged.

Line 90 logs the serviceAccount identifier. Remove it from the log metadata or replace with a boolean flag (e.g., serviceAccountPresent: true).


202-205: Duplicate: Incorrect user agent assignment.

Line 202 assigns props.osVersion to userAgent, but OS version is not the user agent string. Use props.$user_agent or a similar property that contains the actual UA string.


273-279: Duplicate: Inconsistent Mixpanel property names cause data loss.

Lines 274 and 276-279 use camelCase (props.$osVersion, props.$browserVersion) but Mixpanel's standard properties are snake_case ($os_version, $browser_version). Update to use the correct snake_case names and ensure the ternary condition matches the accessed property.

apps/start/src/modals/add-import.tsx (1)

245-249: Duplicate: Avoid leaking secrets in error paths/logs.

The error toast displays error.message which may expose upstream API details or secrets. Use a generic user-facing message and log the full error server-side only.

apps/worker/src/jobs/import.ts (5)

44-47: Duplicate: Logger context leaks config secrets.

The jobLogger child attaches the entire record.config which may contain sensitive credentials like serviceAccount and serviceSecret. Include only non-sensitive fields (e.g., provider, type).


142-188: Duplicate: Harden batch transform; avoid crashes on empty/failed transforms.

The batch buffer uses any and transformEvent is called without per-event error handling. A single transform failure will crash the entire job. Use a provider-agnostic unknown[] buffer and wrap each transform in try/catch.


147-154: Duplicate: Redact invalid events in logs.

Logging the full rawEvent object may expose PII or customer data. Log only safe metadata (provider, event ID/type, validation reason) instead of the complete payload.


225-225: Duplicate: Use jobLogger instead of console.log.

Switch to jobLogger.info('Generating session IDs', { from, importId, provider }) to maintain structured logging with context.


191-216: Duplicate: Repeat harden final partial batch.

The final partial batch also needs guarded per-event transforms and safe createdAt derivation to mirror the main batch processing logic.

🧹 Nitpick comments (5)
packages/common/scripts/get-referrers.ts (2)

10-47: Consider adding an explicit type annotation.

Adding a type annotation to extraReferrers would improve type safety and maintainability.

Apply this diff to add a type annotation:

+type ReferrerEntry = { type: string; name: string };
+
 // extras
-const extraReferrers = {
+const extraReferrers: Record<string, ReferrerEntry> = {
   'zoom.us': { type: 'social', name: 'Zoom' },

49-64: Replace any type with unknown for better type safety.

The data parameter uses any, which bypasses type checking. Using unknown would maintain safety while acknowledging the dynamic nature of the fetched data.

Apply this diff:

-function transform(data: any) {
+function transform(data: unknown) {
   const obj: Record<string, unknown> = {};
+  if (typeof data !== 'object' || data === null) {
+    return obj;
+  }
   for (const type in data) {
apps/start/src/utils/op.ts (1)

5-18: Consider adding debug logging for missing clientId.

The factory pattern with a Proxy fallback is a solid approach. However, when the clientId is missing or invalid, there's no indication in logs, which could make debugging difficult in production environments.

Additionally, the check for clientId === 'undefined' (string comparison) suggests that environment variables might be serialized as the string "undefined" rather than being truly undefined. This is a minor code smell that might warrant investigation at the environment configuration level.

Consider adding logging:

 const createOpInstance = () => {
   if (!clientId || clientId === 'undefined') {
+    console.warn('OpenPanel clientId is missing or invalid. Using no-op implementation.');
     return new Proxy({} as OpenPanel, {
       get: () => () => {},
     });
   }
packages/db/src/services/import.service.ts (1)

553-554: Inconsistent parameter handling with sqlstring.escape.

Lines 553-554 use sqlstring.escape() for inline substitution, while the rest of the file uses ClickHouse's query_params mechanism (e.g., lines 122, 449, 583). This inconsistency reduces maintainability.

Refactor to use query_params for consistency:

  WHERE 
-   e.import_id = ${sqlstring.escape(importId)}
-   AND toDate(e.created_at) = ${sqlstring.escape(from)}
+   e.import_id = {importId:String}
+   AND toDate(e.created_at) = {from:String}
    AND e.session_id != ''
  GROUP BY e.session_id
`;

await ch.command({
  query: sessionsInsertQuery,
+ query_params: { importId, from },
  clickhouse_settings: {

Based on learnings (ClickHouse queries should use query builder/helper functions).

packages/trpc/src/routers/import.ts (1)

155-177: Retry flow should use TRPCError and catch enqueue failures.

retry should return a structured TRPC error (e.g., TRPCError({ code: 'BAD_REQUEST', ... })) instead of new Error, and it needs the same enqueue error handling as create to avoid leaving the UI thinking the retry succeeded when the queue add actually failed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f71c57 and b86c7a4.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (76)
  • apps/api/package.json (1 hunks)
  • apps/start/Dockerfile (2 hunks)
  • apps/start/package.json (2 hunks)
  • apps/start/src/components/integrations/integration-card.tsx (1 hunks)
  • apps/start/src/components/project-mapper.tsx (1 hunks)
  • apps/start/src/components/ui/badge.tsx (1 hunks)
  • apps/start/src/modals/add-import.tsx (1 hunks)
  • apps/start/src/modals/date-ranger-picker.tsx (1 hunks)
  • apps/start/src/modals/index.tsx (2 hunks)
  • apps/start/src/routeTree.gen.ts (11 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (1 hunks)
  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.tsx (1 hunks)
  • apps/start/src/utils/math.ts (1 hunks)
  • apps/start/src/utils/op.ts (1 hunks)
  • apps/start/src/utils/slug.ts (1 hunks)
  • apps/worker/Dockerfile (2 hunks)
  • apps/worker/package.json (3 hunks)
  • apps/worker/src/boot-workers.ts (4 hunks)
  • apps/worker/src/index.ts (2 hunks)
  • apps/worker/src/jobs/cron.delete-projects.ts (1 hunks)
  • apps/worker/src/jobs/events.incoming-event.ts (3 hunks)
  • apps/worker/src/jobs/events.incoming-events.test.ts (2 hunks)
  • apps/worker/src/jobs/import.ts (1 hunks)
  • package.json (1 hunks)
  • packages/cli/package.json (0 hunks)
  • packages/cli/src/cli.ts (0 hunks)
  • packages/cli/src/importer/importer.ts (0 hunks)
  • packages/cli/src/importer/index.ts (0 hunks)
  • packages/cli/tsconfig.json (0 hunks)
  • packages/cli/tsup.config.ts (0 hunks)
  • packages/common/package.json (1 hunks)
  • packages/common/scripts/get-referrers.ts (1 hunks)
  • packages/common/server/index.ts (1 hunks)
  • packages/common/server/parse-referrer.test.ts (5 hunks)
  • packages/common/server/parse-referrer.ts (4 hunks)
  • packages/common/server/parser-user-agent.ts (2 hunks)
  • packages/common/src/object.test.ts (1 hunks)
  • packages/common/src/object.ts (2 hunks)
  • packages/db/code-migrations/5-add-imports-table.ts (1 hunks)
  • packages/db/index.ts (2 hunks)
  • packages/db/package.json (1 hunks)
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251018214030_fix/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql (1 hunks)
  • packages/db/prisma/schema.prisma (2 hunks)
  • packages/db/src/buffers/base-buffer.ts (1 hunks)
  • packages/db/src/buffers/event-buffer.test.ts (8 hunks)
  • packages/db/src/clickhouse/client.ts (6 hunks)
  • packages/db/src/clickhouse/csv.ts (1 hunks)
  • packages/db/src/clickhouse/migration.ts (1 hunks)
  • packages/db/src/services/event.service.ts (1 hunks)
  • packages/db/src/services/import.service.ts (1 hunks)
  • packages/db/src/services/session.service.ts (1 hunks)
  • packages/db/src/types.ts (2 hunks)
  • packages/importer/package.json (1 hunks)
  • packages/importer/src/base-provider.ts (1 hunks)
  • packages/importer/src/index.ts (1 hunks)
  • packages/importer/src/providers/metadata.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.test.ts (1 hunks)
  • packages/importer/src/providers/mixpanel.ts (1 hunks)
  • packages/importer/src/providers/umami.ts (1 hunks)
  • packages/importer/src/types.ts (1 hunks)
  • packages/importer/tsconfig.json (1 hunks)
  • packages/importer/vitest.config.ts (1 hunks)
  • packages/queue/src/queues.ts (1 hunks)
  • packages/redis/package.json (1 hunks)
  • packages/trpc/package.json (1 hunks)
  • packages/trpc/src/root.ts (2 hunks)
  • packages/trpc/src/routers/auth.ts (1 hunks)
  • packages/trpc/src/routers/import.ts (1 hunks)
  • packages/trpc/src/routers/organization.ts (1 hunks)
  • packages/validation/src/index.ts (1 hunks)
💤 Files with no reviewable changes (6)
  • packages/cli/tsup.config.ts
  • packages/cli/src/cli.ts
  • packages/cli/src/importer/index.ts
  • packages/cli/src/importer/importer.ts
  • packages/cli/tsconfig.json
  • packages/cli/package.json
🚧 Files skipped from review as they are similar to previous changes (37)
  • packages/db/src/services/event.service.ts
  • packages/common/package.json
  • packages/importer/tsconfig.json
  • packages/db/prisma/migrations/20251028152531_imports_again/migration.sql
  • packages/trpc/src/routers/auth.ts
  • packages/db/src/services/session.service.ts
  • apps/worker/src/jobs/events.incoming-events.test.ts
  • apps/start/src/components/integrations/integration-card.tsx
  • packages/common/server/index.ts
  • package.json
  • packages/db/src/buffers/event-buffer.test.ts
  • apps/start/src/components/project-mapper.tsx
  • packages/common/server/parser-user-agent.ts
  • packages/db/src/clickhouse/csv.ts
  • packages/db/prisma/migrations/20251022191315_add_status_message/migration.sql
  • packages/db/index.ts
  • apps/start/src/modals/date-ranger-picker.tsx
  • packages/db/code-migrations/5-add-imports-table.ts
  • apps/start/src/routes/app.$organizationId.$projectId.settings._tabs.tsx
  • apps/start/src/components/ui/badge.tsx
  • packages/db/prisma/migrations/20251028150123_fix_imports_table/migration.sql
  • apps/worker/package.json
  • packages/trpc/src/routers/organization.ts
  • apps/start/src/modals/index.tsx
  • packages/redis/package.json
  • apps/worker/src/jobs/cron.delete-projects.ts
  • packages/common/src/object.test.ts
  • packages/db/prisma/migrations/20251018205153_add_import_table/migration.sql
  • apps/start/Dockerfile
  • packages/importer/src/providers/metadata.ts
  • packages/trpc/src/root.ts
  • packages/importer/vitest.config.ts
  • apps/api/package.json
  • packages/importer/src/index.ts
  • packages/db/src/types.ts
  • packages/importer/src/types.ts
  • packages/common/server/parse-referrer.ts
🧰 Additional context used
📓 Path-based instructions (1)
apps/start/**/*.{ts,tsx}

📄 CodeRabbit inference engine (apps/start/.cursorrules)

apps/start/**/*.{ts,tsx}: Instrument server functions created with createServerFn by wrapping their implementation in Sentry.startSpan and importing Sentry from '@sentry/tanstackstart-react'
When using Sentry in the TanStack Start React app, import it as: import * as Sentry from '@sentry/tanstackstart-react'

Files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/utils/slug.ts
  • apps/start/src/utils/op.ts
  • apps/start/src/routeTree.gen.ts
  • apps/start/src/utils/math.ts
  • apps/start/src/modals/add-import.tsx
🧠 Learnings (9)
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*.tsx : Define React route files with createFileRoute and let the bundler/CLI manage the path string

Applied to files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/__root.tsx : Declare the root route in routes/__root.tsx using createRootRoute or createRootRouteWithContext

Applied to files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/_*.tsx : Use leading underscore filenames for pathless layout routes that wrap children without matching a URL segment

Applied to files:

  • apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, use the helper functions in ./packages/db/src/clickhouse/query-functions.ts rather than crafting raw queries

Applied to files:

  • packages/db/src/clickhouse/migration.ts
  • packages/db/src/clickhouse/client.ts
  • packages/db/src/services/import.service.ts
  • packages/importer/src/providers/umami.ts
📚 Learning: 2025-09-26T18:30:17.476Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: .cursorrules:0-0
Timestamp: 2025-09-26T18:30:17.476Z
Learning: When writing ClickHouse queries, always construct them using the custom query builder in ./packages/db/src/clickhouse/query-builder.ts

Applied to files:

  • packages/db/src/clickhouse/client.ts
  • packages/db/src/services/import.service.ts
  • packages/importer/src/providers/umami.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/{index,*.index}.tsx : For index routes, use index.tsx or *.index.tsx files and ensure the createFileRoute path uses a trailing slash (e.g., '/posts/')

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*$*.tsx : Name dynamic route files with $param (e.g., posts.$postId.tsx) to capture path parameters

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/$.tsx : Use $.tsx for splat (catch-all) routes to capture the remaining path into params._splat

Applied to files:

  • apps/start/src/routeTree.gen.ts
📚 Learning: 2025-10-16T10:29:28.924Z
Learnt from: CR
Repo: Openpanel-dev/openpanel PR: 0
File: apps/start/.cursorrules:0-0
Timestamp: 2025-10-16T10:29:28.924Z
Learning: Applies to apps/start/routes/**/*_.tsx : Use a trailing underscore in a segment filename to create a non-nested (un-nested) route (e.g., posts_.tsx)

Applied to files:

  • apps/start/src/routeTree.gen.ts
🧬 Code graph analysis (13)
apps/worker/src/boot-workers.ts (2)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
apps/worker/src/jobs/import.ts (1)
  • importJob (34-317)
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx (3)
packages/importer/src/providers/metadata.ts (1)
  • IMPORT_PROVIDERS (13-30)
apps/public/components/ui/tooltip.tsx (1)
  • Tooltiper (36-51)
apps/start/src/components/integrations/integration-card.tsx (3)
  • IntegrationCard (91-112)
  • IntegrationCardLogoImage (50-69)
  • IntegrationCardFooter (3-15)
packages/queue/src/queues.ts (1)
packages/redis/redis.ts (1)
  • getRedisQueue (93-105)
packages/trpc/src/routers/import.ts (5)
packages/trpc/src/trpc.ts (2)
  • createTRPCRouter (134-134)
  • protectedProcedure (166-170)
packages/trpc/src/errors.ts (1)
  • TRPCAccessError (3-7)
packages/db/src/prisma-client.ts (1)
  • db (276-276)
packages/validation/src/index.ts (1)
  • zCreateImport (478-486)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
apps/worker/src/jobs/import.ts (7)
packages/queue/src/queues.ts (1)
  • ImportQueuePayload (168-173)
packages/db/src/services/import.service.ts (10)
  • ImportSteps (719-719)
  • updateImportStatus (721-784)
  • getImportDateBounds (658-682)
  • insertImportBatch (30-87)
  • generateSessionIds (94-130)
  • createSessionsStartEndEvents (138-348)
  • moveImportsToProduction (354-458)
  • backfillSessionsToProduction (460-569)
  • markImportComplete (574-593)
  • getImportProgress (598-653)
packages/db/src/clickhouse/client.ts (2)
  • formatClickhouseDate (241-252)
  • error (38-43)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/importer/src/providers/umami.ts (1)
  • UmamiProvider (59-382)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-452)
apps/worker/src/index.ts (1)
packages/queue/src/queues.ts (1)
  • importQueue (175-181)
packages/common/server/parse-referrer.test.ts (1)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/importer/src/providers/mixpanel.test.ts (1)
packages/importer/src/providers/mixpanel.ts (1)
  • MixpanelProvider (18-452)
packages/importer/src/providers/mixpanel.ts (7)
packages/validation/src/index.ts (1)
  • IMixpanelImportConfig (471-471)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (719-879)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
packages/common/server/parser-user-agent.ts (2)
  • parseUserAgent (72-115)
  • UserAgentInfo (71-71)
packages/common/src/object.ts (2)
  • toDots (16-67)
  • strip (79-79)
packages/importer/src/base-provider.ts (2)
packages/importer/src/types.ts (3)
  • BaseRawEvent (40-42)
  • ImportJobMetadata (69-73)
  • ErrorContext (45-51)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/services/import.service.ts (5)
packages/db/src/services/event.service.ts (1)
  • IClickhouseEvent (68-102)
packages/db/src/clickhouse/csv.ts (2)
  • csvEscapeField (42-53)
  • csvEscapeJson (15-38)
packages/db/src/clickhouse/client.ts (5)
  • formatClickhouseDate (241-252)
  • chInsertCSV (206-232)
  • TABLE_NAMES (46-58)
  • ch (129-166)
  • convertClickhouseDateToJs (271-273)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/prisma-client.ts (1)
  • db (276-276)
apps/start/src/modals/add-import.tsx (4)
apps/start/src/integrations/tanstack-query/root-provider.tsx (1)
  • Provider (66-84)
packages/validation/src/index.ts (5)
  • zUmamiImportConfig (452-454)
  • zMixpanelImportConfig (461-470)
  • IUmamiImportConfig (456-456)
  • IMixpanelImportConfig (471-471)
  • IImportConfig (473-476)
apps/start/src/components/project-mapper.tsx (1)
  • ProjectMapper (33-116)
apps/public/lib/utils.ts (1)
  • cn (4-6)
packages/importer/src/providers/umami.ts (5)
packages/validation/src/index.ts (1)
  • IUmamiImportConfig (456-456)
packages/logger/index.ts (1)
  • ILogger (6-6)
packages/db/src/services/event.service.ts (2)
  • IClickhouseEvent (68-102)
  • query (719-879)
packages/common/src/url.ts (2)
  • parsePath (11-49)
  • isSameDomain (51-63)
packages/common/server/parse-referrer.ts (2)
  • parseReferrer (17-26)
  • getReferrerWithQuery (28-66)
🔇 Additional comments (20)
packages/db/src/buffers/base-buffer.ts (1)

1-1: LGTM! Import path refactor looks good.

The consolidation of generateSecureId import from a specific subpath to the main @openpanel/common/server module is a clean refactor that aligns with the broader import path restructuring mentioned in the PR.

packages/common/scripts/get-referrers.ts (1)

1-7: LGTM!

The ESM pattern for deriving __dirname is correct and the imports are appropriate for the script's needs.

apps/start/src/utils/op.ts (2)

3-4: LGTM!

Clean extraction of the environment variable into a local constant.


20-20: LGTM!

Module-level initialization is appropriate for this singleton analytics instance.

packages/trpc/package.json (1)

19-19: LGTM!

The addition of @openpanel/queue dependency is correct and follows the monorepo workspace pattern. This enables the TRPC import router to interact with the import queue.

packages/db/src/clickhouse/migration.ts (1)

118-132: LGTM!

The modifyTTL function correctly follows the established pattern for ClickHouse migration helpers. It properly handles both clustered and non-clustered scenarios, uses the replicated() helper appropriately, and maintains consistency with other migration functions in this file.

apps/worker/Dockerfile (1)

38-38: LGTM!

The importer package is correctly integrated into the Docker build process. The changes follow the established pattern: copying package.json for dependency resolution (line 38) and the full package in the final image (line 87).

Also applies to: 87-87

packages/importer/src/base-provider.ts (1)

1-121: LGTM! Well-designed base class for import providers.

The abstract class provides a clean, extensible foundation for import providers:

  • Clear separation between required abstract methods and optional hooks
  • Async generator pattern for parseSource enables efficient streaming of large datasets
  • Comprehensive lifecycle hooks for customization
  • getDateChunks utility correctly handles date range splitting with proper boundary logic
  • Excellent documentation throughout
packages/db/package.json (1)

16-16: LGTM: ClickHouse client version updated to support new features.

The update from ^1.2.0 to ^1.12.1 aligns with enhanced ClickHouse integration (events_imports table, improved insert settings, etc.) introduced in this PR.

packages/db/prisma/migrations/20251028150655_fix_imports_again/migration.sql (1)

1-2: LGTM: Makes currentStep nullable for flexible import state tracking.

Dropping the NOT NULL constraint on currentStep allows imports to exist without an explicit step, supporting initialization states in the import lifecycle.

apps/worker/src/index.ts (1)

8-8: LGTM: Import queue integrated into Bull Board.

The importQueue is properly imported and wired into the Bull Board UI for monitoring import jobs.

Also applies to: Line 42

apps/start/package.json (2)

11-11: LGTM: Build script now loads environment variables.

Updating the build script to use pnpm with-env ensures environment variables are available during the build process, consistent with other scripts in the file.


33-33: LGTM: Importer package added to support new UI.

The @openpanel/importer workspace dependency enables the new Imports UI and modal implementations in the start app.

packages/db/prisma/migrations/20251028103556_import_batch_and_status/migration.sql (1)

1-3: LGTM: Adds batch and step tracking for import lifecycle.

The currentBatch and currentStep columns support multi-stage import processing, enabling progress tracking and debugging across import batches.

packages/importer/src/providers/mixpanel.test.ts (1)

5-318: LGTM! Comprehensive test coverage.

The test suite thoroughly covers Mixpanel provider functionality including date chunking, event transformation, property flattening, and React Native referrer handling. The test expectations are well-structured and validate the provider's behavior across different scenarios.

packages/db/src/clickhouse/client.ts (4)

1-1: LGTM! Appropriate updates for import operations.

The increased timeouts (300s request timeout, 60s idle socket TTL) and addition of the events_imports table support long-running import jobs without connection issues.

Also applies to: 57-57, 62-62, 65-65


89-127: LGTM! Solid retry logic.

Exporting withRetry and wrapping operations with exponential backoff provides resilience against transient connection failures. The retry conditions (Connect errors, socket hang up, Timeout) are well-chosen.


134-162: LGTM! Proper settings for large inserts.

The retry wrapper and comprehensive clickhouse_settings (async_insert, increased timeouts, progress headers) ensure large CSV/row streaming completes successfully even under load.


206-232: LGTM! Clean CSV insert implementation.

The chInsertCSV function correctly uses a Readable stream in binary mode with appropriate CSV format settings, mirrors the existing EventBuffer pattern, and includes proper logging.

apps/start/src/routeTree.gen.ts (1)

1-1700: Auto-generated route tree file.

This file is automatically generated by TanStack Router and should not be manually edited. The additions correctly wire the new imports route into the settings tabs structure.

Comment on lines +265 to +276
<TableCell>
<Tooltiper
content={
<pre className="font-mono text-sm leading-normal whitespace-pre-wrap break-words">
{JSON.stringify(imp.config, null, 2)}
</pre>
}
tooltipClassName="max-w-xs"
>
<Badge>Config</Badge>
</Tooltiper>
</TableCell>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact import configs before rendering.

The tooltip currently stringifies imp.config verbatim, exposing stored provider credentials to anyone with page access. Sanitize/redact known secret keys (API keys, tokens, passwords) or limit visibility based on role before displaying.

🤖 Prompt for AI Agents
In
apps/start/src/routes/_app.$organizationId.$projectId_.settings._tabs.imports.tsx
around lines 265 to 276, the tooltip directly stringifies imp.config and may
expose secrets; fix by sanitizing the config before rendering: implement or call
a small sanitizer that scans known secret keys (e.g., apiKey, api_key, token,
access_token, secret, clientSecret, password, passwd, auth, private_key) and
replaces their values with a placeholder like "REDACTED" (or strips them), then
JSON.stringify that sanitized object for the tooltip; additionally gate full
config visibility by user role/permission — if the current user lacks an
elevated role, always show the sanitized version. Ensure the sanitizer is
deterministic and runs synchronously before passing content to Tooltiper.

Comment on lines +35 to +37
if (value === undefined || value === null || value === '') {
return acc;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filtering empty strings may cause unintended data loss.

The new check at line 35 filters out empty strings (value === ''), which could silently drop properties where an empty string is a legitimate value. This might be a breaking change if previous behavior preserved empty strings as valid data.

For example:

  • Input: {name: "", age: 25}
  • Result: Only age is preserved; name is lost entirely

If this filtering is intentional for the import use case, consider documenting this behavior or making it configurable.

🤖 Prompt for AI Agents
In packages/common/src/object.ts around lines 35 to 37, the added condition
checking value === '' will drop empty-string properties which may be valid data;
either remove the value === '' check to preserve empty strings, or make this
behavior explicit/configurable (e.g., add an options flag keepEmptyStrings or a
separate filter function) and document the change in code comments and relevant
docs so callers opt into stripping empty strings; update unit tests to cover
both behaviors.

Comment on lines +39 to +55
if (typeof value === 'string' && isMalformedJsonString(value)) {
// Skip it
return acc;
}

// Fix nested json strings - but catch parse errors for malformed JSON
if (typeof value === 'string' && isValidJsonString(value)) {
try {
return {
...acc,
...toDots(JSON.parse(value), `${path}${key}.`),
};
} catch {
// Skip it
return acc;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent error handling makes debugging difficult.

While the earlier review flagged the data loss risk from skipping malformed JSON (lines 39-42), there's an additional concern: the catch block at lines 51-54 silently drops values that fail JSON parsing without any logging or warning.

When legitimate data is skipped due to parse failures, users have no visibility into what was lost or why. This makes troubleshooting import issues very difficult.

Consider adding logging to provide visibility:

       } catch {
-        // Skip it
+        // Log parse failure for visibility
+        console.warn(`Failed to parse JSON string for key "${path}${key}": ${value.substring(0, 50)}...`);
         return acc;
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (typeof value === 'string' && isMalformedJsonString(value)) {
// Skip it
return acc;
}
// Fix nested json strings - but catch parse errors for malformed JSON
if (typeof value === 'string' && isValidJsonString(value)) {
try {
return {
...acc,
...toDots(JSON.parse(value), `${path}${key}.`),
};
} catch {
// Skip it
return acc;
}
}
if (typeof value === 'string' && isMalformedJsonString(value)) {
// Skip it
return acc;
}
// Fix nested json strings - but catch parse errors for malformed JSON
if (typeof value === 'string' && isValidJsonString(value)) {
try {
return {
...acc,
...toDots(JSON.parse(value), `${path}${key}.`),
};
} catch {
// Log parse failure for visibility
console.warn(`Failed to parse JSON string for key "${path}${key}": ${value.substring(0, 50)}...`);
return acc;
}
}
🤖 Prompt for AI Agents
In packages/common/src/object.ts around lines 39 to 55, the catch block silently
swallows JSON.parse errors when attempting to fix nested JSON strings; update
the catch to log the error and the offending value/path using the module's
logger (or accept a logger param) before returning acc so failures are visible,
include contextual info (error message, value, and path/key) and avoid changing
existing control flow.

Comment on lines +24 to +56
return db.import.findMany({
where: {
projectId: input.projectId,
},
orderBy: {
createdAt: 'desc',
},
});
}),

get: protectedProcedure
.input(z.object({ id: z.string() }))
.query(async ({ input, ctx }) => {
const importRecord = await db.import.findUniqueOrThrow({
where: {
id: input.id,
},
include: {
project: true,
},
});

const access = await getProjectAccess({
projectId: importRecord.projectId,
userId: ctx.session.userId,
});

if (!access) {
throw TRPCAccessError('You do not have access to this import');
}

return importRecord;
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restrict config exposure in list/get.

Both list and get return the full config payload to any member with read access, which leaks provider credentials (API keys, secrets, etc.). Gate secrets by role or redact sensitive fields before returning the response.

🤖 Prompt for AI Agents
In packages/trpc/src/routers/import.ts around lines 24 to 56, both the list and
get handlers currently return the full import.config payload which may contain
provider credentials; restrict exposure by either checking the caller's project
role and only returning full config for privileged roles (owner/admin), or by
redacting sensitive fields (apiKey, secret, token, password, privateKey, etc.)
from the config before returning for non‑privileged readers; apply the same
redaction/role check in both the findMany and findUnique responses (or transform
the returned objects) and update any return types to reflect the reduced config
shape.

Comment on lines +72 to +99
// Create import record
const importRecord = await db.import.create({
data: {
projectId: input.projectId,
config: input.config,
status: 'pending',
},
});

// Add job to queue
const job = await importQueue.add('import', {
type: 'import',
payload: {
importId: importRecord.id,
},
});

// Update import record with job ID
await db.import.update({
where: { id: importRecord.id },
data: { jobId: job.id },
});

return {
...importRecord,
jobId: job.id,
};
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle queue enqueue failures.

In create, if importQueue.add throws (Redis outage, connection drop), the record stays pending with no job attached. Wrap the enqueue in try/catch, update the record to failed (with errorMessage), and only persist pending/jobId after a successful add.


// Import configs
export const zUmamiImportConfig = createFileImportConfig('umami').extend({
projectMapper: z.array(zProjectMapper),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

projectMapper should likely be optional.

The AI summary indicates that projectMapper should be optional, but the code doesn't include .optional(). Project mapping is typically not required for all imports.

Apply this diff to make it optional:

-  projectMapper: z.array(zProjectMapper),
+  projectMapper: z.array(zProjectMapper).optional(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
projectMapper: z.array(zProjectMapper),
projectMapper: z.array(zProjectMapper).optional(),
🤖 Prompt for AI Agents
In packages/validation/src/index.ts around line 453, the schema currently
declares projectMapper as a required array (projectMapper:
z.array(zProjectMapper)), but it should be optional; change the schema to make
projectMapper optional by appending .optional() (i.e.,
z.array(zProjectMapper).optional()) so the field can be omitted during
validation.

@lindesvard lindesvard merged commit 212254d into main Nov 5, 2025
8 of 9 checks passed
@lindesvard lindesvard deleted the feature/import branch November 7, 2025 15:17
This was referenced Nov 7, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants