Skip to content

Analytics: Backend Tracking and Data Collection #318

@InfinityBowman

Description

@InfinityBowman

Problem

There's no analytics on the site. We have no visibility into how many people visit, which pages are popular, where visitors come from, or what devices they use. We can't tell if shared note links actually drive traffic.

Why not Umami/Plausible?

The current infrastructure is a single t3.small (2 vCPU, 2GB RAM) running PostgreSQL and the Hono API. Adding another service like Umami (~500MB RAM) would be tight. Plausible Cloud is paid. Since we already have PostgreSQL and a Hono API, we can build a lightweight analytics system directly into the existing stack with zero additional infrastructure cost.

Goal

Build the backend half of a privacy-friendly analytics system:

  1. A page_view table in PostgreSQL
  2. A lightweight tracking endpoint (POST /api/analytics/pageview)
  3. Admin-only query endpoints (GET /api/admin/analytics/summary, GET /api/admin/analytics/timeseries)
  4. A client-side usePageView hook that fires on route changes
  5. A data retention cleanup mechanism

This issue covers everything needed to collect and query analytics data. The companion issue (#319) covers the admin dashboard UI that displays it.

Implementation

1. Add the database table

In packages/api/src/db/schema.ts (alongside the existing user, session, account, verification, note, media, audio, and comment tables):

export const pageView = pgTable('page_view', {
  id: text('id').primaryKey().$defaultFn(() => crypto.randomUUID()),
  // What was visited
  path: text('path').notNull(),               // e.g., '/notes/abc123', '/stories'
  pageTitle: text('page_title'),              // e.g., 'Field Notes from Chicago' (from document.title)
  referrer: text('referrer'),                  // e.g., 'https://google.com', null for direct
  // UTM campaign tracking (parsed from URL query string)
  utmSource: text('utm_source'),              // e.g., 'twitter', 'newsletter'
  utmMedium: text('utm_medium'),              // e.g., 'social', 'email'
  utmCampaign: text('utm_campaign'),          // e.g., 'spring-2026-launch'
  // Visitor context (no personal data)
  browser: text('browser'),                    // parsed from User-Agent: 'Chrome', 'Firefox', etc.
  os: text('os'),                              // parsed from User-Agent: 'Windows', 'macOS', etc.
  device: text('device'),                      // 'desktop', 'mobile', 'tablet'
  screenWidth: text('screen_width'),           // bucketed: 'sm' (<640), 'md' (640-1024), 'lg' (1024-1440), 'xl' (1440+)
  language: text('language'),                  // e.g., 'en-US' (from navigator.language)
  // Deduplication
  sessionHash: text('session_hash'),           // hash of IP + User-Agent + date (not stored raw)
  // Timestamp
  createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
}, (table) => [
  index('page_view_path_idx').on(table.path),
  index('page_view_created_at_idx').on(table.createdAt),
  index('page_view_session_hash_idx').on(table.sessionHash),
  index('page_view_utm_source_idx').on(table.utmSource),
]);

Then generate and apply the migration:

pnpm api:db:generate
pnpm api:db:migrate

What each column is for:

  • pageTitle -- document.title at the time of the page view. Makes the "top pages" table human-readable (e.g., "Field Notes from Chicago" instead of /notes/a3f8b2c1).
  • utmSource, utmMedium, utmCampaign -- parsed from URL query parameters. Tracks which shared links drive traffic (e.g., a note link shared on Twitter with ?utm_source=twitter). The share button can automatically append UTM params to copied URLs.
  • screenWidth -- bucketed into sm / md / lg / xl from window.innerWidth. More useful than just mobile / desktop.
  • language -- navigator.language (e.g., en-US). Useful if i18n ever becomes relevant.
  • sessionHash -- one-way hash of IP + User-Agent + today's date. Approximates daily unique visitors without storing identifying info. Resets every day.

2. Add the tracking endpoint

Create packages/api/src/routes/analytics.ts. This is a public endpoint (no auth required -- it tracks anonymous page views):

import { OpenAPIHono, createRoute, z } from '@hono/zod-openapi';
import { pageView } from '../db/schema';
import { createHash } from 'crypto';
import type { AppEnv } from '../types';
import { getDb } from './helpers';

const trackPageViewRoute = createRoute({
  method: 'post',
  path: '/pageview',
  tags: ['Analytics'],
  request: {
    body: {
      content: {
        'application/json': {
          schema: z.object({
            path: z.string().max(500),
            pageTitle: z.string().max(200).optional(),
            referrer: z.string().max(500).optional(),
            utmSource: z.string().max(100).optional(),
            utmMedium: z.string().max(100).optional(),
            utmCampaign: z.string().max(200).optional(),
            screenWidth: z.enum(['sm', 'md', 'lg', 'xl']).optional(),
            language: z.string().max(10).optional(),
          }),
        },
      },
    },
  },
  responses: {
    202: {
      content: { 'application/json': { schema: z.object({ ok: z.boolean() }) } },
      description: 'Page view recorded',
    },
    400: {
      content: { 'application/json': { schema: z.object({ error: z.string() }) } },
      description: 'Invalid request',
    },
  },
});

export const analyticsRoutes = new OpenAPIHono<AppEnv>()
  .openapi(trackPageViewRoute, async (c) => {
    const db = getDb(c);
    const body = c.req.valid('json');

    const ip = c.req.header('x-forwarded-for') || 'unknown';
    const ua = c.req.header('user-agent') || '';
    const today = new Date().toISOString().split('T')[0];

    // One-way hash for daily unique visitor approximation
    const sessionHash = createHash('sha256')
      .update(`${ip}:${ua}:${today}`)
      .digest('hex')
      .slice(0, 16);

    // Parse User-Agent for browser/OS/device (simple regex, no library needed)
    const browser = parseBrowser(ua);
    const os = parseOS(ua);
    const device = parseDevice(ua);

    await db.insert(pageView).values({
      path: body.path,
      pageTitle: body.pageTitle || null,
      referrer: body.referrer || null,
      utmSource: body.utmSource || null,
      utmMedium: body.utmMedium || null,
      utmCampaign: body.utmCampaign || null,
      browser,
      os,
      device,
      screenWidth: body.screenWidth || null,
      language: body.language || null,
      sessionHash,
    });

    return c.json({ ok: true }, 202);
  });

Register it in packages/api/src/routes/index.ts:

import { analyticsRoutes } from './analytics';

export const routes = new OpenAPIHono<AppEnv>()
  .route('/', healthRoutes)
  .route('/users', userRoutes)
  .route('/admin', adminRoutes)
  .route('/notes', noteRoutes)
  .route('/comments', commentRoutes)
  .route('/analytics', analyticsRoutes);   // <-- add this

The User-Agent parsing functions (parseBrowser, parseOS, parseDevice) can be simple regex-based -- no need for a library. A ~30 line utility covers the common cases (Chrome, Firefox, Safari, Edge, mobile vs desktop).

3. Add admin query endpoints

Add analytics query routes to packages/api/src/routes/admin.ts. These use the existing requireAuth and requireAdmin middleware already in that file. Add Zod schemas to packages/shared/src/schemas/ (a new analytics.ts file alongside the existing admin.ts, note.ts, etc.).

// GET /api/admin/analytics/summary?days=30
// Returns: { totalViews, uniqueVisitors, topPages, topReferrers, topCampaigns, browsers, devices, screenWidths, languages }

// GET /api/admin/analytics/timeseries?days=30
// Returns: [{ date, views, visitors }, ...]

SQL aggregations:

-- Page views per day
SELECT DATE(created_at) as date, COUNT(*) as views, COUNT(DISTINCT session_hash) as visitors
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date;

-- Top pages (with human-readable titles)
SELECT path, MAX(page_title) as title, COUNT(*) as views, COUNT(DISTINCT session_hash) as visitors
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY path
ORDER BY views DESC
LIMIT 20;

-- Top referrers
SELECT referrer, COUNT(*) as views
FROM page_view
WHERE referrer IS NOT NULL AND created_at > NOW() - INTERVAL '30 days'
GROUP BY referrer
ORDER BY views DESC
LIMIT 20;

-- UTM campaign breakdown
SELECT utm_source, utm_medium, utm_campaign, COUNT(*) as views, COUNT(DISTINCT session_hash) as visitors
FROM page_view
WHERE utm_source IS NOT NULL AND created_at > NOW() - INTERVAL '30 days'
GROUP BY utm_source, utm_medium, utm_campaign
ORDER BY views DESC
LIMIT 20;

-- Screen size breakdown
SELECT screen_width, COUNT(*) as views
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY screen_width
ORDER BY views DESC;

-- Language breakdown
SELECT language, COUNT(*) as views
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY language
ORDER BY views DESC
LIMIT 10;

4. Add the client-side tracking hook

Create packages/web/app/lib/hooks/usePageView.ts. This project uses TanStack Router (not Next.js), so use useLocation from @tanstack/react-router:

import { useLocation } from '@tanstack/react-router';
import { useEffect } from 'react';
import { API_URL } from '@/app/lib/services/api';

function getScreenWidthBucket(): string {
  const w = window.innerWidth;
  if (w < 640) return 'sm';
  if (w < 1024) return 'md';
  if (w < 1440) return 'lg';
  return 'xl';
}

function getUtmParams(): Record<string, string | null> {
  const params = new URLSearchParams(window.location.search);
  return {
    utmSource: params.get('utm_source'),
    utmMedium: params.get('utm_medium'),
    utmCampaign: params.get('utm_campaign'),
  };
}

export function usePageView() {
  const location = useLocation();

  useEffect(() => {
    // Don't track in development
    if (import.meta.env.DEV) return;

    const utm = getUtmParams();

    fetch(`${API_URL}/api/analytics/pageview`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        path: location.pathname,
        pageTitle: document.title,
        referrer: document.referrer || null,
        screenWidth: getScreenWidthBucket(),
        language: navigator.language,
        ...utm,
      }),
      keepalive: true,
    }).catch(() => {});
  }, [location.pathname]);
}

Wire it up in a client-side component that wraps the app (e.g., the root route's component or a provider component):

usePageView();

Note: API_URL is already defined in packages/web/app/lib/services/api.ts as import.meta.env.VITE_API_URL || 'http://localhost:3002' -- reuse that rather than defining a new env var.

5. Data retention

Add a cleanup script in packages/api/src/scripts/ (alongside the existing sync scripts) that deletes page views older than 90 days:

DELETE FROM page_view WHERE created_at < NOW() - INTERVAL '90 days';

This can be a simple script run via cron on the server, or a Hono route triggered by a scheduler.

Privacy Design

Concern How it's handled
IP address Never stored. Only used transiently to compute session hash.
Cookies None set.
User-Agent Parsed into coarse categories (browser, OS, device type). Raw string not stored.
Session tracking Daily hash of IP + UA + date. Changes every day, can't track across days.
Personal data None collected. No user IDs, no emails, no accounts linked.
UTM params Only stored if present in URL. No personal data -- just campaign metadata.
Language Coarse locale code only (e.g., en-US).
Screen size Bucketed into sm/md/lg/xl. Not the exact pixel width.
GDPR/CCPA Compliant by design -- no personal data processing.

How to Test

  1. Start the full stack: pnpm dev
  2. Send page views via curl:
    curl -X POST http://localhost:3002/api/analytics/pageview \
      -H 'Content-Type: application/json' \
      -d '{"path": "/stories", "pageTitle": "Stories", "referrer": "https://google.com", "screenWidth": "lg", "language": "en-US"}'
  3. Send one with UTM params:
    curl -X POST http://localhost:3002/api/analytics/pageview \
      -H 'Content-Type: application/json' \
      -d '{"path": "/notes/abc123", "pageTitle": "Field Notes", "utmSource": "twitter", "utmMedium": "social", "utmCampaign": "test"}'
  4. Query the admin endpoints (must be authenticated as admin):
    curl http://localhost:3002/api/admin/analytics/summary?days=30
    curl http://localhost:3002/api/admin/analytics/timeseries?days=30
  5. Verify responses contain correct aggregated data
  6. Verify no cookies are set by the tracking endpoint
  7. Verify the usePageView hook fires on route changes (check Network tab in browser)

Acceptance Criteria

  • page_view table added to packages/api/src/db/schema.ts with indexes
  • Drizzle migration generated and applied
  • POST /api/analytics/pageview endpoint accepts and stores page views (OpenAPIHono route in packages/api/src/routes/analytics.ts)
  • Analytics route registered in packages/api/src/routes/index.ts
  • Endpoint parses User-Agent into browser/OS/device (no raw UA stored)
  • Endpoint computes daily session hash (no IP stored)
  • All input fields validated via Zod schemas
  • GET /api/admin/analytics/summary returns aggregate stats (admin-only, uses existing requireAdmin middleware)
  • GET /api/admin/analytics/timeseries returns daily views/visitors (admin-only)
  • Zod response schemas added to packages/shared/src/schemas/analytics.ts
  • Client-side usePageView hook in packages/web/app/lib/hooks/usePageView.ts sends: path, page title, referrer, UTM params, screen width bucket, language
  • Hook uses TanStack Router's useLocation and the existing API_URL from app/lib/services/api.ts
  • Hook only fires in production (import.meta.env.DEV check)
  • Data retention cleanup script in packages/api/src/scripts/
  • No cookies set, no personal data stored
  • Existing tests pass

Files to Create/Modify

Create:

  • packages/api/src/routes/analytics.ts -- tracking endpoint + UA parser utilities
  • packages/shared/src/schemas/analytics.ts -- Zod schemas for analytics API responses
  • packages/web/app/lib/hooks/usePageView.ts -- client-side tracking hook
  • packages/api/src/scripts/cleanup-pageviews.ts -- data retention script

Modify:

  • packages/api/src/db/schema.ts -- add pageView table
  • packages/api/src/routes/index.ts -- register /analytics routes
  • packages/api/src/routes/admin.ts -- add analytics summary/timeseries query endpoints
  • packages/shared/src/schemas/index.ts -- export analytics schemas
  • Root layout or provider component -- wire up usePageView hook

References

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions