-
Notifications
You must be signed in to change notification settings - Fork 9
Analytics: Backend Tracking and Data Collection #318
Description
Problem
There's no analytics on the site. We have no visibility into how many people visit, which pages are popular, where visitors come from, or what devices they use. We can't tell if shared note links actually drive traffic.
Why not Umami/Plausible?
The current infrastructure is a single t3.small (2 vCPU, 2GB RAM) running PostgreSQL and the Hono API. Adding another service like Umami (~500MB RAM) would be tight. Plausible Cloud is paid. Since we already have PostgreSQL and a Hono API, we can build a lightweight analytics system directly into the existing stack with zero additional infrastructure cost.
Goal
Build the backend half of a privacy-friendly analytics system:
- A
page_viewtable in PostgreSQL - A lightweight tracking endpoint (
POST /api/analytics/pageview) - Admin-only query endpoints (
GET /api/admin/analytics/summary,GET /api/admin/analytics/timeseries) - A client-side
usePageViewhook that fires on route changes - A data retention cleanup mechanism
This issue covers everything needed to collect and query analytics data. The companion issue (#319) covers the admin dashboard UI that displays it.
Implementation
1. Add the database table
In packages/api/src/db/schema.ts (alongside the existing user, session, account, verification, note, media, audio, and comment tables):
export const pageView = pgTable('page_view', {
id: text('id').primaryKey().$defaultFn(() => crypto.randomUUID()),
// What was visited
path: text('path').notNull(), // e.g., '/notes/abc123', '/stories'
pageTitle: text('page_title'), // e.g., 'Field Notes from Chicago' (from document.title)
referrer: text('referrer'), // e.g., 'https://google.com', null for direct
// UTM campaign tracking (parsed from URL query string)
utmSource: text('utm_source'), // e.g., 'twitter', 'newsletter'
utmMedium: text('utm_medium'), // e.g., 'social', 'email'
utmCampaign: text('utm_campaign'), // e.g., 'spring-2026-launch'
// Visitor context (no personal data)
browser: text('browser'), // parsed from User-Agent: 'Chrome', 'Firefox', etc.
os: text('os'), // parsed from User-Agent: 'Windows', 'macOS', etc.
device: text('device'), // 'desktop', 'mobile', 'tablet'
screenWidth: text('screen_width'), // bucketed: 'sm' (<640), 'md' (640-1024), 'lg' (1024-1440), 'xl' (1440+)
language: text('language'), // e.g., 'en-US' (from navigator.language)
// Deduplication
sessionHash: text('session_hash'), // hash of IP + User-Agent + date (not stored raw)
// Timestamp
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
}, (table) => [
index('page_view_path_idx').on(table.path),
index('page_view_created_at_idx').on(table.createdAt),
index('page_view_session_hash_idx').on(table.sessionHash),
index('page_view_utm_source_idx').on(table.utmSource),
]);Then generate and apply the migration:
pnpm api:db:generate
pnpm api:db:migrateWhat each column is for:
pageTitle--document.titleat the time of the page view. Makes the "top pages" table human-readable (e.g., "Field Notes from Chicago" instead of/notes/a3f8b2c1).utmSource,utmMedium,utmCampaign-- parsed from URL query parameters. Tracks which shared links drive traffic (e.g., a note link shared on Twitter with?utm_source=twitter). The share button can automatically append UTM params to copied URLs.screenWidth-- bucketed intosm/md/lg/xlfromwindow.innerWidth. More useful than justmobile/desktop.language--navigator.language(e.g.,en-US). Useful if i18n ever becomes relevant.sessionHash-- one-way hash ofIP + User-Agent + today's date. Approximates daily unique visitors without storing identifying info. Resets every day.
2. Add the tracking endpoint
Create packages/api/src/routes/analytics.ts. This is a public endpoint (no auth required -- it tracks anonymous page views):
import { OpenAPIHono, createRoute, z } from '@hono/zod-openapi';
import { pageView } from '../db/schema';
import { createHash } from 'crypto';
import type { AppEnv } from '../types';
import { getDb } from './helpers';
const trackPageViewRoute = createRoute({
method: 'post',
path: '/pageview',
tags: ['Analytics'],
request: {
body: {
content: {
'application/json': {
schema: z.object({
path: z.string().max(500),
pageTitle: z.string().max(200).optional(),
referrer: z.string().max(500).optional(),
utmSource: z.string().max(100).optional(),
utmMedium: z.string().max(100).optional(),
utmCampaign: z.string().max(200).optional(),
screenWidth: z.enum(['sm', 'md', 'lg', 'xl']).optional(),
language: z.string().max(10).optional(),
}),
},
},
},
},
responses: {
202: {
content: { 'application/json': { schema: z.object({ ok: z.boolean() }) } },
description: 'Page view recorded',
},
400: {
content: { 'application/json': { schema: z.object({ error: z.string() }) } },
description: 'Invalid request',
},
},
});
export const analyticsRoutes = new OpenAPIHono<AppEnv>()
.openapi(trackPageViewRoute, async (c) => {
const db = getDb(c);
const body = c.req.valid('json');
const ip = c.req.header('x-forwarded-for') || 'unknown';
const ua = c.req.header('user-agent') || '';
const today = new Date().toISOString().split('T')[0];
// One-way hash for daily unique visitor approximation
const sessionHash = createHash('sha256')
.update(`${ip}:${ua}:${today}`)
.digest('hex')
.slice(0, 16);
// Parse User-Agent for browser/OS/device (simple regex, no library needed)
const browser = parseBrowser(ua);
const os = parseOS(ua);
const device = parseDevice(ua);
await db.insert(pageView).values({
path: body.path,
pageTitle: body.pageTitle || null,
referrer: body.referrer || null,
utmSource: body.utmSource || null,
utmMedium: body.utmMedium || null,
utmCampaign: body.utmCampaign || null,
browser,
os,
device,
screenWidth: body.screenWidth || null,
language: body.language || null,
sessionHash,
});
return c.json({ ok: true }, 202);
});Register it in packages/api/src/routes/index.ts:
import { analyticsRoutes } from './analytics';
export const routes = new OpenAPIHono<AppEnv>()
.route('/', healthRoutes)
.route('/users', userRoutes)
.route('/admin', adminRoutes)
.route('/notes', noteRoutes)
.route('/comments', commentRoutes)
.route('/analytics', analyticsRoutes); // <-- add thisThe User-Agent parsing functions (parseBrowser, parseOS, parseDevice) can be simple regex-based -- no need for a library. A ~30 line utility covers the common cases (Chrome, Firefox, Safari, Edge, mobile vs desktop).
3. Add admin query endpoints
Add analytics query routes to packages/api/src/routes/admin.ts. These use the existing requireAuth and requireAdmin middleware already in that file. Add Zod schemas to packages/shared/src/schemas/ (a new analytics.ts file alongside the existing admin.ts, note.ts, etc.).
// GET /api/admin/analytics/summary?days=30
// Returns: { totalViews, uniqueVisitors, topPages, topReferrers, topCampaigns, browsers, devices, screenWidths, languages }
// GET /api/admin/analytics/timeseries?days=30
// Returns: [{ date, views, visitors }, ...]SQL aggregations:
-- Page views per day
SELECT DATE(created_at) as date, COUNT(*) as views, COUNT(DISTINCT session_hash) as visitors
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date;
-- Top pages (with human-readable titles)
SELECT path, MAX(page_title) as title, COUNT(*) as views, COUNT(DISTINCT session_hash) as visitors
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY path
ORDER BY views DESC
LIMIT 20;
-- Top referrers
SELECT referrer, COUNT(*) as views
FROM page_view
WHERE referrer IS NOT NULL AND created_at > NOW() - INTERVAL '30 days'
GROUP BY referrer
ORDER BY views DESC
LIMIT 20;
-- UTM campaign breakdown
SELECT utm_source, utm_medium, utm_campaign, COUNT(*) as views, COUNT(DISTINCT session_hash) as visitors
FROM page_view
WHERE utm_source IS NOT NULL AND created_at > NOW() - INTERVAL '30 days'
GROUP BY utm_source, utm_medium, utm_campaign
ORDER BY views DESC
LIMIT 20;
-- Screen size breakdown
SELECT screen_width, COUNT(*) as views
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY screen_width
ORDER BY views DESC;
-- Language breakdown
SELECT language, COUNT(*) as views
FROM page_view
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY language
ORDER BY views DESC
LIMIT 10;4. Add the client-side tracking hook
Create packages/web/app/lib/hooks/usePageView.ts. This project uses TanStack Router (not Next.js), so use useLocation from @tanstack/react-router:
import { useLocation } from '@tanstack/react-router';
import { useEffect } from 'react';
import { API_URL } from '@/app/lib/services/api';
function getScreenWidthBucket(): string {
const w = window.innerWidth;
if (w < 640) return 'sm';
if (w < 1024) return 'md';
if (w < 1440) return 'lg';
return 'xl';
}
function getUtmParams(): Record<string, string | null> {
const params = new URLSearchParams(window.location.search);
return {
utmSource: params.get('utm_source'),
utmMedium: params.get('utm_medium'),
utmCampaign: params.get('utm_campaign'),
};
}
export function usePageView() {
const location = useLocation();
useEffect(() => {
// Don't track in development
if (import.meta.env.DEV) return;
const utm = getUtmParams();
fetch(`${API_URL}/api/analytics/pageview`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
path: location.pathname,
pageTitle: document.title,
referrer: document.referrer || null,
screenWidth: getScreenWidthBucket(),
language: navigator.language,
...utm,
}),
keepalive: true,
}).catch(() => {});
}, [location.pathname]);
}Wire it up in a client-side component that wraps the app (e.g., the root route's component or a provider component):
usePageView();Note: API_URL is already defined in packages/web/app/lib/services/api.ts as import.meta.env.VITE_API_URL || 'http://localhost:3002' -- reuse that rather than defining a new env var.
5. Data retention
Add a cleanup script in packages/api/src/scripts/ (alongside the existing sync scripts) that deletes page views older than 90 days:
DELETE FROM page_view WHERE created_at < NOW() - INTERVAL '90 days';This can be a simple script run via cron on the server, or a Hono route triggered by a scheduler.
Privacy Design
| Concern | How it's handled |
|---|---|
| IP address | Never stored. Only used transiently to compute session hash. |
| Cookies | None set. |
| User-Agent | Parsed into coarse categories (browser, OS, device type). Raw string not stored. |
| Session tracking | Daily hash of IP + UA + date. Changes every day, can't track across days. |
| Personal data | None collected. No user IDs, no emails, no accounts linked. |
| UTM params | Only stored if present in URL. No personal data -- just campaign metadata. |
| Language | Coarse locale code only (e.g., en-US). |
| Screen size | Bucketed into sm/md/lg/xl. Not the exact pixel width. |
| GDPR/CCPA | Compliant by design -- no personal data processing. |
How to Test
- Start the full stack:
pnpm dev - Send page views via curl:
curl -X POST http://localhost:3002/api/analytics/pageview \ -H 'Content-Type: application/json' \ -d '{"path": "/stories", "pageTitle": "Stories", "referrer": "https://google.com", "screenWidth": "lg", "language": "en-US"}'
- Send one with UTM params:
curl -X POST http://localhost:3002/api/analytics/pageview \ -H 'Content-Type: application/json' \ -d '{"path": "/notes/abc123", "pageTitle": "Field Notes", "utmSource": "twitter", "utmMedium": "social", "utmCampaign": "test"}'
- Query the admin endpoints (must be authenticated as admin):
curl http://localhost:3002/api/admin/analytics/summary?days=30 curl http://localhost:3002/api/admin/analytics/timeseries?days=30
- Verify responses contain correct aggregated data
- Verify no cookies are set by the tracking endpoint
- Verify the
usePageViewhook fires on route changes (check Network tab in browser)
Acceptance Criteria
-
page_viewtable added topackages/api/src/db/schema.tswith indexes - Drizzle migration generated and applied
-
POST /api/analytics/pageviewendpoint accepts and stores page views (OpenAPIHono route inpackages/api/src/routes/analytics.ts) - Analytics route registered in
packages/api/src/routes/index.ts - Endpoint parses User-Agent into browser/OS/device (no raw UA stored)
- Endpoint computes daily session hash (no IP stored)
- All input fields validated via Zod schemas
-
GET /api/admin/analytics/summaryreturns aggregate stats (admin-only, uses existingrequireAdminmiddleware) -
GET /api/admin/analytics/timeseriesreturns daily views/visitors (admin-only) - Zod response schemas added to
packages/shared/src/schemas/analytics.ts - Client-side
usePageViewhook inpackages/web/app/lib/hooks/usePageView.tssends: path, page title, referrer, UTM params, screen width bucket, language - Hook uses TanStack Router's
useLocationand the existingAPI_URLfromapp/lib/services/api.ts - Hook only fires in production (
import.meta.env.DEVcheck) - Data retention cleanup script in
packages/api/src/scripts/ - No cookies set, no personal data stored
- Existing tests pass
Files to Create/Modify
Create:
packages/api/src/routes/analytics.ts-- tracking endpoint + UA parser utilitiespackages/shared/src/schemas/analytics.ts-- Zod schemas for analytics API responsespackages/web/app/lib/hooks/usePageView.ts-- client-side tracking hookpackages/api/src/scripts/cleanup-pageviews.ts-- data retention script
Modify:
packages/api/src/db/schema.ts-- addpageViewtablepackages/api/src/routes/index.ts-- register/analyticsroutespackages/api/src/routes/admin.ts-- add analytics summary/timeseries query endpointspackages/shared/src/schemas/index.ts-- export analytics schemas- Root layout or provider component -- wire up
usePageViewhook
References
- Plausible's approach to privacy-friendly analytics (design inspiration)
- Umami's session hashing approach (similar to what we're doing)
- Existing OpenAPIHono route patterns:
packages/api/src/routes/admin.ts - Existing shared schemas:
packages/shared/src/schemas/admin.ts