Skip to content

fix(cli): switch identity stitch to hybrid stitch+stamp attribution#5559

Draft
seanoliver wants to merge 6 commits into
developfrom
sean/growth-891-switch-cli-identity-stitch-to-hybrid-stitch-stamp
Draft

fix(cli): switch identity stitch to hybrid stitch+stamp attribution#5559
seanoliver wants to merge 6 commits into
developfrom
sean/growth-891-switch-cli-identity-stitch-to-hybrid-stitch-stamp

Conversation

@seanoliver

Copy link
Copy Markdown
Contributor

The gate from #5366 stopped the ephemeral-env $identify spike (730K/day → ~15K/day) but at the cost of attribution: in CI, Docker, and npx supabase, cli_* events stay orphaned on throwaway device IDs and never link to the authenticated user. Those populations are 31–85% of CLI volume and feed the Agent-Led Growth dashboards.

Problem

$create_alias was doing two jobs: labeling future events with the user ID, and retroactively merging pre-login device history into the person. In ephemeral environments the persistence assumption behind stitching is false, so every run re-stitched — that was the spike. But dropping stitching entirely (the original GROWTH-891 "Option C") would also drop the history merge on developer laptops, where it has real value and was never the volume problem.

There was also a quieter leak: the #5366 gate only covered the OnGotrueID hook. The login command calls StitchLogin directly, so supabase login --token in CI still fired an alias and wrote state to a doomed home directory on every run.

Fix

Hybrid of stitching and stamping, per ADR 0013 (included in this PR):

  • Stamp everywhere. After the first authenticated API call, the user UUID from X-Gotrue-Id is stashed in process memory and used as distinct_id on all subsequent captures. Zero extra PostHog events; restores attribution in CI/Docker/npx.
  • Stitch only in persistent runtimes. One $create_alias + telemetry.json write on a laptop's first login; nothing in ephemeral environments. The branch lives inside the stitch owner on every surface — Go StitchLogin, legacy TS stitchLogin and the platform-api layer, next/ login — so call sites can't forget the gate. The login-in-CI leak closes as a side effect.
  • Memory wins over disk when they disagree (re-login as a different user), and logout clears both on all surfaces, so post-logout events immediately fall back to the device ID.
  • Removes the redundant $identify in next/ login that fix(cli): drop redundant identify from identity stitch #5396 missed, and cli_* capture volume from CI is unchanged — only identity-linking behavior moves.

The first commit fixes a separate concurrency bug this work surfaced: telemetry config writes used millisecond-timestamp temp filenames, so two same-millisecond writers raced the rename into ENOENT.

Beyond the test suites, we verified the live event streams: both binaries rebuilt against a local PostHog capture server and a stub API serving X-Gotrue-Id, run across first-run/persistent/steady-state/CI homes — ephemeral runs fire zero alias/identify with captures stamped by the real UUID, persistent first-run fires exactly one alias and persists the UUID.

GROWTH-891

Two concurrent writers in the same millisecond shared a tmp path and raced
the rename into ENOENT. Surfaced as a cross-file flake in the login/logout
integration tests, but two real CLI processes could hit it the same way.
After the first authenticated API call, the user UUID is stashed in process
memory and used as distinct_id on all subsequent capture events (stamping,
zero extra PostHog events). The $create_alias merge and the telemetry.json
write now only happen in persistent runtimes, and the gate lives inside the
stitch owner on every surface (Go StitchLogin, legacy TS stitchLogin +
platform-api layer, next TS login) so no call site can forget it. Logout
clears both the in-memory and persisted identity on all surfaces, and the
redundant $identify that GROWTH-890 missed in next/login is gone.

This restores per-user attribution for cli_* events in CI/Docker/npx (lost
to the #5366 gate) without re-introducing the identify/alias volume spike.
See docs/adr/0013-hybrid-stitch-stamp-identity-attribution.md.

GROWTH-891
@seanoliver seanoliver requested a review from a team as a code owner June 12, 2026 00:22
@seanoliver seanoliver marked this pull request as draft June 12, 2026 00:27
@github-actions

Copy link
Copy Markdown

Supabase CLI preview

npx --yes https://pkg.pr.new/supabase@5559

Preview package for commit 57e05b2.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57e05b28ef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread apps/cli-go/internal/telemetry/service.go
Comment thread apps/cli-go/internal/telemetry/service.go
seanoliver and others added 4 commits June 11, 2026 17:46
Three fixes from an independent review of the hybrid stitch+stamp change:

- Stamp the in-memory identity from the first authenticated response even
  when a persisted distinct_id already exists (stale disk identity + a
  different live token previously kept attributing events to the old user).
  Alias only the first identity a device ever sees — re-aliasing on
  re-login, or on the login command's direct StitchLogin call after the
  response hook already stitched, would merge a second user into the
  device's existing person graph.
- Mark the legacy TS stitch attempt before its first yield point so
  concurrent authenticated responses cannot double-stitch.
- Clear the telemetry identity on logout even when no token exists; a stale
  distinct_id can outlive the token.
Second-round review findings:

- Login as A, logout, login as B re-aliased the same device — a device
  already merged into A's person graph. Logout now resets the identity
  entirely (new ResetIdentity / resetIdentity): forget the user AND rotate
  the device id, so the next login aliases a fresh device. Transient
  failure paths keep ClearDistinctID and preserve the device id.
- A failed alias enqueue left the in-memory identity set, so the login
  command's follow-up StitchLogin treated the identity as already aliased
  and permanently skipped the history merge. The identity is now un-stashed
  on enqueue failure, keeping the first-identity gate retryable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants