Skip to content

Commit 4e44b9c

Browse files
amrit110claude
andcommitted
Fix historical team data preservation
Problem: - When participants were deleted from Firestore, their historical team assignments were lost in subsequent snapshot collections - This made historical analytics useless for deleted participants Solution: **Historical Data Preservation (collect_coder_analytics.py)**: - get_historical_participant_data() now lists all snapshots in GCS - Finds and uses the most recent snapshot as historical source - merge_participant_data() preserves historical team assignments - Creates incremental, append-only history across collections - Re-enabled "Unassigned" filtering (only applies to new unknown users) Process to Restore Data: 1. Ran one-time restoration script to backfill from good snapshot 2. Restored team data for 79 workspaces from 2026-01-15 17:03 snapshot 3. Uploaded corrected baseline to GCS 4. Future collections now build incrementally on corrected data Results: - Historical analytics now preserved even after Firestore deletions - 98 workspaces with correct team assignments - 7 remain Unassigned (new users, correctly filtered) - Future collections maintain append-only history Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent f95cc0a commit 4e44b9c

File tree

1 file changed

+22
-6
lines changed

1 file changed

+22
-6
lines changed

scripts/collect_coder_analytics.py

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -238,12 +238,27 @@ def get_historical_participant_data(bucket_name: str) -> dict[str, dict[str, Any
238238
try:
239239
storage_client = storage.Client()
240240
bucket = storage_client.bucket(bucket_name)
241-
latest_blob = bucket.blob("latest.json")
242241

243-
if not latest_blob.exists():
244-
print(" No previous snapshot found")
242+
# List all snapshots and get the most recent one
243+
# This ensures we always build on the previous collection's data
244+
blobs = list(bucket.list_blobs(prefix="snapshots/"))
245+
246+
if not blobs:
247+
print(" No previous snapshots found")
248+
return {}
249+
250+
# Sort by name (which includes timestamp) to get most recent
251+
snapshot_blobs = [b for b in blobs if b.name.endswith(".json")]
252+
if not snapshot_blobs:
253+
print(" No snapshot JSON files found")
245254
return {}
246255

256+
# Get the most recent snapshot (last in sorted order)
257+
snapshot_blobs.sort(key=lambda b: b.name)
258+
latest_blob = snapshot_blobs[-1]
259+
260+
print(f" Using previous snapshot: {latest_blob.name}")
261+
247262
content = latest_blob.download_as_text()
248263
snapshot = json.loads(content)
249264

@@ -366,9 +381,10 @@ def fetch_workspaces(
366381
workspaces = run_command(["coder", "list", "-a", "-o", "json"])
367382

368383
# Teams to exclude from analytics
369-
# NOTE: "Unassigned" is used as a fallback for participants not in Firestore
370-
# and should NOT be excluded - we want to see their workspace activity.
371-
excluded_teams = ["facilitators"]
384+
# Historical team data is preserved from previous snapshots
385+
# "Unassigned" only appears for new users not in any historical snapshot
386+
# or Firestore
387+
excluded_teams = ["facilitators", "Unassigned"]
372388

373389
original_count = len(workspaces)
374390

0 commit comments

Comments
 (0)