Skip to content

Commit b3dbac5

Browse files
snomiaoclaude
andauthored
fix(perf): optimize MongoDB storage and add ForkedRepo index (#178)
* perf: add ForkedRepo compound index for query optimization Add idx_forkedRepo_repo compound index {forkedRepo: 1, repo: 1} to improve lookup performance for the ForkedRepo collection. Identified by MongoDB Performance Advisor: - Problem: Queries scanning 3830 docs to return 1 - Expected: 151ms → <10ms (93% improvement) Also manually dropped the redundant candidate.data_1 index from CNRepos (was a prefix of compound index candidate.data_1_createdPulls...). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address Copilot review comments - Add `unique: true` to ForkedRepo index to prevent duplicate upserts - Add `if (import.meta.main)` guard to migration script - Move db.close() to finally block for proper cleanup on errors - Add validation for --limit flag to fail fast on invalid values - Fix docs to match actual --limit=N format Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 9198708 commit b3dbac5

File tree

2 files changed

+64
-9
lines changed

2 files changed

+64
-9
lines changed

scripts/migrate-cnrepos-trim-data.ts

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,27 @@
88
* Expected reduction: ~539 MB → ~10 MB (98%)
99
*
1010
* Usage:
11-
* bun scripts/migrate-cnrepos-trim-data.ts [--dry-run] [--limit N]
11+
* bun scripts/migrate-cnrepos-trim-data.ts [--dry-run] [--limit=N]
1212
*
1313
* Options:
14-
* --dry-run Preview changes without modifying data
15-
* --limit N Process only N documents (for testing)
14+
* --dry-run Preview changes without modifying data
15+
* --limit=N Process only N documents (for testing)
1616
*/
1717

1818
import { db } from "@/src/db";
1919

2020
const DRY_RUN = process.argv.includes("--dry-run");
2121
const LIMIT_ARG = process.argv.find((a) => a.startsWith("--limit="));
22-
const LIMIT = LIMIT_ARG ? parseInt(LIMIT_ARG.split("=")[1]) : 0;
22+
const LIMIT = (() => {
23+
if (!LIMIT_ARG) return 0;
24+
const value = LIMIT_ARG.split("=")[1];
25+
const parsed = Number.parseInt(value, 10);
26+
if (Number.isNaN(parsed)) {
27+
console.error(`Invalid --limit value "${value}". Please provide an integer.`);
28+
process.exit(1);
29+
}
30+
return parsed;
31+
})();
2332

2433
// Fields to keep in info.data
2534
const INFO_FIELDS = [
@@ -256,9 +265,12 @@ async function migrate() {
256265
console.log(` Data size: ${(statsAfter.size / 1024 / 1024).toFixed(2)} MB`);
257266
console.log(` Avg doc size: ${statsAfter.avgObjSize.toLocaleString()} bytes`);
258267
}
259-
260-
await db.close();
261268
}
262269

263-
// Run
264-
await migrate();
270+
if (import.meta.main) {
271+
try {
272+
await migrate();
273+
} finally {
274+
await db.close();
275+
}
276+
}

scripts/setup-performance-indexes.ts

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,12 @@
33
* Performance Index Migration Script
44
*
55
* This script creates critical indexes identified by MongoDB Performance Advisor
6-
* to dramatically improve query performance for SlackMsgs and CNRepos collections.
6+
* to dramatically improve query performance for SlackMsgs, CNRepos, and ForkedRepo collections.
77
*
88
* Expected Impact:
99
* - SlackMsgs: 637ms → <50ms query time (92% improvement)
1010
* - CNRepos: 420ms → <100ms query time (76% improvement)
11+
* - ForkedRepo: 151ms → <10ms query time (93% improvement)
1112
* - Disk I/O reduction: ~823.8 MB per query cycle
1213
* - Query targeting: 79,841:1 → ~1:1 ratio
1314
*
@@ -22,6 +23,11 @@ import { db } from "@/src/db";
2223
import { SlackMsgs } from "@/lib/slack/SlackMsgs";
2324
import { CNRepos } from "@/src/CNRepos";
2425

26+
// ForkedRepo collection (defined inline in src/createGithubForkForRepo.ts)
27+
const ForkedRepo = db.collection<{ repo: string; forkedRepo: string; updatedAt: Date }>(
28+
"ForkedRepo",
29+
);
30+
2531
async function setupPerformanceIndexes() {
2632
console.log("🚀 Setting up performance-critical indexes...\n");
2733

@@ -85,6 +91,34 @@ async function setupPerformanceIndexes() {
8591
}
8692
}
8793

94+
// ===================================================================
95+
// ISSUE #3: ForkedRepo - Missing compound index for lookups
96+
// ===================================================================
97+
console.log("📊 ForkedRepo Collection");
98+
console.log(" Problem: Queries scanning 3830 docs to return 1");
99+
console.log(" Query Pattern: { repo: ..., forkedRepo: ... }");
100+
console.log(" Creating: idx_forkedRepo_repo");
101+
102+
try {
103+
await ForkedRepo.createIndex(
104+
{ forkedRepo: 1, repo: 1 },
105+
{
106+
name: "idx_forkedRepo_repo",
107+
background: true,
108+
unique: true,
109+
},
110+
);
111+
console.log(" ✅ Created compound index: idx_forkedRepo_repo");
112+
console.log(" Expected improvement: 151ms → <10ms (93% faster)\n");
113+
} catch (error) {
114+
if ((error as Error).message.includes("already exists")) {
115+
console.log(" ℹ️ Index idx_forkedRepo_repo already exists\n");
116+
} else {
117+
console.error(" ❌ Error creating idx_forkedRepo_repo:", error);
118+
throw error;
119+
}
120+
}
121+
88122
// ===================================================================
89123
// Verification
90124
// ===================================================================
@@ -108,6 +142,15 @@ async function setupPerformanceIndexes() {
108142
console.log(` - ${idx.name}: ${keys}${highlight}`);
109143
});
110144

145+
// List ForkedRepo indexes
146+
console.log("\n📋 ForkedRepo indexes:");
147+
const forkedRepoIndexes = await ForkedRepo.listIndexes().toArray();
148+
forkedRepoIndexes.forEach((idx) => {
149+
const keys = JSON.stringify(idx.key);
150+
const highlight = idx.name === "idx_forkedRepo_repo" ? " ⭐" : "";
151+
console.log(` - ${idx.name}: ${keys}${highlight}`);
152+
});
153+
111154
console.log("\n✨ All performance indexes created successfully!");
112155
console.log("\n📈 Next Steps:");
113156
console.log(" 1. Monitor Performance Advisor in MongoDB Atlas");

0 commit comments

Comments
 (0)