Skip to content

Added job to backfill ghost_uuid in sites table#1432

Merged
mike182uk merged 3 commits intomainfrom
mike-ber-2964-job-for-ghost-uuid-backfill
Nov 17, 2025
Merged

Added job to backfill ghost_uuid in sites table#1432
mike182uk merged 3 commits intomainfrom
mike-ber-2964-job-for-ghost-uuid-backfill

Conversation

@mike182uk
Copy link
Member

ref https://linear.app/ghost/issue/BER-2964

Added job that backfills the ghost_uuid field recently added to the sites table

ref https://linear.app/ghost/issue/BER-2964

Added job that backfills the `ghost_uuid` field recently added to the `sites`
table
@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Walkthrough

Adds a new job backfill-ghost-uuid with TypeScript implementation to fetch Ghost site UUIDs and update a MySQL database, plus a comprehensive test suite. Includes Dockerfile, docker-compose, test-run and gcloud push scripts, and README documentation. Also reorders a test import and adjusts an import grouping rule in biome.json to group NODE with BUN.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Review core job logic and DB interaction: jobs/backfill-ghost-uuid/index.ts
  • Verify tests and test setup/teardown: jobs/backfill-ghost-uuid/index.test.ts, run-tests.sh, docker-compose.yml
  • Confirm container and deployment scripts: Dockerfile, gcloud-push.sh
  • Check documentation and CI impact: README.md
  • Validate lint/config change and import reorder effects: biome.json, jobs/__template__/index.test.ts

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a job to backfill the ghost_uuid field in the sites table, which is the primary focus of all the added files.
Description check ✅ Passed The description is directly related to the changeset, referencing the Linear issue and explaining that a job was added to backfill the ghost_uuid field in the sites table.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mike-ber-2964-job-for-ghost-uuid-backfill

Tip

📝 Customizable high-level summaries are now available!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  - Provide custom instructions to shape the summary (bullet lists, tables, contributor stats, etc.).
  - Use `high_level_summary_in_walkthrough` to move the summary from the description to the walkthrough section.

  Example:

  > "Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."

  Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
jobs/backfill-ghost-uuid/index.test.ts (2)

147-172: Consider adding a test for timeout abort behavior.

The test verifies that an AbortSignal is passed but doesn't validate the 30-second timeout or test the abort scenario. Given that the implementation uses a 30s timeout (per index.ts:32), adding a test that mocks a delayed response exceeding the timeout would ensure the abort mechanism works correctly.

Example test:

test('should return null on timeout', async () => {
    global.fetch = mock(() => 
        new Promise(resolve => setTimeout(resolve, 35000))
    );
    
    const uuid = await fetchSiteGhostUUID('example.com');
    
    expect(uuid).toBeNull();
});

275-302: Consider whether this test case is necessary.

The second test updates an existing (non-null) ghost_uuid value, but the job's purpose is to backfill NULL values. While the underlying updateSiteGhostUUID function technically supports overwriting, this scenario shouldn't occur during normal backfill operations since getSitesWithoutGhostUUID only returns sites with NULL ghost_uuid.

jobs/backfill-ghost-uuid/run-tests.sh (2)

4-4: Consider using docker compose instead of docker-compose.

The docker-compose standalone command is deprecated. Docker Compose V2 is now bundled with Docker CLI as docker compose.

Apply this diff:

-docker-compose up -d
+docker compose up -d

Note: You'll also need to update line 8 and line 21.


20-21: Ensure cleanup runs even on test failure.

With the suggested set -e, the script will exit on test failure before reaching cleanup. Consider adding a trap handler to ensure docker compose down always runs.

Add after set -e:

set -e
trap 'docker compose down' EXIT

Then remove the explicit docker compose down at line 21 since the trap handles it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b884710 and 1a78f56.

📒 Files selected for processing (10)
  • biome.json (1 hunks)
  • jobs/__template__/index.test.ts (1 hunks)
  • jobs/backfill-ghost-uuid/Dockerfile (1 hunks)
  • jobs/backfill-ghost-uuid/README.md (1 hunks)
  • jobs/backfill-ghost-uuid/docker-compose.yml (1 hunks)
  • jobs/backfill-ghost-uuid/gcloud-push.sh (1 hunks)
  • jobs/backfill-ghost-uuid/index.test.ts (1 hunks)
  • jobs/backfill-ghost-uuid/index.ts (1 hunks)
  • jobs/backfill-ghost-uuid/run-tests.sh (1 hunks)
  • jobs/migrate-bluesky-handles/index.test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: allouis
Repo: TryGhost/ActivityPub PR: 737
File: migrate/migrations/000054_populate-outboxes-table.up.sql:56-63
Timestamp: 2025-05-27T09:09:22.110Z
Learning: In the outboxes table migration for Ghost ActivityPub, the author_id field always refers to the author of the original post, not the person performing the outbox action. For reposts (outbox_type = 1), author_id should be posts.author_id (original author), while user_id captures who made the repost via the joins through accounts and users tables.
📚 Learning: 2025-09-04T07:49:39.075Z
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1240
File: jobs/migrate-bluesky-handles/Dockerfile:1-8
Timestamp: 2025-09-04T07:49:39.075Z
Learning: The jobs/migrate-bluesky-handles project uses only Bun's built-in APIs (SQL helper, fetch) and requires no external dependencies, making a simple COPY index.ts approach sufficient in the Dockerfile.

Applied to files:

  • jobs/migrate-bluesky-handles/index.test.ts
  • jobs/backfill-ghost-uuid/index.ts
  • jobs/__template__/index.test.ts
  • jobs/backfill-ghost-uuid/Dockerfile
📚 Learning: 2025-09-25T12:39:45.755Z
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1329
File: .github/renovate.json5:31-34
Timestamp: 2025-09-25T12:39:45.755Z
Learning: The jobs/update-external-accounts directory was removed from the TryGhost/ActivityPub repository and no longer exists, so Renovate configuration exclusions referencing this path are no longer needed.

Applied to files:

  • jobs/backfill-ghost-uuid/index.ts
  • jobs/backfill-ghost-uuid/README.md
🧬 Code graph analysis (1)
jobs/backfill-ghost-uuid/index.test.ts (2)
jobs/backfill-ghost-uuid/index.ts (3)
  • fetchSiteGhostUUID (28-66)
  • getSitesWithoutGhostUUID (68-76)
  • updateSiteGhostUUID (78-87)
src/core/result.ts (1)
  • Error (9-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build / Build Docker Images (linux/amd64, amd64)
🔇 Additional comments (13)
jobs/migrate-bluesky-handles/index.test.ts (1)

1-2: LGTM! Import formatting aligns with updated rules.

The blank line addition follows the new import grouping configuration in biome.json, which now pairs NODE and BUN imports together.

biome.json (1)

26-26: LGTM! Grouping NODE and BUN imports together.

This change appropriately pairs NODE and BUN imports in the same group, which aligns with the project's use of the Bun runtime.

jobs/__template__/index.test.ts (1)

1-11: LGTM! Template updated to match new import grouping.

The import reorganization follows the updated biome.json rules, providing a consistent template for future jobs.

jobs/backfill-ghost-uuid/docker-compose.yml (1)

1-10: LGTM! Well-configured test environment.

The MySQL test service uses image digest pinning for reproducibility and tmpfs mounting for faster ephemeral test storage. Port mapping to 3308 avoids conflicts with local MySQL instances.

jobs/backfill-ghost-uuid/gcloud-push.sh (1)

1-39: LGTM! Well-structured deployment script.

The script follows best practices with strict error handling (set -euo pipefail), validates prerequisites, and provides clear progress messages throughout the build and push process.

jobs/backfill-ghost-uuid/README.md (1)

1-67: LGTM! Comprehensive job documentation.

The README clearly describes the job's purpose, operational characteristics (sequential processing, rate limiting, error handling), and provides complete setup and deployment instructions.

jobs/backfill-ghost-uuid/index.ts (5)

28-66: LGTM! Robust HTTP request handling with timeout.

The function properly implements:

  • Abort controller with 30s timeout to prevent hanging requests
  • HTTP status validation
  • Response structure validation
  • Error handling with logging

68-76: LGTM! Clean query implementation.

The function retrieves sites without ghost_uuid using a straightforward parameterized query with ORDER BY for deterministic processing.


78-87: LGTM! Properly parameterized update query.

The function uses parameterized queries to prevent SQL injection vulnerabilities.


89-156: LGTM! Well-structured main orchestration.

The main function properly:

  • Configures DB connection with socket or host/port fallback
  • Processes sites sequentially with rate limiting (500ms delay)
  • Tracks and reports success/failure counts
  • Ensures connection cleanup in finally block

158-164: LGTM! Proper entry point with error handling.

The entry point correctly checks import.meta.main and handles unhandled errors with appropriate exit codes.

jobs/backfill-ghost-uuid/Dockerfile (1)

1-9: Remove the unnecessary lib directory from the Dockerfile—it does not exist and will cause build failure.

The lib directory referenced in line 6 (COPY lib ./lib) does not exist in the repository. The job only imports mysql2/promise, which is a Bun built-in API. Following the same pattern as similar jobs (e.g., migrate-bluesky-handles), remove this line to allow the build to succeed.

FROM oven/bun:1.3.0-alpine@sha256:37e6b1cbe053939bccf6ae4507977ed957eaa6e7f275670b72ad6348e0d2c11f

WORKDIR /opt/job

COPY index.ts .

ENTRYPOINT ["bun", "run", "index.ts"]
CMD []
⛔ Skipped due to learnings
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1240
File: jobs/migrate-bluesky-handles/Dockerfile:1-8
Timestamp: 2025-09-04T07:49:39.075Z
Learning: The jobs/migrate-bluesky-handles project uses only Bun's built-in APIs (SQL helper, fetch) and requires no external dependencies, making a simple COPY index.ts approach sufficient in the Dockerfile.
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1240
File: jobs/migrate-bluesky-handles/Dockerfile:1-8
Timestamp: 2025-09-04T07:49:39.075Z
Learning: The jobs/migrate-bluesky-handles project uses only Bun's built-in APIs (SQL helper, fetch) and requires no external dependencies, making a simple COPY index.ts approach sufficient in the Dockerfile.
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1329
File: .github/renovate.json5:31-34
Timestamp: 2025-09-25T12:39:45.755Z
Learning: The jobs/update-external-accounts directory was removed from the TryGhost/ActivityPub repository and no longer exists, so Renovate configuration exclusions referencing this path are no longer needed.
jobs/backfill-ghost-uuid/index.test.ts (1)

175-243: LGTM!

Comprehensive test coverage for getSitesWithoutGhostUUID, including edge cases (empty table, no NULL values) and ordering verification.

Comment on lines +1 to +2
#!/usr/bin/env bash

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add error handling with set -e.

The script lacks set -e, so failures in intermediate commands won't halt execution. This could lead to confusing errors if, for example, docker-compose up fails but the script continues.

Apply this diff:

 #!/usr/bin/env bash
+set -e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#!/usr/bin/env bash
#!/usr/bin/env bash
set -e
🤖 Prompt for AI Agents
In jobs/backfill-ghost-uuid/run-tests.sh around lines 1 to 2, the script is
missing a safety flag so failures in intermediate commands won't stop execution;
add a shell option line immediately after the shebang to enable exiting on
errors (e.g., add set -e), optionally including set -o pipefail for safer
pipeline behavior, so the script will abort when any command fails.

Comment on lines +7 to +15
for i in {1..30}; do
if docker exec "$(docker-compose ps -q mysql-test)" mysqladmin ping -h"localhost" -P"3306" -u"root" -p"root" --silent 2>/dev/null; then
echo "MySQL is ready!"
sleep 2
break
fi
echo "Waiting for MySQL... ($i/30)"
sleep 1
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling if MySQL never becomes ready.

The script continues executing tests even if the 30-attempt loop exhausts without MySQL becoming ready. This leads to test failures that don't clearly indicate the root cause.

Apply this diff:

 echo "Waiting for MySQL to be ready..."
+ready=false
 for i in {1..30}; do
     if docker exec "$(docker-compose ps -q mysql-test)" mysqladmin ping -h"localhost" -P"3306" -u"root" -p"root" --silent 2>/dev/null; then
         echo "MySQL is ready!"
         sleep 2
+        ready=true
         break
     fi
     echo "Waiting for MySQL... ($i/30)"
     sleep 1
 done
+
+if [ "$ready" = false ]; then
+    echo "ERROR: MySQL failed to become ready after 30 attempts"
+    docker-compose down
+    exit 1
+fi

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In jobs/backfill-ghost-uuid/run-tests.sh around lines 7 to 15, the script waits
up to 30 times for MySQL but doesn't handle the case where MySQL never becomes
ready; update the script to detect when the loop exhausts (no successful ping),
print a clear error message, dump relevant container logs (e.g., docker-compose
logs mysql-test) to aid debugging, and exit with a non-zero status so tests do
not continue silently.

Copy link
Contributor

@sagzy sagzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@mike182uk mike182uk enabled auto-merge (squash) November 17, 2025 15:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a78f56 and 36eb24e.

📒 Files selected for processing (3)
  • jobs/backfill-ghost-uuid/Dockerfile (1 hunks)
  • jobs/backfill-ghost-uuid/index.test.ts (1 hunks)
  • jobs/backfill-ghost-uuid/index.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • jobs/backfill-ghost-uuid/Dockerfile
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: allouis
Repo: TryGhost/ActivityPub PR: 737
File: migrate/migrations/000054_populate-outboxes-table.up.sql:56-63
Timestamp: 2025-05-27T09:09:22.110Z
Learning: In the outboxes table migration for Ghost ActivityPub, the author_id field always refers to the author of the original post, not the person performing the outbox action. For reposts (outbox_type = 1), author_id should be posts.author_id (original author), while user_id captures who made the repost via the joins through accounts and users tables.
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1240
File: jobs/migrate-bluesky-handles/Dockerfile:1-8
Timestamp: 2025-09-04T07:49:39.075Z
Learning: The jobs/migrate-bluesky-handles project uses only Bun's built-in APIs (SQL helper, fetch) and requires no external dependencies, making a simple COPY index.ts approach sufficient in the Dockerfile.
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1240
File: jobs/migrate-bluesky-handles/Dockerfile:1-8
Timestamp: 2025-09-04T07:49:39.075Z
Learning: The jobs/migrate-bluesky-handles project uses only Bun's built-in APIs (SQL helper, fetch) and requires no external dependencies, making a simple COPY index.ts approach sufficient in the Dockerfile.
📚 Learning: 2025-09-04T07:49:39.075Z
Learnt from: mike182uk
Repo: TryGhost/ActivityPub PR: 1240
File: jobs/migrate-bluesky-handles/Dockerfile:1-8
Timestamp: 2025-09-04T07:49:39.075Z
Learning: The jobs/migrate-bluesky-handles project uses only Bun's built-in APIs (SQL helper, fetch) and requires no external dependencies, making a simple COPY index.ts approach sufficient in the Dockerfile.

Applied to files:

  • jobs/backfill-ghost-uuid/index.ts
🧬 Code graph analysis (1)
jobs/backfill-ghost-uuid/index.test.ts (1)
jobs/backfill-ghost-uuid/index.ts (3)
  • fetchSiteGhostUUID (28-66)
  • getSitesWithoutGhostUUID (68-76)
  • updateSiteGhostUUID (78-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test / Test
🔇 Additional comments (8)
jobs/backfill-ghost-uuid/index.ts (4)

68-76: LGTM!

The query correctly selects sites without a ghost_uuid and orders them by ID. The parameterized query prevents SQL injection.


78-102: LGTM!

The function properly handles duplicate UUID constraints by catching ER_DUP_ENTRY errors and returning a specific result. The type guard for error checking is thorough and safe.


104-186: LGTM!

The orchestration logic is well-structured with proper connection handling, sequential processing with rate limiting (500ms delay), comprehensive logging, and appropriate cleanup in the finally block.


188-194: LGTM!

The entry point correctly guards execution and handles unhandled errors with appropriate exit codes.

jobs/backfill-ghost-uuid/index.test.ts (4)

24-58: LGTM!

The test setup demonstrates excellent practices: proper connection management, table creation/cleanup, clean state between tests via DELETE, and fetch mock restoration. The schema correctly uses CHAR(36) for UUID storage and includes appropriate constraints.


60-173: LGTM!

Comprehensive test coverage for fetchSiteGhostUUID including success path, HTTP errors, malformed responses, missing fields, network errors, and URL format verification. Well-structured with appropriate mocking.


175-243: LGTM!

Thorough test coverage for getSitesWithoutGhostUUID including filtering logic, empty results, and ordering verification. The tests properly validate the WHERE clause behavior.


245-361: LGTM!

Excellent test coverage for updateSiteGhostUUID including null-to-UUID updates, existing UUID updates, duplicate constraint handling, and return value verification. The duplicate test correctly verifies the row remains unchanged when a constraint violation occurs.

Comment on lines +61 to +65
} catch (error) {
console.warn(`Failed to fetch site info for ${host}: ${error.message}`);

return null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type-check error before accessing .message property.

The caught error has type unknown. Accessing error.message without verifying it's an Error instance could fail if the error is a string or other type.

Apply this diff to add a type guard:

     } catch (error) {
-        console.warn(`Failed to fetch site info for ${host}: ${error.message}`);
+        const message = error instanceof Error ? error.message : String(error);
+        console.warn(`Failed to fetch site info for ${host}: ${message}`);
 
         return null;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
console.warn(`Failed to fetch site info for ${host}: ${error.message}`);
return null;
}
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.warn(`Failed to fetch site info for ${host}: ${message}`);
return null;
}
🤖 Prompt for AI Agents
In jobs/backfill-ghost-uuid/index.ts around lines 61 to 65, the catch block
accesses error.message but the caught value is typed as unknown; add a type
guard to safely extract a message — for example check if (error instanceof
Error) and use error.message, otherwise fall back to String(error) (or a default
like 'Unknown error'), and use that safeMessage in the console.warn before
returning null.

@mike182uk mike182uk merged commit 83a2fb2 into main Nov 17, 2025
29 of 31 checks passed
@mike182uk mike182uk deleted the mike-ber-2964-job-for-ghost-uuid-backfill branch November 17, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants