Skip to content

Comments

feat: add image compression and upload pipeline for org logos#102

Open
aayank13 wants to merge 1 commit intoketankauntia:masterfrom
aayank13:feat/image-compression-upload-pipeline
Open

feat: add image compression and upload pipeline for org logos#102
aayank13 wants to merge 1 commit intoketankauntia:masterfrom
aayank13:feat/image-compression-upload-pipeline

Conversation

@aayank13
Copy link

@aayank13 aayank13 commented Feb 20, 2026

Summary

Adds an image processing pipeline for GSoC organization logos that:

  • Downloads org logos from the GSoC API
  • Compresses them to optimized WebP format via sharp
  • Renames them to {slug}.webp
  • Uploads them to Cloudflare R2

Also creates the images/ folder structure with tech-stack/ and 2026/ subfolders.

Closes #96

Changes

New Files

  • scripts/lib/r2-client.ts — Reusable Cloudflare R2 upload client (S3-compatible)
  • scripts/lib/image-processor.ts — Image download (with retry) + WebP compression
  • scripts/process-org-images.ts — Main pipeline orchestrator with --dry-run and --local-only modes
  • images/tech-stack/.gitkeep — Placeholder for future tech-stack icons
  • images/2026/.gitkeep — Output directory for processed 2026 org logos

Modified Files

  • package.json — Added sharp, @aws-sdk/client-s3; new gsoc:images script; updated gsoc:sync
  • .gitignore — Ignore generated image files in images/
  • transform-year-organizations.ts — Fixed img_r2_url for new orgs (was incorrectly set to raw API URL)

Usage

pnpm gsoc:images -- --year 2026 --dry-run     # Preview what would be processed
pnpm gsoc:images -- --year 2026 --local-only   # Download + compress only
pnpm gsoc:images -- --year 2026                # Full pipeline with R2 upload


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **New Features**
  * Automated image processing workflow for organization logos now includes WebP compression for optimized file sizes and cloud storage integration for reliable image management and delivery.

* **Chores**
  * Added project dependencies for image compression and cloud storage functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@vercel
Copy link

vercel bot commented Feb 20, 2026

@aayank13 is attempting to deploy a commit to the Ketan's Personal Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Validation error: Expected 'de' | 'de-DE' | 'de-AT' | 'de-CH' | 'en' | 'en-US' | 'en-AU' | 'en-GB' | 'en-CA' | 'en-NZ' | 'en-ZA' | 'es' | 'es-AR' | 'fr' | 'fr-CA' | 'fr-CH' | 'fr-BE' | 'nl' | 'nl-BE' | 'pt-AO' | 'pt' | 'pt-BR' | 'pt-MZ' | 'pt-PT' | 'ar' | 'ast-ES' | 'ast' | 'be-BY' | 'be' | 'br-FR' | 'br' | 'ca-ES' | 'ca' | 'ca-ES-valencia' | 'ca-ES-balear' | 'da-DK' | 'da' | 'de-DE-x-simple-language' | 'el-GR' | 'el' | 'eo' | 'fa' | 'ga-IE' | 'ga' | 'gl-ES' | 'gl' | 'it' | 'ja-JP' | 'ja' | 'km-KH' | 'km' | 'ko-KR' | 'ko' | 'pl-PL' | 'pl' | 'ro-RO' | 'ro' | 'ru-RU' | 'ru' | 'sk-SK' | 'sk' | 'sl-SI' | 'sl' | 'sv' | 'ta-IN' | 'ta' | 'tl-PH' | 'tl' | 'tr' | 'uk-UA' | 'uk' | 'zh-CN' | 'zh' | 'crh-UA' | 'crh' | 'cs-CZ' | 'cs' | 'nb' | 'no' | 'nl-NL' | 'de-DE-x-simple-language-DE' | 'es-ES' | 'it-IT' | 'fa-IR' | 'sv-SE' | 'de-LU' | 'fr-FR' | 'bg-BG' | 'bg' | 'he-IL' | 'he' | 'hi-IN' | 'hi' | 'vi-VN' | 'vi' | 'th-TH' | 'th' | 'bn-BD' | 'bn', received object at "language"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

A new image processing pipeline for GSoC organization logos is introduced. The system downloads logos from the GSoC API, compresses them to WebP format using the sharp library, saves them locally, and uploads them to Cloudflare R2 for distribution. A new script orchestrates this workflow with support for dry-run and local-only modes.

Changes

Cohort / File(s) Summary
Configuration & Dependencies
.gitignore, package.json
Added ignore patterns for generated webp/png/jpg images in images directory. Introduced new npm script gsoc:images and added @aws-sdk/client-s3 and sharp dependencies to both dependencies and devDependencies. Updated gsoc:sync workflow to include image processing step.
Image Processing Utilities
scripts/lib/image-processor.ts, scripts/lib/r2-client.ts
New utility modules providing image download/compression (downloadImage, compressToWebP, processAndSaveLocally) and R2 upload capabilities (uploadToR2, getR2PublicUrl). Includes retry logic for downloads and configuration via environment variables.
Image Processing Orchestration
scripts/process-org-images.ts
New script that orchestrates the complete workflow: reads organization data, downloads logos, compresses to WebP, optionally uploads to R2, and updates org JSON files with new image URLs. Supports --year, --dry-run, and --local-only flags with progress logging and error handling.
Organization Transform Minor Updates
scripts/transform-year-organizations.ts
Formatting adjustment and functional change: img_r2_url now defaults to empty string instead of falling back to logo URL when creating new organizations.

Sequence Diagram(s)

sequenceDiagram
    participant Script as process-org-images.ts
    participant FS as File System
    participant Processor as image-processor.ts
    participant R2 as r2-client.ts
    participant CloudflareR2 as Cloudflare R2

    Script->>FS: Read raw org JSON
    Script->>FS: Read per-org JSON files
    
    loop For each org with logo_url
        Script->>Processor: processAndSaveLocally(logo_url, ...)
        activate Processor
        Processor->>Processor: downloadImage (with retries)
        Processor->>Processor: compressToWebP (sharp)
        Processor->>FS: Create images/<YEAR>/ dir
        Processor->>FS: Save compressed image
        Processor-->>Script: Return local image path
        deactivate Processor
        
        alt Not --local-only
            Script->>R2: uploadToR2(image_path, ...)
            activate R2
            R2->>CloudflareR2: PutObjectCommand
            CloudflareR2-->>R2: Return success
            R2->>R2: Generate public URL
            R2-->>Script: Return R2 URL
            deactivate R2
            Script->>FS: Update per-org JSON with img_r2_url
        end
    end
    
    Script->>Script: Log summary (processed, skipped, failed)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With whiskers twitching, I bounce with glee,
Compressing logos to webp spree!
Down they download, to R2 they soar,
Organization images we store and adore! 📸✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature: adding an image compression and upload pipeline for organization logos.
Description check ✅ Passed The description provides a comprehensive summary, detailed changes, and clear usage examples covering all required information.
Linked Issues check ✅ Passed The PR fully implements all requirements from issue #96: downloads org images, compresses to WebP, renames files, uploads to R2, and creates proper folder structure.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue requirements; no unrelated modifications detected in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
package.json (1)

57-57: sharp ^0.33.0 will not resolve to the current 0.34.x series.

For packages with a 0.x.y version, the ^ range only allows patch increments within the same minor (0.33.*). The latest published version is 0.34.5, which includes upstream libvips bug fixes and TypeScript improvements. Consider bumping to ^0.34.0 to pick up those fixes.

💡 Proposed change
-"sharp": "^0.33.0",
+"sharp": "^0.34.0",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 57, The package.json currently pins the sharp
dependency to "sharp": "^0.33.0", which will not pick up the 0.34.x series;
update the sharp entry to use "^0.34.0" so the project can receive the 0.34.*
bugfix and TypeScript improvements, then regenerate your lockfile by running
your package manager install (npm/yarn/pnpm) to update package-lock.json or
yarn.lock accordingly; ensure any CI/cache is refreshed so the new version is
used.
scripts/lib/r2-client.ts (1)

26-33: Consider setting a request timeout on the S3 client.

The AWS SDK v3 S3Client has no default socket/request timeout; a stalled upload to R2 will hang the script indefinitely. Add requestHandler or maxAttempts config, or at minimum a socketTimeout.

💡 Suggested timeout config
+import { NodeHttpHandler } from "@smithy/node-http-handler";
+
 _client = new S3Client({
     region: "auto",
     endpoint: `https://${accountId}.r2.cloudflarestorage.com`,
     credentials: {
         accessKeyId: getEnvOrThrow("R2_ACCESS_KEY_ID"),
         secretAccessKey: getEnvOrThrow("R2_SECRET_ACCESS_KEY"),
     },
+    requestHandler: new NodeHttpHandler({
+        requestTimeout: 30_000,
+        socketTimeout: 30_000,
+    }),
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/r2-client.ts` around lines 26 - 33, The S3Client instantiation
assigned to _client lacks a request timeout and can hang; update the S3Client
config in the S3Client(...) call to include a requestHandler with timeouts
(e.g., import and use NodeHttpHandler and pass requestHandler: new
NodeHttpHandler({ socketTimeout: <ms>, connectionTimeout: <ms> })) and/or set
maxAttempts to a sensible retry limit so R2 uploads won't stall indefinitely;
update the S3Client(...) call where _client is created to include these options.
scripts/lib/image-processor.ts (1)

58-74: processAndSaveLocally is exported but its functionality is duplicated inline in process-org-images.ts.

process-org-images.ts manually calls downloadImagecompressToWebPfs.writeFileSync (lines 117–121) instead of calling processAndSaveLocally. The only difference is the inline size comparison log. Consider extending processAndSaveLocally to return both buffers (or sizes) so callers can retain size logging while avoiding the duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/image-processor.ts` around lines 58 - 74, processAndSaveLocally
duplicates logic in process-org-images.ts; change processAndSaveLocally to
return both the saved file path and size info (e.g., { outputPath: string,
originalSize: number, compressedSize: number } or include the
original/compressed Buffers) so callers can log size differences without
reimplementing downloadImage/compressToWebP/write logic. Update
processAndSaveLocally (the function shown) to capture original buffer size
before compression and compressed buffer size after compressToWebP, write the
file as now, and return the sizes alongside outputPath; then replace the manual
download/compress/write sequence in process-org-images.ts with a call to
processAndSaveLocally and use the returned sizes for the existing size
comparison log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@package.json`:
- Line 22: The npm script "gsoc:sync" currently invokes
scripts/process-org-images.ts without the --local-only flag, which forces R2
uploads and causes silent per-org failures when R2_* env vars are not set;
update the "gsoc:sync" entry to call scripts/process-org-images.ts --local-only
(so uploads are decoupled and handled by the separate "gsoc:images" script), or
alternatively add a README note that "gsoc:sync" requires R2 credentials (R2_*
env vars) if you want to keep the current behavior; reference the "gsoc:sync"
npm script and scripts/process-org-images.ts and "gsoc:images" to make the
change or documentation clear.

In `@scripts/lib/image-processor.ts`:
- Around line 15-39: The downloadImage function currently uses fetch without a
timeout, so a stalled response will never throw and retries won't trigger;
modify downloadImage to create an AbortController for each fetch attempt, pass
controller.signal into fetch(url, { signal }), start a per-attempt timer (e.g.,
via setTimeout) that calls controller.abort() after a configured per-attempt
timeout, and clear the timer when the response is received or on error; ensure
the abort error is handled like other errors so the loop retries (using existing
lastError, RETRY_DELAY_MS and MAX_RETRIES) and that the controller/timer are
properly cleaned up each attempt to avoid leaks.

In `@scripts/process-org-images.ts`:
- Around line 76-88: The skip-and-update logic uses raw.slug directly to build
orgFile so aliased slugs (SLUG_ALIASES) never resolve and R2 URLs aren't
persisted; import or duplicate the SLUG_ALIASES mapping and resolve the
canonical file slug before any filesystem lookup (i.e., compute a resolvedSlug
from SLUG_ALIASES[raw.slug] || raw.slug) and use that when constructing orgFile
(used in the pre-skip check and in updateOrgJson), ensuring both the existence
check and the write/update target the actual JSON filename under ORGS_DIR.
- Line 45: R2_URL_PREFIX is hardcoded which breaks the skip check that uses
currentR2.startsWith(R2_URL_PREFIX); instead derive the prefix from the same
source as r2-client (use the R2_PUBLIC_URL env var or call getR2PublicUrl from
r2-client) so the skip logic matches the actual public URL; update the
declaration of R2_URL_PREFIX to compute its value from process.env.R2_PUBLIC_URL
(or import and call getR2PublicUrl) with the existing literal as a fallback, and
ensure the currentR2.startsWith(...) check uses this computed value.
- Around line 127-132: The R2 upload uses r2Key = `${raw.slug}.webp` which omits
the year and causes cross-year overwrites; update the r2Key construction in the
block that checks LOCAL_ONLY (where uploadToR2 is called) to include the same
year segment used for local saves (e.g., `${year}/${raw.slug}.webp` or whatever
variable holds the YEAR), so the remote key mirrors the local path; ensure any
logging (console.log) and references to r2Url remain unchanged after this
change.
- Around line 150-163: The script currently logs failures but never sets a
non-zero exit code; update the end of the script where failures is inspected
(the block that prints "[FAILURES]" and the LOCAL_ONLY messages) to call
process.exit(1) when failures.length > 0 so CI fails on any upload errors;
ensure you only skip the exit when LOCAL_ONLY is true and uploads were
intentionally not attempted (or always exit non-zero regardless of LOCAL_ONLY if
you prefer the simpler behavior), referencing the failures array and the
existing LOCAL_ONLY/IMAGES_DIR logic to decide when to call process.exit(1).

---

Nitpick comments:
In `@package.json`:
- Line 57: The package.json currently pins the sharp dependency to "sharp":
"^0.33.0", which will not pick up the 0.34.x series; update the sharp entry to
use "^0.34.0" so the project can receive the 0.34.* bugfix and TypeScript
improvements, then regenerate your lockfile by running your package manager
install (npm/yarn/pnpm) to update package-lock.json or yarn.lock accordingly;
ensure any CI/cache is refreshed so the new version is used.

In `@scripts/lib/image-processor.ts`:
- Around line 58-74: processAndSaveLocally duplicates logic in
process-org-images.ts; change processAndSaveLocally to return both the saved
file path and size info (e.g., { outputPath: string, originalSize: number,
compressedSize: number } or include the original/compressed Buffers) so callers
can log size differences without reimplementing
downloadImage/compressToWebP/write logic. Update processAndSaveLocally (the
function shown) to capture original buffer size before compression and
compressed buffer size after compressToWebP, write the file as now, and return
the sizes alongside outputPath; then replace the manual download/compress/write
sequence in process-org-images.ts with a call to processAndSaveLocally and use
the returned sizes for the existing size comparison log.

In `@scripts/lib/r2-client.ts`:
- Around line 26-33: The S3Client instantiation assigned to _client lacks a
request timeout and can hang; update the S3Client config in the S3Client(...)
call to include a requestHandler with timeouts (e.g., import and use
NodeHttpHandler and pass requestHandler: new NodeHttpHandler({ socketTimeout:
<ms>, connectionTimeout: <ms> })) and/or set maxAttempts to a sensible retry
limit so R2 uploads won't stall indefinitely; update the S3Client(...) call
where _client is created to include these options.

"gsoc:regen": "npx tsx scripts/regenerate-tech-topics-from-json.ts",
"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
"gsoc:images": "npx tsx scripts/process-org-images.ts",
"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

gsoc:sync calls process-org-images.ts without --local-only, requiring R2 credentials for a general sync.

Anyone running gsoc:sync without R2_* env vars set will have every upload fail silently (per-org errors are caught and accumulated; script still exits 0), so the pipeline continues but org JSONs never get img_r2_url updated. Consider either:

  • Appending --local-only to the sync invocation (upload separately via gsoc:images), or
  • Documenting in README that gsoc:sync requires R2 credentials configured.
💡 Proposed change — decouple upload from sync
-"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
+"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts --local-only && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts --local-only && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 22, The npm script "gsoc:sync" currently invokes
scripts/process-org-images.ts without the --local-only flag, which forces R2
uploads and causes silent per-org failures when R2_* env vars are not set;
update the "gsoc:sync" entry to call scripts/process-org-images.ts --local-only
(so uploads are decoupled and handled by the separate "gsoc:images" script), or
alternatively add a README note that "gsoc:sync" requires R2 credentials (R2_*
env vars) if you want to keep the current behavior; reference the "gsoc:sync"
npm script and scripts/process-org-images.ts and "gsoc:images" to make the
change or documentation clear.

Comment on lines +15 to +39
export async function downloadImage(url: string): Promise<Buffer> {
let lastError: Error | null = null;

for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
try {
const response = await fetch(url);

if (!response.ok) {
throw new Error(`HTTP ${response.status} ${response.statusText}`);
}

const arrayBuffer = await response.arrayBuffer();
return Buffer.from(arrayBuffer);
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
if (attempt < MAX_RETRIES) {
await sleep(RETRY_DELAY_MS * attempt);
}
}
}

throw new Error(
`Failed to download ${url} after ${MAX_RETRIES} attempts: ${lastError?.message}`,
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

fetch has no timeout — a stalled remote server will hang the pipeline indefinitely.

Without an AbortController timeout, a logo server that accepts the connection but never delivers data will block downloadImage forever, bypassing the retry logic (retries only fire on thrown errors, not on a hanging response).

💡 Proposed fix — add per-attempt timeout
 export async function downloadImage(url: string): Promise<Buffer> {
     let lastError: Error | null = null;
+    const TIMEOUT_MS = 15_000;

     for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
         try {
-            const response = await fetch(url);
+            const controller = new AbortController();
+            const timer = setTimeout(() => controller.abort(), TIMEOUT_MS);
+            let response: Response;
+            try {
+                response = await fetch(url, { signal: controller.signal });
+            } finally {
+                clearTimeout(timer);
+            }

             if (!response.ok) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/image-processor.ts` around lines 15 - 39, The downloadImage
function currently uses fetch without a timeout, so a stalled response will
never throw and retries won't trigger; modify downloadImage to create an
AbortController for each fetch attempt, pass controller.signal into fetch(url, {
signal }), start a per-attempt timer (e.g., via setTimeout) that calls
controller.abort() after a configured per-attempt timeout, and clear the timer
when the response is received or on error; ensure the abort error is handled
like other errors so the loop retries (using existing lastError, RETRY_DELAY_MS
and MAX_RETRIES) and that the controller/timer are properly cleaned up each
attempt to avoid leaks.

);
const IMAGES_DIR = path.join(ROOT, "images", String(YEAR));

const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

R2_URL_PREFIX is hardcoded and diverges from the R2_PUBLIC_URL env var used in r2-client.ts.

The skip logic on lines 81–84 compares currentR2.startsWith(R2_URL_PREFIX), but r2-client.getR2PublicUrl derives the actual public URL from the R2_PUBLIC_URL env var. If a different R2 bucket or CDN is configured, the hardcoded prefix will never match existing records: all orgs will be re-downloaded and re-uploaded on every run, and already-processed orgs won't be skipped.

💡 Proposed fix — derive prefix from env var
-const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";
+const R2_URL_PREFIX = process.env.R2_PUBLIC_URL
+    ? process.env.R2_PUBLIC_URL.replace(/\/+$/, "") + "/"
+    : "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";
const R2_URL_PREFIX = process.env.R2_PUBLIC_URL
? process.env.R2_PUBLIC_URL.replace(/\/+$/, "") + "/"
: "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` at line 45, R2_URL_PREFIX is hardcoded which
breaks the skip check that uses currentR2.startsWith(R2_URL_PREFIX); instead
derive the prefix from the same source as r2-client (use the R2_PUBLIC_URL env
var or call getR2PublicUrl from r2-client) so the skip logic matches the actual
public URL; update the declaration of R2_URL_PREFIX to compute its value from
process.env.R2_PUBLIC_URL (or import and call getR2PublicUrl) with the existing
literal as a fallback, and ensure the currentR2.startsWith(...) check uses this
computed value.

Comment on lines +76 to +88
const orgFile = path.join(ORGS_DIR, `${raw.slug}.json`);
if (fs.existsSync(orgFile)) {
try {
const orgData = JSON.parse(fs.readFileSync(orgFile, "utf-8"));
const currentR2 = orgData.img_r2_url || orgData.logo_r2_url || "";
if (currentR2.startsWith(R2_URL_PREFIX)) {
skipped.push(`${raw.slug} (already has R2 URL)`);
continue;
}
} catch {
// If we can't read/parse the file, process anyway
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Slug alias resolution is missing — aliased orgs will never have their R2 URL written.

scripts/transform-year-organizations.ts maps some raw API slugs to different file names via SLUG_ALIASES (e.g., "ceph""ceph-foundation", "openms-inc""openms"). Both the skip-check (line 76) and updateOrgJson (line 167) derive the org file path directly from raw.slug:

const orgFile = path.join(ORGS_DIR, `${raw.slug}.json`);  // e.g., ceph.json — does not exist

For any aliased org:

  1. The skip-check won't find the org file → the alias is always queued for download even if already uploaded.
  2. After upload, updateOrgJson silently skips (line 168: if (!fs.existsSync(orgFile)) return;) → img_r2_url / logo_r2_url are never persisted.

Fix: duplicate or extract the SLUG_ALIASES map (or move it to a shared config) and resolve the file slug before both checks.

💡 Minimal fix — add alias resolution
+// Keep in sync with SLUG_ALIASES in transform-year-organizations.ts
+const SLUG_ALIASES: Record<string, string> = {
+    "ceph": "ceph-foundation",
+    "openms-inc": "openms",
+};
+
+function resolveFileSlug(rawSlug: string): string {
+    return SLUG_ALIASES[rawSlug] ?? rawSlug;
+}

Then replace every raw.slug used for file lookup:

-const orgFile = path.join(ORGS_DIR, `${raw.slug}.json`);
+const orgFile = path.join(ORGS_DIR, `${resolveFileSlug(raw.slug)}.json`);
 function updateOrgJson(slug: string, r2Url: string) {
-    const orgFile = path.join(ORGS_DIR, `${slug}.json`);
+    const orgFile = path.join(ORGS_DIR, `${resolveFileSlug(slug)}.json`);

Ideally the alias map should be extracted to a shared module (e.g., scripts/lib/slug-aliases.ts) consumed by both scripts to avoid the duplication.

Also applies to: 166-178

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` around lines 76 - 88, The skip-and-update
logic uses raw.slug directly to build orgFile so aliased slugs (SLUG_ALIASES)
never resolve and R2 URLs aren't persisted; import or duplicate the SLUG_ALIASES
mapping and resolve the canonical file slug before any filesystem lookup (i.e.,
compute a resolvedSlug from SLUG_ALIASES[raw.slug] || raw.slug) and use that
when constructing orgFile (used in the pre-skip check and in updateOrgJson),
ensuring both the existence check and the write/update target the actual JSON
filename under ORGS_DIR.

Comment on lines +127 to +132
let r2Url = "";
if (!LOCAL_ONLY) {
const r2Key = `${raw.slug}.webp`;
r2Url = await uploadToR2(r2Key, webpBuffer, "image/webp");
console.log(`${progress} Uploaded to R2: ${r2Url}`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

R2 key has no year prefix, inconsistent with the local path structure and causes year-over-year overwrites.

The local file is saved as images/{YEAR}/{slug}.webp, but the R2 key is just {slug}.webp. A re-run for a different year (e.g., 2027) silently overwrites the 2026 logo in R2, and any org link pointing to the 2026 URL would serve the 2027 image.

If the intent is one canonical logo per org (year-agnostic), this should be documented. If per-year logos are needed, the key should include the year:

💡 Proposed fix — include year in R2 key
-const r2Key = `${raw.slug}.webp`;
+const r2Key = `${YEAR}/${raw.slug}.webp`;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` around lines 127 - 132, The R2 upload uses
r2Key = `${raw.slug}.webp` which omits the year and causes cross-year
overwrites; update the r2Key construction in the block that checks LOCAL_ONLY
(where uploadToR2 is called) to include the same year segment used for local
saves (e.g., `${year}/${raw.slug}.webp` or whatever variable holds the YEAR), so
the remote key mirrors the local path; ensure any logging (console.log) and
references to r2Url remain unchanged after this change.

Comment on lines +150 to +163
console.log("\n[DONE] Image processing complete!");
console.log(` Processed: ${processed}`);
console.log(` Skipped: ${skipped.length}`);
console.log(` Failed: ${failed}`);

if (failures.length > 0) {
console.log("\n[FAILURES]");
failures.forEach((f) => console.log(` - ${f.slug}: ${f.error}`));
}

if (LOCAL_ONLY) {
console.log(`\n Local files saved to: ${IMAGES_DIR}`);
console.log(" Re-run without --local-only to upload to R2.");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Script always exits 0 even when uploads fail — CI won't detect a broken run.

failures is logged but never causes a non-zero exit. If all 200 orgs fail (e.g., due to a wrong R2 credential), the script exits cleanly and gsoc:sync proceeds to the next step.

💡 Proposed fix — propagate failure exit code
+    if (failures.length > 0 && failures.length === processed + failures.length) {
+        // Total failure — exit non-zero so CI/scripts can detect it
+        process.exit(1);
+    }

Or simpler — always exit non-zero when any upload failed:

+    if (failures.length > 0) {
+        process.exit(1);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.log("\n[DONE] Image processing complete!");
console.log(` Processed: ${processed}`);
console.log(` Skipped: ${skipped.length}`);
console.log(` Failed: ${failed}`);
if (failures.length > 0) {
console.log("\n[FAILURES]");
failures.forEach((f) => console.log(` - ${f.slug}: ${f.error}`));
}
if (LOCAL_ONLY) {
console.log(`\n Local files saved to: ${IMAGES_DIR}`);
console.log(" Re-run without --local-only to upload to R2.");
}
console.log("\n[DONE] Image processing complete!");
console.log(` Processed: ${processed}`);
console.log(` Skipped: ${skipped.length}`);
console.log(` Failed: ${failed}`);
if (failures.length > 0) {
console.log("\n[FAILURES]");
failures.forEach((f) => console.log(` - ${f.slug}: ${f.error}`));
}
if (LOCAL_ONLY) {
console.log(`\n Local files saved to: ${IMAGES_DIR}`);
console.log(" Re-run without --local-only to upload to R2.");
}
if (failures.length > 0) {
process.exit(1);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process-org-images.ts` around lines 150 - 163, The script currently
logs failures but never sets a non-zero exit code; update the end of the script
where failures is inspected (the block that prints "[FAILURES]" and the
LOCAL_ONLY messages) to call process.exit(1) when failures.length > 0 so CI
fails on any upload errors; ensure you only skip the exit when LOCAL_ONLY is
true and uploads were intentionally not attempted (or always exit non-zero
regardless of LOCAL_ONLY if you prefer the simpler behavior), referencing the
failures array and the existing LOCAL_ONLY/IMAGES_DIR logic to decide when to
call process.exit(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image compression & upload pipeline

1 participant