feat: add image compression and upload pipeline for org logos by aayank13 · Pull Request #102 · ketankauntia/gsoc-orgs

aayank13 · 2026-02-20T19:10:19Z

Summary

Adds an image processing pipeline for GSoC organization logos that:

Downloads org logos from the GSoC API
Compresses them to optimized WebP format via sharp
Renames them to {slug}.webp
Uploads them to Cloudflare R2

Also creates the images/ folder structure with tech-stack/ and 2026/ subfolders.

Closes #96

Changes

New Files

scripts/lib/r2-client.ts — Reusable Cloudflare R2 upload client (S3-compatible)
scripts/lib/image-processor.ts — Image download (with retry) + WebP compression
scripts/process-org-images.ts — Main pipeline orchestrator with --dry-run and --local-only modes
images/tech-stack/.gitkeep — Placeholder for future tech-stack icons
images/2026/.gitkeep — Output directory for processed 2026 org logos

Modified Files

package.json — Added sharp, @aws-sdk/client-s3; new gsoc:images script; updated gsoc:sync
.gitignore — Ignore generated image files in images/
transform-year-organizations.ts — Fixed img_r2_url for new orgs (was incorrectly set to raw API URL)

Usage

pnpm gsoc:images -- --year 2026 --dry-run     # Preview what would be processed
pnpm gsoc:images -- --year 2026 --local-only   # Download + compress only
pnpm gsoc:images -- --year 2026                # Full pipeline with R2 upload


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **New Features**
  * Automated image processing workflow for organization logos now includes WebP compression for optimized file sizes and cloud storage integration for reliable image management and delivery.

* **Chores**
  * Added project dependencies for image compression and cloud storage functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

vercel · 2026-02-20T19:10:24Z

@aayank13 is attempting to deploy a commit to the Ketan's Personal Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-02-20T19:10:38Z

Warning

`.coderabbit.yaml` has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)

Validation error: Expected 'de' | 'de-DE' | 'de-AT' | 'de-CH' | 'en' | 'en-US' | 'en-AU' | 'en-GB' | 'en-CA' | 'en-NZ' | 'en-ZA' | 'es' | 'es-AR' | 'fr' | 'fr-CA' | 'fr-CH' | 'fr-BE' | 'nl' | 'nl-BE' | 'pt-AO' | 'pt' | 'pt-BR' | 'pt-MZ' | 'pt-PT' | 'ar' | 'ast-ES' | 'ast' | 'be-BY' | 'be' | 'br-FR' | 'br' | 'ca-ES' | 'ca' | 'ca-ES-valencia' | 'ca-ES-balear' | 'da-DK' | 'da' | 'de-DE-x-simple-language' | 'el-GR' | 'el' | 'eo' | 'fa' | 'ga-IE' | 'ga' | 'gl-ES' | 'gl' | 'it' | 'ja-JP' | 'ja' | 'km-KH' | 'km' | 'ko-KR' | 'ko' | 'pl-PL' | 'pl' | 'ro-RO' | 'ro' | 'ru-RU' | 'ru' | 'sk-SK' | 'sk' | 'sl-SI' | 'sl' | 'sv' | 'ta-IN' | 'ta' | 'tl-PH' | 'tl' | 'tr' | 'uk-UA' | 'uk' | 'zh-CN' | 'zh' | 'crh-UA' | 'crh' | 'cs-CZ' | 'cs' | 'nb' | 'no' | 'nl-NL' | 'de-DE-x-simple-language-DE' | 'es-ES' | 'it-IT' | 'fa-IR' | 'sv-SE' | 'de-LU' | 'fr-FR' | 'bg-BG' | 'bg' | 'he-IL' | 'he' | 'hi-IN' | 'hi' | 'vi-VN' | 'vi' | 'th-TH' | 'th' | 'bn-BD' | 'bn', received object at "language"

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

A new image processing pipeline for GSoC organization logos is introduced. The system downloads logos from the GSoC API, compresses them to WebP format using the sharp library, saves them locally, and uploads them to Cloudflare R2 for distribution. A new script orchestrates this workflow with support for dry-run and local-only modes.

Changes

Cohort / File(s)	Summary
Configuration & Dependencies `.gitignore`, `package.json`	Added ignore patterns for generated webp/png/jpg images in images directory. Introduced new npm script `gsoc:images` and added `@aws-sdk/client-s3` and sharp dependencies to both `dependencies` and `devDependencies`. Updated `gsoc:sync` workflow to include image processing step.
Image Processing Utilities `scripts/lib/image-processor.ts`, `scripts/lib/r2-client.ts`	New utility modules providing image download/compression (`downloadImage`, `compressToWebP`, `processAndSaveLocally`) and R2 upload capabilities (`uploadToR2`, `getR2PublicUrl`). Includes retry logic for downloads and configuration via environment variables.
Image Processing Orchestration `scripts/process-org-images.ts`	New script that orchestrates the complete workflow: reads organization data, downloads logos, compresses to WebP, optionally uploads to R2, and updates org JSON files with new image URLs. Supports `--year`, `--dry-run`, and `--local-only` flags with progress logging and error handling.
Organization Transform Minor Updates `scripts/transform-year-organizations.ts`	Formatting adjustment and functional change: `img_r2_url` now defaults to empty string instead of falling back to logo URL when creating new organizations.

Sequence Diagram(s)

sequenceDiagram
    participant Script as process-org-images.ts
    participant FS as File System
    participant Processor as image-processor.ts
    participant R2 as r2-client.ts
    participant CloudflareR2 as Cloudflare R2

    Script->>FS: Read raw org JSON
    Script->>FS: Read per-org JSON files
    
    loop For each org with logo_url
        Script->>Processor: processAndSaveLocally(logo_url, ...)
        activate Processor
        Processor->>Processor: downloadImage (with retries)
        Processor->>Processor: compressToWebP (sharp)
        Processor->>FS: Create images/<YEAR>/ dir
        Processor->>FS: Save compressed image
        Processor-->>Script: Return local image path
        deactivate Processor
        
        alt Not --local-only
            Script->>R2: uploadToR2(image_path, ...)
            activate R2
            R2->>CloudflareR2: PutObjectCommand
            CloudflareR2-->>R2: Return success
            R2->>R2: Generate public URL
            R2-->>Script: Return R2 URL
            deactivate R2
            Script->>FS: Update per-org JSON with img_r2_url
        end
    end
    
    Script->>Script: Log summary (processed, skipped, failed)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With whiskers twitching, I bounce with glee,
Compressing logos to webp spree!
Down they download, to R2 they soar,
Organization images we store and adore! 📸✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main feature: adding an image compression and upload pipeline for organization logos.
Description check	✅ Passed	The description provides a comprehensive summary, detailed changes, and clear usage examples covering all required information.
Linked Issues check	✅ Passed	The PR fully implements all requirements from issue `#96`: downloads org images, compresses to WebP, renames files, uploads to R2, and creates proper folder structure.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the linked issue requirements; no unrelated modifications detected in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

package.json (1)
57-57: sharp ^0.33.0 will not resolve to the current 0.34.x series.

For packages with a 0.x.y version, the ^ range only allows patch increments within the same minor (0.33.*). The latest published version is 0.34.5, which includes upstream libvips bug fixes and TypeScript improvements. Consider bumping to ^0.34.0 to pick up those fixes.
💡 Proposed change
-"sharp": "^0.33.0",
+"sharp": "^0.34.0",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 57, The package.json currently pins the sharp
dependency to "sharp": "^0.33.0", which will not pick up the 0.34.x series;
update the sharp entry to use "^0.34.0" so the project can receive the 0.34.*
bugfix and TypeScript improvements, then regenerate your lockfile by running
your package manager install (npm/yarn/pnpm) to update package-lock.json or
yarn.lock accordingly; ensure any CI/cache is refreshed so the new version is
used.
scripts/lib/r2-client.ts (1)
26-33: Consider setting a request timeout on the S3 client.

The AWS SDK v3 S3Client has no default socket/request timeout; a stalled upload to R2 will hang the script indefinitely. Add requestHandler or maxAttempts config, or at minimum a socketTimeout.
💡 Suggested timeout config
+import { NodeHttpHandler } from "@smithy/node-http-handler";
+
 _client = new S3Client({
     region: "auto",
     endpoint: `https://${accountId}.r2.cloudflarestorage.com`,
     credentials: {
         accessKeyId: getEnvOrThrow("R2_ACCESS_KEY_ID"),
         secretAccessKey: getEnvOrThrow("R2_SECRET_ACCESS_KEY"),
     },
+    requestHandler: new NodeHttpHandler({
+        requestTimeout: 30_000,
+        socketTimeout: 30_000,
+    }),
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/r2-client.ts` around lines 26 - 33, The S3Client instantiation
assigned to _client lacks a request timeout and can hang; update the S3Client
config in the S3Client(...) call to include a requestHandler with timeouts
(e.g., import and use NodeHttpHandler and pass requestHandler: new
NodeHttpHandler({ socketTimeout: <ms>, connectionTimeout: <ms> })) and/or set
maxAttempts to a sensible retry limit so R2 uploads won't stall indefinitely;
update the S3Client(...) call where _client is created to include these options.
scripts/lib/image-processor.ts (1)
58-74: processAndSaveLocally is exported but its functionality is duplicated inline in process-org-images.ts.

process-org-images.ts manually calls downloadImage → compressToWebP → fs.writeFileSync (lines 117–121) instead of calling processAndSaveLocally. The only difference is the inline size comparison log. Consider extending processAndSaveLocally to return both buffers (or sizes) so callers can retain size logging while avoiding the duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/image-processor.ts` around lines 58 - 74, processAndSaveLocally
duplicates logic in process-org-images.ts; change processAndSaveLocally to
return both the saved file path and size info (e.g., { outputPath: string,
originalSize: number, compressedSize: number } or include the
original/compressed Buffers) so callers can log size differences without
reimplementing downloadImage/compressToWebP/write logic. Update
processAndSaveLocally (the function shown) to capture original buffer size
before compression and compressed buffer size after compressToWebP, write the
file as now, and return the sizes alongside outputPath; then replace the manual
download/compress/write sequence in process-org-images.ts with a call to
processAndSaveLocally and use the returned sizes for the existing size
comparison log.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@package.json`:
- Line 22: The npm script "gsoc:sync" currently invokes
scripts/process-org-images.ts without the --local-only flag, which forces R2
uploads and causes silent per-org failures when R2_* env vars are not set;
update the "gsoc:sync" entry to call scripts/process-org-images.ts --local-only
(so uploads are decoupled and handled by the separate "gsoc:images" script), or
alternatively add a README note that "gsoc:sync" requires R2 credentials (R2_*
env vars) if you want to keep the current behavior; reference the "gsoc:sync"
npm script and scripts/process-org-images.ts and "gsoc:images" to make the
change or documentation clear.

In `@scripts/lib/image-processor.ts`:
- Around line 15-39: The downloadImage function currently uses fetch without a
timeout, so a stalled response will never throw and retries won't trigger;
modify downloadImage to create an AbortController for each fetch attempt, pass
controller.signal into fetch(url, { signal }), start a per-attempt timer (e.g.,
via setTimeout) that calls controller.abort() after a configured per-attempt
timeout, and clear the timer when the response is received or on error; ensure
the abort error is handled like other errors so the loop retries (using existing
lastError, RETRY_DELAY_MS and MAX_RETRIES) and that the controller/timer are
properly cleaned up each attempt to avoid leaks.

In `@scripts/process-org-images.ts`:
- Around line 76-88: The skip-and-update logic uses raw.slug directly to build
orgFile so aliased slugs (SLUG_ALIASES) never resolve and R2 URLs aren't
persisted; import or duplicate the SLUG_ALIASES mapping and resolve the
canonical file slug before any filesystem lookup (i.e., compute a resolvedSlug
from SLUG_ALIASES[raw.slug] || raw.slug) and use that when constructing orgFile
(used in the pre-skip check and in updateOrgJson), ensuring both the existence
check and the write/update target the actual JSON filename under ORGS_DIR.
- Line 45: R2_URL_PREFIX is hardcoded which breaks the skip check that uses
currentR2.startsWith(R2_URL_PREFIX); instead derive the prefix from the same
source as r2-client (use the R2_PUBLIC_URL env var or call getR2PublicUrl from
r2-client) so the skip logic matches the actual public URL; update the
declaration of R2_URL_PREFIX to compute its value from process.env.R2_PUBLIC_URL
(or import and call getR2PublicUrl) with the existing literal as a fallback, and
ensure the currentR2.startsWith(...) check uses this computed value.
- Around line 127-132: The R2 upload uses r2Key = `${raw.slug}.webp` which omits
the year and causes cross-year overwrites; update the r2Key construction in the
block that checks LOCAL_ONLY (where uploadToR2 is called) to include the same
year segment used for local saves (e.g., `${year}/${raw.slug}.webp` or whatever
variable holds the YEAR), so the remote key mirrors the local path; ensure any
logging (console.log) and references to r2Url remain unchanged after this
change.
- Around line 150-163: The script currently logs failures but never sets a
non-zero exit code; update the end of the script where failures is inspected
(the block that prints "[FAILURES]" and the LOCAL_ONLY messages) to call
process.exit(1) when failures.length > 0 so CI fails on any upload errors;
ensure you only skip the exit when LOCAL_ONLY is true and uploads were
intentionally not attempted (or always exit non-zero regardless of LOCAL_ONLY if
you prefer the simpler behavior), referencing the failures array and the
existing LOCAL_ONLY/IMAGES_DIR logic to decide when to call process.exit(1).

---

Nitpick comments:
In `@package.json`:
- Line 57: The package.json currently pins the sharp dependency to "sharp":
"^0.33.0", which will not pick up the 0.34.x series; update the sharp entry to
use "^0.34.0" so the project can receive the 0.34.* bugfix and TypeScript
improvements, then regenerate your lockfile by running your package manager
install (npm/yarn/pnpm) to update package-lock.json or yarn.lock accordingly;
ensure any CI/cache is refreshed so the new version is used.

In `@scripts/lib/image-processor.ts`:
- Around line 58-74: processAndSaveLocally duplicates logic in
process-org-images.ts; change processAndSaveLocally to return both the saved
file path and size info (e.g., { outputPath: string, originalSize: number,
compressedSize: number } or include the original/compressed Buffers) so callers
can log size differences without reimplementing
downloadImage/compressToWebP/write logic. Update processAndSaveLocally (the
function shown) to capture original buffer size before compression and
compressed buffer size after compressToWebP, write the file as now, and return
the sizes alongside outputPath; then replace the manual download/compress/write
sequence in process-org-images.ts with a call to processAndSaveLocally and use
the returned sizes for the existing size comparison log.

In `@scripts/lib/r2-client.ts`:
- Around line 26-33: The S3Client instantiation assigned to _client lacks a
request timeout and can hang; update the S3Client config in the S3Client(...)
call to include a requestHandler with timeouts (e.g., import and use
NodeHttpHandler and pass requestHandler: new NodeHttpHandler({ socketTimeout:
<ms>, connectionTimeout: <ms> })) and/or set maxAttempts to a sensible retry
limit so R2 uploads won't stall indefinitely; update the S3Client(...) call
where _client is created to include these options.

coderabbitai · 2026-02-20T19:16:29Z

package.json

    "gsoc:regen": "npx tsx scripts/regenerate-tech-topics-from-json.ts",
-    "gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
+    "gsoc:images": "npx tsx scripts/process-org-images.ts",
+    "gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"


⚠️ Potential issue | 🟡 Minor

gsoc:sync calls process-org-images.ts without --local-only, requiring R2 credentials for a general sync.

Anyone running gsoc:sync without R2_* env vars set will have every upload fail silently (per-org errors are caught and accumulated; script still exits 0), so the pipeline continues but org JSONs never get img_r2_url updated. Consider either:

Appending --local-only to the sync invocation (upload separately via gsoc:images), or

Documenting in README that gsoc:sync requires R2 credentials configured.

💡 Proposed change — decouple upload from sync

-"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts" +"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts --local-only && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"

"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts --local-only && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@package.json` at line 22, The npm script "gsoc:sync" currently invokes scripts/process-org-images.ts without the --local-only flag, which forces R2 uploads and causes silent per-org failures when R2_* env vars are not set; update the "gsoc:sync" entry to call scripts/process-org-images.ts --local-only (so uploads are decoupled and handled by the separate "gsoc:images" script), or alternatively add a README note that "gsoc:sync" requires R2 credentials (R2_* env vars) if you want to keep the current behavior; reference the "gsoc:sync" npm script and scripts/process-org-images.ts and "gsoc:images" to make the change or documentation clear.

coderabbitai · 2026-02-20T19:16:29Z

scripts/lib/image-processor.ts

+export async function downloadImage(url: string): Promise<Buffer> {
+    let lastError: Error | null = null;
+
+    for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
+        try {
+            const response = await fetch(url);
+
+            if (!response.ok) {
+                throw new Error(`HTTP ${response.status} ${response.statusText}`);
+            }
+
+            const arrayBuffer = await response.arrayBuffer();
+            return Buffer.from(arrayBuffer);
+        } catch (err) {
+            lastError = err instanceof Error ? err : new Error(String(err));
+            if (attempt < MAX_RETRIES) {
+                await sleep(RETRY_DELAY_MS * attempt);
+            }
+        }
+    }
+
+    throw new Error(
+        `Failed to download ${url} after ${MAX_RETRIES} attempts: ${lastError?.message}`,
+    );
+}


⚠️ Potential issue | 🟠 Major

fetch has no timeout — a stalled remote server will hang the pipeline indefinitely.

Without an AbortController timeout, a logo server that accepts the connection but never delivers data will block downloadImage forever, bypassing the retry logic (retries only fire on thrown errors, not on a hanging response).

💡 Proposed fix — add per-attempt timeout

export async function downloadImage(url: string): Promise<Buffer> { let lastError: Error | null = null; + const TIMEOUT_MS = 15_000; for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) { try { - const response = await fetch(url); + const controller = new AbortController(); + const timer = setTimeout(() => controller.abort(), TIMEOUT_MS); + let response: Response; + try { + response = await fetch(url, { signal: controller.signal }); + } finally { + clearTimeout(timer); + } if (!response.ok) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/lib/image-processor.ts` around lines 15 - 39, The downloadImage function currently uses fetch without a timeout, so a stalled response will never throw and retries won't trigger; modify downloadImage to create an AbortController for each fetch attempt, pass controller.signal into fetch(url, { signal }), start a per-attempt timer (e.g., via setTimeout) that calls controller.abort() after a configured per-attempt timeout, and clear the timer when the response is received or on error; ensure the abort error is handled like other errors so the loop retries (using existing lastError, RETRY_DELAY_MS and MAX_RETRIES) and that the controller/timer are properly cleaned up each attempt to avoid leaks.

coderabbitai · 2026-02-20T19:16:29Z

scripts/process-org-images.ts

+);
+const IMAGES_DIR = path.join(ROOT, "images", String(YEAR));
+
+const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";


⚠️ Potential issue | 🟠 Major

R2_URL_PREFIX is hardcoded and diverges from the R2_PUBLIC_URL env var used in r2-client.ts.

The skip logic on lines 81–84 compares currentR2.startsWith(R2_URL_PREFIX), but r2-client.getR2PublicUrl derives the actual public URL from the R2_PUBLIC_URL env var. If a different R2 bucket or CDN is configured, the hardcoded prefix will never match existing records: all orgs will be re-downloaded and re-uploaded on every run, and already-processed orgs won't be skipped.

💡 Proposed fix — derive prefix from env var

-const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/"; +const R2_URL_PREFIX = process.env.R2_PUBLIC_URL + ? process.env.R2_PUBLIC_URL.replace(/\/+$/, "") + "/" + : "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";

const R2_URL_PREFIX = process.env.R2_PUBLIC_URL

? process.env.R2_PUBLIC_URL.replace(/\/+$/, "") + "/"

: "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/process-org-images.ts` at line 45, R2_URL_PREFIX is hardcoded which breaks the skip check that uses currentR2.startsWith(R2_URL_PREFIX); instead derive the prefix from the same source as r2-client (use the R2_PUBLIC_URL env var or call getR2PublicUrl from r2-client) so the skip logic matches the actual public URL; update the declaration of R2_URL_PREFIX to compute its value from process.env.R2_PUBLIC_URL (or import and call getR2PublicUrl) with the existing literal as a fallback, and ensure the currentR2.startsWith(...) check uses this computed value.

coderabbitai · 2026-02-20T19:16:29Z

scripts/process-org-images.ts

+        const orgFile = path.join(ORGS_DIR, `${raw.slug}.json`);
+        if (fs.existsSync(orgFile)) {
+            try {
+                const orgData = JSON.parse(fs.readFileSync(orgFile, "utf-8"));
+                const currentR2 = orgData.img_r2_url || orgData.logo_r2_url || "";
+                if (currentR2.startsWith(R2_URL_PREFIX)) {
+                    skipped.push(`${raw.slug} (already has R2 URL)`);
+                    continue;
+                }
+            } catch {
+                // If we can't read/parse the file, process anyway
+            }
+        }


⚠️ Potential issue | 🟠 Major

Slug alias resolution is missing — aliased orgs will never have their R2 URL written.

scripts/transform-year-organizations.ts maps some raw API slugs to different file names via SLUG_ALIASES (e.g., "ceph" → "ceph-foundation", "openms-inc" → "openms"). Both the skip-check (line 76) and updateOrgJson (line 167) derive the org file path directly from raw.slug:

const orgFile = path.join(ORGS_DIR, `${raw.slug}.json`); // e.g., ceph.json — does not exist

For any aliased org:

The skip-check won't find the org file → the alias is always queued for download even if already uploaded.

After upload, updateOrgJson silently skips (line 168: if (!fs.existsSync(orgFile)) return;) → img_r2_url / logo_r2_url are never persisted.

Fix: duplicate or extract the SLUG_ALIASES map (or move it to a shared config) and resolve the file slug before both checks.

💡 Minimal fix — add alias resolution

+// Keep in sync with SLUG_ALIASES in transform-year-organizations.ts +const SLUG_ALIASES: Record<string, string> = { + "ceph": "ceph-foundation", + "openms-inc": "openms", +}; + +function resolveFileSlug(rawSlug: string): string { + return SLUG_ALIASES[rawSlug] ?? rawSlug; +}

Then replace every raw.slug used for file lookup:

-const orgFile = path.join(ORGS_DIR, `${raw.slug}.json`); +const orgFile = path.join(ORGS_DIR, `${resolveFileSlug(raw.slug)}.json`);

function updateOrgJson(slug: string, r2Url: string) { - const orgFile = path.join(ORGS_DIR, `${slug}.json`); + const orgFile = path.join(ORGS_DIR, `${resolveFileSlug(slug)}.json`);

Ideally the alias map should be extracted to a shared module (e.g., scripts/lib/slug-aliases.ts) consumed by both scripts to avoid the duplication.

Also applies to: 166-178

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/process-org-images.ts` around lines 76 - 88, The skip-and-update logic uses raw.slug directly to build orgFile so aliased slugs (SLUG_ALIASES) never resolve and R2 URLs aren't persisted; import or duplicate the SLUG_ALIASES mapping and resolve the canonical file slug before any filesystem lookup (i.e., compute a resolvedSlug from SLUG_ALIASES[raw.slug] || raw.slug) and use that when constructing orgFile (used in the pre-skip check and in updateOrgJson), ensuring both the existence check and the write/update target the actual JSON filename under ORGS_DIR.

coderabbitai · 2026-02-20T19:16:30Z

scripts/process-org-images.ts

+            let r2Url = "";
+            if (!LOCAL_ONLY) {
+                const r2Key = `${raw.slug}.webp`;
+                r2Url = await uploadToR2(r2Key, webpBuffer, "image/webp");
+                console.log(`${progress} Uploaded to R2: ${r2Url}`);
+            }


⚠️ Potential issue | 🟡 Minor

R2 key has no year prefix, inconsistent with the local path structure and causes year-over-year overwrites.

The local file is saved as images/{YEAR}/{slug}.webp, but the R2 key is just {slug}.webp. A re-run for a different year (e.g., 2027) silently overwrites the 2026 logo in R2, and any org link pointing to the 2026 URL would serve the 2027 image.

If the intent is one canonical logo per org (year-agnostic), this should be documented. If per-year logos are needed, the key should include the year:

💡 Proposed fix — include year in R2 key

-const r2Key = `${raw.slug}.webp`; +const r2Key = `${YEAR}/${raw.slug}.webp`;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/process-org-images.ts` around lines 127 - 132, The R2 upload uses r2Key = `${raw.slug}.webp` which omits the year and causes cross-year overwrites; update the r2Key construction in the block that checks LOCAL_ONLY (where uploadToR2 is called) to include the same year segment used for local saves (e.g., `${year}/${raw.slug}.webp` or whatever variable holds the YEAR), so the remote key mirrors the local path; ensure any logging (console.log) and references to r2Url remain unchanged after this change.

coderabbitai · 2026-02-20T19:16:30Z

scripts/process-org-images.ts

+    console.log("\n[DONE] Image processing complete!");
+    console.log(`  Processed: ${processed}`);
+    console.log(`  Skipped:   ${skipped.length}`);
+    console.log(`  Failed:    ${failed}`);
+
+    if (failures.length > 0) {
+        console.log("\n[FAILURES]");
+        failures.forEach((f) => console.log(`  - ${f.slug}: ${f.error}`));
+    }
+
+    if (LOCAL_ONLY) {
+        console.log(`\n  Local files saved to: ${IMAGES_DIR}`);
+        console.log("  Re-run without --local-only to upload to R2.");
+    }


⚠️ Potential issue | 🟡 Minor

Script always exits 0 even when uploads fail — CI won't detect a broken run.

failures is logged but never causes a non-zero exit. If all 200 orgs fail (e.g., due to a wrong R2 credential), the script exits cleanly and gsoc:sync proceeds to the next step.

💡 Proposed fix — propagate failure exit code

+ if (failures.length > 0 && failures.length === processed + failures.length) { + // Total failure — exit non-zero so CI/scripts can detect it + process.exit(1); + }

Or simpler — always exit non-zero when any upload failed:

+ if (failures.length > 0) { + process.exit(1); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

console.log("\n[DONE] Image processing complete!");

console.log(` Processed: ${processed}`);

console.log(` Skipped: ${skipped.length}`);

console.log(` Failed: ${failed}`);

if (failures.length > 0) {

console.log("\n[FAILURES]");

failures.forEach((f) => console.log(` - ${f.slug}: ${f.error}`));

}

if (LOCAL_ONLY) {

console.log(`\n Local files saved to: ${IMAGES_DIR}`);

console.log(" Re-run without --local-only to upload to R2.");

}

console.log("\n[DONE] Image processing complete!");

console.log(` Processed: ${processed}`);

console.log(` Skipped: ${skipped.length}`);

console.log(` Failed: ${failed}`);

if (failures.length > 0) {

console.log("\n[FAILURES]");

failures.forEach((f) => console.log(` - ${f.slug}: ${f.error}`));

}

if (LOCAL_ONLY) {

console.log(`\n Local files saved to: ${IMAGES_DIR}`);

console.log(" Re-run without --local-only to upload to R2.");

}

if (failures.length > 0) {

process.exit(1);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/process-org-images.ts` around lines 150 - 163, The script currently logs failures but never sets a non-zero exit code; update the end of the script where failures is inspected (the block that prints "[FAILURES]" and the LOCAL_ONLY messages) to call process.exit(1) when failures.length > 0 so CI fails on any upload errors; ensure you only skip the exit when LOCAL_ONLY is true and uploads were intentionally not attempted (or always exit non-zero regardless of LOCAL_ONLY if you prefer the simpler behavior), referencing the failures array and the existing LOCAL_ONLY/IMAGES_DIR logic to decide when to call process.exit(1).

feat: add image compression and upload pipeline for org logos

47b89e7

aayank13 requested a review from ketankauntia as a code owner February 20, 2026 19:10

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add image compression and upload pipeline for org logos#102

feat: add image compression and upload pipeline for org logos#102
aayank13 wants to merge 1 commit intoketankauntia:masterfrom
aayank13:feat/image-compression-upload-pipeline

aayank13 commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

`.coderabbit.yaml` has a parsing error

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"
	"gsoc:sync": "npx tsx scripts/fetch-year-data.ts && npx tsx scripts/transform-year-organizations.ts && npx tsx scripts/process-org-images.ts --local-only && npx tsx scripts/generate-yearly-page-from-json.ts && npx tsx scripts/regenerate-tech-topics-from-json.ts"

-const R2_URL_PREFIX = "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";
+const R2_URL_PREFIX = process.env.R2_PUBLIC_URL
+    ? process.env.R2_PUBLIC_URL.replace(/\/+$/, "") + "/"
+    : "https://pub-268c3a1efc8b4f8a99115507a760ca14.r2.dev/";

Comments

Conversation

aayank13 commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Files

Modified Files

Usage

Uh oh!

vercel bot commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

.coderabbit.yaml has a parsing error

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aayank13 commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

`.coderabbit.yaml` has a parsing error