Skip to content

Add on-demand AI category summaries with smart caching#15

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/add-on-demand-ai-summaries
Draft

Add on-demand AI category summaries with smart caching#15
Copilot wants to merge 2 commits intomainfrom
copilot/add-on-demand-ai-summaries

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 2, 2026

Summaries for news categories were never generated at all. This adds a fully on-demand generation pipeline: summaries are only created when explicitly requested, cached for 24 hours, and crawl external sources minimally.

generate_summaries.js (new)

Standalone script + optional HTTP server. Key behaviors:

  • Crawls only what's missing: skips articles that already have content; crawls the rest with 500 ms throttle and 10 s timeout
  • Top 10 recent articles per category to cap external requests
  • AI or extractive: uses OpenAI GPT when OPENAI_API_KEY is set; falls back to a lightweight extractive summary otherwise
  • 24-hour cache in category_summaries.json; --force bypasses it
# Single category
node generate_summaries.js "Digital policy + rights"

# All categories, ignore cache
node generate_summaries.js --force

# Optional HTTP API
node generate_summaries.js --server
# POST /api/summaries/refresh  { "category": "Culture", "force": true }
# GET  /api/summaries

Environment variables:

Variable Purpose
OPENAI_API_KEY Enables AI summaries; omit to use extractive fallback
OPENAI_MODEL Model override (default: gpt-3.5-turbo)
SUMMARIES_PORT HTTP server port (default: 3001)

app.js (frontend, minimal)

Lazily loads category_summaries.json once and replaces the default category description with the cached AI summary when one exists — no change to rendering logic or data loading.

.gitignore

Force-includes category_summaries.json alongside data.json and stats.json.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • commission.europa.eu
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • foreignpolicy.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js Global policy --force (dns block)
  • kansallisarkisto.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • lvm.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • okm.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • suomenkuvalehti.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • techpolicy.press
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • valtioneuvosto.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • www.consilium.europa.eu
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
  • www.global-solutions-initiative.org
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js Global policy --force (dns block)
  • yle.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js Global policy --force (dns block)
  • ym.fi
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node generate_summaries.js --help (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Implement on-demand AI category summaries that are only generated when explicitly requested, cached until newer versions are created.

Requirements:

  1. No automatic generation: Summaries should only be created when someone explicitly requests them (via CLI command or API endpoint)

  2. Intelligent crawling with minimal queries:

    • Only fetch article content for articles without existing content data
    • Use 500ms throttling between requests to avoid rate limiting
    • 10-second timeout per request for slow sources
    • Limit to top 10 recent articles per category to minimize requests
  3. Smart caching:

    • Store generated summaries in category_summaries.json with timestamp
    • Check cache age (24 hours default) before regenerating
    • Allow forced refresh if needed
    • Only regenerate summaries that are explicitly requested
  4. Content-based summaries:

    • Extract titles and crawl article links to get full context
    • Generate meaningful summaries from actual article content
    • Identify key topics and article diversity
    • Include references to top 3 articles
  5. Implementation:

    • Create generate_summaries.js script for on-demand execution
    • Add optional API endpoint in app.js to trigger generation on-demand (POST /api/summaries/refresh)
    • Support CLI usage: node generate_summaries.js "Category Name"
    • Provide status output showing what was generated vs cached

This ensures summaries are only created when needed, minimizing unnecessary API calls and requests to external sources.

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: susannaanas <4725416+susannaanas@users.noreply.github.com>
Copilot AI changed the title [WIP] Add on-demand AI category summaries with caching Add on-demand AI category summaries with smart caching Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants