Skip to content

lucassz/newssift

Repository files navigation

NewsSift

NewsSift is an intelligent news curation service that uses AI to deliver personalized news digests based on your interests. It automatically fetches articles from your chosen news sources, evaluates them using large language models, and delivers curated reports straight to your inbox.

Screenshot of the NewsSift dashboard Screenshot of the NewsSift report page

Features

Core Features

  • 🎯 Customizable News Sources: Add any news website with CSS selectors to specify where to find articles
  • 🤖 AI-Powered Curation: Uses any LLM model compatible with the OpenAI API to evaluate articles based on your preferences
  • 📅 Flexible Scheduling: Receive daily reports generated at your preferred time
  • Instant Testing: Generate on-demand reports to test and refine your preferences
  • 📧 Email Delivery: Receive curated articles directly in your inbox

Architecture

NewsSift is built with a modern, scalable architecture. The code is divided into a processing server, which handles all LLM requests, and a webapp. The processing server kicks off report generation either based on the scheduled report settings in the database, or via an API endpoint for on-demand report generation that is kicked off by the webapp backend.

In addition to promoting separation of concerns, splitting up the code into two services in this way means that each service can be deployed on the platform that's best fit for it. The webapp, which is used to manage settings and view reports, can be run on a serverless platform such as Vercel or Netlify since its requests are fast to handle. Meanwhile, calls to LLM endpoints are more cost-effective to run on a shared Node.js process than within a serverless billing model. I suggest deploying the webapp to Vercel and the processing server to Railway.

Webapp

The web interface handles:

  • Authentication
  • News source configuration
  • Preferences (evaluation prompt and report scheduled time)
  • Report history and triggering of on-demand generation

It uses:

  • Framework: Next.js 15 with App Router, using Server Components and Server Actions
  • UI Components: shadcn/ui + Tailwind CSS
  • Authentication: Better Auth

Processing server

A Node.js server that handles:

  • Scheduled report generation
  • On-demand report creation
  • Article fetching and parsing
  • LLM-based content evaluation
  • Email sending

It uses:

  • Email Service: Resend and React Email
  • LLM Integration: OpenAI-compatible API (I suggest using OpenRouter for easy access to many models)
    • Note that the service places a rate limit on the
  • Web Scraping: Cheerio for HTML parsing and Readability for extracting article contents
  • HTTP Server: Fastify

The following steps are used to generate a report:

  1. For each news source, visit the provided page and apply the configured CSS selector to retrieve a list of article links
  2. For each gathered article link, visit the page and use Readability to extract the main article text
  3. Put the gathered articles in batches and, for each batch, submit a request to the LLM that includes a system prompt, the user's evaluation prompt and each article's text 3.1. The LLM is asked to respond either yes or no to whether each article should be selected, and to give a brief explanation of its answer to facilitate tuning the prompt

Auxiliary

Across both the webapp and the processing server, the codebase uses:

Future planned enhancements

  • RSS feed support (both for inputting news items and for outputting the filtered feed)
  • Multiple evaluation prompts per user (e.g., per source)
  • Generating article summaries

Installation

To run locally:

  1. Create ./.env and ./web/.env based on the corresponding .env.example files 1.1. A Neon database URL is expected, but other drivers for Postgres can easily be used by tweaking src/lib/db/index.ts
  2. Run pnpm install to install top-level dependencies
  3. Run pnpm run drizzle-kit push in order to push the schema to your Postgres database
  4. Run pnpm run build && pnpm run start to start the processing server (pnpm run dev is also available for watching dev changes)
  5. Run cd web && pnpm install && pnpm run build && pnpm run start to start the webapp (pnpm run dev is also available)

To run on the cloud:

  1. Run pnpm install and POSTGRES_URL=YOUR_URL_HERE pnpm run drizzle-kit push locally to push the schema to your Postgres database
  2. Deploy the repo as a Next.js project to a host such as Vercel, using ./web as the base directory and using ./web/.env.example as a template for the environment variables to configure
  3. Deploy the processing server as a Docker image to a host such as Railway, Render or GCP, using Dockerfile to automatically build and run the server and using ./.env.example as a template for the environment variables to configure

About

Personalized news curation powered by LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors