Skip to content

Srinivaskoruprolu007/Gatherly

Repository files navigation

Gatherly

A personal knowledge capture tool that turns any webpage into structured, searchable content β€” saved to your own library.

Live Demo TypeScript Deployed on Vercel


What is Gatherly?

Gatherly is a personal knowledge library. Paste any URL and Gatherly automatically fetches the page, strips away noise (ads, nav, clutter), and stores the content as clean Markdown β€” with metadata like title, author, publish date, and OG image. It then uses AI to summarize the content and tag it automatically, so your saved pages become a searchable, structured knowledge archive scoped privately to your account.


Features

πŸ”— Intelligent Web Capture

Save any webpage by providing a URL. The system fetches and processes the page content automatically.

🧹 Smart Content Extraction

Web pages are cleaned and converted into structured Markdown, removing ads, navigation clutter, and noise.

🏷️ Metadata Extraction

Automatically extracts useful metadata from each page:

  • Title
  • Author
  • Published date
  • Open Graph image

πŸ€– AI Summarization

Each saved page is automatically summarized using AI, giving you a concise overview of the content without having to re-read the full article.

🏷️ Auto-Generated Tags

AI analyzes the content of each saved page and automatically assigns relevant tags, making your library self-organizing and easy to browse by topic.

πŸ”­ Discover by Topic

Use the Discover route to fetch a curated set of links for any topic of interest. Explore new content and add it directly to your knowledge library without manually hunting for URLs.

πŸ“¦ Bulk URL Import

Import multiple URLs at once and process them in parallel for faster knowledge capture.

πŸ—ΊοΈ Site Mapping

Discover multiple pages from a website automatically β€” useful for exploring documentation, blogs, and research sources.

πŸ“Š Content Status Tracking

Each saved page moves through a transparent processing lifecycle:

Status Description
processing Page is being fetched and parsed
completed Content successfully extracted and stored
failed An error occurred during capture

πŸ“š Structured Knowledge Storage

Captured pages are stored as structured records in the database β€” searchable, revisitable, and reusable.

πŸ“ Clean Markdown Output

All captured content is stored as Markdown, making it easy to render in apps, export, or reuse in docs and notes.

πŸ” Authentication & User Isolation

Each user's saved content is securely scoped to their account via Better Auth.

⚑ Fast Server Functions

Built with TanStack Start server functions for type-safe, fast server-side operations.

πŸ” SEO-Friendly Landing Page

Landing page content is rendered server-side, producing fully hydrated HTML for search engines.


Tech Stack

Layer Technology
Framework TanStack Start
Routing TanStack Router (file-based)
Forms TanStack Form
Auth Better Auth
Database ORM Prisma
Database Neon PostgreSQL
Web Scraping Firecrawl API
AI (Summarization & Tagging) Claude / OpenAI
Validation Zod
UI Components shadcn/ui
Styling Tailwind CSS v4
Notifications Sonner
Analytics Vercel Analytics
Language TypeScript

Getting Started

Prerequisites

  • Node.js 18+
  • pnpm

Installation

git clone https://github.com/Srinivaskoruprolu007/Gatherly.git
cd Gatherly
pnpm install

Environment Variables

Create a .env file in the root:

DATABASE_URL=your_neon_postgresql_connection_string

BETTER_AUTH_SECRET=your_auth_secret
BETTER_AUTH_URL=http://localhost:3000

FIRECRAWL_API_KEY=your_firecrawl_api_key

# AI provider (whichever you use)
ANTHROPIC_API_KEY=your_anthropic_api_key
# or
OPENAI_API_KEY=your_openai_api_key

Development

pnpm dev

Production Build

pnpm build
pnpm start

Scripts

Command Description
pnpm dev Start development server
pnpm build Build for production
pnpm start Start production server
pnpm test Run tests with Vitest
pnpm lint Lint with ESLint
pnpm format Format with Prettier
pnpm check Run lint + format check together

Project Structure

src/
β”œβ”€β”€ routes/             # File-based routes (TanStack Router)
β”‚   β”œβ”€β”€ __root.tsx      # Root layout β€” head, theme, toaster, analytics
β”‚   β”œβ”€β”€ _authed/        # Authenticated route group
β”‚   β”‚   β”œβ”€β”€ discover/   # Topic-based link discovery
β”‚   β”‚   └── ...
β”‚   └── ...
β”œβ”€β”€ components/         # Shared UI components (shadcn/ui)
β”œβ”€β”€ context/            # React context providers (ThemeContext)
β”œβ”€β”€ lib/                # Utilities, constants, server functions
└── styles.css          # Global styles (Tailwind)

prisma/
└── schema.prisma       # Database schema (Neon PostgreSQL)

Architecture

Gatherly separates concerns into distinct layers:

URL Input
    β”‚
    β–Ό
Firecrawl API          ← fetch, clean, extract metadata
    β”‚
    β–Ό
AI Pipeline            ← summarize content, auto-generate tags
    β”‚
    β–Ό
Prisma / Neon          ← store structured records
    β”‚
    β–Ό
UI (TanStack Router)   ← render, search, discover

This separation allows the platform to grow into a full knowledge management system β€” with search, filtering by tag, export, and more.


Deployment

Deployed on Vercel with automatic deployments on every push to main.

πŸ”— https://gatherly-bice.vercel.app


License

MIT

About

Save any URL. AI extracts, summarizes, and tags the content into your personal knowledge library.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages