🗞️ Taza Khabar

Fresh News, Powered by AI

Taza Khabar is an intelligent news aggregation and analysis system that combines web scraping, AI-powered journalism, and automated reporting to deliver comprehensive, multi-perspective news reports on any topic.

🌟 Features

🔍 Intelligent News Discovery: Automatically searches and discovers relevant news articles from across the web
🤖 AI-Powered Journalism: Uses GPT-4 to analyze raw news data and generate professional, multi-angle news reports
📊 Multi-Perspective Analysis: Extracts diverse viewpoints - economic, political, social, technological, and human interest angles
🗄️ Database Integration: Stores generated reports in PostgreSQL for future reference and analysis
🔗 Web Scraping: Extracts full article content using Playwright for comprehensive data gathering
⚡ Real-time Processing: Fast API-based backend for quick news processing and report generation

🏗️ Architecture

The project consists of two main components:

🤖 AI Engine (`/ai`)

News Agent: Core AI journalist that generates comprehensive reports
Pipeline: Orchestrates the entire news processing workflow
Tools: Web search capabilities integrated with AI models
Database: PostgreSQL integration with Drizzle ORM

🖥️ Backend Server (`/server`)

FastAPI Server: RESTful API for news search and scraping
Search Tool: Google search integration for news discovery
Scrape Tool: Playwright-based web scraping for article content
Custom Google Search: Modified Google search library for news-specific queries

🛠️ Technology Stack

AI & Language Models

OpenAI GPT-4: For intelligent news analysis and report generation
AI SDK: Modern AI integration framework
Zod: Schema validation for AI tool parameters

Backend & API

FastAPI: High-performance Python web framework
Playwright: Modern web scraping and browser automation
Axios: HTTP client for API communication

Database & ORM

PostgreSQL: Robust relational database for news storage
Drizzle ORM: Type-safe database operations
pg: PostgreSQL client for Node.js

Development & Utilities

Node.js: JavaScript runtime for AI components
Python 3.13: Backend server runtime
dotenv: Environment variable management

📋 Prerequisites

Node.js (v16 or higher)
Python 3.13+
PostgreSQL database
OpenAI API key

⚙️ Installation

1. Clone the Repository

git clone https://github.com/aneeshpatne/Taza-Khabar.git
cd Taza-Khabar

2. Set Up the AI Engine

cd ai
npm install

3. Set Up the Backend Server

cd ../server
pip install fastapi uvicorn playwright pydantic
# Install Playwright browsers
playwright install

4. Install Custom Google Search

cd server/googlesearch
pip install -e .

5. Environment Configuration

Create a .env file in the /ai directory:

OPENAI_API_KEY=your_openai_api_key_here
DATABASE_URL=postgresql://username:password@localhost:5432/taza_khabar

6. Database Setup

Set up your PostgreSQL database and run the schema migrations using Drizzle.

🚀 Usage

1. Start the Backend Server

cd server/Tools
python Tool.py

The API server will start on http://localhost:8000

2. Run the News Pipeline

cd ai
node pipline.js

This will:

Search for news articles on "India News" (configurable)
Scrape the full content of discovered articles
Generate a comprehensive AI news report
Save the report to the database

3. API Endpoints

POST `/tool`

Search and scrape news articles

{
  "query": "Your news topic",
  "num_results": 20
}

GET `/`

Health check endpoint

📝 How It Works

News Discovery: The system searches Google News for articles related to your specified topic
Content Extraction: Playwright scrapes the full text content from discovered news URLs
AI Analysis: GPT-5 analyzes the raw content and identifies 5-8 diverse story angles
Research Enhancement: For each angle, the AI conducts additional web searches for comprehensive coverage
Report Generation: A professional news report is generated with proper journalistic structure
Database Storage: The final report is stored in PostgreSQL for future reference

🔧 Configuration

Customizing News Topics

Edit the topic in /ai/pipline.js:

let dump = await axios.post("http://localhost:8000/tool", {
  query: "Your Custom Topic", // Change this
  num_results: 20,
});

Database Schema

The news reports are stored using this schema:

export const news = pgTable("news", {
  id: integer("id").primaryKey(),
  modifiedAt: timestamp("modified_at").defaultNow(),
  content: text("content").notNull(),
});

Taza Khabar - Bringing you fresh perspectives on the news that matters. 📰✨

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
ai		ai
frontend/taza-khabar		frontend/taza-khabar
server		server
.gitignore		.gitignore
ReadME.md		ReadME.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗞️ Taza Khabar

🌟 Features

🏗️ Architecture

🤖 AI Engine (`/ai`)

🖥️ Backend Server (`/server`)

🛠️ Technology Stack

AI & Language Models

Backend & API

Database & ORM

Development & Utilities

📋 Prerequisites

⚙️ Installation

1. Clone the Repository

2. Set Up the AI Engine

3. Set Up the Backend Server

4. Install Custom Google Search

5. Environment Configuration

6. Database Setup

🚀 Usage

1. Start the Backend Server

2. Run the News Pipeline

3. API Endpoints

POST `/tool`

GET `/`

📝 How It Works

🔧 Configuration

Customizing News Topics

Database Schema

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🗞️ Taza Khabar

🌟 Features

🏗️ Architecture

🤖 AI Engine (/ai)

🖥️ Backend Server (/server)

🛠️ Technology Stack

AI & Language Models

Backend & API

Database & ORM

Development & Utilities

📋 Prerequisites

⚙️ Installation

1. Clone the Repository

2. Set Up the AI Engine

3. Set Up the Backend Server

4. Install Custom Google Search

5. Environment Configuration

6. Database Setup

🚀 Usage

1. Start the Backend Server

2. Run the News Pipeline

3. API Endpoints

POST /tool

GET /

📝 How It Works

🔧 Configuration

Customizing News Topics

Database Schema

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

🤖 AI Engine (`/ai`)

🖥️ Backend Server (`/server`)

POST `/tool`

GET `/`