Fresh News, Powered by AI
Taza Khabar is an intelligent news aggregation and analysis system that combines web scraping, AI-powered journalism, and automated reporting to deliver comprehensive, multi-perspective news reports on any topic.
- 🔍 Intelligent News Discovery: Automatically searches and discovers relevant news articles from across the web
- 🤖 AI-Powered Journalism: Uses GPT-4 to analyze raw news data and generate professional, multi-angle news reports
- 📊 Multi-Perspective Analysis: Extracts diverse viewpoints - economic, political, social, technological, and human interest angles
- 🗄️ Database Integration: Stores generated reports in PostgreSQL for future reference and analysis
- 🔗 Web Scraping: Extracts full article content using Playwright for comprehensive data gathering
- ⚡ Real-time Processing: Fast API-based backend for quick news processing and report generation
The project consists of two main components:
- News Agent: Core AI journalist that generates comprehensive reports
- Pipeline: Orchestrates the entire news processing workflow
- Tools: Web search capabilities integrated with AI models
- Database: PostgreSQL integration with Drizzle ORM
- FastAPI Server: RESTful API for news search and scraping
- Search Tool: Google search integration for news discovery
- Scrape Tool: Playwright-based web scraping for article content
- Custom Google Search: Modified Google search library for news-specific queries
- OpenAI GPT-4: For intelligent news analysis and report generation
- AI SDK: Modern AI integration framework
- Zod: Schema validation for AI tool parameters
- FastAPI: High-performance Python web framework
- Playwright: Modern web scraping and browser automation
- Axios: HTTP client for API communication
- PostgreSQL: Robust relational database for news storage
- Drizzle ORM: Type-safe database operations
- pg: PostgreSQL client for Node.js
- Node.js: JavaScript runtime for AI components
- Python 3.13: Backend server runtime
- dotenv: Environment variable management
- Node.js (v16 or higher)
- Python 3.13+
- PostgreSQL database
- OpenAI API key
git clone https://github.com/aneeshpatne/Taza-Khabar.git
cd Taza-Khabarcd ai
npm installcd ../server
pip install fastapi uvicorn playwright pydantic
# Install Playwright browsers
playwright installcd server/googlesearch
pip install -e .Create a .env file in the /ai directory:
OPENAI_API_KEY=your_openai_api_key_here
DATABASE_URL=postgresql://username:password@localhost:5432/taza_khabarSet up your PostgreSQL database and run the schema migrations using Drizzle.
cd server/Tools
python Tool.pyThe API server will start on http://localhost:8000
cd ai
node pipline.jsThis will:
- Search for news articles on "India News" (configurable)
- Scrape the full content of discovered articles
- Generate a comprehensive AI news report
- Save the report to the database
Search and scrape news articles
{
"query": "Your news topic",
"num_results": 20
}Health check endpoint
- News Discovery: The system searches Google News for articles related to your specified topic
- Content Extraction: Playwright scrapes the full text content from discovered news URLs
- AI Analysis: GPT-5 analyzes the raw content and identifies 5-8 diverse story angles
- Research Enhancement: For each angle, the AI conducts additional web searches for comprehensive coverage
- Report Generation: A professional news report is generated with proper journalistic structure
- Database Storage: The final report is stored in PostgreSQL for future reference
Edit the topic in /ai/pipline.js:
let dump = await axios.post("http://localhost:8000/tool", {
query: "Your Custom Topic", // Change this
num_results: 20,
});The news reports are stored using this schema:
export const news = pgTable("news", {
id: integer("id").primaryKey(),
modifiedAt: timestamp("modified_at").defaultNow(),
content: text("content").notNull(),
});Taza Khabar - Bringing you fresh perspectives on the news that matters. 📰✨