diff --git a/examples/olostep-craigslist-analysis/.env.example b/examples/olostep-craigslist-analysis/.env.example new file mode 100644 index 00000000..1b0abd46 --- /dev/null +++ b/examples/olostep-craigslist-analysis/.env.example @@ -0,0 +1,10 @@ +# .env.example + +# TODO: Get your E2B API key from https://e2b.dev/docs +E2B_API_KEY="" + +# TODO: Get your Olostep API key from https://olostep.com +OLOSTEP_API_KEY="" + +# TODO: Get your Google AI Studio API key from https://aistudio.google.com +GEMINI_API_KEY="" \ No newline at end of file diff --git a/examples/olostep-craigslist-analysis/README.md b/examples/olostep-craigslist-analysis/README.md new file mode 100644 index 00000000..7ddb9238 --- /dev/null +++ b/examples/olostep-craigslist-analysis/README.md @@ -0,0 +1,237 @@ +# ๐Ÿ  Craigslist Housing Analysis with Olostep + Gemini + E2B + +This example demonstrates how to scrape and analyze Craigslist housing data using the **Olostep API** for web scraping, **Google Gemini 2.0 Flash** for AI-powered data extraction, and **E2B Code Interpreter** for data analysis and visualization. + +## ๐ŸŽฏ What it does + +1. **๐Ÿ•ท๏ธ Web Scraping**: Uses Olostep API to fetch raw HTML content from Craigslist housing pages +2. **๐Ÿค– AI Extraction**: Feeds HTML to Gemini AI to extract structured JSON data from JSON-LD schemas +3. **๐Ÿงน Data Cleaning**: Processes and filters extracted data, removing null fields and duplicates +4. **๐Ÿ“Š Analysis**: Uses E2B's Python sandbox to analyze housing data with pandas, matplotlib, and seaborn +5. **๐Ÿ“ˆ Visualization**: Generates publication-ready charts and insights about SF Bay Area housing market + +## ๐Ÿš€ Features + +- **Real-time scraping** of Craigslist housing data across multiple Bay Area regions +- **AI-powered extraction** of property details from JSON-LD structured data +- **Smart data processing** that only shows fields with actual values (no null clutter) +- **Rich visualizations** including property type distribution, location analysis, and market insights +- **Robust error handling** for API failures and data validation +- **Clean output** with structured JSON data and multiple visualization charts + +## ๐Ÿ“‹ Prerequisites + +- Node.js 18+ installed +- TypeScript support +- API Keys for: + - **Olostep API** (get from [Olostep](https://olostep.com)) + - **Google Gemini API** (get from [Google AI Studio](https://makersuite.google.com/app/apikey)) + - **E2B API** (get from [E2B](https://e2b.dev)) + +## ๐Ÿ”ง Setup + +1. **Install dependencies:** + ```bash + npm install + ``` + +2. **Set up environment variables:** + Create a `.env` file in the project root: + ```bash + OLOSTEP_API_KEY=your_olostep_api_key_here + GEMINI_API_KEY=your_gemini_api_key_here + E2B_API_KEY=your_e2b_api_key_here + ``` + +3. **Run the analysis:** + ```bash + npm run start + ``` + +## ๐Ÿ“Š Generated Outputs + +The analysis creates several files: + +### ๐Ÿ“„ Data Files +- `craigslist_listings.json` - Structured property data extracted from Craigslist (generated after running) +- `sample_craigslist_listings.json` - Sample output data to show expected format + +### ๐Ÿ“ˆ Visualization Charts +When you run the analysis, it generates multiple PNG files with visualizations: +- Property type distribution (pie chart) +- Location frequency analysis (bar chart) +- Price distribution histogram (when price data is available) +- Bedroom/bathroom distribution analysis +- Property features breakdown +- Market insights and trends + +## ๐Ÿ” How It Works + +### 1. Web Scraping with Olostep +```typescript +// Scrapes multiple Craigslist regions +const searchUrls = [ + 'https://sfbay.craigslist.org/search/rea#search=2~gallery~40', // Real estate + 'https://sfbay.craigslist.org/search/eby/apa#search=2~gallery~56', // Apartments +] + +const response = await fetch('https://api.olostep.com/v1/scrapes', { + method: 'POST', + headers: { + 'Authorization': `Bearer ${OLOSTEP_API_KEY}`, + 'Content-Type': 'application/json' + }, + body: JSON.stringify({ + formats: ["markdown", "html"], + wait_before_scraping: 3000, + url_to_scrape: searchUrl + }) +}) +``` + +### 2. AI-Powered Data Extraction with Gemini +```typescript +// Extracts structured data from HTML using Gemini AI +const extractionPrompt = ` +Parse this Craigslist page and extract real estate listings from JSON-LD structured data. +LOOK FOR: