Skip to content

jefffreyli/casava-open

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

261 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CASAVA sublet aggregator

A web application that aggregates sublet listings from Facebook Groups and Craigslist across multiple cities.

Installation

# Clone the repository
git clone this repo
cd casava-v2

# Install dependencies
yarn install

Running the Application

Development mode

Run frontend

cd frontend
yarn dev

Run backend

cd backend
yarn dev

Production build

yarn build
yarn start

Backend Structure

  1. Data Collection Layer: A scraper service that fetches listings from Facebook Groups and Craigslist through a CRON job of every x hours
  2. Processing Layer: LLM-powered NLP to parse and structure raw listing data
  3. Storage Layer: PostgreSQL database with Prisma ORM to store structured listing data
  4. API Layer: Express-based REST API to serve data to the frontend

Tech Stack

Backend

  • Node.js & TS: For strongly-typed server development
  • Express: Lightweight web framework for RESTful API endpoints
  • Prisma: Modern ORM for database access
  • PostgreSQL: Relational database for persistent storage
  • Winston: Structured logging
  • dotenv: Environment variable management

Frontend (Existing)

  • Next.js: React framework for production
  • TailwindCSS 4.0: Utility-first CSS framework
  • shadcn/ui: Component library built on Radix UI
  • TypeScript: Type safety throughout the application

DETAILS

Cron Job Server:

  • Node.js server runs scheduled scraping tasks every 12 hours to collect fresh listings without overloading source websites

Data Processing Pipeline:

  • Scrapes 15-20 most recent posts per source (3-4 Facebook groups and Craigslist pages per city)
  • Extracts listing details using LLM without storing original content
  • Normalizes data across different sources
  • Performs safety evaluation based on listing completeness and quality
  • Maintains attribution to original sources

API Layer:

  • RESTful endpoints for listing retrieval and filtering
  • Cached responses to minimize database queries
  • Proper error handling and validation

Code Organization

Node backend:
│ ├── src/ # Source code
│ │ ├── app.ts # Express application setup
│ │ ├── config/ # Configuration files
│ │ ├── controllers/ # API controllers
│ │ ├── middleware/ # Express middleware
│ │ ├── routes/ # API routes
│ │ ├── services/ # Business logic
│ │ ├── types/ # TypeScript type definitions
│ │ └── utils/ # Utility functions

Legal Considerations

CASAVA follows established legal precedents (hiQ v. LinkedIn, Meta v. Bright Data) by only scraping public data without authentication, extracting non-copyrightable facts rather than creative content, maintaining attribution to original sources, and implementing an immediate compliance protocol for any cease-and-desist requests.

Legal Precedents

  • hiQ v. LinkedIn (2022): Public data scraping without authentication may be legal under CFAA, but using fake accounts to bypass restrictions remains illegal.

  • Meta v. Bright Data (2024): Platform terms of service don't apply to users scraping public data while logged out.

  • X Corp v. Bright Data (2024): Terms prohibiting scraping may be unenforceable if they attempt to create "private copyright" over non-owned content.

  • Craigslist v. 3taps (2013): Ignoring cease-and-desist notices and circumventing IP blocks violates CFAA.

  • Facebook v. Power Ventures: Copying entire pages containing copyrighted elements creates liability, even when targeting user content.

  • Feist Publications v. Rural Telephone: Facts are not copyrightable; only creative selection and arrangement can be protected.

About

The search engine for sublets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages