Skip to content

hanifekaptan/semantic-recipe-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Semantic Recipe Finder

Semantic Recipe Finder is a full-stack application that uses natural language semantic search to find recipes. Built with FastAPI backend and Streamlit frontend, it leverages sentence-transformers and ChromaDB for intelligent recipe discovery.

🌐 Links

✨ Features

  • Semantic Search: Natural language queries powered by sentence-transformers (all-MiniLM-L6-v2)
  • Fast Vector Search: ChromaDB with 384-dimensional embeddings for efficient similarity search
  • RESTful API: FastAPI backend with comprehensive endpoints and OpenAPI documentation
  • Modern UI: Streamlit frontend with responsive recipe cards and detailed views
  • Comprehensive Tests: 61 unit and integration tests with pytest
  • Docker Ready: Multi-container setup with docker-compose for easy deployment

πŸš€ Quickstart (local development)

Prerequisites

  • Python 3.10

Setup

  1. Clone the repository
git clone https://github.com/hanifekaptan/semantic-recipe-finder.git
cd semantic-recipe-finder
  1. Create and activate virtual environment
python -m venv .venv
.venv\Scripts\activate  # Windows
source .venv/bin/activate  # Linux/Mac
  1. Install dependencies
pip install -r requirements.txt
  1. Run the backend (FastAPI)
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000

The API will be available at http://localhost:8000 with interactive docs at /docs.

  1. Run the frontend (Streamlit) (in a new terminal)
streamlit run frontend/app.py --server.port 8501

The UI will be available at http://localhost:8501.

Environment Variables

  • API_BASE_URL: Backend URL for Streamlit frontend (default: http://localhost:8000)
  • LOG_LEVEL: Logging level (default: INFO)

πŸ“‚ Project Structure

semantic-recipe-finder/
β”œβ”€β”€ app/                          # FastAPI backend application
β”‚   β”œβ”€β”€ api/                      # API routes and endpoints
β”‚   β”‚   β”œβ”€β”€ health.py            # Health check endpoint
β”‚   β”‚   β”œβ”€β”€ routes.py            # Recipe detail endpoint
β”‚   β”‚   └── search.py            # Search endpoint
β”‚   β”œβ”€β”€ core/                     # Core configuration and utilities
β”‚   β”‚   β”œβ”€β”€ config.py            # Global configuration and state
β”‚   β”‚   └── logging.py           # Logging setup
β”‚   β”œβ”€β”€ models/                   # Pydantic data models
β”‚   β”‚   β”œβ”€β”€ recipe_card.py       # Recipe card model (search results)
β”‚   β”‚   β”œβ”€β”€ recipe_detail.py     # Full recipe detail model
β”‚   β”‚   β”œβ”€β”€ search_query.py      # Search request model
β”‚   β”‚   └── search_response.py   # Search response model
β”‚   β”œβ”€β”€ services/                 # Business logic services
β”‚   β”‚   β”œβ”€β”€ detail_service.py    # Recipe detail retrieval
β”‚   β”‚   β”œβ”€β”€ loading_service.py   # Data and model loading
β”‚   β”‚   β”œβ”€β”€ search_service.py    # Semantic search logic
β”‚   β”‚   └── vectorstore.py       # ChromaDB operations
β”‚   β”œβ”€β”€ utils/                    # Utility functions
β”‚   β”‚   β”œβ”€β”€ data_preprocessor.py # Text cleaning
β”‚   β”‚   └── vectorizer.py        # Text vectorization
β”‚   └── main.py                   # FastAPI app initialization
β”œβ”€β”€ frontend/                     # Streamlit frontend application
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── client.py            # Backend API client
β”‚   β”œβ”€β”€ components/               # Reusable UI components
β”‚   β”‚   β”œβ”€β”€ header.py            # App header
β”‚   β”‚   β”œβ”€β”€ recipe_card.py       # Recipe card display
β”‚   β”‚   β”œβ”€β”€ recipe_detail.py     # Detailed recipe view
β”‚   β”‚   └── search_bar.py        # Search input
β”‚   β”œβ”€β”€ pages/                    # Streamlit pages
β”‚   β”‚   β”œβ”€β”€ detail.py            # Recipe detail page
β”‚   β”‚   └── search.py            # Search results page
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── utility.py           # Helper functions
β”‚   └── app.py                    # Streamlit app entrypoint
β”œβ”€β”€ data/                         # Data storage
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   └── recipes.csv          # Original recipe dataset
β”‚   └── processed/
β”‚       β”œβ”€β”€ ids_embs.npy         # Recipe IDs
β”‚       β”œβ”€β”€ metadata_embs.npy    # Recipe metadata embeddings
β”‚       └── persist/             # ChromaDB persistent storage
β”œβ”€β”€ docker/                       # Docker configurations
β”‚   β”œβ”€β”€ backend.Dockerfile
β”‚   β”œβ”€β”€ frontend.Dockerfile
β”‚   └── entrypoint.sh
β”œβ”€β”€ tests/                        # Test suite
β”‚   β”œβ”€β”€ integration/             # API integration tests
β”‚   β”‚   └── test_smoke_api.py
β”‚   └── unit/                     # Unit tests
β”‚       β”œβ”€β”€ services/            # Service layer tests
β”‚       └── utils/               # Utility tests
β”œβ”€β”€ docker-compose.yml            # Multi-container orchestration
β”œβ”€β”€ Dockerfile                    # HuggingFace Space Dockerfile
β”œβ”€β”€ requirements.txt              # Python dependencies
└── pytest.ini                    # Pytest configuration

πŸ—οΈ Architecture Overview

Backend (FastAPI)

  • Semantic Search: Uses all-MiniLM-L6-v2 sentence-transformer model for query encoding
  • Vector Database: ChromaDB with DuckDB+Parquet backend for 100 recipe embeddings (384 dimensions)
  • Service Layer: Clean separation between API routes, business logic, and data access
  • Error Handling: Comprehensive exception handling with proper HTTP status codes
  • API Documentation: Auto-generated OpenAPI (Swagger) docs at /docs

Frontend (Streamlit)

  • Component-Based: Modular UI components for search, cards, and detail views
  • API Client: HTTP client with error handling for backend communication
  • Session State: Manages search results and navigation state
  • Responsive Design: Clean, user-friendly interface optimized for recipe browsing

Data Flow

  1. User enters natural language query in Streamlit UI
  2. Frontend sends request to /search endpoint
  3. Backend cleans and vectorizes query text
  4. ChromaDB performs similarity search on recipe embeddings
  5. Top 100 results retrieved from DataFrame
  6. Results paginated and returned to frontend
  7. Frontend displays recipe cards with key information

πŸ§ͺ Testing

The project includes comprehensive test coverage with pytest:

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov=frontend

# Run specific test suite
pytest tests/unit/services/
pytest tests/integration/

Test Statistics:

  • 61 total tests (28 service tests, 18 utility tests, 15 integration tests)
  • Unit tests: Mock-based testing for services and utilities
  • Integration tests: FastAPI TestClient for full API testing
  • All tests passing with proper fixtures and parametrization

🐳 Docker Deployment

Using Docker Compose (Recommended)

docker-compose up --build

This starts both backend (port 8000) and frontend (port 8501) containers.

Individual Containers

# Backend only
docker build -f docker/backend.Dockerfile -t recipe-finder-backend .
docker run -p 8000:8000 recipe-finder-backend

# Frontend only
docker build -f docker/frontend.Dockerfile -t recipe-finder-frontend .
docker run -p 8501:8501 recipe-finder-frontend

HuggingFace Spaces

The root Dockerfile is configured for HuggingFace Spaces deployment with both services.

πŸ“Š Dataset

The application uses a subset of recipe data with:

  • 100 recipes from Food.com dataset
  • Metadata: Name, description, category, ingredients, nutrition facts, ratings
  • Embeddings: Pre-computed 384-dimensional vectors from recipe metadata
  • Storage: ChromaDB persistent storage at data/processed/persist/

πŸ› οΈ Technology Stack

Backend:

  • FastAPI 0.115.6
  • sentence-transformers (all-MiniLM-L6-v2)
  • ChromaDB 0.5.23
  • Pandas, NumPy
  • Pydantic for data validation

Frontend:

  • Streamlit 1.41.1
  • httpx for API calls
  • Python 3.10+

Testing:

  • pytest 9.0.2
  • pytest-asyncio
  • unittest.mock

DevOps:

  • Docker & docker-compose
  • GitHub Actions (coming soon)
  • HuggingFace Spaces deployment

πŸ“ API Endpoints

GET /health

Health check endpoint for monitoring.

Response: { "status": "ok", "ready": true }

POST /search

Semantic search for recipes.

Request Body:

{
  "query": "quick pasta dinner",
  "offset": 0,
  "limit": 20
}

Response:

{
  "search_results": [
    {
      "recipe_id": 123,
      "similarity_score": 0.87,
      "card": {
        "recipe_id": 123,
        "name": "Quick Pasta Carbonara",
        "description": "Creamy pasta dish...",
        "recipe_category": "Main Course",
        "keywords": ["pasta", "quick", "italian"],
        "n_ingredients": 5,
        "total_time_minutes": 20,
        "calories": 450.0,
        "aggregated_rating": 4.5
      }
    }
  ],
  "total_results": 42,
  "offset": 0,
  "limit": 20
}

GET /recipe/{recipe_id}

Get full recipe details.

Response:

{
  "recipe_id": 123,
  "name": "Quick Pasta Carbonara",
  "description": "Creamy pasta dish...",
  "recipe_category": "Main Course",
  "keywords": ["pasta", "quick", "italian"],
  "ingredients": ["spaghetti", "eggs", "bacon", "parmesan", "pepper"],
  "instructions": ["Step 1...", "Step 2..."],
  "n_ingredients": 5,
  "total_time_minutes": 20,
  "calories": 450.0,
  "fat_content": 15.0,
  "protein_content": 25.0,
  "aggregated_rating": 4.5
}

πŸ“„ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

πŸ“§ Contact

Hanife Kaptan - hanifekaptan.dev@gmail.com

Project Link: https://github.com/hanifekaptan/semantic-recipe-finder


⭐ Star this repo if you find it helpful!

About

A vector-based search app that finds recipes using natural-language queries; FastAPI backend and Streamlit frontend.

Topics

Resources

Stars

Watchers

Forks

Contributors