Skip to content

feixieliz/hybrid_query_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid Query System (HQS)

A natural language query system that intelligently combines SQL and vector search to answer questions over structured and unstructured data.

Overview

HQS uses LLM-powered query analysis to route user questions to the most appropriate data source:

  • SQL queries for structured data (e.g., "What are the top-selling products?")
  • Vector search for unstructured documents (e.g., "How do I set up my TV?")
  • Hybrid mode when both sources are needed

Architecture

User Query → Query Analyzer → Execution → Answer Synthesizer → Response

Components:

  • Query Analyzer: Routes queries using LLM-based intent detection
  • SQL Engine: Generates and executes SQL from natural language
  • Vector Engine: Semantic similarity search with embeddings cache
  • Answer Synthesizer: Combines results into natural language responses

Quick Start

Prerequisites

  • Python 3.11-3.13
  • OpenAI API key

Installation

# Install dependencies
make dev-install

# Set up environment
echo "OPENAI_API_KEY=your-key-here" > .env

# Initialize demo data
make setup

Run API Server

# Production mode
make run

# Development mode with auto-reload
make run-dev

API available at http://localhost:8000

CLI Usage

Normal Mode (single query):

# Basic query
hqs "What are the top-selling products?"

# With real-time streaming progress
hqs --verbose "How do I set up my TV?"

# With detailed summary (shows SQL queries, sources, etc.)
hqs --details "What products cost less than $50?"

Interactive Mode (persistent session):

# Start interactive session
hqs --interactive

# With initial settings
hqs -i --verbose --details

Interactive mode commands:

  • Type your question to get an answer
  • verbose - Toggle real-time streaming mode
  • details - Toggle detailed summary mode
  • stats - View system statistics
  • quit or exit - Exit session

API Endpoints

  • POST /api/v1/query - Process natural language query
  • POST /api/v1/query/stream - Stream query with SSE progress updates
  • POST /api/v1/analyze - Analyze query without execution
  • POST /api/v1/sql - Direct SQL generation and execution
  • POST /api/v1/vector - Direct vector similarity search
  • GET /api/v1/health - System health check
  • GET /api/v1/stats - Database and corpus statistics

Documentation: http://localhost:8000/docs

Example Request

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the top-selling products?"}'

Testing

# Run all tests
make test

# Run with coverage
make test-coverage

# API tests only
make test-api

Test markers: unit, integration, api, slow, requires_openai

Project Structure

src/hqs/
├── api/              # FastAPI REST layer
├── core.py           # Main orchestrator
├── query_analyzer.py # Query routing logic
├── sql_engine.py     # SQL generation & execution
├── vector_engine.py  # Vector similarity search
├── answer_synthesizer.py  # Answer synthesis
├── llmclient.py      # LLM abstraction
├── cli.py            # Command-line interface
└── prompts/          # LLM prompt templates

data/
├── products.db       # SQLite database (e-commerce)
├── corpus/           # Text documents
└── embeddings.json   # Cached embeddings

tests/                # Comprehensive test suite

Configuration

Settings via environment variables (.env):

  • OPENAI_API_KEY - Required for LLM operations
  • API_HOST - Server host (default: 0.0.0.0)
  • API_PORT - Server port (default: 8000)
  • DATABASE_PATH - SQLite database path
  • VECTOR_CORPUS_DIR - Document corpus directory
  • LLM_MODEL - OpenAI model (default: gpt-4o-mini)

Development

# Clean build artifacts
make clean

# Reset demo data
make setup-db
make setup-demo

Tech Stack

  • Framework: FastAPI, Uvicorn
  • LLM: OpenAI (gpt-4o-mini)
  • Database: SQLite
  • Vector: OpenAI Embeddings (text-embedding-3-small)
  • Testing: pytest, pytest-asyncio, pytest-cov
  • Package Manager: uv

License

MIT

About

A hybrid query system combining vector and sql databases.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors