A modern, AI-powered real estate recommendation engine built with vector search and geospatial capabilities.
HomeRecoEngine is an intelligent real estate recommendation system that combines semantic search, geospatial analysis, and machine learning to provide personalized property recommendations. Built with FastAPI and Milvus vector database, it offers powerful search capabilities including natural language queries, location-based searches, and advanced filtering.
- Semantic Search: Natural language queries like "spacious apartment near subway with good schools"
- Geospatial Search: Find properties within specific radius (1-50km) from any location
- Hybrid Search: Combine semantic understanding with precise filtering
- Real-time Results: Sub-second response times with vector indexing
- Precise Distance Calculation: Haversine formula for accurate geographic distances
- Circular & Rectangular Area Search: Flexible geographic boundary options
- Coordinate Support: WGS84 coordinate system compatibility
- Multi-format Input: Support for various address and coordinate formats
- Bulk Import: Excel/CSV file upload with validation and deduplication
- Real-time CRUD: Create, read, update, delete operations via REST API
- Data Validation: Comprehensive input validation and error handling
- Scalable Storage: Milvus vector database for high-performance operations
- Embedding Models: Support for multiple embedding models (BGE, FastEmbed, etc.)
- Intelligent Matching: Vector similarity matching for personalized recommendations
- Multi-language Support: Chinese and English text processing
- Contextual Understanding: Deep semantic analysis of property descriptions
- Python 3.12+
- Milvus 2.5.11+ (Vector Database)
- Docker & Docker Compose (For Milvus)
- 8GB+ RAM (Recommended)
uv is a fast Python package manager, recommended for managing project dependencies.
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"# Clone the repository
git clone https://github.com/yourusername/HomeRecoEngine.git
cd HomeRecoEngine
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv sync# Download official Milvus Docker Compose file
wget https://github.com/milvus-io/milvus/releases/download/v2.5.14/milvus-standalone-docker-compose.yml -O docker-compose.yml
# Start Milvus
docker-compose up -d
# Verify Milvus is running
docker-compose psEdit the configuration file conf/service_conf.yaml:
# Milvus Configuration
milvus:
hosts: 'http://127.0.0.1:19530' # Milvus server address
username: 'root' # Default username
password: 'Milvus' # Default password
db_name: '' # Database name (optional)
# API Server Configuration
home_recommendation:
host: 0.0.0.0 # Listen on all interfaces
http_port: 7001 # API port
# Embedding Model Configuration
user_default_llm:
embedding_model: 'BAAI/bge-large-zh-v1.5@BAAI' # Default embedding modelThe system requires NLTK data files. If the project includes a nltk folder, copy it to your user directory:
# Copy NLTK data to user directory (if nltk folder exists in project root)
cp -r nltk /home/$(whoami)/
# Alternative: Set NLTK_DATA environment variable
export NLTK_DATA=/path/to/project/nltkThe system will automatically download embedding models from Hugging Face to:
- Default path:
~/.cache/huggingface/transformers/ - Custom path: Set
TRANSFORMERS_CACHEenvironment variable
For users in China, use mirror to speed up downloads:
export HF_ENDPOINT=https://hf-mirror.com# Start the API server
uv run python -m api.app
# Or with debug mode
uv run python -m api.app --debug
# The API will be available at http://localhost:7001
# Swagger UI: http://localhost:7001/docs- Swagger UI: http://localhost:7001/docs
- ReDoc: http://localhost:7001/redoc
POST /api/houses/searchFind properties within 5km radius:
{
"location": {
"center_longitude": 116.3974,
"center_latitude": 39.9093,
"radius_km": 5.0
},
"price_range": {
"min_price": 300,
"max_price": 800
},
"limit": 20
}Semantic search:
{
"user_query_text": "luxury apartment near subway with good schools",
"price_range": {
"min_price": 500,
"max_price": 1200
},
"limit": 15
}POST /api/houses/insert # Add single property
POST /api/houses/batch-insert # Add multiple properties
GET /api/houses/detail/{id} # Get property details
DELETE /api/houses/{id} # Delete propertyPOST /api/houses/upload-excel # Upload Excel file
POST /api/houses/preview-excel # Preview data before importFor complete API documentation, see API_DOCUMENTATION.md.
import requests
# Initialize client
BASE_URL = "http://localhost:7001/api/houses"
# Search properties near a location
def search_nearby_properties(lng, lat, radius_km=5.0):
response = requests.post(f"{BASE_URL}/search", json={
"location": {
"center_longitude": lng,
"center_latitude": lat,
"radius_km": radius_km
},
"price_range": {"min_price": 300, "max_price": 800},
"limit": 20
})
return response.json()
# Semantic search
def semantic_search(query, max_price=1000):
response = requests.post(f"{BASE_URL}/search", json={
"user_query_text": query,
"price_range": {"max_price": max_price},
"limit": 15
})
return response.json()
# Example usage
properties = search_nearby_properties(116.3974, 39.9093, 3.0)
school_properties = semantic_search("school district apartment")interface SearchParams {
location?: {
center_longitude: number;
center_latitude: number;
radius_km: number;
};
user_query_text?: string;
price_range?: {
min_price?: number;
max_price?: number;
};
limit?: number;
}
async function searchProperties(params: SearchParams) {
const response = await fetch('/api/houses/search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(params)
});
return await response.json();
}
// Find properties near current location
navigator.geolocation.getCurrentPosition(async (position) => {
const results = await searchProperties({
location: {
center_longitude: position.coords.longitude,
center_latitude: position.coords.latitude,
radius_km: 5.0
},
price_range: { min_price: 300, max_price: 800 },
limit: 20
});
console.log(`Found ${results.data.total} properties nearby`);
});HomeRecoEngine/
โโโ ๐ api/ # API Layer
โ โโโ ๐ apps/ # FastAPI route handlers
โ โโโ ๐ db/ # Database services
โ โ โโโ ๐ services/ # Business logic
โ โโโ ๐ utils/ # API utilities
โโโ ๐ core/ # Core Components
โ โโโ ๐ llm/ # LLM and embedding models
โ โโโ ๐ nlp/ # NLP processing
โ โโโ ๐ prompts/ # AI prompts
โ โโโ ๐ utils/ # Core utilities
โโโ ๐ conf/ # Configuration files
โโโ ๐ reference/ # Documentation and examples
| Component | Technology | Purpose |
|---|---|---|
| Web Framework | FastAPI | High-performance async API |
| Vector Database | Milvus | Similarity search and storage |
| Embedding Models | BGE, FastEmbed | Text vectorization |
| Geospatial | Haversine Formula | Distance calculations |
| Data Processing | Pandas, OpenPyXL | Data import and manipulation |
| AI/ML | Transformers, PyTorch | Natural language processing |
- Search Latency: < 100ms for typical queries
- Vector Indexing: HNSW algorithm for optimal performance
- Concurrent Users: Supports 1000+ concurrent requests
- Data Scale: Tested with 1M+ property records
- Lazy Loading: On-demand model initialization
- Connection Pooling: Efficient database connections
- Caching: Intelligent caching for frequent queries
- Batch Processing: Optimized bulk operations
# Milvus Configuration
MILVUS_HOST=127.0.0.1
MILVUS_PORT=19530
# API Configuration
API_HOST=0.0.0.0
API_PORT=7001
# Model Configuration
EMBEDDING_MODEL=BAAI/bge-large-zh-v1.5
HF_ENDPOINT=https://hf-mirror.com # For users in China
TRANSFORMERS_CACHE=~/.cache/huggingface/transformers/
# NLTK Configuration
NLTK_DATA=/home/$(whoami)/nltk # NLTK data directoryEdit conf/service_conf.yaml:
# Vector Database Settings
milvus:
hosts: "http://127.0.0.1:19530"
username: "root"
password: "Milvus"
# Embedding Model Settings
user_default_llm:
embedding_model: "BAAI/bge-large-zh-v1.5@BAAI"
# API Server Settings
home_recommendation:
host: "0.0.0.0"
http_port: 7001# Run all tests
python -m pytest tests/
# Run with coverage
python -m pytest tests/ --cov=api --cov-report=html
# Run specific test categories
python -m pytest tests/test_search.py # Search functionality
python -m pytest tests/test_geospatial.py # Location features
python -m pytest tests/test_api.py # API endpoints# Test the API server
python simple_test.py
# Test search functionality
python api/db/services/example_usage.py{
"id": 1001,
"xqmc": ["Community Name"],
"qy": "District",
"dz": "Full Address",
"jd": 116.3974, // Longitude
"wd": 39.9093, // Latitude
"mj": 95.6, // Area (sqm)
"fyhx": "3BR2BA", // Layout
"zj": 650.5, // Total Price (10k CNY)
"dj": 6800, // Unit Price (CNY/sqm)
"lc": "15/30F", // Floor
"cx": "South-North", // Orientation
"zxqk": "Renovated", // Renovation
"ywdt": "Yes", // Elevator
"ywcw": "Yes", // Parking
"xqtd": "School district, near subway", // Features
"zb": "Mature commercial area nearby" // Surroundings
}| Column | Required | Description | Example |
|---|---|---|---|
| id | โ | Unique identifier | 1001 |
| xqmc | โ | Community name | "Sunshine Garden" |
| qy | โ | District | "Chaoyang" |
| jd | โ | Longitude | 116.3974 |
| wd | โ | Latitude | 39.9093 |
| mj | โ | Area (sqm) | 95.6 |
| zj | โ | Total price (10k CNY) | 650.5 |
| ... |
We welcome contributions! Please see our Contributing Guide for details.
# Fork and clone the repository
git clone https://github.com/yourusername/HomeRecoEngine.git
cd HomeRecoEngine
# Install development dependencies
uv sync --dev
# Install pre-commit hooks
pre-commit install
# Run tests before committing
python -m pytest tests/- Python: Follow PEP 8, use Black formatter
- Type Hints: Use type annotations for all functions
- Documentation: Docstrings for all public methods
- Testing: Maintain >90% test coverage
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Wiki: Project Wiki
- Discord: Join our Discord server
- WeChat: Add WeChat group for Chinese users
- Milvus - Vector database infrastructure
- FastAPI - Modern web framework
- BGE Embeddings - High-quality text embeddings
- BAAI - Pre-trained embedding models
- Multi-city Support: Expand beyond single city deployments
- Real-time Updates: WebSocket for live property updates
- Advanced Analytics: Property market trend analysis
- Mobile App: React Native mobile application
- Machine Learning: Predictive pricing models
- Integration APIs: Connect with major real estate platforms
- GPU Acceleration: CUDA support for embedding models
- Distributed Search: Multi-node Milvus cluster support
- Edge Caching: Redis for frequently accessed data
- Auto-scaling: Kubernetes deployment configurations
1. Milvus Connection Failed
# Check if Milvus is running
docker-compose ps
# Check Milvus logs
docker-compose logs milvus-standalone2. Model Download Issues
# Use China mirror
export HF_ENDPOINT=https://hf-mirror.com
# Check model cache directory
ls -la ~/.cache/huggingface/transformers/3. Port Already in Use
# Change port in conf/service_conf.yaml
home_recommendation:
http_port: 8080 # Use different port4. NLTK Data Not Found
# Copy NLTK data to user directory
cp -r nltk /home/$(whoami)/
# Or set environment variable
export NLTK_DATA=/path/to/project/nltk
# Verify NLTK data location
python -c "import nltk; print(nltk.data.path)"Follow Conventional Commits:
feat:- New featuresfix:- Bug fixesdocs:- Documentation changesstyle:- Code style changesrefactor:- Code refactoringtest:- Test additions/modificationschore:- Build process or auxiliary tool changes
This project is proprietary software. All rights reserved.
โญ Star this repository if you find it helpful!
Made with โค๏ธ by the HomeRecoEngine team
๐ Homepage โข ๐ Docs โข ๐ Issues โข ๐ฌ Discussions