A cutting-edge semantic search system for Airbnb listings, leveraging Superlinked for multi-attribute vector indexing and Qdrant for high-performance vector search. This system enables natural language queries to retrieve relevant listings by embedding different attributes (e.g., price, location, and descriptions) using specialized embedding models, ensuring highly accurate and context-aware search results.
- Semantic Search for Structured Data: Enables natural language queries like "cozy apartments with a view under $150 with rating above 4.5" to retrieve relevant Airbnb listings based on multiple attributes.
- Multi-Attribute Vector Indexing: Each column in the Airbnb dataset is embedded using a specialized model (e.g., float-specific embeddings for price, text embeddings for descriptions).
- Vector Database Integration: Powered by Qdrant for efficient similarity searches.
- RESTful API Endpoints: Built with FastAPI for efficient and scalable backend operations.
- Interactive UI: Uses Streamlit to provide an intuitive front-end interface for searching Airbnb listings.
├── superlinked_app/
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration settings
│ ├── constants.py # Constants definition
│ ├── index.py # Data schema definition
│ ├── query.py # Query configurations
│ └── vdb.py # Vector database setup
├── tools/
│ ├── create_qdrant_database.py # Database initialization tool
│ ├── st_app.py # Enhanced Streamlit UI
│ └── streamlit_app.py # Basic Streamlit UI
├── .env.example # Environment variables
├── app.py # Python file for in memory development when we trun Qdrant vdb off.
├── Makefile # Automation for common development tasks
├── pyproject.toml # Project metadata and dependencies
└── README.md # Project documentation
- Superlinked Framework: For semantic search capabilities
- Qdrant: Vector database for efficient similarity search
- OpenAI API: Natural language understanding
- Streamlit: Interactive web interface
- uv: Fast Python package installer and resolver
We use a publicly available dataset containing Airbnb listings in Stockholm, sourced from Inside Airbnb, which provides detailed metadata about rental properties.
- The dataset includes key attributes such as listing descriptions, prices, locations, property types, availability, and ratings.
- This structured data enables multi-attribute semantic search, allowing users to query Airbnb listings based on various factors like price range, neighborhood, and amenities.
- Python 3.11+
- Qdrant Cloud account (or local Qdrant instance)
- OpenAI API key
- Clone the repository:
git clone https://github.com/AmirLayegh/airbnb-semantic-search.git
cd airbnb-semantic-search- Install dependencies using uv:
# Create a virtual environment (if not created)
uv venv
# Install dependencies from pyproject.toml in editable mode
uv install -e .
# OR (Preferred for installing all dependencies with version locking)
uv sync
- Create a
.envfile in the root directory with your credentials:
QDRANT_API_KEY=your_qdrant_api_key
QDRANT_CLUSTER_URL=your_qdrant_cluster_url
OPENAI_API_KEY=your_openai_api_key
- Update the
DATA_PATHinsuperlinked_app/config.pyto point to your Airbnb listings dataset.
- Create a Qdrant collection (first-time setup):
python tools/create_qdrant_database.pyOr using the Makefile:
make create-qdrant-database- Start the Superlinked server:
python -m superlinked.serverOr using the Makefile:
make start-superlinked-server- Load the data into Qdrant collection:
make load-data- Run the Streamlit UI app:
make streamlit-run- Open your browser at
http://localhost:8501to access the application.
Enter natural language queries in the search bar like:
- "Apartments in old town with a view"
- "Affordable homes under $100 per night"
- "Top-rated places with pool and wifi"
Use the sidebar to set additional filters:
- Price range
- Minimum rating
- Room type preferences
Explore search result analytics including:
- Price distribution
- Rating distribution
- Room type breakdown
- Summary statistics
To add a new field to the search schema:
- Add the field to
DataSchemainindex.py - Create a new similarity space if needed
- Add the field to the index spaces list
- Update the query parameters in
query.py
The Streamlit interface can be customized by editing:
tools/st_app.py(enhanced UI)tools/streamlit_app.py(basic UI)
This project is licensed under the MIT License - see the LICENSE file for details.
This project was inspired by the amazing work from DECODING ML on tabular semantic search. Their tutorial provided valuable insights into leveraging vector databases and multi-attribute indexing for structured data retrieval.

