High-Performance Search API with Flask and Elasticsearch
Features β’ Installation β’ API Documentation β’ Usage
- Overview
- Key Features
- Technology Stack
- System Architecture
- Installation
- Configuration
- API Documentation
- Usage Examples
- Project Structure
- Elasticsearch Setup
- Data Indexing
- Performance
- Troubleshooting
- Contributing
- License
- Contact
FlaskSearch API is a robust, production-ready RESTful API that integrates Flask with Elasticsearch to provide powerful full-text search capabilities. Built specifically for searching through Shakespeare plays, this API demonstrates best practices for implementing search functionality in modern web applications.
The application provides lightning-fast search results, flexible query options, and scalable architecture suitable for handling large datasets and high-traffic scenarios.
- β‘ Fast Search: Elasticsearch-powered sub-second query responses
- π Full-Text Search: Advanced text search with relevance scoring
- π― Flexible Queries: Support for various search patterns and filters
- π Scalable: Handle millions of documents efficiently
- π‘οΈ Production Ready: Error handling and logging included
- π§ Easy Integration: RESTful API for seamless integration
- π Well Documented: Comprehensive documentation and examples
- π High Performance: Optimized for speed and reliability
- Full-Text Search: Search through Shakespeare plays and dialogues
- Fuzzy Matching: Handle typos and misspellings
- Relevance Scoring: Results ranked by relevance
- Field-Specific Search: Search in specific fields (title, author, text)
- Boolean Queries: Combine multiple search terms with AND/OR
- Phrase Search: Exact phrase matching
- Wildcard Search: Pattern-based searching
- Aggregations: Statistical analysis of search results
- RESTful Endpoints: Standard HTTP methods (GET, POST, PUT, DELETE)
- JSON Responses: Structured JSON output
- Pagination: Handle large result sets efficiently
- Filtering: Filter results by various criteria
- Sorting: Sort results by relevance, date, or other fields
- CORS Support: Cross-origin resource sharing enabled
- Error Handling: Comprehensive error messages
- Request Validation: Input validation and sanitization
- Bulk Indexing: Index large datasets efficiently
- Real-Time Updates: Instant data updates
- Document CRUD: Create, read, update, delete operations
- Data Import: JSON data import from files
- Index Management: Create, update, delete indices
- Mapping Configuration: Customize field types and analyzers
- Query Caching: Cache frequent queries
- Connection Pooling: Reuse database connections
- Async Operations: Non-blocking I/O for better performance
- Batch Processing: Handle multiple operations efficiently
- Index Optimization: Optimized Elasticsearch settings
- Response Compression: Reduce bandwidth usage
- Flask: 2.x - Lightweight Python web framework
- Python: 3.8+ - Programming language
- Werkzeug: WSGI utility library
- Jinja2: Template engine
- Elasticsearch: 8.x - Distributed search and analytics engine
- Elasticsearch-py: Official Python client for Elasticsearch
- elasticsearch-dsl: High-level library for Elasticsearch
- Flask-CORS: Cross-Origin Resource Sharing
- Flask-RESTful: REST API building tools
- python-dotenv: Environment variable management
- requests: HTTP library for API calls
- gunicorn: Production WSGI server
- pytest: Testing framework
- JSON: Data interchange format
- CSV: Data import/export (optional)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Application β
β (Web Browser, Mobile App, etc.) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β HTTP Requests
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Flask REST API β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β API Endpoints β β
β β /search /index /document /bulk β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Business Logic Layer β β
β β Query Builder | Validator | Serializer β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β Elasticsearch Client
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Elasticsearch Cluster β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Index 1 β β Index 2 β β Index N β β
β β Shards β β Shards β β Shards β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python: 3.8 or higher
- pip: Python package manager
- Elasticsearch: 8.x or 7.x
- Virtual Environment: venv or virtualenv
- Git: Version control
git clone https://github.com/hamzakhan0712/Elasticsearch_Flask.git
cd Elasticsearch_Flask# Windows
python -m venv API/venv
API\venv\Scripts\activate
# Linux/Mac
python3 -m venv API/venv
source API/venv/bin/activatecd API
pip install -r requirements.txtIf requirements.txt is not available, install these packages:
pip install flask
pip install elasticsearch
pip install flask-cors
pip install python-dotenv
pip install gunicornUsing Docker (Recommended):
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0Manual Installation:
- Download from elastic.co
- Extract and run
bin/elasticsearch(Linux/Mac) orbin\elasticsearch.bat(Windows)
curl http://localhost:9200You should see JSON output with cluster information.
Create a .env file in the API directory:
FLASK_APP=app.py
FLASK_ENV=development
FLASK_DEBUG=True
ELASTICSEARCH_HOST=http://localhost:9200
INDEX_NAME=shakespeareplay
PORT=5000Or edit API/config.py:
INDEX_NAME = 'shakespeareplay'
ESKNN_HOST = 'http://localhost:9200'python index_data.pypython app.pyThe API will be available at: http://localhost:5000
# API/config.py
INDEX_NAME = 'shakespeareplay'
ESKNN_HOST = 'http://localhost:9200'
# Additional configurations
DEBUG = True
HOST = '0.0.0.0'
PORT = 5000# Connection settings
es = Elasticsearch(
hosts=[ESKNN_HOST],
timeout=30,
max_retries=3,
retry_on_timeout=True
)
# Index settings
INDEX_SETTINGS = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings": {
"properties": {
"title": {"type": "text"},
"author": {"type": "keyword"},
"text": {"type": "text", "analyzer": "custom_analyzer"},
"line_number": {"type": "integer"}
}
}
}http://localhost:5000/api
GET /api/search?q={query}&size={number}&from={offset}
Parameters:
- q: Search query (required)
- size: Number of results (default: 10)
- from: Offset for pagination (default: 0)
Response: 200 OK
{
"hits": [
{
"score": 5.23,
"source": {
"title": "Hamlet",
"author": "William Shakespeare",
"text": "To be, or not to be...",
"line_number": 56
}
}
],
"total": 150,
"took": 12
}POST /api/document
Body:
{
"title": "Romeo and Juliet",
"author": "William Shakespeare",
"text": "O Romeo, Romeo, wherefore art thou Romeo?",
"line_number": 33
}
Response: 201 Created
{
"message": "Document indexed successfully",
"id": "abc123",
"index": "shakespeareplay"
}GET /api/document/{id}
Response: 200 OK
{
"id": "abc123",
"source": {
"title": "Romeo and Juliet",
"author": "William Shakespeare",
"text": "O Romeo, Romeo, wherefore art thou Romeo?",
"line_number": 33
}
}PUT /api/document/{id}
Body:
{
"text": "Updated text content"
}
Response: 200 OK
{
"message": "Document updated successfully"
}DELETE /api/document/{id}
Response: 200 OK
{
"message": "Document deleted successfully"
}POST /api/bulk
Body:
[
{
"title": "Play 1",
"author": "Shakespeare",
"text": "Content 1"
},
{
"title": "Play 2",
"author": "Shakespeare",
"text": "Content 2"
}
]
Response: 201 Created
{
"message": "Bulk indexing completed",
"indexed": 2,
"failed": 0
}import requests
BASE_URL = "http://localhost:5000/api"
# Search for documents
response = requests.get(f"{BASE_URL}/search", params={
"q": "to be or not to be",
"size": 5
})
results = response.json()
# Index a new document
document = {
"title": "Macbeth",
"author": "William Shakespeare",
"text": "Out, damned spot! Out, I say!",
"line_number": 1
}
response = requests.post(f"{BASE_URL}/document", json=document)
# Get document by ID
doc_id = "abc123"
response = requests.get(f"{BASE_URL}/document/{doc_id}")
document = response.json()# Search
curl "http://localhost:5000/api/search?q=hamlet&size=10"
# Index document
curl -X POST http://localhost:5000/api/document \
-H "Content-Type: application/json" \
-d '{
"title": "Othello",
"author": "William Shakespeare",
"text": "O, beware, my lord, of jealousy",
"line_number": 165
}'
# Get document
curl http://localhost:5000/api/document/abc123
# Delete document
curl -X DELETE http://localhost:5000/api/document/abc123// Search
const searchResults = await fetch(
'http://localhost:5000/api/search?q=love&size=10'
).then(res => res.json());
// Index document
const response = await fetch('http://localhost:5000/api/document', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
title: 'King Lear',
author: 'William Shakespeare',
text: 'How sharper than a serpent\'s tooth',
line_number: 288
})
});Elasticsearch_Flask/
β
βββ API/ # Main application directory
β βββ venv/ # Virtual environment
β βββ config.py # Configuration settings
β βββ app.py # Flask application entry point
β βββ routes.py # API route definitions
β βββ elasticsearch_client.py # Elasticsearch connection
β βββ models.py # Data models
β βββ utils.py # Utility functions
β βββ requirements.txt # Python dependencies
β
βββ datset.json # Shakespeare plays dataset
βββ index_data.py # Data indexing script
βββ # Flask API Setup Instructions.txt # Setup documentation
βββ .gitignore # Git ignore rules
βββ LICENSE # MIT License
βββ README.md # This file
from elasticsearch import Elasticsearch
es = Elasticsearch(['http://localhost:9200'])
# Create index with mappings
es.indices.create(
index='shakespeareplay',
body={
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {"type": "text"},
"author": {"type": "keyword"},
"text": {"type": "text"},
"line_number": {"type": "integer"},
"act": {"type": "integer"},
"scene": {"type": "integer"},
"speaker": {"type": "keyword"}
}
}
}
)# Refresh index
es.indices.refresh(index='shakespeareplay')
# Force merge segments
es.indices.forcemerge(index='shakespeareplay')
# Update settings
es.indices.put_settings(
index='shakespeareplay',
body={
"index": {
"refresh_interval": "1s"
}
}
)import json
from elasticsearch import Elasticsearch
es = Elasticsearch(['http://localhost:9200'])
# Load dataset
with open('datset.json', 'r') as f:
data = json.load(f)
# Bulk index
for item in data:
es.index(
index='shakespeareplay',
body=item
)
print("Data indexed successfully!")from elasticsearch.helpers import bulk
def generate_docs():
with open('datset.json', 'r') as f:
data = json.load(f)
for item in data:
yield {
"_index": "shakespeareplay",
"_source": item
}
# Bulk index
success, failed = bulk(es, generate_docs())
print(f"Indexed: {success}, Failed: {failed}")- Use Bulk API: Index multiple documents at once
- Disable Refresh: Set
refresh_interval=-1during bulk indexing - Increase Bulk Size: Use larger batch sizes (500-1000)
- Connection Pooling: Reuse connections
- Query Caching: Enable query result caching
- Use Filters: Prefer filters over queries for exact matches
- Limit Fields: Only retrieve necessary fields
- Search Latency: < 50ms for simple queries
- Indexing Speed: 1000+ documents/second
- Concurrent Users: 100+ simultaneous connections
- Storage: ~1KB per Shakespeare line
- Memory: 512MB minimum for Elasticsearch
Issue: Cannot connect to Elasticsearch
Solution:
1. Check if Elasticsearch is running: curl http://localhost:9200
2. Verify ESKNN_HOST in config.py
3. Check firewall settings
4. Ensure Elasticsearch is not using HTTPS (or update URL)Issue: Index not found
Solution:
1. Create the index: python index_data.py
2. Verify index exists: curl http://localhost:9200/_cat/indices
3. Check INDEX_NAME in config.pyIssue: Search returns no results
Solution:
1. Verify data is indexed: curl http://localhost:9200/shakespeareplay/_count
2. Check query syntax
3. Try a simpler search term
4. Refresh index: curl -X POST http://localhost:9200/shakespeareplay/_refreshIssue: Slow search performance
Solution:
1. Reduce result size
2. Add pagination
3. Optimize Elasticsearch settings
4. Increase Elasticsearch memory
5. Use query profiling: add ?profile=trueContributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch
git checkout -b feature/YourFeature
- Commit your changes
git commit -m "Add: Your feature description" - Push to your fork
git push origin feature/YourFeature
- Open a Pull Request
- Follow PEP 8 for Python code
- Write docstrings for all functions
- Add unit tests for new features
- Update documentation
This project is licensed under the MIT License - see the LICENSE file for details.
Developer: Hamza Khan
- GitHub: @hamzakhan0712
- Repository: Elasticsearch_Flask
- Elasticsearch for the powerful search engine
- Flask for the lightweight web framework
- William Shakespeare for the timeless content
- Open Source Community for inspiration and support
- Authentication and authorization (JWT)
- Rate limiting and throttling
- Advanced analytics and aggregations
- Multi-language support
- Autocomplete/suggestions
- Highlighting of search terms
- Export search results (CSV, JSON)
- Docker containerization
- GraphQL API support
- WebSocket for real-time updates
- Admin dashboard
- API documentation with Swagger/OpenAPI
Built with β€οΈ for Full-Text Search Excellence
β Star this repository if you find it helpful!