This application implements an advanced company search and ranking system using a hybrid search approach that combines vector similarity search and traditional full-text search. The system uses PostgreSQL with PgVector extension for integrated storage. This approach ensures both semantic relevance and keyword accuracy in search results, making it particularly effective for company discovery and ranking.
The system leverages a sophisticated hybrid search architecture that:
- Uses Pinecone embeddings to convert company descriptions into vector representations
- Uses PostgreSQL with PgVector for vector search and storage
- Implements PostgreSQL's full-text search capabilities for keyword matching
- Combines both approaches with a weighted scoring system for optimal ranking
- Utilizes Groq's LLaMA models for intelligent search result processing and summarization
- Utilizes Redis for caching frequently accessed data to improve performance
This hybrid approach provides more accurate and contextually relevant results compared to traditional keyword-only search systems.
- Backend: FastAPI
- Database: PostgreSQL with pgvector extension, Redis for caching
- Vector Embeddings: Pinecone Inference API
- LLM Processing: Groq (tool-calling)
- Web Search: SerpApi (Google search)
- MCDA Re‑ranking: Haskell microservice (TOPSIS)
- Frontend: React + Tailwind CSS
- Containerization: Docker
- ORM: SQLAlchemy
The project includes a lightweight Haskell microservice that performs Multi‑Criteria Decision Analysis (MCDA) re‑ranking using the TOPSIS method. The backend can call this service to re‑order search results when the user selects the MCDA sort option.
- Accepts a list of candidate companies with simple numeric features (e.g.,
relevance,text,location,industry) and optional weights. - Returns the same candidates ranked by their MCDA score, plus a short explanation.
- Code:
HaskellMCDA/ - Default port:
8081 - Backend env var for service URL:
MCDA_URL(default:http://mcda:8081in Docker)
- Endpoint:
POST /rank - Request (JSON):
{
"candidates": [
{
"id": 12,
"features": {
"relevance": 0.82,
"text": 0.6,
"location": 1,
"industry": 1
}
},
{
"id": 7,
"features": {
"relevance": 0.76,
"text": 0.75,
"location": 0,
"industry": 1
}
}
],
"weights": {
"relevance": 0.6,
"text": 0.25,
"location": 0.1,
"industry": 0.05
},
"method": "topsis"
}- Response (JSON):
{
"rankedCandidates": [
{ "id": 12, "score": 0.83 },
{ "id": 7, "score": 0.78 }
],
"explanation": "Ranked using TOPSIS with weights: relevance:0.6, text:0.25, location:0.1, industry:0.05"
}- From the repository root (where
docker-compose.ymllives):
docker-compose up --build -d- This starts
mcda(Haskell),backend(FastAPI),frontend(React),db(PostgreSQL/pgvector) andredis. - The backend is configured with
MCDA_URL=http://mcda:8081and will call MCDA when the frontend requestssort_by=mcda.
Prerequisites: GHC + Cabal (install with ghcup), then:
cd HaskellMCDA
cabal update
cabal build
cabal run- The service will start on
http://localhost:8081.
With the service running locally:
curl -s http://localhost:8081/rank \
-H 'Content-Type: application/json' \
-d '{
"candidates": [
{"id": 1, "features": {"relevance": 0.9, "text": 0.6, "location": 1}},
{"id": 2, "features": {"relevance": 0.7, "text": 0.8, "location": 0}}
],
"weights": {"relevance": 0.6, "text": 0.3, "location": 0.1},
"method": "topsis"
}'- Frontend: the search page has a “Sort” dropdown. Choose
MCDAto request MCDA re‑ranking. - Backend:
POST /search-companyacceptssort_by. Whensort_by=mcda, it forwards candidates to the Haskell MCDA service and reorders the results. If the MCDA service is unavailable, the backend logs an error and falls back to the original order.
- Backend (FastAPI): basic pytest unit tests live in
Backend/tests/. Run:cd Backend && python -m pytest -q. - Frontend (React): Jest + React Testing Library tests live in
Frontend/src/__tests__/and component folders. Run:cd Frontend && npm test -- --watchAll=false(ornpm run test:coverage).
- Hybrid search combining vector similarity and full-text search
- Real-time company ranking based on search relevance
- Company information management (add/search) and retrieval
- LLM powered tool calling
- Docker-based application deployment
- Pinecone embeddings
- Groq as LLM provider
- Docker and Docker Compose
- Pinecone API key (for embeddings)
- Groq API key (for Groq LLM models)
- Serp API key (web search)
-
Create a
.envfile in the Backend directory with the following variables:PINECONE_API_KEY=your_pinecone_api_key GROQ_API_KEY=your_groq_api_key SERP_API_KEY=your_serp_api_key
-
Clone the repository:
git clone <repo_link> cd Investment-Search
-
Build and start the containers:
docker-compose up --build
-
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
The application provides several options for database setup:
-
Reset Database and Load Sample Data:
command: > bash -c " python scripts/flush_redis.py && python scripts/reset_db.py && python scripts/load_data.py && uvicorn main:app --host 0.0.0.0 --port 8000 --reload "
-
Load Sample Data Only:
command: > bash -c " python scripts/load_data.py && uvicorn main:app --host 0.0.0.0 --port 8000 --reload "
-
Start Application Only:
command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
The system follows a service‑oriented architecture. Users search via the React frontend; the FastAPI backend orchestrates tool calls (DB search or Web search) through the Chat Service and optionally re‑ranks with the Haskell MCDA service. Results are cached in Redis.
flowchart TD
U[User] --> FE[React Frontend]
FE -->|"POST /search-company"| API[FastAPI Backend]
FE -->|"GET /companies"| API
API --> Chat[Chat Service]
API <--> RC[(Redis Cache)]
Chat -->|"web_search=true"| Serp[SerpApi]
Chat -->|"db search"| PGS[PostgreSQL Searcher]
PGS --> PG[(PostgreSQL + pgvector)]
API -->|"Add Company"| Emb[Embedding Service]
Emb --> Pine[Pinecone Inference]
Emb --> PG
API -->|"sort_by=mcda"| MCDA[Haskell MCDA - TOPSIS]
MCDA --> API
Serp -->|"optional name match"| PG
Flow highlights:
- Frontend can request database search or web search (
web_search=true). - When
sort_by=mcda, backend sends candidates and weights to the Haskell MCDA service for re‑ranking. - Responses (summary + ranked companies) are cached in Redis.
- Frontend Layer: React-based user interface
- API Layer: FastAPI backend with REST endpoints
- Service Layer: Modular services for different functionalities
- Data Layer: PostgreSQL with pgvector for storage and search
- External APIs: Third-party services for embeddings and LLM processing
Adding Companies:
- Company data → PostgreSQL (with embeddings)
Search Process:
- User query → Embedding generation
- Vector search → PostgreSQL
- Results → LLM processing
- Final response → User