This repository contains a FastAPI application that provides endpoints for data clustering and similarity search operations.
- Clustering: Performs DBSCAN clustering on CSV datasets.
- Similarity Search: Conducts vector embedding and similarity search for a given query in a CSV dataset.
-
Clone the repository: git clone https://github.com/yourusername/curotec_task.git cd curotec_task
-
Install the required dependencies: pip install -r requirements.txt
Run the FastAPI application: python src/main.py
The server will start at http://127.0.0.1:8000/.
- GET / : Welcome message and basic API information.
- POST /clustering : Perform DBSCAN clustering on a CSV file.
- Parameters:
file: CSV file (required)params: JSON string of clustering parameters (optional)
- Parameters:
- POST /similarity : Perform similarity search on text data in a CSV file.
- Parameters:
file: CSV file (required)params: JSON string of similarity search parameters (optional)
- Parameters:
- /docs : You can always access the FastAPI docs at
/docs(e.g., http://127.0.0.1:8000/docs). This will provide a "friendly UI" for interacting with the API.
eps_range: List of float values for DBSCAN epsilon parametermin_samples_range: List of integer values for DBSCAN min_samples parameterlabel_column_index: Index of the label column (if exists)max_grid_search_combinations: Maximum number of parameter combinations for grid searchn_components_global: Number of global components for MFAdo_mfa: Boolean flag to enable MFA
text_column: Column name containing text to embedquery_text: Optional text to compare againsttop_k: Number of most similar items to return
- FastAPI
- pandas
- numpy
- scikit-learn
- sentence-transformers
- uvicorn
- pydantic
For a complete list of dependencies, see requirements.txt.