[Detailed description of the project yet to come]
This project implements a CRUD (Create, Read, Update, Delete) application using ColBERT (Contextualized Late Interaction over BERT) for efficient document retrieval and search. The application includes a data portal interface for easy interaction with the ColBERT system.
- Document indexing and retrieval using ColBERT
- Web interface for searching documents
- CRUD operations for document management
- Training and evaluation of ColBERT models
- Unix/Linux environment (recommended)
- Windows users: Consider using WSL or Conda environment
- Python 3.8+
- CUDA-capable GPU (recommended for optimal performance)
- Clone the repository:
git clone git@github.com:ibohaji/Colbert-Crud-App-ess.git
cd Colbert-Crud-App-ess- Create and activate a virtual environment:
# Linux/Unix
python -m venv myenv
source myenv/bin/activate
# Windows (if not using WSL)
# Consider using Conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/- Install dependencies:
pip install -r requirements.txtpython -m dataportal.app_colbertThe portal will be available at http://localhost:5000
- Prepare your documents in JSON format:
{
"doc_id": {
"title": "Document Title",
"text": "Document content..."
}
}- Use the API endpoint:
curl -X POST http://localhost:5000/index \
-H "Content-Type: application/json" \
-d @your_documents.jsonThis project is built using ColBERTv2, an efficient and effective neural search engine:
- Original Paper: "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction"
- Authors: Omar Khattab, Christopher Potts, and Matei Zaharia
- Official Repository: stanford-futuredata/ColBERT
- ColBERT-AI Library: colbert-ai