In this example, we will build index flow from text embedding from local markdown files, and query the index.
We appreciate a star ⭐ at CocoIndex Github if this is helpful.
🌱 A detailed step by step tutorial can be found here: Get Started Documentation
- We will ingest a list of local files.
- For each file, perform chunking (recursively split) and then embedding.
- We will save the embeddings and the metadata in Postgres with PGVector.
We will match against user-provided text by a SQL query, and reuse the embedding operation in the indexing flow.
Install Postgres if you don't have one.
Install dependencies:
pip install -e .Setup:
cocoindex setup mainUpdate index:
cocoindex update mainRun:
python main.pyI used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with Zero pipeline data retention. Run following command to start CocoInsight:
cocoindex server -ci main
Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.