Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Build text embedding and semantic search 🔍

Open In Colab GitHub

In this example, we will build index flow from text embedding from local markdown files, and query the index.

We appreciate a star ⭐ at CocoIndex Github if this is helpful.

Steps

🌱 A detailed step by step tutorial can be found here: Get Started Documentation

Indexing Flow

Screenshot 2025-05-19 at 5 48 28 PM
  1. We will ingest a list of local files.
  2. For each file, perform chunking (recursively split) and then embedding.
  3. We will save the embeddings and the metadata in Postgres with PGVector.

Query

We will match against user-provided text by a SQL query, and reuse the embedding operation in the indexing flow.

Prerequisite

Install Postgres if you don't have one.

Run

Install dependencies:

pip install -e .

Setup:

cocoindex setup main

Update index:

cocoindex update main

Run:

python main.py

CocoInsight

I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with Zero pipeline data retention. Run following command to start CocoInsight:

cocoindex server -ci main

Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.