Name	Name	Last commit message	Last commit date
parent directory ..
markdown_files	markdown_files
.env	.env
README.md	README.md
Text_Embedding.ipynb	Text_Embedding.ipynb
main.py	main.py
pyproject.toml	pyproject.toml

Name

Last commit message

Last commit date

Build text embedding and semantic search 🔍

In this example, we will build index flow from text embedding from local markdown files, and query the index.

We appreciate a star ⭐ at CocoIndex Github if this is helpful.

Steps

🌱 A detailed step by step tutorial can be found here: Get Started Documentation

Indexing Flow

We will ingest a list of local files.
For each file, perform chunking (recursively split) and then embedding.
We will save the embeddings and the metadata in Postgres with PGVector.

Query

We will match against user-provided text by a SQL query, and reuse the embedding operation in the indexing flow.

Prerequisite

Install Postgres if you don't have one.

Run

Install dependencies:

pip install -e .

Setup:

cocoindex setup main

Update index:

cocoindex update main

Run:

python main.py

CocoInsight

I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with Zero pipeline data retention. Run following command to start CocoInsight:

cocoindex server -ci main

Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Build text embedding and semantic search 🔍

Steps

Indexing Flow

Query

Prerequisite

Run

CocoInsight

FilesExpand file tree

text_embedding

Directory actions

More options

Directory actions

More options

Latest commit

History

text_embedding

Folders and files

parent directory

README.md

Build text embedding and semantic search 🔍

Steps

Indexing Flow

Query

Prerequisite

Run

CocoInsight