Name	Name	Last commit message	Last commit date
parent directory ..
source_files	source_files
.env	.env
README.md	README.md
main.py	main.py
pyproject.toml	pyproject.toml

Build visual document index from PDFs and images with ColPali

In this example, we build a visual document indexing flow using ColPali for embedding PDFs and images. and query the index with natural language.

We appreciate a star ⭐ at CocoIndex Github if this is helpful.

Steps

Indexing Flow

We ingest a list of PDF files and image files from the source_files directory.
For each file:
- PDF files: convert each page to a high-resolution image (300 DPI)
- Image files: use the image directly
- Generate visual embeddings for each page/image using ColPali model
We will save the embeddings and metadata in Qdrant vector database.

Query

We will match against user-provided natural language text using ColPali's text-to-visual embedding capability, enabling semantic search across visual document content.

Prerequisite

Install Qdrant if you don't have one running locally.

You can start Qdrant with Docker:

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Run

Install dependencies:

pip install -e .

NOTE: The pdf2image requires poppler to be installed manually. Please refer to this document for the specific installation instructions for your platform.

Setup:

cocoindex setup main

Update index:

cocoindex update main

Run:

python main.py

Data Attribution

The example data files used in this demonstration come from the following sources:

PDF Documents

ArXiv Papers: Research papers sourced from ArXiv, an open-access repository of electronic preprints covering various scientific disciplines.

Image Documents

Healthcare Industry Dataset: Images from the vidore/syntheticDocQA_healthcare_industry_test dataset on Hugging Face, which contains synthetic document question-answering data for healthcare industry documents.
ESG Reports Dataset: Images from the vidore/esg_reports_eng_v2 dataset on Hugging Face, containing Environmental, Social, and Governance (ESG) reports.

We thank the creators and maintainers of these datasets for making their data available for research and development purposes.

About ColPali

This example uses ColPali, a state-of-the-art vision-language model that enables:

Direct visual understanding of document layouts, tables, and figures
Natural language queries against visual document content
No need for OCR or text extraction - works directly with document images

CocoInsight

I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with Zero pipeline data retention. Run following command to start CocoInsight:

cocoindex server -ci main

Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Build visual document index from PDFs and images with ColPali

Steps

Indexing Flow

Query

Prerequisite

Run

Data Attribution

PDF Documents

Image Documents

About ColPali

CocoInsight

FilesExpand file tree

multi_format_indexing

Directory actions

More options

Directory actions

More options

Latest commit

History

multi_format_indexing

Folders and files

parent directory

README.md

Build visual document index from PDFs and images with ColPali

Steps

Indexing Flow

Query

Prerequisite

Run

Data Attribution

PDF Documents

Image Documents

About ColPali

CocoInsight