Skip to content

Commit 267d796

Browse files
committed
Added ColPali / ColQwen2 example [skip ci]
1 parent 368b363 commit 267d796

File tree

3 files changed

+57
-0
lines changed

3 files changed

+57
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Or check out some examples:
3333
- [Hybrid search](https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search/cross_encoder.py) with SentenceTransformers (cross-encoder)
3434
- [Sparse search](https://github.com/pgvector/pgvector-python/blob/master/examples/sparse_search/example.py) with Transformers
3535
- [Late interaction search](https://github.com/pgvector/pgvector-python/blob/master/examples/colbert/exact.py) with ColBERT
36+
- [Document retrieval](https://github.com/pgvector/pgvector-python/blob/master/examples/colpali/exact.py) with ColPali
3637
- [Image search](https://github.com/pgvector/pgvector-python/blob/master/examples/image_search/example.py) with PyTorch
3738
- [Image search](https://github.com/pgvector/pgvector-python/blob/master/examples/imagehash/example.py) with perceptual hashing
3839
- [Morgan fingerprints](https://github.com/pgvector/pgvector-python/blob/master/examples/rdkit/example.py) with RDKit

examples/colpali/exact.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
from colpali_engine.models import ColQwen2, ColQwen2Processor
2+
from datasets import load_dataset
3+
from pgvector.psycopg import register_vector
4+
import psycopg
5+
import torch
6+
7+
conn = psycopg.connect(dbname='pgvector_example', autocommit=True)
8+
9+
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
10+
register_vector(conn)
11+
12+
conn.execute('DROP TABLE IF EXISTS documents')
13+
conn.execute('CREATE TABLE documents (id bigserial PRIMARY KEY, embeddings vector(128)[])')
14+
conn.execute("""
15+
CREATE OR REPLACE FUNCTION max_sim(document vector[], query vector[]) RETURNS double precision AS $$
16+
WITH queries AS (
17+
SELECT row_number() OVER () AS query_number, * FROM (SELECT unnest(query) AS query)
18+
),
19+
documents AS (
20+
SELECT unnest(document) AS document
21+
),
22+
similarities AS (
23+
SELECT query_number, 1 - (document <=> query) AS similarity FROM queries CROSS JOIN documents
24+
),
25+
max_similarities AS (
26+
SELECT MAX(similarity) AS max_similarity FROM similarities GROUP BY query_number
27+
)
28+
SELECT SUM(max_similarity) FROM max_similarities
29+
$$ LANGUAGE SQL
30+
""")
31+
32+
33+
device = 'mps' if torch.backends.mps.is_available() else 'cpu'
34+
model = ColQwen2.from_pretrained('vidore/colqwen2-v1.0', torch_dtype=torch.bfloat16, device_map=device).eval()
35+
processor = ColQwen2Processor.from_pretrained('vidore/colqwen2-v1.0')
36+
37+
38+
def generate_embeddings(processed):
39+
with torch.no_grad():
40+
return model(**processed.to(model.device)).to(device='cpu', dtype=torch.float32)
41+
42+
43+
input = load_dataset('vidore/docvqa_test_subsampled', split='test[:3]')['image']
44+
for content in input:
45+
embeddings = [e.numpy() for e in generate_embeddings(processor.process_images([content]))[0]]
46+
conn.execute('INSERT INTO documents (embeddings) VALUES (%s)', (embeddings,))
47+
48+
query = 'dividend'
49+
query_embeddings = [e.numpy() for e in generate_embeddings(processor.process_queries([query]))[0]]
50+
result = conn.execute('SELECT id, max_sim(embeddings, %s) AS max_sim FROM documents ORDER BY max_sim DESC LIMIT 5', (query_embeddings,)).fetchall()
51+
for row in result:
52+
print(row)

examples/colpali/requirements.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
colpali-engine
2+
datasets
3+
pgvector
4+
psycopg[binary]

0 commit comments

Comments
 (0)