[Bug]: Vectorizer runs when non-loading column changes

### What happened?

The vectorizer calls the embedding model to re-embed the `loading_column` even though the `loading_column` did not change. I am running the vectorizer worker via python.

Expected behaviour:
The re-embedding or API call to the embedding model should only be done when the `loading_column` was updated, not when other columns were changed. Re-embedding the loading_column when the loading_column has not changed seems to be making redundant calls and using up API key usage.





### pgai extension affected

0.11.1

### pgai library affected

0.12.0

### PostgreSQL version used

17.0

### What operating system did you use?

Ubuntu 24.04 x64

### What installation method did you use?

Docker

### What platform did you run on?

On prem/Self-hosted

### Relevant log output and stack trace

```bash

```

### How can we reproduce the bug?

```bash
To replicate:
- Follow the quick start e.g.

CREATE TABLE blog(
    id        SERIAL PRIMARY KEY,
    title     TEXT,
    authors   TEXT,
    contents  TEXT,
    metadata  JSONB 
);



SELECT ai.create_vectorizer( 
   'blog'::regclass,
   name => 'blog_embeddings',  -- Optional custom name for easier reference
   loading => ai.loading_column('contents'),
   embedding => ai.embedding_ollama('nomic-embed-text', 768),
   destination => ai.destination_table('blog_contents_embeddings')
);


- Change any other column aside from the `contents` column (the vectorizer should not be called to re-embed the `contents` column but it seems like the vectorizer was run and the `contents` was re-embedded even though the `contents` column did not change)

Logs from vectorizer worker

2025-10-07 15:52:22 [debug    ] obtained secret 'OPENAI_API_KEY' from environment
2025-10-07 15:52:22 [info     ] running vectorizer             vectorizer_id=1
2025-10-07 15:52:22 [debug    ] Items pulled from queue: 1    
2025-10-07 15:52:22 [debug    ] Chunks produced: 1            
2025-10-07 15:52:22 [debug    ] Batch 1 has 17.25 tokens in 1 chunks
2025-10-07 15:52:22 [debug    ] Batch 1 of 1                  
2025-10-07 15:52:22 [debug    ] Chunks for this batch: 1      
2025-10-07 15:52:22 [debug    ] Request 1 of 1 initiated      
2025-10-07 15:52:22 [debug    ] Request 1 of 1 ended after: 0.9623326590008219 seconds. Tokens usage: Usage(prompt_tokens=13, total_tokens=13)
2025-10-07 15:52:23 [debug    ] Embedding stats                chunks_per_second=0.8606774421141247 total_chunks=5 total_request_time=5.80937730599544 wall_time=4696.198784884
2025-10-07 15:52:23 [debug    ] Processing stats               chunks_per_second=0.0010542820954175822 chunks_per_second_per_thread=0.7508895275378047 task=138297813986048 total_chunks=5 total_processing_time=6.658769121997466 wall_time=4742.563704469998
2025-10-07 15:52:23 [debug    ] Items pulled from queue: 0    
2025-10-07 15:52:23 [info     ] finished processing vectorizer items=1 vectorizer_id=1
```

### Are you going to work on the bugfix?

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Vectorizer runs when non-loading column changes #878

What happened?

pgai extension affected

pgai library affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

Are you going to work on the bugfix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Vectorizer runs when non-loading column changes #878

Description

What happened?

pgai extension affected

pgai library affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

Are you going to work on the bugfix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions