When using image embeddings, some image embeddings may be skipped

This is replicable with the sample data. When running prepdocs, you'll see that several pages aren't represented in the sections uploaded, in that there are no sections with corresponding sourcepage equal to that page number, and thus no sections with an imageEmbedding corresponding to that sourcepage. That means some answers may be lower quality, as they don't find the relevant matching image.

Possible approaches:

* For certain document types, like slides, never chunk sections across pages. This was my original idea but then realized our sample document was a slide exported as a PDF, so I couldn't have a PPT-dependent condition. Thus, this isn't a full solution.
* Never let sections go across pages. This may not work well with many PDFs like research papers that legitimately have sections go across pages.
* Associate multiple sourcepage's with a single section. @mattgotteiner says that's possible by picking a delimeter. Not sure if multiple imageEmbedding's would also be possible? Otherwise we'd have to pick which imageEmbedding we thought was best.
* ...? Your idea here!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When using image embeddings, some image embeddings may be skipped #1675

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When using image embeddings, some image embeddings may be skipped #1675

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions