Skip to content

Expand to add support for Vespa? #473

@jbaiter

Description

@jbaiter

With its very versatile support for hybrid search (combining "classic" Lucene-like term search with vector search), Vespa is becoming very popular in many contexts. It would be great to be able to use OCR highlighting with it, at least for the term-search.

Vespa supports using CharFilter implementations from Lucene, so at least the indexing side should be a simple matter of writing the appropriate wrappers to expose the functionality to it.

For rendering the responses, Vespa supports custom "Result Renderers", with these it should be simple to add a ocrHighlighting field to the response, assuming the Result object has offset information associated with it. I haven't yet found out how to access this information, but it's definitely available at least internally for the highlighting feature for fields and summaries (called "bolding" in Vespa). Hopefully there's a way to access it from the Renderer implementation.

Looks like it's going to be more complicated: The Java-side of Vespa does not have access to offset information, this is all handled in the C++ backend, and then passed on to the Java side as a text sequence with highlight markers, i.e. the offset information is lost. From what I could gather from the documentation, the Java side of Vespa is the only place where we can add extra functionality, so a straight 1:1 port of the approach used for Solr won't work in Vespa.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions