[ML] Evaluate reranking all chunks from docs with the elastic reranker

The purpose of this issue is to implement a basic chunking process for the elastic reranker and evaluate it against the existing truncation process.

The proposed chunking process:
1. Build a chunking strategy
    - "strategy": "sentence",
    - "max_chunk_size": `max(elastic_reranker_max_token_limit - query_token_count, elastic_reranker_max_token_limit / 2) * words_per_token`
    - "sentence_overlap": 0
    - Note: The elastic reranker is optimized for English text and each document must be concatenated with the query.
    - Note: Chunking in ES currently uses word count as the unit for max chunk size. There is also no way to calculate the exact token count for a given query. As such we use a conversion rate of 1 token = ¾ word to convert between the two units.
2. Chunk each document and send all chunks from all documents to the elastic reranker.
3. Parse the elastic reranker response to return a single relevance score for each document corresponding with the highest relevance score for any of its chunks.

The proposed evaluation process (to run with both truncation and chunking):
1. Create an ECH cluster. 
2. Ingest a test data set.
3. Run BM25 retrieval and identify some metrics to calculate it's rerank accuracy.
4. Run retrieval using the text similarity reranker and identify some metrics to calculate it's rerank accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ML] Evaluate reranking all chunks from docs with the elastic reranker #133588

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[ML] Evaluate reranking all chunks from docs with the elastic reranker #133588

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions