Skip to content

Commit a4d0557

Browse files
committed
Set default similarity for Cohere model to cosine (elastic#125370)
Cohere embeddings are expected to be normalized to unit vectors, but due to floating point precision issues, our check ({@link DenseVectorFieldMapper#isNotUnitVector(float)}) often fails. This change fixes this bug by setting the default similarity for newly created Cohere inference endpoint to cosine. Closes elastic#122878
1 parent 4201dcc commit a4d0557

File tree

3 files changed

+14
-7
lines changed

3 files changed

+14
-7
lines changed

docs/changelog/125370.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 125370
2+
summary: Set default similarity for Cohere model to cosine
3+
area: Machine Learning
4+
type: bug
5+
issues:
6+
- 122878

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/cohere/CohereService.java

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import org.elasticsearch.common.util.LazyInitializable;
1515
import org.elasticsearch.core.Nullable;
1616
import org.elasticsearch.core.TimeValue;
17+
import org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper;
1718
import org.elasticsearch.inference.ChunkedInference;
1819
import org.elasticsearch.inference.ChunkingSettings;
1920
import org.elasticsearch.inference.InferenceServiceConfiguration;
@@ -335,15 +336,15 @@ public Model updateModelWithEmbeddingDetails(Model model, int embeddingSize) {
335336
}
336337

337338
/**
338-
* Return the default similarity measure for the embedding type.
339-
* Cohere embeddings are normalized to unit vectors therefor Dot
340-
* Product similarity can be used and is the default for all Cohere
341-
* models.
339+
* Returns the default similarity measure for the embedding type.
340+
* Cohere embeddings are expected to be normalized to unit vectors, but due to floating point precision issues,
341+
* our check ({@link DenseVectorFieldMapper#isNotUnitVector(float)}) often fails.
342+
* Therefore, we use cosine similarity to ensure compatibility.
342343
*
343-
* @return The default similarity.
344+
* @return The default similarity measure.
344345
*/
345346
static SimilarityMeasure defaultSimilarity() {
346-
return SimilarityMeasure.DOT_PRODUCT;
347+
return SimilarityMeasure.COSINE;
347348
}
348349

349350
@Override

x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/CohereServiceTests.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1577,7 +1577,7 @@ public void testChunkedInfer_BatchesCalls_Bytes() throws IOException {
15771577
}
15781578

15791579
public void testDefaultSimilarity() {
1580-
assertEquals(SimilarityMeasure.DOT_PRODUCT, CohereService.defaultSimilarity());
1580+
assertEquals(SimilarityMeasure.COSINE, CohereService.defaultSimilarity());
15811581
}
15821582

15831583
public void testInfer_StreamRequest() throws Exception {

0 commit comments

Comments
 (0)