Skip to content
This repository was archived by the owner on Aug 16, 2022. It is now read-only.

Commit 5ae56e1

Browse files
committed
update painless
1 parent 9bb0093 commit 5ae56e1

File tree

1 file changed

+69
-1
lines changed

1 file changed

+69
-1
lines changed

docs/knn/painless-functions.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,72 @@ has_children: false
77
has_math: true
88
---
99

10-
<TODO>
10+
# Painless Scripting Functions
11+
12+
With the k-NN Plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin has added painless extensions to a few of the distance functions used in [k-NN score script](../knn-score-script) so that you can utilize them when you need more customization with respect to you k-NN workload.
13+
14+
## Get started with k-NN's Painless Scripting Functions
15+
16+
To use k-NN's Painless Scripting functions, first, you still need to create an index with `knn_vector` fields as was done in [k-NN score script](../knn-score-script#Getting_started_with_the_score_script). Once the index is created and you have ingested some data, you can use the painless extensions like so:
17+
18+
```
19+
GET my-knn-index-2/_search
20+
{
21+
"size": 2,
22+
"query": {
23+
"script_score": {
24+
"query": {
25+
"bool": {
26+
"filter": {
27+
"term": {
28+
"color": "BLUE"
29+
}
30+
}
31+
}
32+
},
33+
"script": {
34+
"source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])",
35+
"params": {
36+
"field": "my_vector",
37+
"query_value": [9.9, 9.9],
38+
}
39+
}
40+
}
41+
}
42+
}
43+
```
44+
45+
The `field` needs to map to a `knn_vector` field and the `query_value` needs to be a floating point array with the same dimension as `field`.
46+
47+
## Function Types
48+
The following table contains the available painless functions the k-NN plugin provides:
49+
50+
<table>
51+
<thead style="text-align: left">
52+
<tr>
53+
<th>Function Name</th>
54+
<th>Function Signature</th>
55+
<th>Description</th>
56+
</tr>
57+
</thead>
58+
<tr>
59+
<td>l2Squared</td>
60+
<td>`float l2Squared (float[] queryVector, doc['vector field'])`</td>
61+
<td>This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.</td>
62+
</tr>
63+
<tr>
64+
<td>cosineSimilarity</td>
65+
<td>float cosineSimilarity (float[] queryVector, doc['vector field'])</td>
66+
<td>Cosine similarity is inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass magnitude of query vector optionally to improve the performance instead of calculating the magnitude every time for every filtered document: `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`. In general, range of cosine similarity is [-1, 1], but in case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since tf-idf cannot be negative. Hence, we add 1.0 to the cosine similarity to score always positive. </td>
67+
</tr>
68+
</table>
69+
70+
71+
## Constraints
72+
1. If a document’s knn_vector field has different dimensions than the query, the function throws an IllegalArgumentException.
73+
2. If a vector field doesn't have a value, the function throws an IllegalStateException.
74+
You can avoid this situation by first checking if a document has a value for the field:
75+
```
76+
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
77+
```
78+
Since scores can only be positive, this script ranks documents with vector fields higher than those without.

0 commit comments

Comments
 (0)