You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -50,9 +49,9 @@ You need an [IBM Cloud® Databases for Elasticsearch deployment](https://cloud.i
50
49
`api_key`
51
50
: (Required, string) A valid API key of your Watsonx account. You can find your Watsonx API keys or you can create a new one [on the API keys page](https://cloud.ibm.com/iam/apikeys).
52
51
53
-
::::{important}
54
-
You need to provide the API key only once, during the {{infer}} model creation. The [Get {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-get) does not retrieve your API key. After creating the {{infer}} model, you cannot change the associated API key. If you want to use a different API key, delete the {{infer}} model and recreate it with the same name and the updated API key.
55
-
::::
52
+
::::{important}
53
+
You need to provide the API key only once, during the {{infer}} model creation. The [Get {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-get) does not retrieve your API key. After creating the {{infer}} model, you cannot change the associated API key. If you want to use a different API key, delete the {{infer}} model and recreate it with the same name and the updated API key.
54
+
::::
56
55
57
56
58
57
`api_version`
@@ -70,13 +69,28 @@ You need an [IBM Cloud® Databases for Elasticsearch deployment](https://cloud.i
70
69
`rate_limit`
71
70
: (Optional, object) By default, the `watsonxai` service sets the number of requests allowed per minute to `120`. This helps to minimize the number of rate limit errors returned from Watsonx. To modify this, set the `requests_per_minute` setting of this object in your service settings:
72
71
73
-
```text
74
-
"rate_limit": {
75
-
"requests_per_minute": <<number_of_requests>>
76
-
}
77
-
```
72
+
```json
73
+
"rate_limit": {
74
+
"requests_per_minute": <<number_of_requests>>
75
+
}
76
+
```
77
+
78
+
`task_settings`
79
+
: (Optional, object) Settings to configure the inference task.
80
+
81
+
These settings are specific to the `<task_type>` you specified.
82
+
83
+
::::{dropdown} `task_settings` for the `rerank` task type
84
+
`truncate_input_tokens`
85
+
: (Optional, integer) Specifies the maximum number of tokens per input document before truncation.
86
+
87
+
`return_documents`
88
+
: (Optional, boolean) Specify whether to return doc text within the results.
78
89
90
+
`top_n`
91
+
: (Optional, integer) The number of most relevant documents to return. Defaults to the number of input documents.
79
92
93
+
::::
80
94
81
95
## Watsonx AI service example [inference-example-watsonx-ai]
82
96
@@ -101,4 +115,35 @@ PUT _inference/text_embedding/watsonx-embeddings
101
115
3. The ID of your IBM Cloud project.
102
116
4. A valid API version parameter. You can find the active version data parameters [here](https://cloud.ibm.com/apidocs/watsonx-ai#active-version-dates).
103
117
118
+
The following example shows how to create an inference endpoint called `watsonx-rerank` to perform a `rerank` task type.
1. A valid Watsonx API key. You can find on the [API keys page of your account](https://cloud.ibm.com/iam/apikeys).
141
+
2. The {{infer}} endpoint URL you created on Watsonx.
142
+
3. The ID of your IBM Cloud project.
143
+
4. A valid API version parameter. You can find the active version data parameters [here](https://cloud.ibm.com/apidocs/watsonx-ai#active-version-dates).
144
+
5. The maximum number of tokens per document before truncation.
145
+
6. Whether to return the document text in the results.
146
+
7. The number of top relevant documents to return.
0 commit comments