Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ include::service-elser.asciidoc[]
include::service-google-ai-studio.asciidoc[]
include::service-google-vertex-ai.asciidoc[]
include::service-hugging-face.asciidoc[]
include::service-jinaai.asciidoc[]
include::service-mistral.asciidoc[]
include::service-openai.asciidoc[]
include::service-watsonx-ai.asciidoc[]
3 changes: 2 additions & 1 deletion docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Click the links to review the configuration details of the services:
* <<infer-service-mistral,Mistral>> (`text_embedding`)
* <<infer-service-openai,OpenAI>> (`completion`, `text_embedding`)
* <<infer-service-watsonx-ai>> (`text_embedding`)
* <<infer-service-jinaai,JinaAI>> (`text_embedding`, `rerank`)

The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of
the services connect to external providers.
Expand All @@ -87,4 +88,4 @@ When adaptive allocations are enabled:
- The number of allocations scales up automatically when the load increases.
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.

For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
255 changes: 255 additions & 0 deletions docs/reference/inference/service-jinaai.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
[[infer-service-jinaai]]
=== JinaAI {infer} service

Creates an {infer} endpoint to perform an {infer} task with the `jinaai` service.


[discrete]
[[infer-service-jinaai-api-request]]
==== {api-request-title}

`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-jinaai-api-path-params]]
==== {api-path-parms-title}

`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `text_embedding`,
* `rerank`.
--

[discrete]
[[infer-service-jinaai-api-request-body]]
==== {api-request-body-title}

`chunking_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=chunking-settings]

`max_chunking_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]

`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]

`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]

`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`jinaai`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `jinaai` service.
--

`api_key`:::
(Required, string)
A valid API key for your JinaAI account.
You can find it at https://jina.ai/embeddings/.
+
--
include::inference-shared.asciidoc[tag=api-key-admonition]
--

`rate_limit`:::
(Optional, object)
The default rate limit for the `jinaai` service is 2000 requests per minute for all task types.
You can modify this using the `requests_per_minute` setting in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]

More information about JinaAI's rate limits can be found in https://jina.ai/contact-sales/#rate-limit.
--
+
.`service_settings` for the `rerank` task type
[%collapsible%closed]
=====
`model_id`::
(Required, string)
The name of the model to use for the {infer} task.
To review the available `rerank` compatible models, refer to https://jina.ai/reranker.
=====
+
.`service_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`model_id`:::
(Optional, string)
The name of the model to use for the {infer} task.
To review the available `text_embedding` models, refer to the
https://jina.ai/embeddings/.

`similarity`:::
(Optional, string)
Similarity measure. One of `cosine`, `dot_product`, `l2_norm`.
Defaults based on the `embedding_type` (`float` -> `dot_product`, `int8/byte` -> `cosine`).
=====



`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `rerank` task type
[%collapsible%closed]
=====
`return_documents`::
(Optional, boolean)
Specify whether to return doc text within the results.

`top_n`::
(Optional, integer)
The number of most relevant documents to return, defaults to the number of the documents.
If this {infer} endpoint is used in a `text_similarity_reranker` retriever query and `top_n` is set, it must be greater than or equal to `rank_window_size` in the query.
=====
+
.`task_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`task`:::
(Optional, string)
Specifies the task passed to the model.
Valid values are:
* `classification`: use it for embeddings passed through a text classifier.
* `clustering`: use it for the embeddings run through a clustering algorithm.
* `ingest`: use it for storing document embeddings in a vector database.
* `search`: use it for storing embeddings of search queries run against a vector database to find relevant documents.
=====


[discrete]
[[inference-example-jinaai]]
==== JinaAI service examples

The following examples demonstrate how to create {infer} endpoints for `text_embeddings` and `rerank` tasks using the JinaAI service and use them in search requests.

First, we create the `embeddings` service:

[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/jinaai-embeddings
{
"service": "jinaai",
"service_settings": {
"model_id": "jina-embeddings-v3",
"api_key": "<api_key>"
}
}
------------------------------------------------------------
// TEST[skip:uses ML]

Then, we create the `rerank` service:
[source,console]
------------------------------------------------------------
PUT _inference/rerank/jinaai-rerank
{
"service": "jinaai",
"service_settings": {
"api_key": "<api_key>",
"model_id": "jina-reranker-v2-base-multilingual"
},
"task_settings": {
"top_n": 10,
"return_documents": true
}
}
------------------------------------------------------------
// TEST[skip:uses ML]

Now we can create an index that will use `jinaai-embeddings` service to index the documents.

[source,console]
------------------------------------------------------------
PUT jinaai-index
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": "jinaai-embeddings"
}
}
}
}
------------------------------------------------------------
// TEST[skip:uses ML]

[source,console]
------------------------------------------------------------
PUT jinaai-index/_bulk
{ "index" : { "_index" : "jinaai-index", "_id" : "1" } }
{"content": "Sarah Johnson is a talented marine biologist working at the Oceanographic Institute. Her groundbreaking research on coral reef ecosystems has garnered international attention and numerous accolades."}
{ "index" : { "_index" : "jinaai-index", "_id" : "2" } }
{"content": "She spends months at a time diving in remote locations, meticulously documenting the intricate relationships between various marine species. "}
{ "index" : { "_index" : "jinaai-index", "_id" : "3" } }
{"content": "Her dedication to preserving these delicate underwater environments has inspired a new generation of conservationists."}
------------------------------------------------------------
// TEST[skip:uses ML]

Now, with the index created, we can search with and without the reranker service.

[source,console]
------------------------------------------------------------
GET jinaai-index/_search
{
"query": {
"semantic": {
"field": "content",
"query": "who inspired taking care of the sea?"
}
}
}
------------------------------------------------------------
// TEST[skip:uses ML]

[source,console]
------------------------------------------------------------
POST jinaai-index/_search
{
"retriever": {
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"semantic": {
"field": "content",
"query": "who inspired taking care of the sea?"
}
}
}
},
"field": "content",
"rank_window_size": 100,
"inference_id": "jinaai-rerank",
"inference_text": "who inspired taking care of the sea?"
}
}
}
------------------------------------------------------------
// TEST[skip:uses ML]