Skip to content

Commit 39d5d5d

Browse files
darnautovqn895
andauthored
[9.1] [AI Infra] Fix Observability AI assistant product docs missing multilingual support (#224274) (#226368)
# Backport This will backport the following commits from `main` to `9.1`: - [[AI Infra] Fix Observability AI assistant product docs missing multilingual support (#224274)](#224274) <!--- Backport version: 10.0.1 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Quynh Nguyen (Quinn)","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-07-03T09:53:13Z","message":"[AI Infra] Fix Observability AI assistant product docs missing multilingual support (#224274)\n\n## Summary\n\nThis PR fixes #222176, and\nrewired the productDoc installation process to accept an `inferenceId`\nargument to the productDocBase installation API. It:\n\n- Allows for concurrent installation of the product docs with two\ndifferent models: The default ELSER and the multilingual E5. Kibana will\nonly install the one the user needs, but has capacity for other models\nif user needs both of them (i.e. ELSER for Security AI Assistant and\nmultilingual E5 for Observability AI Assistant).\n \n- Modifies the script that generates the artifacts to also allows\ninferenceId to be passed in.\n\n```\nnode scripts/build_product_doc_artifacts.js --product-name=security --stack-version=8.18 --inference-id=.multilingual-e5-small-elasticsearch\n```\n\n- In parallel with this PR, deploys the public multilingual product doc\nartifacts for 8.18\n- It modifies the installation logic to append the inferenceId's to the\ntarget index name's (to distinguish it from the ELSER default) and\ndefine the mapping of the target index to use the E5's model_settings\n- Surfaces up error if there's an error with the installation \n\n\t\nFor example, if there's no corresponding artifact available, or if the\nartifact fails to fetch. Before:\n\n\n![image](https://github.com/user-attachments/assets/ac9fe8db-a34e-4e67-8471-56e8f4520fdd)\n\n\n\t\n\tAfter, it will prompt the user: \n\n\n![image](https://github.com/user-attachments/assets/ff18c6bd-0c20-4227-aac7-913a3032a31f)\n\n\n## Note for Reviewers:\n\n- **kibana-core**: Saved object 'product-doc-install-status' was updated\nto add a new field `inference_id`\n\n- **Security Gen AI**: With the newly required inferenceId parameter to\nthe installation endpoints, by default it will use ELSER\n'.elser-2-elasticsearch'\n\n- **Observability AI Assistant**: There are 2 at least todos remaining:\n1) Make to sure pass the inferenceId to the retrieveDocumentation so\nthat it reroutes to the right index\nhttps://github.com//pull/224274/files#diff-e393e350cf2449f8b756cad947fc8a902fddf6e6b30f1363750d469fc7d81b61R74\n2) Handle the change in inference model selection for Product Doc. Here\nthis is triggering an update to the product doc installation when user\nclicks Update model:\nhttps://github.com//pull/224274/files#diff-7d84fc1bf3106fe3b0cb357c800faefc1b96b853beeb74711f1c3c623ae901b9R151\n\n- ### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [x] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [x]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [x] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>\nCo-authored-by: Dima Arnautov <[email protected]>\nCo-authored-by: Elastic Machine <[email protected]>","sha":"4ff731dcccaf9f60c5ef49ae014ed96800bacbe7","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug",":ml","release_note:skip","Team:Obs AI Assistant","ci:project-deploy-observability","backport:version","v9.1.0","v8.19.0","v9.2.0"],"title":"[AI Infra] Fix Observability AI assistant product docs missing multilingual support","number":224274,"url":"https://github.com/elastic/kibana/pull/224274","mergeCommit":{"message":"[AI Infra] Fix Observability AI assistant product docs missing multilingual support (#224274)\n\n## Summary\n\nThis PR fixes #222176, and\nrewired the productDoc installation process to accept an `inferenceId`\nargument to the productDocBase installation API. It:\n\n- Allows for concurrent installation of the product docs with two\ndifferent models: The default ELSER and the multilingual E5. Kibana will\nonly install the one the user needs, but has capacity for other models\nif user needs both of them (i.e. ELSER for Security AI Assistant and\nmultilingual E5 for Observability AI Assistant).\n \n- Modifies the script that generates the artifacts to also allows\ninferenceId to be passed in.\n\n```\nnode scripts/build_product_doc_artifacts.js --product-name=security --stack-version=8.18 --inference-id=.multilingual-e5-small-elasticsearch\n```\n\n- In parallel with this PR, deploys the public multilingual product doc\nartifacts for 8.18\n- It modifies the installation logic to append the inferenceId's to the\ntarget index name's (to distinguish it from the ELSER default) and\ndefine the mapping of the target index to use the E5's model_settings\n- Surfaces up error if there's an error with the installation \n\n\t\nFor example, if there's no corresponding artifact available, or if the\nartifact fails to fetch. Before:\n\n\n![image](https://github.com/user-attachments/assets/ac9fe8db-a34e-4e67-8471-56e8f4520fdd)\n\n\n\t\n\tAfter, it will prompt the user: \n\n\n![image](https://github.com/user-attachments/assets/ff18c6bd-0c20-4227-aac7-913a3032a31f)\n\n\n## Note for Reviewers:\n\n- **kibana-core**: Saved object 'product-doc-install-status' was updated\nto add a new field `inference_id`\n\n- **Security Gen AI**: With the newly required inferenceId parameter to\nthe installation endpoints, by default it will use ELSER\n'.elser-2-elasticsearch'\n\n- **Observability AI Assistant**: There are 2 at least todos remaining:\n1) Make to sure pass the inferenceId to the retrieveDocumentation so\nthat it reroutes to the right index\nhttps://github.com//pull/224274/files#diff-e393e350cf2449f8b756cad947fc8a902fddf6e6b30f1363750d469fc7d81b61R74\n2) Handle the change in inference model selection for Product Doc. Here\nthis is triggering an update to the product doc installation when user\nclicks Update model:\nhttps://github.com//pull/224274/files#diff-7d84fc1bf3106fe3b0cb357c800faefc1b96b853beeb74711f1c3c623ae901b9R151\n\n- ### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [x] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [x]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [x] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>\nCo-authored-by: Dima Arnautov <[email protected]>\nCo-authored-by: Elastic Machine <[email protected]>","sha":"4ff731dcccaf9f60c5ef49ae014ed96800bacbe7"}},"sourceBranch":"main","suggestedTargetBranches":["9.1","8.19"],"targetPullRequestStates":[{"branch":"9.1","label":"v9.1.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/224274","number":224274,"mergeCommit":{"message":"[AI Infra] Fix Observability AI assistant product docs missing multilingual support (#224274)\n\n## Summary\n\nThis PR fixes #222176, and\nrewired the productDoc installation process to accept an `inferenceId`\nargument to the productDocBase installation API. It:\n\n- Allows for concurrent installation of the product docs with two\ndifferent models: The default ELSER and the multilingual E5. Kibana will\nonly install the one the user needs, but has capacity for other models\nif user needs both of them (i.e. ELSER for Security AI Assistant and\nmultilingual E5 for Observability AI Assistant).\n \n- Modifies the script that generates the artifacts to also allows\ninferenceId to be passed in.\n\n```\nnode scripts/build_product_doc_artifacts.js --product-name=security --stack-version=8.18 --inference-id=.multilingual-e5-small-elasticsearch\n```\n\n- In parallel with this PR, deploys the public multilingual product doc\nartifacts for 8.18\n- It modifies the installation logic to append the inferenceId's to the\ntarget index name's (to distinguish it from the ELSER default) and\ndefine the mapping of the target index to use the E5's model_settings\n- Surfaces up error if there's an error with the installation \n\n\t\nFor example, if there's no corresponding artifact available, or if the\nartifact fails to fetch. Before:\n\n\n![image](https://github.com/user-attachments/assets/ac9fe8db-a34e-4e67-8471-56e8f4520fdd)\n\n\n\t\n\tAfter, it will prompt the user: \n\n\n![image](https://github.com/user-attachments/assets/ff18c6bd-0c20-4227-aac7-913a3032a31f)\n\n\n## Note for Reviewers:\n\n- **kibana-core**: Saved object 'product-doc-install-status' was updated\nto add a new field `inference_id`\n\n- **Security Gen AI**: With the newly required inferenceId parameter to\nthe installation endpoints, by default it will use ELSER\n'.elser-2-elasticsearch'\n\n- **Observability AI Assistant**: There are 2 at least todos remaining:\n1) Make to sure pass the inferenceId to the retrieveDocumentation so\nthat it reroutes to the right index\nhttps://github.com//pull/224274/files#diff-e393e350cf2449f8b756cad947fc8a902fddf6e6b30f1363750d469fc7d81b61R74\n2) Handle the change in inference model selection for Product Doc. Here\nthis is triggering an update to the product doc installation when user\nclicks Update model:\nhttps://github.com//pull/224274/files#diff-7d84fc1bf3106fe3b0cb357c800faefc1b96b853beeb74711f1c3c623ae901b9R151\n\n- ### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [x] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [x]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [x] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>\nCo-authored-by: Dima Arnautov <[email protected]>\nCo-authored-by: Elastic Machine <[email protected]>","sha":"4ff731dcccaf9f60c5ef49ae014ed96800bacbe7"}}]}] BACKPORT--> Co-authored-by: Quynh Nguyen (Quinn) <[email protected]>
1 parent 2119344 commit 39d5d5d

File tree

85 files changed

+1121
-310
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+1121
-310
lines changed

packages/kbn-check-mappings-update-cli/current_fields.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -915,6 +915,7 @@
915915
],
916916
"product-doc-install-status": [
917917
"index_name",
918+
"inference_id",
918919
"installation_status",
919920
"last_installation_date",
920921
"product_name",

packages/kbn-check-mappings-update-cli/current_mappings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3042,6 +3042,9 @@
30423042
"index_name": {
30433043
"type": "keyword"
30443044
},
3045+
"inference_id": {
3046+
"type": "keyword"
3047+
},
30453048
"installation_status": {
30463049
"type": "keyword"
30473050
},

src/core/server/integration_tests/ci_checks/saved_objects/check_registered_types.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ describe('checking migration metadata changes on all registered SO types', () =>
154154
"osquery-saved-query": "a8ef11610473e3d1b51a8fdacb2799d8a610818e",
155155
"policy-settings-protection-updates-note": "c05c4c33a5e5bd1fa153991f300d040ac5d6f38d",
156156
"privilege-monitoring-status": "4daec76df427409bcd64250f5c23f5ab86c8bac3",
157-
"product-doc-install-status": "ee7817c45bf1c41830290c8ef535e726c86f7c19",
157+
"product-doc-install-status": "f94e3e5ad2cc933df918f2cd159044c626e01011",
158158
"query": "1966ccce8e9853018111fb8a1dee500228731d9e",
159159
"risk-engine-configuration": "533a0a3f2dbef1c95129146ec4d5714de305be1a",
160160
"rules-settings": "53f94e5ce61f5e75d55ab8adbc1fb3d0937d2e0b",

x-pack/packages/ai-infra/product-doc-artifact-builder/README.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,23 @@ Script to build the knowledge base artifacts.
88
node scripts/build_product_doc_artifacts.js --stack-version {version} --product-name {product}
99
```
1010

11+
Example:
12+
13+
```
14+
node scripts/build_product_doc_artifacts.js --product-name=security --stack-version=8.18 --inference-id=.multilingual-e5-small-elasticsearch
15+
```
16+
1117
### parameters
1218

13-
#### `stack-version`:
19+
#### `stack-version`:
1420

1521
the stack version to generate the artifacts for.
1622

17-
#### `product-name`:
23+
#### `product-name`:
1824

1925
(multi-value) the list of products to generate artifacts for.
2026

21-
possible values:
27+
possible values:
2228
- "kibana"
2329
- "elasticsearch"
2430
- "observability"
@@ -34,6 +40,11 @@ Defaults to `{REPO_ROOT}/build-kb-artifacts`.
3440

3541
The folder to use for temporary files.
3642

43+
#### inference-id:
44+
45+
The inference endpoint to use to generate the embeddings. If the inference ID provided and is not the ELSER default, the artifacts will be generated with `{artifactName}--{inference-id}.zip`. Note the double dash before inference-id.
46+
47+
3748
Defaults to `{REPO_ROOT}/build/temp-kb-artifacts`
3849

3950
#### Cluster infos
@@ -46,4 +57,7 @@ Defaults to `{REPO_ROOT}/build/temp-kb-artifacts`
4657
- params for the embedding cluster:
4758
`embeddingClusterUrl` / env.KIBANA_EMBEDDING_CLUSTER_URL
4859
`embeddingClusterUsername` / env.KIBANA_EMBEDDING_CLUSTER_USERNAME
49-
`embeddingClusterPassword` / env.KIBANA_EMBEDDING_CLUSTER_PASSWORD
60+
`embeddingClusterPassword` / env.KIBANA_EMBEDDING_CLUSTER_PASSWORD
61+
62+
- params for the inference endpoint:
63+
`inferenceId`

x-pack/packages/ai-infra/product-doc-artifact-builder/src/artifact/mappings.ts

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,30 +6,31 @@
66
*/
77

88
import type { MappingTypeMapping } from '@elastic/elasticsearch/lib/api/types';
9+
import {
10+
DEFAULT_ELSER,
11+
getSemanticTextMapping,
12+
type SemanticTextMapping,
13+
} from '../tasks/create_index';
914

10-
export const getArtifactMappings = (inferenceEndpoint: string): MappingTypeMapping => {
15+
export const getArtifactMappings = (
16+
customSemanticTextMapping?: SemanticTextMapping
17+
): MappingTypeMapping => {
18+
const semanticTextMapping = customSemanticTextMapping
19+
? customSemanticTextMapping
20+
: getSemanticTextMapping(DEFAULT_ELSER);
1121
return {
1222
dynamic: 'strict',
1323
properties: {
1424
content_title: { type: 'text' },
15-
content_body: {
16-
type: 'semantic_text',
17-
inference_id: inferenceEndpoint,
18-
},
25+
content_body: semanticTextMapping,
1926
product_name: { type: 'keyword' },
2027
root_type: { type: 'keyword' },
2128
slug: { type: 'keyword' },
2229
url: { type: 'keyword' },
2330
version: { type: 'version' },
2431
ai_subtitle: { type: 'text' },
25-
ai_summary: {
26-
type: 'semantic_text',
27-
inference_id: inferenceEndpoint,
28-
},
29-
ai_questions_answered: {
30-
type: 'semantic_text',
31-
inference_id: inferenceEndpoint,
32-
},
32+
ai_summary: semanticTextMapping,
33+
ai_questions_answered: semanticTextMapping,
3334
ai_tags: { type: 'keyword' },
3435
},
3536
};

x-pack/packages/ai-infra/product-doc-artifact-builder/src/build_artifacts.ts

Lines changed: 43 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,14 @@
77

88
import Path from 'path';
99
import { Client, HttpConnection } from '@elastic/elasticsearch';
10+
import {
11+
Client as ElasticsearchClient8,
12+
HttpConnection as Elasticsearch8HttpConnection,
13+
} from 'elasticsearch-8.x';
14+
1015
import { ToolingLog } from '@kbn/tooling-log';
1116
import type { ProductName } from '@kbn/product-doc-common';
17+
import { defaultInferenceEndpoints } from '@kbn/inference-common';
1218
import {
1319
// checkConnectivity,
1420
createTargetIndex,
@@ -21,17 +27,18 @@ import {
2127
processDocuments,
2228
} from './tasks';
2329
import type { TaskConfig } from './types';
30+
import { getSemanticTextMapping } from './tasks/create_index';
2431

2532
const getSourceClient = (config: TaskConfig) => {
26-
return new Client({
33+
return new ElasticsearchClient8({
2734
compression: true,
2835
nodes: [config.sourceClusterUrl],
2936
sniffOnStart: false,
3037
auth: {
3138
username: config.sourceClusterUsername,
3239
password: config.sourceClusterPassword,
3340
},
34-
Connection: HttpConnection,
41+
Connection: Elasticsearch8HttpConnection,
3542
requestTimeout: 30_000,
3643
});
3744
};
@@ -79,6 +86,7 @@ export const buildArtifacts = async (config: TaskConfig) => {
7986
sourceClient,
8087
embeddingClient,
8188
log,
89+
inferenceId: config.inferenceId ?? defaultInferenceEndpoints.ELSER,
8290
});
8391
}
8492

@@ -93,18 +101,41 @@ const buildArtifact = async ({
93101
embeddingClient,
94102
sourceClient,
95103
log,
104+
inferenceId,
96105
}: {
97106
productName: ProductName;
98107
stackVersion: string;
99108
buildFolder: string;
100109
targetFolder: string;
101-
sourceClient: Client;
110+
sourceClient: ElasticsearchClient8;
102111
embeddingClient: Client;
103112
log: ToolingLog;
113+
inferenceId: string;
104114
}) => {
105-
log.info(`Starting building artifact for product [${productName}] and version [${stackVersion}]`);
115+
log.info(
116+
`Starting building artifact for product [${productName}] and version [${stackVersion}] with inference id [${inferenceId}]`
117+
);
106118

107-
const targetIndex = getTargetIndexName({ productName, stackVersion });
119+
const semanticTextMapping = getSemanticTextMapping(inferenceId);
120+
121+
log.info(
122+
`Detected semantic text mapping for Inference ID ${inferenceId}:\n ${JSON.stringify(
123+
semanticTextMapping,
124+
null,
125+
2
126+
)}`
127+
);
128+
129+
const targetIndex = getTargetIndexName({
130+
productName,
131+
stackVersion,
132+
inferenceId: semanticTextMapping?.inference_id,
133+
});
134+
await deleteIndex({
135+
indexName: targetIndex,
136+
client: embeddingClient,
137+
log,
138+
});
108139

109140
let documents = await extractDocumentation({
110141
client: sourceClient,
@@ -119,6 +150,7 @@ const buildArtifact = async ({
119150
await createTargetIndex({
120151
client: embeddingClient,
121152
indexName: targetIndex,
153+
semanticTextMapping,
122154
});
123155

124156
await indexDocuments({
@@ -142,12 +174,7 @@ const buildArtifact = async ({
142174
productName,
143175
stackVersion,
144176
log,
145-
});
146-
147-
await deleteIndex({
148-
indexName: targetIndex,
149-
client: embeddingClient,
150-
log,
177+
semanticTextMapping,
151178
});
152179

153180
log.info(`Finished building artifact for product [${productName}] and version [${stackVersion}]`);
@@ -156,9 +183,13 @@ const buildArtifact = async ({
156183
const getTargetIndexName = ({
157184
productName,
158185
stackVersion,
186+
inferenceId,
159187
}: {
160188
productName: string;
161189
stackVersion: string;
190+
inferenceId?: string;
162191
}) => {
163-
return `kb-artifact-builder-${productName}-${stackVersion}`.toLowerCase();
192+
return `kb-artifact-builder-${productName}-${stackVersion}${
193+
inferenceId ? `-${inferenceId}` : ''
194+
}`.toLowerCase();
164195
};

x-pack/packages/ai-infra/product-doc-artifact-builder/src/command.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ function options(y: yargs.Argv) {
7171
demandOption: true,
7272
default: process.env.KIBANA_EMBEDDING_CLUSTER_PASSWORD,
7373
})
74+
.option('inferenceId', {
75+
describe: 'The inference id to use for the artifacts',
76+
string: true,
77+
})
7478
.locale('en');
7579
}
7680

@@ -89,6 +93,7 @@ export function runScript() {
8993
embeddingClusterUrl: argv.embeddingClusterUrl!,
9094
embeddingClusterUsername: argv.embeddingClusterUsername!,
9195
embeddingClusterPassword: argv.embeddingClusterPassword!,
96+
inferenceId: argv.inferenceId,
9297
};
9398

9499
return buildArtifacts(taskConfig);

x-pack/packages/ai-infra/product-doc-artifact-builder/src/tasks/create_artifact.ts

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,28 +15,32 @@ import {
1515
} from '@kbn/product-doc-common';
1616
import { getArtifactMappings } from '../artifact/mappings';
1717
import { getArtifactManifest } from '../artifact/manifest';
18-
import { DEFAULT_ELSER } from './create_index';
18+
import { DEFAULT_ELSER, SemanticTextMapping } from './create_index';
1919

2020
export const createArtifact = async ({
2121
productName,
2222
stackVersion,
2323
buildFolder,
2424
targetFolder,
2525
log,
26+
semanticTextMapping,
2627
}: {
2728
buildFolder: string;
2829
targetFolder: string;
2930
productName: ProductName;
3031
stackVersion: string;
3132
log: ToolingLog;
33+
semanticTextMapping?: SemanticTextMapping;
3234
}) => {
3335
log.info(
3436
`Starting to create artifact from build folder [${buildFolder}] into target [${targetFolder}]`
3537
);
3638

3739
const zip = new AdmZip();
3840

39-
const mappings = getArtifactMappings(DEFAULT_ELSER);
41+
const inferenceId = semanticTextMapping?.inference_id || DEFAULT_ELSER;
42+
43+
const mappings = getArtifactMappings(semanticTextMapping);
4044
const mappingFileContent = JSON.stringify(mappings, undefined, 2);
4145
zip.addFile('mappings.json', Buffer.from(mappingFileContent, 'utf-8'));
4246

@@ -53,6 +57,7 @@ export const createArtifact = async ({
5357
const artifactName = getArtifactName({
5458
productName,
5559
productVersion: stackVersion,
60+
inferenceId,
5661
});
5762
zip.writeZip(Path.join(targetFolder, artifactName));
5863

x-pack/packages/ai-infra/product-doc-artifact-builder/src/tasks/create_index.ts

Lines changed: 46 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,43 +6,66 @@
66
*/
77

88
import type { Client } from '@elastic/elasticsearch';
9-
import type { MappingTypeMapping } from '@elastic/elasticsearch/lib/api/types';
9+
import { getArtifactMappings } from '../artifact/mappings';
1010

1111
export const DEFAULT_ELSER = '.elser-2-elasticsearch';
12+
export const DEFAULT_E5_SMALL = '.multilingual-e5-small-elasticsearch';
1213

13-
const mappings: MappingTypeMapping = {
14-
dynamic: 'strict',
15-
properties: {
16-
content_title: { type: 'text' },
17-
content_body: {
18-
type: 'semantic_text',
19-
inference_id: DEFAULT_ELSER,
20-
},
21-
product_name: { type: 'keyword' },
22-
root_type: { type: 'keyword' },
23-
slug: { type: 'keyword' },
24-
url: { type: 'keyword' },
25-
version: { type: 'version' },
26-
ai_subtitle: { type: 'text' },
27-
ai_summary: {
28-
type: 'semantic_text',
29-
inference_id: DEFAULT_ELSER,
30-
},
31-
ai_questions_answered: {
32-
type: 'semantic_text',
33-
inference_id: DEFAULT_ELSER,
14+
interface BaseSemanticTextMapping {
15+
type: 'semantic_text';
16+
inference_id: string;
17+
}
18+
export interface SemanticTextMapping extends BaseSemanticTextMapping {
19+
model_settings?: {
20+
service?: string;
21+
task_type?: string;
22+
dimensions?: number;
23+
similarity?: string;
24+
element_type?: string;
25+
};
26+
}
27+
28+
type SupportedInferenceId = typeof DEFAULT_E5_SMALL | typeof DEFAULT_ELSER;
29+
const isSupportedInferenceId = (inferenceId: string): inferenceId is SupportedInferenceId => {
30+
return inferenceId === DEFAULT_E5_SMALL || inferenceId === DEFAULT_ELSER;
31+
};
32+
33+
const INFERENCE_ID_TO_SEMANTIC_TEXT_MAPPING: Record<SupportedInferenceId, SemanticTextMapping> = {
34+
[DEFAULT_E5_SMALL]: {
35+
type: 'semantic_text',
36+
inference_id: DEFAULT_E5_SMALL,
37+
model_settings: {
38+
service: 'elasticsearch',
39+
task_type: 'text_embedding',
40+
dimensions: 384,
41+
similarity: 'cosine',
42+
element_type: 'float',
3443
},
35-
ai_tags: { type: 'keyword' },
3644
},
45+
[DEFAULT_ELSER]: {
46+
type: 'semantic_text',
47+
inference_id: DEFAULT_ELSER,
48+
},
49+
};
50+
export const getSemanticTextMapping = (inferenceId: string): SemanticTextMapping => {
51+
if (isSupportedInferenceId(inferenceId)) {
52+
return INFERENCE_ID_TO_SEMANTIC_TEXT_MAPPING[inferenceId];
53+
}
54+
throw new Error(`Semantic text mapping for Inference ID ${inferenceId} not found`);
3755
};
3856

3957
export const createTargetIndex = async ({
4058
indexName,
4159
client,
60+
semanticTextMapping,
4261
}: {
4362
indexName: string;
4463
client: Client;
64+
semanticTextMapping?: SemanticTextMapping;
4565
}) => {
66+
const mappings = semanticTextMapping
67+
? getArtifactMappings(semanticTextMapping)
68+
: getArtifactMappings(getSemanticTextMapping(DEFAULT_ELSER));
4669
await client.indices.create({
4770
index: indexName,
4871
mappings,

0 commit comments

Comments
 (0)