|
| 1 | +# OpenSearch |
| 2 | + |
| 3 | +LangStream supports using OpenSearch as a vector database. |
| 4 | + |
| 5 | +Learn more about performing vector search with OpenSearch in the [official documentation](https://opensearch.org/docs/latest/search-plugins/knn/index/) |
| 6 | + |
| 7 | +> Only OpenSearch 2.x is officially supported. |
| 8 | +
|
| 9 | +### Connecting to OpenSearch |
| 10 | + |
| 11 | +Create a `vector-database` resource in your configuration.yaml file. |
| 12 | +A single resource is bound to a single index. |
| 13 | + |
| 14 | +```yaml |
| 15 | +resources: |
| 16 | + - type: "vector-database" |
| 17 | + name: "OpenSearch" |
| 18 | + configuration: |
| 19 | + service: "opensearch" |
| 20 | + username: "${secrets.opensearch.username}" |
| 21 | + password: "${secrets.opensearch.password}" |
| 22 | + host: "${secrets.opensearch.host}" |
| 23 | + port: "${secrets.opensearch.port}" |
| 24 | + index-name: "my-index-000" |
| 25 | +``` |
| 26 | +
|
| 27 | +### Connecting to AWS OpenSearch service |
| 28 | +
|
| 29 | +```yaml |
| 30 | +resources: |
| 31 | + - type: "vector-database" |
| 32 | + name: "OpenSearch" |
| 33 | + configuration: |
| 34 | + service: "opensearch" |
| 35 | + username: "${secrets.opensearch.username}" |
| 36 | + password: "${secrets.opensearch.password}" |
| 37 | + host: "${secrets.opensearch.host}" |
| 38 | + region: "${secrets.opensearch.region}" |
| 39 | + index-name: "my-index-000" |
| 40 | +``` |
| 41 | +
|
| 42 | +- `username` is the AWS Access Key |
| 43 | +- `password` is the AWS Secret Key |
| 44 | +- `host` is the endpoint provided by AWS. e.g. for AWS OpenSearch serverless it looks like this: xxxx.<region>.aoss.amazonaws.com |
| 45 | +- `region` is the AWS region. It has to match with the one used in the endpoint |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +#### Declare an index as asset |
| 50 | + |
| 51 | +To bind the application to the OpenSearch index creation at startup, you must use the `opensearch-index` asset type. |
| 52 | + |
| 53 | +You can configure `settings` and `mappings` as you prefer. Other configuration fields are not supported. |
| 54 | + |
| 55 | +This is an example mixing normal fields with vector fields. The `knn` plugin is required in the target OpenSearch instance. |
| 56 | +```yaml |
| 57 | +- name: "os-index" |
| 58 | + asset-type: "opensearch-index" |
| 59 | + creation-mode: create-if-not-exists |
| 60 | + config: |
| 61 | + datasource: "OpenSearch" |
| 62 | + settings: | |
| 63 | + { |
| 64 | + "index": { |
| 65 | + "knn": true, |
| 66 | + "knn.algo_param.ef_search": 100 |
| 67 | + } |
| 68 | + } |
| 69 | + mappings: | |
| 70 | + { |
| 71 | + "properties": { |
| 72 | + "content": { |
| 73 | + "type": "text" |
| 74 | + }, |
| 75 | + "embeddings": { |
| 76 | + "type": "knn_vector", |
| 77 | + "dimension": 1536 |
| 78 | + } |
| 79 | + } |
| 80 | + } |
| 81 | +``` |
| 82 | + |
| 83 | +Refer to the [settings](https://opensearch.org/docs/latest/im-plugin/index-settings/) documentation for the `settings` field. |
| 84 | +Refer to the [mappings](https://opensearch.org/docs/latest/field-types/index/) documentation for the `mappings` field. |
| 85 | + |
| 86 | + |
| 87 | +#### Search |
| 88 | + |
| 89 | +Use the `query-vector-db` agent with the following parameters to perform searches on the index created above : |
| 90 | + |
| 91 | +```yaml |
| 92 | + - name: "lookup-related-documents" |
| 93 | + type: "query-vector-db" |
| 94 | + configuration: |
| 95 | + datasource: "OpenSearch" |
| 96 | + query: | |
| 97 | + { |
| 98 | + "size": 1, |
| 99 | + "query": { |
| 100 | + "knn": { |
| 101 | + "embeddings": { |
| 102 | + "vector": ?, |
| 103 | + "k": 1 |
| 104 | + } |
| 105 | + } |
| 106 | + } |
| 107 | + } |
| 108 | + fields: |
| 109 | + - "value.question_embeddings" |
| 110 | + output-field: "value.related_documents" |
| 111 | +``` |
| 112 | + |
| 113 | +You can use the '?' symbol as a placeholder for the fields. |
| 114 | + |
| 115 | +The `query` is the body sent to OpenSearch. Refer to the [documentation](https://opensearch.org/docs/latest/query-dsl/index/) to learn which parameters are supported. |
| 116 | +Note that the query will be executed on the configured index. Multi-index queries are not supported, but you can declare multiple datasources and query different indexes in the same application. |
| 117 | + |
| 118 | +The `output-field` will contain the query result. |
| 119 | +The result is an array with the following elements: |
| 120 | +- `id`: the document ID |
| 121 | +- `document`: the document source |
| 122 | +- `score`: the document score |
| 123 | +- `index`: the index name |
| 124 | + |
| 125 | +For example, if you want to keep only one relevant field from the first result, use the `compute` agent after the search: |
| 126 | + |
| 127 | +```yaml |
| 128 | + - name: "lookup-related-documents" |
| 129 | + type: "query-vector-db" |
| 130 | + configuration: |
| 131 | + datasource: "OpenSearch" |
| 132 | + query: | |
| 133 | + { |
| 134 | + "size": 1, |
| 135 | + "query": { |
| 136 | + "match_all": {} |
| 137 | + } |
| 138 | + } |
| 139 | + output-field: "value.related_documents" |
| 140 | + only-first: true |
| 141 | + - name: "Format response" |
| 142 | + type: compute |
| 143 | + configuration: |
| 144 | + fields: |
| 145 | + - name: "value" |
| 146 | + type: STRING |
| 147 | + expression: "value.related_documents.document.content" |
| 148 | +``` |
| 149 | + |
| 150 | + |
| 151 | +### Indexing |
| 152 | + |
| 153 | +Use the `vector-db-sink` agent to index data, with the following parameters: |
| 154 | + |
| 155 | +```yaml |
| 156 | + - name: "Write to Solr" |
| 157 | + type: "vector-db-sink" |
| 158 | + input: chunks-topic |
| 159 | + configuration: |
| 160 | + datasource: "OpenSearch" |
| 161 | + bulk-parameters: |
| 162 | + timeout: 2m |
| 163 | + fields: |
| 164 | + - name: "id" |
| 165 | + expression: "fn:concat(value.filename, value.chunk_id)" |
| 166 | + - name: "embeddings" |
| 167 | + expression: "fn:toListOfFloat(value.embeddings_vector)" |
| 168 | + - name: "text" |
| 169 | + expression: "value.text" |
| 170 | +``` |
| 171 | + |
| 172 | + |
| 173 | +All indexing is performed using the Bulk operation. |
| 174 | +You can customize the [bulk parameters](https://opensearch.org/docs/latest/api-reference/document-apis/bulk/#url-parameters) with the `bulk-parameters` property. |
| 175 | + |
| 176 | +The request will be flushed depending on `flush-interval` and `batch-size` parameters. |
| 177 | + |
| 178 | +### Configuration |
| 179 | + |
| 180 | +Check out the full configuration properties in the [API Reference page](../../building-applications/api-reference/resources.md#datasource_opensearch). |
0 commit comments