You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-complex-data-types.md
+72-60Lines changed: 72 additions & 60 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,12 +11,12 @@ ms.custom:
11
11
- ignite-2023
12
12
ms.service: azure-ai-search
13
13
ms.topic: how-to
14
-
ms.date: 01/18/2024
14
+
ms.date: 10/14/2024
15
15
---
16
16
17
17
# Model complex data types in Azure AI Search
18
18
19
-
External datasets used to populate an Azure AI Search index can come in many shapes. Sometimes they include hierarchical or nested substructures. Examples might include multiple addresses for a single customer, multiple colors and sizes for a single SKU, multiple authors of a single book, and so on. In modeling terms, you might see these structures referred to as *complex*, *compound*, *composite*, or *aggregate* data types. The term Azure AI Search uses for this concept is **complex type**. In Azure AI Search, complex types are modeled using **complex fields**. A complex field is a field that contains children (subfields) which can be of any data type, including other complex types. This works in a similar way as structured data types in a programming language.
19
+
External datasets used to populate an Azure AI Search index can come in many shapes. Sometimes they include hierarchical or nested substructures. Examples might include multiple addresses for a single customer, multiple colors and sizes for a single product, multiple authors of a single book, and so on. In modeling terms, you might see these structures referred to as *complex*, *compound*, *composite*, or *aggregate* data types. The term Azure AI Search uses for this concept is **complex type**. In Azure AI Search, complex types are modeled using **complex fields**. A complex field is a field that contains children (subfields) which can be of any data type, including other complex types. This works in a similar way as structured data types in a programming language.
20
20
21
21
Complex fields represent either a single object in the document, or an array of objects, depending on the data type. Fields of type `Edm.ComplexType` represent single objects, while fields of type `Collection(Edm.ComplexType)` represent arrays of objects.
22
22
@@ -61,12 +61,6 @@ The following JSON document is composed of simple fields and complex fields. Com
61
61
}
62
62
```
63
63
64
-
## Indexing complex types
65
-
66
-
During indexing, you can have a maximum of 3000 elements across all complex collections within a single document. An element of a complex collection is a member of that collection, so in the case of Rooms (the only complex collection in the Hotel example), each room is an element. In the example above, if the "Secret Point Motel" had 500 rooms, the hotel document would have 500 room elements. For nested complex collections, each nested element is also counted, in addition to the outer (parent) element.
67
-
68
-
This limit applies only to complex collections, and not complex types (like Address) or string collections (like Tags).
69
-
70
64
## Create complex fields
71
65
72
66
As with any index definition, you can use the portal, [REST API](/rest/api/searchservice/indexes/create), or [.NET SDK](/dotnet/api/azure.search.documents.indexes.models.searchindex) to create a schema that includes complex types.
During indexing, you can have a maximum of 3,000 elements across all complex collections within a single document. An element of a complex collection is a member of that collection. For Rooms (the only complex collection in the Hotel example), each room is an element. In the example above, if the "Secret Point Motel" had 500 rooms, the hotel document would have 500 room elements. For nested complex collections, each nested element is also counted, in addition to the outer (parent) element.
184
+
185
+
This limit applies only to complex collections, and not complex types (like Address) or string collections (like Tags).
186
+
187
187
## Update complex fields
188
188
189
-
All of the [reindexing rules](search-howto-reindex.md) that apply to fields in general still apply to complex fields. Restating a few of the main rules here, adding a field to a complex type doesn't require an index rebuild, but most modifications do.
189
+
All of the [reindexing rules](search-howto-reindex.md) that apply to fields in general still apply to complex fields. Adding a new field to a complex type doesn't require an index rebuild, but most other modifications do require a rebuild.
190
190
191
191
### Structural updates to the definition
192
192
@@ -198,7 +198,7 @@ Notice that within a complex type, each subfield has a type and can have attribu
198
198
199
199
Updating existing documents in an index with the `upload` action works the same way for complex and simple fields: all fields are replaced. However, `merge` (or `mergeOrUpload` when applied to an existing document) doesn't work the same across all fields. Specifically, `merge` doesn't support merging elements within a collection. This limitation exists for collections of primitive types and complex collections. To update a collection, you need to retrieve the full collection value, make changes, and then include the new collection in the Index API request.
200
200
201
-
## Search complex fields
201
+
## Search complex fields in text queries
202
202
203
203
Free-form search expressions work as expected with complex types. If any searchable field or subfield anywhere in a document matches, then the document itself is a match.
204
204
@@ -208,6 +208,51 @@ Queries get more nuanced when you have multiple terms and operators, and some te
208
208
209
209
Queries like this are *uncorrelated* for full-text search, unlike filters. In filters, queries over subfields of a complex collection are correlated using range variables in [`any` or `all`](search-query-odata-collection-operators.md). The Lucene query above returns documents containing both "Portland, Maine" and "Portland, Oregon", along with other cities in Oregon. This happens because each clause applies to all values of its field in the entire document, so there's no concept of a "current subdocument". For more information on this, see [Understanding OData collection filters in Azure AI Search](search-query-understand-collection-filters.md).
210
210
211
+
## Search complex fields in RAG queries
212
+
213
+
A RAG pattern passes search results to a chat model for generative AI and conversational search. By default, search results passed to an LLM are a flattened rowset. However, if your index has complex types, your query can provide those fields if you first convert the search results output to JSON, and then pass the JSON to the LLM.
214
+
215
+
A partial example illustrates the technique:
216
+
217
+
+ Indicate the fields you want in the prompt or in the query
218
+
+ Make sure the fields are searchable and retrievable in the index
219
+
+ Select the fields for the search results
220
+
+ Format the results as JSON
221
+
+ Send the request for chat completion to the model provider
222
+
223
+
```python
224
+
import json
225
+
226
+
# Query is the question being asked. It's sent to the search engine and the LLM.
227
+
query="Can you recommend a few hotels that offer complimentary breakfast? Tell me their description, address, tags, and the rate for one room they have which sleep 4 people."
228
+
229
+
# Set up the search results and the chat thread.
230
+
# Retrieve the selected fields from the search index related to the question.
For the end-to-end example, see [Quickstart: Generative search (RAG) with grounding data from Azure AI Search](search-get-started-rag.md).
255
+
211
256
## Select complex fields
212
257
213
258
The `$select` parameter is used to choose which fields are returned in search results. To use this parameter to select specific subfields of a complex field, include the parent field and subfield separated by a slash (`/`).
@@ -244,15 +289,13 @@ To filter on a complex collection field, you can use a **lambda expression** wit
244
289
245
290
As with top-level simple fields, simple subfields of complex fields can only be included in filters if they have the **filterable** attribute set to `true` in the index definition. For more information, see the [Create Index API reference](/rest/api/searchservice/indexes/create).
246
291
247
-
Azure Search has the limitation that the complex objects in the collections across a single document cannot exceed 3000.
248
-
249
-
Users will encounter the below error during indexing when complex collections exceed the 3000 limit.
292
+
Azure AI Search limits complex objects in a collection to 3,000 objects per document. Exceeding this limit results in the following message:
250
293
251
-
“A collection in your document exceeds the maximum elements across all complex collections limit. The document with key '1052' has '4303' objects in collections (JSON arrays). At most '3000' objects are allowed to be in collections across the entire document. Remove objects from collections and try indexing the document again."
294
+
`"A collection in your document exceeds the maximum elements across all complex collections limit. The document with key '1052' has '4303' objects in collections (JSON arrays). At most '3000' objects are allowed to be in collections across the entire document. Remove objects from collections and try indexing the document again."`
252
295
253
-
In some use cases, we might need to add more than 3000 items to a collection. In those use cases, we can pipe (|) or use any form of delimiter to delimit the values, concatenate them, and store them as a delimited string. There is no limitation on the number of strings stored in an array in Azure Search. Storing these complex values as strings avoids the limitation. The customer needs to validate whether this workaround meets their scenario requirements.
296
+
If you need more than 3,000 items, you can pipe (`|`) or use any form of delimiter to delimit the values, concatenate them, and store them as a delimited string. There's no limitation on the number of strings stored in an array. Storing complex values as strings bypasses the complex collection limitation.
254
297
255
-
For example, it wouldn't be possible to use complex types if the "searchScope" array below had more than 3000 elements.
298
+
To illustrate, assume you have a `"searchScope`" array with more than 3,000 elements:
256
299
257
300
```json
258
301
@@ -267,10 +310,11 @@ For example, it wouldn't be possible to use complex types if the "searchScope" a
267
310
"productCode": 1235,
268
311
"categoryCode": "C200"
269
312
}
313
+
. . .
270
314
]
271
315
```
272
316
273
-
Storing these complex values as strings with a delimiter avoids the limitation
317
+
The workaround for storing the values as a delimited string might look like this:
274
318
275
319
```json
276
320
"searchScope": [
@@ -283,26 +327,10 @@ Storing these complex values as strings with a delimiter avoids the limitation
283
327
]
284
328
285
329
```
286
-
Rather than storing these with wildcards, we can also use a [custom analyzer](index-add-custom-analyzers.md) that splits the word into | to cut down on storage size.
287
-
288
-
The reason we have stored the values with wildcards instead of just storing them as below
289
-
290
-
>`|FRA|1234|C100|`
291
-
292
-
is to cater to search scenarios where the customer might want to search for items that have country France, irrespective of products and categories. Similarly, the customer might need to search to see if the item has product 1234, irrespective of the country or the category.
293
-
294
-
If we had stored only one entry
295
-
296
-
>`|FRA|1234|C100|`
297
-
298
-
without wildcards, if the user wants to filter only on France, we cannot convert the user input to match the "searchScope" array because we don't know what combination of France is present in our "searchScope" array
299
-
300
330
301
-
If the user wants to filter only by country, let's say France. We will take the user input and construct it as a string as below:
331
+
Storing all of the search variants in the delimited string is helpful in search scenarios where you want to search for items that have just "FRA" or "1234" or another combination within the array.
302
332
303
-
>`|FRA|*|*|`
304
-
305
-
which we can then use to filter in azure search as we search in an array of item values
333
+
Here's a filter formatting snippet in C# that converts inputs into searchable strings:
306
334
307
335
```csharp
308
336
foreach (varfilterIteminfilterCombinations)
@@ -312,39 +340,23 @@ foreach (var filterItem in filterCombinations)
312
340
}
313
341
314
342
```
315
-
Similarly, if the user searches for France and the 1234 product code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
316
-
317
-
>`|FRA|1234|*|`
318
-
319
-
If the user searches for 1234 product code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
320
-
321
-
>`|*|1234|*|`
322
-
323
-
If the user searches for the C100 category code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
324
-
325
-
>`|*|*|C100|`
326
-
327
-
If the user searches for France and the 1234 product code and C100 category code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
328
-
329
-
>`|FRA|1234|C100|`
330
-
331
-
If a user tries to search for countries not present in our list, it will not match the delimited array "searchScope" stored in the search index, and no results will be returned.
332
-
For example, a user searches for Canada and product code 1234. The user search would be converted to
333
343
334
-
>`|CAN|1234|*|`
344
+
The following list provides inputs and search strings (outputs) side by side:
335
345
336
-
This will not match any of the entries in the delimited array in our search index.
346
+
For "FRA" county code and the "1234" product code, the formatted output is ```|FRA|1234|*|```.
347
+
For "1234" product code, the formatted output is ```|*|1234|*|```.
348
+
For "C100" category code, the formatted output is ```|*|*|C100|```.
337
349
338
-
Only the above design choice requires this wild card entry; if it had been saved as a complex object, we could have simply performed an explicit search as shown below.
350
+
Only provide the wild card entry placeholder if you're implementing the string array workaround. Otherwise, if you're using a complex type, your filter might look this example:
varcombinedCountryCategoryFilter="("+countryFilter+" and "+catgFilter+")";
344
356
345
357
```
346
-
We can thus satisfy requirements where we need to search for a combination of values by storing it as a delimited string instead of a complex collection if our complex collections exceed the Azure Search limit. This is one of the workarounds, and the customer needs to validate if this would meet their scenario requirements.
347
358
359
+
If you implement the workaround, be sure to test extentively.
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-index-schema.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ In Azure AI Search, an index that works best for RAG workloads has these qualiti
63
63
64
64
- Accommodates the queries you want create. You should have fields for vector and hybrid content, and those fields should be attributed to support specific query behaviors, such as searchable or filterable. You can only query one index at a time (no joins) so your fields collection should define all of your searchable content.
65
65
66
-
- Your schema should be flat (no complex types or structures). This requirement is specific to the RAG pattern in Azure AI Search.
66
+
- Your schema should either be flat (no complex types or structures), or you should [format the complext type utput as JSON](search-get-started-rag.md#send-a-complex-rag-query) before sending it to the LLM. This requirement is specific to the RAG pattern in Azure AI Search.
67
67
68
68
<!-- Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential queries in your search logic to pull from both (a query on the chunked data index, a lookup on the parent index). This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a lookup query. -->
0 commit comments