Skip to content

Commit 64fc9dd

Browse files
committed
other complex type updates
1 parent b021232 commit 64fc9dd

File tree

2 files changed

+73
-61
lines changed

2 files changed

+73
-61
lines changed

articles/search/search-howto-complex-data-types.md

Lines changed: 72 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ ms.custom:
1111
- ignite-2023
1212
ms.service: azure-ai-search
1313
ms.topic: how-to
14-
ms.date: 01/18/2024
14+
ms.date: 10/14/2024
1515
---
1616

1717
# Model complex data types in Azure AI Search
1818

19-
External datasets used to populate an Azure AI Search index can come in many shapes. Sometimes they include hierarchical or nested substructures. Examples might include multiple addresses for a single customer, multiple colors and sizes for a single SKU, multiple authors of a single book, and so on. In modeling terms, you might see these structures referred to as *complex*, *compound*, *composite*, or *aggregate* data types. The term Azure AI Search uses for this concept is **complex type**. In Azure AI Search, complex types are modeled using **complex fields**. A complex field is a field that contains children (subfields) which can be of any data type, including other complex types. This works in a similar way as structured data types in a programming language.
19+
External datasets used to populate an Azure AI Search index can come in many shapes. Sometimes they include hierarchical or nested substructures. Examples might include multiple addresses for a single customer, multiple colors and sizes for a single product, multiple authors of a single book, and so on. In modeling terms, you might see these structures referred to as *complex*, *compound*, *composite*, or *aggregate* data types. The term Azure AI Search uses for this concept is **complex type**. In Azure AI Search, complex types are modeled using **complex fields**. A complex field is a field that contains children (subfields) which can be of any data type, including other complex types. This works in a similar way as structured data types in a programming language.
2020

2121
Complex fields represent either a single object in the document, or an array of objects, depending on the data type. Fields of type `Edm.ComplexType` represent single objects, while fields of type `Collection(Edm.ComplexType)` represent arrays of objects.
2222

@@ -61,12 +61,6 @@ The following JSON document is composed of simple fields and complex fields. Com
6161
}
6262
```
6363

64-
## Indexing complex types
65-
66-
During indexing, you can have a maximum of 3000 elements across all complex collections within a single document. An element of a complex collection is a member of that collection, so in the case of Rooms (the only complex collection in the Hotel example), each room is an element. In the example above, if the "Secret Point Motel" had 500 rooms, the hotel document would have 500 room elements. For nested complex collections, each nested element is also counted, in addition to the outer (parent) element.
67-
68-
This limit applies only to complex collections, and not complex types (like Address) or string collections (like Tags).
69-
7064
## Create complex fields
7165

7266
As with any index definition, you can use the portal, [REST API](/rest/api/searchservice/indexes/create), or [.NET SDK](/dotnet/api/azure.search.documents.indexes.models.searchindex) to create a schema that includes complex types.
@@ -184,9 +178,15 @@ namespace AzureSearch.SDKHowTo
184178

185179
---
186180

181+
### Complex collection limits
182+
183+
During indexing, you can have a maximum of 3,000 elements across all complex collections within a single document. An element of a complex collection is a member of that collection. For Rooms (the only complex collection in the Hotel example), each room is an element. In the example above, if the "Secret Point Motel" had 500 rooms, the hotel document would have 500 room elements. For nested complex collections, each nested element is also counted, in addition to the outer (parent) element.
184+
185+
This limit applies only to complex collections, and not complex types (like Address) or string collections (like Tags).
186+
187187
## Update complex fields
188188

189-
All of the [reindexing rules](search-howto-reindex.md) that apply to fields in general still apply to complex fields. Restating a few of the main rules here, adding a field to a complex type doesn't require an index rebuild, but most modifications do.
189+
All of the [reindexing rules](search-howto-reindex.md) that apply to fields in general still apply to complex fields. Adding a new field to a complex type doesn't require an index rebuild, but most other modifications do require a rebuild.
190190

191191
### Structural updates to the definition
192192

@@ -198,7 +198,7 @@ Notice that within a complex type, each subfield has a type and can have attribu
198198

199199
Updating existing documents in an index with the `upload` action works the same way for complex and simple fields: all fields are replaced. However, `merge` (or `mergeOrUpload` when applied to an existing document) doesn't work the same across all fields. Specifically, `merge` doesn't support merging elements within a collection. This limitation exists for collections of primitive types and complex collections. To update a collection, you need to retrieve the full collection value, make changes, and then include the new collection in the Index API request.
200200

201-
## Search complex fields
201+
## Search complex fields in text queries
202202

203203
Free-form search expressions work as expected with complex types. If any searchable field or subfield anywhere in a document matches, then the document itself is a match.
204204

@@ -208,6 +208,51 @@ Queries get more nuanced when you have multiple terms and operators, and some te
208208
209209
Queries like this are *uncorrelated* for full-text search, unlike filters. In filters, queries over subfields of a complex collection are correlated using range variables in [`any` or `all`](search-query-odata-collection-operators.md). The Lucene query above returns documents containing both "Portland, Maine" and "Portland, Oregon", along with other cities in Oregon. This happens because each clause applies to all values of its field in the entire document, so there's no concept of a "current subdocument". For more information on this, see [Understanding OData collection filters in Azure AI Search](search-query-understand-collection-filters.md).
210210

211+
## Search complex fields in RAG queries
212+
213+
A RAG pattern passes search results to a chat model for generative AI and conversational search. By default, search results passed to an LLM are a flattened rowset. However, if your index has complex types, your query can provide those fields if you first convert the search results output to JSON, and then pass the JSON to the LLM.
214+
215+
A partial example illustrates the technique:
216+
217+
+ Indicate the fields you want in the prompt or in the query
218+
+ Make sure the fields are searchable and retrievable in the index
219+
+ Select the fields for the search results
220+
+ Format the results as JSON
221+
+ Send the request for chat completion to the model provider
222+
223+
```python
224+
import json
225+
226+
# Query is the question being asked. It's sent to the search engine and the LLM.
227+
query="Can you recommend a few hotels that offer complimentary breakfast? Tell me their description, address, tags, and the rate for one room they have which sleep 4 people."
228+
229+
# Set up the search results and the chat thread.
230+
# Retrieve the selected fields from the search index related to the question.
231+
selected_fields = ["HotelName","Description","Address","Rooms","Tags"]
232+
search_results = search_client.search(
233+
search_text=query,
234+
top=5,
235+
select=selected_fields,
236+
query_type="semantic"
237+
)
238+
sources_filtered = [{field: result[field] for field in selected_fields} for result in search_results]
239+
sources_formatted = "\n".join([json.dumps(source) for source in sources_filtered])
240+
241+
response = openai_client.chat.completions.create(
242+
messages=[
243+
{
244+
"role": "user",
245+
"content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
246+
}
247+
],
248+
model=AZURE_DEPLOYMENT_MODEL
249+
)
250+
251+
print(response.choices[0].message.content)
252+
```
253+
254+
For the end-to-end example, see [Quickstart: Generative search (RAG) with grounding data from Azure AI Search](search-get-started-rag.md).
255+
211256
## Select complex fields
212257

213258
The `$select` parameter is used to choose which fields are returned in search results. To use this parameter to select specific subfields of a complex field, include the parent field and subfield separated by a slash (`/`).
@@ -244,15 +289,13 @@ To filter on a complex collection field, you can use a **lambda expression** wit
244289
245290
As with top-level simple fields, simple subfields of complex fields can only be included in filters if they have the **filterable** attribute set to `true` in the index definition. For more information, see the [Create Index API reference](/rest/api/searchservice/indexes/create).
246291

247-
Azure Search has the limitation that the complex objects in the collections across a single document cannot exceed 3000.
248-
249-
Users will encounter the below error during indexing when complex collections exceed the 3000 limit.
292+
Azure AI Search limits complex objects in a collection to 3,000 objects per document. Exceeding this limit results in the following message:
250293

251-
A collection in your document exceeds the maximum elements across all complex collections limit. The document with key '1052' has '4303' objects in collections (JSON arrays). At most '3000' objects are allowed to be in collections across the entire document. Remove objects from collections and try indexing the document again."
294+
`"A collection in your document exceeds the maximum elements across all complex collections limit. The document with key '1052' has '4303' objects in collections (JSON arrays). At most '3000' objects are allowed to be in collections across the entire document. Remove objects from collections and try indexing the document again."`
252295

253-
In some use cases, we might need to add more than 3000 items to a collection. In those use cases, we can pipe (|) or use any form of delimiter to delimit the values, concatenate them, and store them as a delimited string. There is no limitation on the number of strings stored in an array in Azure Search. Storing these complex values as strings avoids the limitation. The customer needs to validate whether this workaround meets their scenario requirements.
296+
If you need more than 3,000 items, you can pipe (`|`) or use any form of delimiter to delimit the values, concatenate them, and store them as a delimited string. There's no limitation on the number of strings stored in an array. Storing complex values as strings bypasses the complex collection limitation.
254297

255-
For example, it wouldn't be possible to use complex types if the "searchScope" array below had more than 3000 elements.
298+
To illustrate, assume you have a `"searchScope`" array with more than 3,000 elements:
256299

257300
```json
258301

@@ -267,10 +310,11 @@ For example, it wouldn't be possible to use complex types if the "searchScope" a
267310
"productCode": 1235,
268311
"categoryCode": "C200"
269312
}
313+
. . .
270314
]
271315
```
272316

273-
Storing these complex values as strings with a delimiter avoids the limitation
317+
The workaround for storing the values as a delimited string might look like this:
274318

275319
```json
276320
"searchScope": [
@@ -283,26 +327,10 @@ Storing these complex values as strings with a delimiter avoids the limitation
283327
]
284328

285329
```
286-
Rather than storing these with wildcards, we can also use a [custom analyzer](index-add-custom-analyzers.md) that splits the word into | to cut down on storage size.
287-
288-
The reason we have stored the values with wildcards instead of just storing them as below
289-
290-
>`|FRA|1234|C100|`
291-
292-
is to cater to search scenarios where the customer might want to search for items that have country France, irrespective of products and categories. Similarly, the customer might need to search to see if the item has product 1234, irrespective of the country or the category.
293-
294-
If we had stored only one entry
295-
296-
>`|FRA|1234|C100|`
297-
298-
without wildcards, if the user wants to filter only on France, we cannot convert the user input to match the "searchScope" array because we don't know what combination of France is present in our "searchScope" array
299-
300330

301-
If the user wants to filter only by country, let's say France. We will take the user input and construct it as a string as below:
331+
Storing all of the search variants in the delimited string is helpful in search scenarios where you want to search for items that have just "FRA" or "1234" or another combination within the array.
302332

303-
>`|FRA|*|*|`
304-
305-
which we can then use to filter in azure search as we search in an array of item values
333+
Here's a filter formatting snippet in C# that converts inputs into searchable strings:
306334

307335
```csharp
308336
foreach (var filterItem in filterCombinations)
@@ -312,39 +340,23 @@ foreach (var filterItem in filterCombinations)
312340
}
313341

314342
```
315-
Similarly, if the user searches for France and the 1234 product code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
316-
317-
>`|FRA|1234|*|`
318-
319-
If the user searches for 1234 product code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
320-
321-
>`|*|1234|*|`
322-
323-
If the user searches for the C100 category code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
324-
325-
>`|*|*|C100|`
326-
327-
If the user searches for France and the 1234 product code and C100 category code, we will take the user input, construct it as a delimited string as below, and match it against our search array.
328-
329-
>`|FRA|1234|C100|`
330-
331-
If a user tries to search for countries not present in our list, it will not match the delimited array "searchScope" stored in the search index, and no results will be returned.
332-
For example, a user searches for Canada and product code 1234. The user search would be converted to
333343

334-
>`|CAN|1234|*|`
344+
The following list provides inputs and search strings (outputs) side by side:
335345

336-
This will not match any of the entries in the delimited array in our search index.
346+
For "FRA" county code and the "1234" product code, the formatted output is ```|FRA|1234|*|```.
347+
For "1234" product code, the formatted output is ```|*|1234|*|```.
348+
For "C100" category code, the formatted output is ```|*|*|C100|```.
337349

338-
Only the above design choice requires this wild card entry; if it had been saved as a complex object, we could have simply performed an explicit search as shown below.
350+
Only provide the wild card entry placeholder if you're implementing the string array workaround. Otherwise, if you're using a complex type, your filter might look this example:
339351

340352
```csharp
341-
var countryFilter = $"searchScope/any(ss: search.in(countryCode ,'FRA'))";
342-
var catgFilter = $"searchScope/any(ss: search.in(categoryCode ,'C100'))";
343-
var combinedCountryCategoryFilter = "(" + countryFilter + " and " + catgFilter + ")";
353+
var countryFilter = $"searchScope/any(ss: search.in(countryCode ,'FRA'))";
354+
var catgFilter = $"searchScope/any(ss: search.in(categoryCode ,'C100'))";
355+
var combinedCountryCategoryFilter = "(" + countryFilter + " and " + catgFilter + ")";
344356

345357
```
346-
We can thus satisfy requirements where we need to search for a combination of values by storing it as a delimited string instead of a complex collection if our complex collections exceed the Azure Search limit. This is one of the workarounds, and the customer needs to validate if this would meet their scenario requirements.
347358

359+
If you implement the workaround, be sure to test extentively.
348360

349361
## Next steps
350362

articles/search/tutorial-rag-build-solution-index-schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ In Azure AI Search, an index that works best for RAG workloads has these qualiti
6363

6464
- Accommodates the queries you want create. You should have fields for vector and hybrid content, and those fields should be attributed to support specific query behaviors, such as searchable or filterable. You can only query one index at a time (no joins) so your fields collection should define all of your searchable content.
6565

66-
- Your schema should be flat (no complex types or structures). This requirement is specific to the RAG pattern in Azure AI Search.
66+
- Your schema should either be flat (no complex types or structures), or you should [format the complext type utput as JSON](search-get-started-rag.md#send-a-complex-rag-query) before sending it to the LLM. This requirement is specific to the RAG pattern in Azure AI Search.
6767

6868
<!-- Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential queries in your search logic to pull from both (a query on the chunked data index, a lookup on the parent index). This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a lookup query. -->
6969

0 commit comments

Comments
 (0)