Skip to content

Commit 4c67ca7

Browse files
authored
[Cosmos] Adds full text policy and full text indexes (Azure#37891)
* FTS control plane changes * add tests, refine README and changelog * replace tests * update changelog, version * update README, tests * additional assertions
1 parent 15d3f80 commit 4c67ca7

File tree

9 files changed

+796
-11
lines changed

9 files changed

+796
-11
lines changed

sdk/cosmos/azure-cosmos/CHANGELOG.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
## Release History
22

3-
### 4.8.1 (Unreleased)
3+
### 4.9.0 (Unreleased)
44

55
#### Features Added
6+
* Added full text policy and full text indexing policy. See [PR 37891](https://github.com/Azure/azure-sdk-for-python/pull/37891).
67

78
#### Breaking Changes
89

@@ -16,12 +17,12 @@ This version and all future versions will support Python 3.13.
1617
#### Features Added
1718
* Added response headers directly to SDK item point operation responses. See [PR 35791](https://github.com/Azure/azure-sdk-for-python/pull/35791).
1819
* SDK will now retry all ServiceRequestErrors (failing outgoing requests) before failing. Default number of retries is 3. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
19-
* Added Retry Policy for Container Recreate in the Python SDK. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
20-
* Added option to disable write payload on writes. See [PR 37365](https://github.com/Azure/azure-sdk-for-python/pull/37365)
21-
* Added get feed ranges API. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687)
22-
* Added feed range support in `query_items_change_feed`. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687)
23-
* Added **provisional** helper APIs for managing session tokens. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971)
24-
* Added ability to get feed range for a partition key. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971)
20+
* Added Retry Policy for Container Recreate in the Python SDK. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
21+
* Added option to disable write payload on writes. See [PR 37365](https://github.com/Azure/azure-sdk-for-python/pull/37365).
22+
* Added get feed ranges API. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687).
23+
* Added feed range support in `query_items_change_feed`. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687).
24+
* Added **provisional** helper APIs for managing session tokens. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971).
25+
* Added ability to get feed range for a partition key. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971).
2526

2627
#### Breaking Changes
2728
* Item-level point operations will now return `CosmosDict` and `CosmosList` response types.
@@ -34,12 +35,12 @@ For more information on this, see our README section [here](https://github.com/A
3435
* Added retry handling logic for DatabaseAccountNotFound exceptions. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
3536
* Fixed SDK regex validation that would not allow for item ids to be longer than 255 characters. See [PR 36569](https://github.com/Azure/azure-sdk-for-python/pull/36569).
3637
* Fixed issue where 'NoneType' object has no attribute error was raised when a session retry happened during a query. See [PR 37578](https://github.com/Azure/azure-sdk-for-python/pull/37578).
37-
* Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See [PR 38136](https://github.com/Azure/azure-sdk-for-python/pull/38136)
38+
* Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See [PR 38136](https://github.com/Azure/azure-sdk-for-python/pull/38136).
3839
* Batch requests will now be properly considered as Write operation. See [PR 38365](https://github.com/Azure/azure-sdk-for-python/pull/38365).
3940

4041
#### Other Changes
41-
* Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
42-
* Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
42+
* Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
43+
* Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
4344
* SDK will now make database account calls every 5 minutes to refresh location cache. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
4445

4546
### 4.7.0 (2024-05-15)

sdk/cosmos/azure-cosmos/README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -768,6 +768,56 @@ not being able to recognize the new NonStreamingOrderBy capability that makes ve
768768
If this happens, you can set the `AZURE_COSMOS_DISABLE_NON_STREAMING_ORDER_BY` environment variable to `"True"` to opt out of this
769769
functionality and continue operating as usual.*
770770

771+
### Public Preview - Full Text Policy and Full Text Indexes
772+
We have added new capabilities to utilize full text policies and full text indexing for users to leverage full text search
773+
utilizing our Cosmos SDK. These two container-level configurations have to be turned on at the account-level
774+
before you can use them.
775+
776+
A full text policy allows the user to define the default language to be used for all full text paths, or to set
777+
a language for each path individually in case the user would like to use full text search on data containing different
778+
languages in different fields.
779+
780+
A sample full text policy would look like this:
781+
```python
782+
full_text_policy = {
783+
"defaultLanguage": "en-US",
784+
"fullTextPaths": [
785+
{
786+
"path": "/text1",
787+
"language": "en-US"
788+
},
789+
{
790+
"path": "/text2",
791+
"language": "en-US"
792+
}
793+
]
794+
}
795+
```
796+
Currently, the only supported language is `en-US` - using the relevant ISO-639 language code to ISO-3166 country code.
797+
Any non-supported language or code will return an exception when trying to use it - which will also include the list of supported languages.
798+
This list will include more options in the future; for more information on supported languages, please see [here][cosmos_fts].
799+
800+
Full text search indexes have been added to the already existing indexing_policy and only require the path to the
801+
relevant field to be used.
802+
A sample indexing policy with full text search indexes would look like this:
803+
```python
804+
indexing_policy = {
805+
"automatic": True,
806+
"indexingMode": "consistent",
807+
"compositeIndexes": [
808+
[
809+
{"path": "/numberField", "order": "ascending"},
810+
{"path": "/stringField", "order": "descending"}
811+
]
812+
],
813+
"fullTextIndexes": [
814+
{"path": "/abstract"}
815+
]
816+
}
817+
```
818+
Modifying the index in a container is an asynchronous operation that can take a long time to finish. See [here][cosmos_index_policy_change] for more information.
819+
For more information on using full text policies and full text indexes, see [here][cosmos_fts].
820+
771821
## Troubleshooting
772822

773823
### General
@@ -904,6 +954,8 @@ For more extensive documentation on the Cosmos DB service, see the [Azure Cosmos
904954
[cosmos_concurrency_sample]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/cosmos/azure-cosmos/samples/concurrency_sample.py
905955
[cosmos_index_sample]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/cosmos/azure-cosmos/samples/index_management.py
906956
[cosmos_index_sample_async]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/cosmos/azure-cosmos/samples/index_management_async.py
957+
[cosmos_fts]: https://aka.ms/cosmosfulltextsearch
958+
[cosmos_index_policy_change]: https://learn.microsoft.com/azure/cosmos-db/index-policy#modifying-the-indexing-policy
907959

908960
## Contributing
909961

sdk/cosmos/azure-cosmos/azure/cosmos/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@
1919
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2020
# SOFTWARE.
2121

22-
VERSION = "4.8.1"
22+
VERSION = "4.9.0"

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ async def create_container(
173173
match_condition: Optional[MatchConditions] = None,
174174
analytical_storage_ttl: Optional[int] = None,
175175
vector_embedding_policy: Optional[Dict[str, Any]] = None,
176+
full_text_policy: Optional[Dict[str, Any]] = None,
176177
**kwargs: Any
177178
) -> ContainerProxy:
178179
"""Create a new container with the given ID (name).
@@ -206,6 +207,9 @@ async def create_container(
206207
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
207208
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
208209
is generated for a particular distance function.
210+
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
211+
Used to denote the default language to be used for all full text indexes, or to individually
212+
assign a language to each full text index path.
209213
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
210214
:returns: A `ContainerProxy` instance representing the new container.
211215
:rtype: ~azure.cosmos.aio.ContainerProxy
@@ -251,6 +255,8 @@ async def create_container(
251255
definition["computedProperties"] = computed_properties
252256
if vector_embedding_policy is not None:
253257
definition["vectorEmbeddingPolicy"] = vector_embedding_policy
258+
if full_text_policy is not None:
259+
definition["fullTextPolicy"] = full_text_policy
254260

255261
if session_token is not None:
256262
kwargs['session_token'] = session_token
@@ -285,6 +291,7 @@ async def create_container_if_not_exists(
285291
match_condition: Optional[MatchConditions] = None,
286292
analytical_storage_ttl: Optional[int] = None,
287293
vector_embedding_policy: Optional[Dict[str, Any]] = None,
294+
full_text_policy: Optional[Dict[str, Any]] = None,
288295
**kwargs: Any
289296
) -> ContainerProxy:
290297
"""Create a container if it does not exist already.
@@ -320,6 +327,9 @@ async def create_container_if_not_exists(
320327
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
321328
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
322329
data type, and is generated for a particular distance function.
330+
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
331+
Used to denote the default language to be used for all full text indexes, or to individually
332+
assign a language to each full text index path.
323333
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
324334
:returns: A `ContainerProxy` instance representing the new container.
325335
:rtype: ~azure.cosmos.aio.ContainerProxy
@@ -349,6 +359,7 @@ async def create_container_if_not_exists(
349359
session_token=session_token,
350360
initial_headers=initial_headers,
351361
vector_embedding_policy=vector_embedding_policy,
362+
full_text_policy=full_text_policy,
352363
**kwargs
353364
)
354365

@@ -482,6 +493,7 @@ async def replace_container(
482493
etag: Optional[str] = None,
483494
match_condition: Optional[MatchConditions] = None,
484495
analytical_storage_ttl: Optional[int] = None,
496+
full_text_policy: Optional[Dict[str, Any]] = None,
485497
**kwargs: Any
486498
) -> ContainerProxy:
487499
"""Reset the properties of the container.
@@ -509,6 +521,9 @@ async def replace_container(
509521
note that analytical storage can only be enabled on Synapse Link enabled accounts.
510522
:keyword response_hook: A callable invoked with the response metadata.
511523
:paramtype response_hook: Callable[[Dict[str, str], Dict[str, Any]], None]
524+
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
525+
Used to denote the default language to be used for all full text indexes, or to individually
526+
assign a language to each full text index path.
512527
:returns: A `ContainerProxy` instance representing the container after replace completed.
513528
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: Raised if the container couldn't be replaced.
514529
This includes if the container with given id does not exist.
@@ -545,6 +560,7 @@ async def replace_container(
545560
"defaultTtl": default_ttl,
546561
"conflictResolutionPolicy": conflict_resolution_policy,
547562
"analyticalStorageTtl": analytical_storage_ttl,
563+
"fullTextPolicy": full_text_policy
548564
}.items()
549565
if value is not None
550566
}

sdk/cosmos/azure-cosmos/azure/cosmos/database.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ def create_container( # pylint:disable=docstring-missing-param
174174
match_condition: Optional[MatchConditions] = None,
175175
analytical_storage_ttl: Optional[int] = None,
176176
vector_embedding_policy: Optional[Dict[str, Any]] = None,
177+
full_text_policy: Optional[Dict[str, Any]] = None,
177178
**kwargs: Any
178179
) -> ContainerProxy:
179180
"""Create a new container with the given ID (name).
@@ -203,6 +204,9 @@ def create_container( # pylint:disable=docstring-missing-param
203204
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
204205
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
205206
data type, and is generated for a particular distance function.
207+
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
208+
Used to denote the default language to be used for all full text indexes, or to individually
209+
assign a language to each full text index path.
206210
:returns: A `ContainerProxy` instance representing the new container.
207211
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
208212
:rtype: ~azure.cosmos.ContainerProxy
@@ -246,6 +250,8 @@ def create_container( # pylint:disable=docstring-missing-param
246250
definition["computedProperties"] = computed_properties
247251
if vector_embedding_policy is not None:
248252
definition["vectorEmbeddingPolicy"] = vector_embedding_policy
253+
if full_text_policy is not None:
254+
definition["fullTextPolicy"] = full_text_policy
249255

250256
if session_token is not None:
251257
kwargs['session_token'] = session_token
@@ -287,6 +293,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
287293
match_condition: Optional[MatchConditions] = None,
288294
analytical_storage_ttl: Optional[int] = None,
289295
vector_embedding_policy: Optional[Dict[str, Any]] = None,
296+
full_text_policy: Optional[Dict[str, Any]] = None,
290297
**kwargs: Any
291298
) -> ContainerProxy:
292299
"""Create a container if it does not exist already.
@@ -318,6 +325,9 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
318325
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
319326
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
320327
is generated for a particular distance function.
328+
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
329+
Used to denote the default language to be used for all full text indexes, or to individually
330+
assign a language to each full text index path.
321331
:returns: A `ContainerProxy` instance representing the container.
322332
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container read or creation failed.
323333
:rtype: ~azure.cosmos.ContainerProxy
@@ -349,6 +359,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
349359
session_token=session_token,
350360
initial_headers=initial_headers,
351361
vector_embedding_policy=vector_embedding_policy,
362+
full_text_policy=full_text_policy,
352363
**kwargs
353364
)
354365

@@ -538,6 +549,7 @@ def replace_container( # pylint:disable=docstring-missing-param
538549
etag: Optional[str] = None,
539550
match_condition: Optional[MatchConditions] = None,
540551
analytical_storage_ttl: Optional[int] = None,
552+
full_text_policy: Optional[Dict[str, Any]] = None,
541553
**kwargs: Any
542554
) -> ContainerProxy:
543555
"""Reset the properties of the container.
@@ -562,6 +574,9 @@ def replace_container( # pylint:disable=docstring-missing-param
562574
None leaves analytical storage off and a value of -1 turns analytical storage on with no TTL. Please
563575
note that analytical storage can only be enabled on Synapse Link enabled accounts.
564576
:keyword Callable response_hook: A callable invoked with the response metadata.
577+
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
578+
Used to denote the default language to be used for all full text indexes, or to individually
579+
assign a language to each full text index path.
565580
:returns: A `ContainerProxy` instance representing the container after replace completed.
566581
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: Raised if the container couldn't be replaced.
567582
This includes if the container with given id does not exist.
@@ -603,6 +618,7 @@ def replace_container( # pylint:disable=docstring-missing-param
603618
"defaultTtl": default_ttl,
604619
"conflictResolutionPolicy": conflict_resolution_policy,
605620
"analyticalStorageTtl": analytical_storage_ttl,
621+
"fullTextPolicy": full_text_policy,
606622
}.items()
607623
if value is not None
608624
}

sdk/cosmos/azure-cosmos/samples/index_management.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -753,6 +753,61 @@ def get_embeddings(num):
753753
print("Entity doesn't exist")
754754

755755

756+
def use_full_text_policy(db):
757+
try:
758+
delete_container_if_exists(db, CONTAINER_ID)
759+
760+
# Create a container with full text policy and full text indexes
761+
indexing_policy = {
762+
"automatic": True,
763+
"fullTextIndexes": [
764+
{"path": "/text1"}
765+
]
766+
}
767+
full_text_policy = {
768+
"defaultLanguage": "en-US",
769+
"fullTextPaths": [
770+
{
771+
"path": "/text1",
772+
"language": "en-US"
773+
},
774+
{
775+
"path": "/text2",
776+
"language": "en-US"
777+
}
778+
]
779+
}
780+
781+
created_container = db.create_container(
782+
id=CONTAINER_ID,
783+
partition_key=PARTITION_KEY,
784+
indexing_policy=indexing_policy,
785+
full_text_policy=full_text_policy
786+
)
787+
properties = created_container.read()
788+
print(created_container)
789+
790+
print("\n" + "-" * 25 + "\n11. Container created with full text policy and full text indexes")
791+
print_dictionary_items(properties["indexingPolicy"])
792+
print_dictionary_items(properties["fullTextPolicy"])
793+
794+
# Create some items to use with full text search
795+
for i in range(10):
796+
created_container.create_item({"id": "full_text_item" + str(i), "text1": "some-text"})
797+
798+
# Run full text search queries using ranking
799+
query = "select * from c"
800+
query_documents_with_custom_query(created_container, query)
801+
802+
# Cleanup
803+
db.delete_container(created_container)
804+
print("\n")
805+
except exceptions.CosmosResourceExistsError:
806+
print("Entity already exists")
807+
except exceptions.CosmosResourceNotFoundError:
808+
print("Entity doesn't exist")
809+
810+
756811
def run_sample():
757812
try:
758813
client = obtain_client()
@@ -789,6 +844,9 @@ def run_sample():
789844
# 10. Create and use a vector embedding policy
790845
use_vector_embedding_policy(created_db)
791846

847+
# 11. Create and use a full text policy
848+
use_full_text_policy(created_db)
849+
792850
except exceptions.AzureError as e:
793851
raise e
794852

0 commit comments

Comments
 (0)