Skip to content

Commit 8448657

Browse files
authored
astrapy version 2 (#344)
* escape utilities; wip unit tests thereof * unit tests for the escape/unescape utilities * added more tests of doc-path manip * distinct methods accept list of literals and escape str input. WIP int.tests todo * distinct: enriched int. tests with full coverage of list key inputs * path segments in utilities can be integers * unescape of empty string returns empty list * wip for maps-as-tuples: got the basics working * basic support for tuples in table payloads (no actual test in filters and some updates) * align un/escape utilities to latest conventions * finalize distinct docstrings * disabled auto-tuples for any 'filter' portion of any payload * cleanup of map2tuple_paths for updateOne * changesfile * update docstring for collection find/one about include_similarity * wip for serders+generalized * serdes option to control auto-tuple behaviour, added specific tests to tuplification * refactoring of map2tuple path definition + unit test thereof * full onboarding of the map2tuple serdes option * refactor map2tuple logic to use discriminator functions * basic support for column indexes (with 'entries', no testing) * adjust against 1906 as temp measure * completed 'inert' work on listindexes, parking as is for now * fix some unit tests * full empty-options-hiding in table index definition as_dict + adjust tests * createCollection, FARR ready with unit tests * rename enum to CursorState * cursors, clone method(s) retain the mapping * preparation for farr prototype * FindAndRerankCursor classes * remove the 'alive' (property) method of all cursors * collection, find_and_rerank method impl * changesfile * restore FARR signature to resemble find; rename FARR's sort type alias to HybridSortType * RerankedResult for FARR cursors * changesfile * fix return types of find_and_rerank's * removed CumulativeOperationException * insertmany overhaul WIP: collectionInsertMany done * insertmany overhaul WIP: collectionInsertMany+TableInsertMany done * all batch-op exceptions behave as per 2.0 spec now * str repr for bulk op exception classes * a type alias for DataAPIWarningDescriptor * wip on revise docstrings and tests * done last remaining todos in docstrings + note on escaping utils in readme * adjust tests to new bulk-exception logic * adjust integration testing to new bulk-exception structure * exception diagram picture and changesfile done * fix bulk-exceptions integration tests for dml tests * all integration tests adapted to new bulk-error structure * adjust all tests to the exception rework * map-to-tuple serdes option has three states (never, DataAPIMaps, always) * farr method docstring (colls); adjust tests for the default lexical coming back * added missing docstrings * update import sample code: readme+test * refactor cursor modules * renamed (table) index type 'text-analysed' => 'text' * classes for findRerProvs response + unit tests * async_/find_reranking_providers method in database admins * reranking header provider classes * api options is reranking-api-key aware * adapt unit tests to reranking_api_keys param * adapt to API renaming reranking->rerank in createCollection * collection lifecycle int.tests for FARR * added RerankingAPIKeyHeaderProvider code example to docstring * add unit test of rerank-api-key in commander headers * rename CollectionRerankingOptions & RerankingServiceOptions ==> CollectionRerankOptions & RerankServiceOptions * improve farr mock response for testing set-up * protect cursors' dict inputs with deepcopy * mock-based basic 'integration tests' for findAndRerank * thorough testing of the bulk exceptions * include_sort_vector in findAndRerank support * integration tests for get_sort_vector in collections' FARR * include_scores in collection's FARR * First farr IT with actual API; protection against 'vector' score; un-mock farr responses * remove outdated ref to cursor.distinct removed method from docstrings * farr cursors, completed docstrings for methods and classes * standard cursor test for sync collection FARR cursor * async farr cursor test * adjusted a test to a newly-effective error deduplication in insertMany * some missing parts from README on new features * changesfile * Eric P's spotted a typo in authentication.py docstring example * change header name for reranking API key; restore empty-farr-results test * remove hybridLimits from FARR tests when not needed as it became optional for the API * add collection support for DataAPIDate and DataAPIMap + tests * changesfile * test of zero-match FARR cursors and their get_sort_vector properties * extend vectorize/FARR sort and hybrid_limits testing to objects * introduced rerankQuery param for FARR (for byov querying) * adapt farr novectorize tests to rerank_query parameter; temporary fix against issue #1949 * find_and_rerank, detailed code examples in docstring * docstrings * removed mocker of farr responses * comments * adapt testing to dev/prod/local provisional differences re: hybrid and farr * Ustilaginales * more protection against issue 1949 * readme * add DataAPIVector to HybridSortType; adjust astra admin tests to get_database requiring region * mark find_and_rerank as beta method * testing of typing with FARR * final changesfile
1 parent 0175e76 commit 8448657

File tree

91 files changed

+12019
-1888
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+12019
-1888
lines changed

CHANGES

Lines changed: 43 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,43 @@
1-
main
2-
====
3-
Exceptions
4-
- `DataAPIResponseException.error_descriptors` is a property (computed from detailed_error_descriptors)
1+
v 2.0.0
2+
==========
3+
Collections admit `DataAPIMap` in writing and support `DataAPIDate` as well (the latter coming with the same timezone caveats as `datetime.date`)
4+
(plus all of the v2 pre-releases below)
5+
6+
7+
v 2.0.0rc2
8+
==========
9+
Cursors (class hierarchy revised to accommodate `find_and_rerank`, plus other changes):
10+
- renamed the 'FindCursorState' enum to `CursorState`
11+
- renamed the abstract ur-class 'FindCursor' => `AbstractCursor`
12+
- `clone` method does not strip the mapping anymore, rather retains it
13+
- removed the `alive` sugar property (use `cursors.state != CursorState.CLOSED`)
14+
Support for reranker header-based authentication:
15+
- new authentication classes `RerankingHeadersProvider`, `RerankingAPIKeyHeaderProvider`
16+
- introduced `reranking_api_key` parameter for APIOptions, `{get|create}_{table|collection}` database methods, collection/table `with_options` and `to_[a]sync` methods
17+
Support for "findRerankingProviders" API command in Database Admin classes:
18+
- classes class hierarchy: `RerankingProviderParameter`, `RerankingProviderModel`, `RerankingProviderToken`, `RerankingProviderAuthentication`, `RerankingProvider`, `FindRerankingProvidersResult` to express the response
19+
- database admin `find_reranking_providers`/`async_find_reranking_providers` methods implemented.
20+
Support for findAndRerank in collections:
21+
- new classes `CollectionLexicalOptions`, `CollectionRerankOptions`, `RerankServiceOptions` for create_collection
22+
- new `lexical` and `rerank` entries in CollectionDefinition (+builder interface management)
23+
- `find_and_rerank` method for collections
24+
- cursor classes `[Async]CollectionFindAndRerankCursor` added
25+
- findAndRerank cursors return the new `RerankedResult` construct by default (modulo custom mappings)
26+
Maps for tables expressed as list of pairs (association lists):
27+
- support for automatic handling of DataAPIMaps (+possibly dicts) in the proper table payload portions
28+
- introduced serdes option `encode_maps_as_lists_in_tables` (default to "NEVER") to control this
29+
Exceptions, major rework of `[Table|Collection]InsertManyException`, `CollectionUpdateManyException` and `CollectionDeleteManyException` ("bulk operations")
30+
- All astrapy exceptions derive directly from `Exception` (and not 'ValueError` anymore)
531
- better string representation of `DataAPIDetailedErrorDescriptor`
32+
- `DataAPIDetailedErrorDescriptor` removed.
33+
- 'CumulativeOperationException` removed.
34+
- The four `CollectionInsertManyException`, `CollectionUpdateManyException`, `CollectionDeleteManyException`, `TableInsertManyException` classes now inherit directly from `DataAPIResponseException`.
35+
- New semantics and structure for `[Collection|Table]InsertManyException`: they have members `inserted_ids`(/`inserted_id_tuples`) and an `exceptions` list for the root cause(s)
36+
- New semantics and structure for `Collection[Update|Delete]ManyException`: they have members `partial_result` and a single-exception `cause`. They are now raised consistently for API exceptions occurring during the respective methods.
37+
Arbitrary field names and dot-escaping:
38+
- offering utilities `astrapy.utils.document_paths.escape_field_names/unescape_field_path`
39+
- `distinct` methods can accept a list of (literal) str|int as well as a(n escaped) identifier string
40+
Improved StrEnum matching for e.g. better coercion of TableIndexType (and future enum with e.g. dashes in values)
641
Spawner methods for databases/admins standardized; they don't issue DevOps API calls.
742
- removed `normalize_region_for_id` utility method, not used anymore.
843
- `AstraDBAdmin.get_[async]_database()`:
@@ -29,6 +64,7 @@ Replaced the ValueErrors not directly coming from function calls/constructors wi
2964
Collection and Table `insert_many` methods employ returnDocumentResponses under the hood.
3065
maintenance: switch to DSE6.9 for local non-Astra testing.
3166

67+
3268
v 2.0.0rc1
3369
==========
3470
Support for TIMEUUID and COUNTER columns:
@@ -41,6 +77,7 @@ restore support for Python 3.8, 3.9
4177
maintenance: full restructuring of tests and CI (tables+collections on same footing+other)
4278
maintenance: adopt `blockbuster` in async tests to detect (and bust) any blocking call
4379

80+
4481
v 2.0.0-preview
4582
===============
4683
Introduction of full Tables support.
@@ -77,7 +114,7 @@ Reworked and enriched `FindCursor` interface:
77114
- Cursor classes renamed to `[Async]CollectionCursor`
78115
- Base class for all (find) cursors renamed to `FindCursor`
79116
- introduced `map` and `to_list` methods
80-
- `cursor.state` now has values in `FindCursorState` enum (take `cursor.state.value` for a string)
117+
- `cursor.state` now has values in `CursorState` enum (take `cursor.state.value` for a string)
81118
- 'cursor.address' is removed from the API
82119
- `cursor.rewind()` returns None, mutates cursor in-place
83120
- removed 'cursor.distinct()': use the corresponding collection(/table) method.
@@ -199,6 +236,7 @@ Internal restructuring/maintenance things:
199236
- removal of unused imports from toplevel `__init__.py` (ids, constants, cursors)
200237
- simplified timeout management classes and representations
201238

239+
202240
v. 1.5.2
203241
========
204242
Bugfix: `Database.get_collection` uses callers inheritance (same for async)

README.md

Lines changed: 135 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,56 @@ Next steps:
9393
- [AstraPy reference](https://docs.datastax.com/en/astra-api-docs/_attachments/python-client/astrapy/index.html)
9494
- Package on [PyPI](https://pypi.org/project/astrapy/)
9595

96+
### Server-side embeddings
97+
98+
AstraPy works with the "vectorize" feature of the Data API. This means that one can define server-side computation for vector embeddings and use text strings in place of a document vector, both in writing and in reading.
99+
The transformation of said text into an embedding is handled by the Data API, using a provider and model you specify.
100+
101+
```python
102+
my_collection = database.create_collection(
103+
"my_vectorize_collection",
104+
definition=(
105+
CollectionDefinition.builder()
106+
.set_vector_service(
107+
provider="example_vendor",
108+
model_name="embedding_model_name",
109+
authentication={"providerKey": "<STORED_API_KEY_NAME>"} # if needed
110+
)
111+
.build()
112+
)
113+
)
114+
115+
my_collection.insert_one({"$vectorize": "text to make into embedding"})
116+
117+
documents = my_collection.find(sort={"$vectorize": "vector search query text"})
118+
```
119+
120+
See the [Data API reference](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html)
121+
for more on this topic.
122+
123+
### Hybrid search
124+
125+
AstraPy supports the supports the "find and rerank" Data API command,
126+
which performs a hybrid search by combining results from a lexical search
127+
and a vector-based search in a single operation.
128+
129+
```python
130+
r_results = my_collection.find_and_rerank(
131+
sort={"$hybrid": "query text"},
132+
limit=10,
133+
include_scores=True,
134+
)
135+
136+
for r_result in r_results:
137+
print(r_result.document, r_results.scores)
138+
```
139+
140+
The Data API must support the primitive (and one must not have
141+
disabled the feature at collection-creation time).
142+
143+
See the Data API reference, and the docstring for the `find_and_rerank` method,
144+
for more on this topic.
145+
96146
### Using Tables
97147

98148
The example above uses a _collection_, where schemaless "documents" can be stored and retrieved.
@@ -184,7 +234,17 @@ for result in cursor:
184234
my_table.drop()
185235
```
186236

187-
For more on Tables, consult the [Data API documentation about Tables](https://docs.datastax.com/en/astra-db-serverless/api-reference/tables.html).
237+
For more on Tables, consult the [Data API documentation about Tables](https://docs.datastax.com/en/astra-db-serverless/api-reference/tables.html). Note that most features of Collections, with due modifications, hold for Tables as well (e.g. "vectorize", i.e. server-side embeddings).
238+
239+
#### Maps as association lists
240+
241+
In the Data API, table `map` columns with key of a type other than text
242+
have to be expressed as association lists,
243+
i.e. nested lists of lists: `[[key1, value1], [key2, value2], ...]`.
244+
245+
AstraPy objects can be configured to always do so automatically, for a seamless
246+
experience.
247+
See the API Option `serdes_options.encode_maps_as_lists_in_tables` for details.
188248

189249
### Usage with HCD and other non-Astra installations
190250

@@ -393,6 +453,25 @@ my_collection.update_one(
393453
my_collection.insert_one({"_id": uuid8()})
394454
```
395455

456+
### Escaping field names
457+
458+
Field names containing special characters (`.` and `&`) must be correctly escaped
459+
in certain Data API commands. It is a responsibility of the user to ensure escaping
460+
is done when needed; however, AstraPy offers utilities to escape sequences of "path
461+
segments" and -- should it ever be needed -- unescape path-strings back into
462+
literal segments:
463+
464+
```python
465+
from astrapy.utils.document_paths import escape_field_names, unescape_field_path
466+
467+
print(escape_field_names("f1", "f2", 12, "g.&3"))
468+
# prints: f1.f2.12.g&.&&3
469+
print(escape_field_names(["f1", "f2", 12, "g.&3"]))
470+
# prints: f1.f2.12.g&.&&3
471+
print(unescape_field_path("a&&&.b.c.d.12"))
472+
# prints: ['a&.b', 'c', 'd', '12']
473+
```
474+
396475
## For contributors
397476

398477
First install poetry with `pip install poetry` and then the project dependencies with `poetry install --with dev`.
@@ -468,6 +547,23 @@ poetry run pytest [...] -o log_cli=0
468547
poetry run pytest [...] -o log_cli=1 --log-cli-level=10
469548
```
470549

550+
### Special tests (2025-03-25, Temporary provisions)
551+
552+
Running special tests taking `find_and_rerank` into account, until dev/prod/local discrepancies resolved.
553+
554+
**Prod** (usual CI) just runs as is and skips f.a.r.r.
555+
556+
**Dev** (manual CI on a hybrid-capable cloud Data API). One must:
557+
558+
1. launch integration tests with `ASTRAPY_TEST_FINDANDRERANK=y`
559+
2. ... but also setting "ASTRAPY_TEST_FINDANDRERANK_SUPPRESS_LEXICAL=y" to suppress actual non-null `"$lexical"` sorts, if not rolled out yet.
560+
561+
**Local** (manual CI on a hybrid-capable locally-running Data API). One must:
562+
563+
1. launch integration tests with `ASTRAPY_TEST_FINDANDRERANK=y`
564+
2. ... but also with `ASTRAPY_FINDANDRERANK_USE_RERANKER_HEADER=y` to pass a reranker API key where needed
565+
3. ... which requires an environment variable `HEADER_RERANKING_API_KEY_NVIDIA` to be set with the `AstraCS:...` dev token.
566+
471567
## Appendices
472568

473569
### Appendix A: quick reference for key imports
@@ -497,12 +593,29 @@ Constants for data-related use:
497593
from astrapy.constants import (
498594
DefaultIdType,
499595
Environment,
596+
MapEncodingMode,
500597
ReturnDocument,
501598
SortMode,
502599
VectorMetric,
503600
)
504601
```
505602

603+
Cursor for find-like operations:
604+
605+
```python
606+
from astrapy.cursors import (
607+
AbstractCursor,
608+
AsyncCollectionFindAndRerankCursor,
609+
AsyncCollectionFindCursor,
610+
AsyncTableFindCursor,
611+
CollectionFindAndRerankCursor,
612+
CollectionFindCursor,
613+
CursorState,
614+
RerankedResult,
615+
TableFindCursor,
616+
)
617+
```
618+
506619
ObjectIds and UUIDs:
507620

508621
```python
@@ -553,8 +666,14 @@ from astrapy.info import (
553666
AlterTableAddVectorize,
554667
AlterTableDropColumns,
555668
AlterTableDropVectorize,
669+
AstraDBAdminDatabaseInfo,
670+
AstraDBDatabaseInfo,
556671
CollectionDefaultIDOptions,
557672
CollectionDefinition,
673+
CollectionDescriptor,
674+
CollectionInfo,
675+
CollectionLexicalOptions,
676+
CollectionRerankOptions,
558677
CollectionVectorOptions,
559678
ColumnType,
560679
CreateTableDefinition,
@@ -563,14 +682,29 @@ from astrapy.info import (
563682
EmbeddingProviderModel,
564683
EmbeddingProviderParameter,
565684
EmbeddingProviderToken,
685+
FindEmbeddingProvidersResult,
686+
FindRerankingProvidersResult,
687+
ListTableDefinition,
688+
ListTableDescriptor,
689+
RerankingProvider,
690+
RerankingProviderAuthentication,
691+
RerankingProviderModel,
692+
RerankingProviderParameter,
693+
RerankingProviderToken,
694+
RerankServiceOptions,
695+
TableAPIIndexSupportDescriptor,
696+
TableAPISupportDescriptor,
566697
TableBaseIndexDefinition,
567698
TableIndexDefinition,
699+
TableIndexDescriptor,
568700
TableIndexOptions,
701+
TableInfo,
569702
TableKeyValuedColumnType,
570703
TableKeyValuedColumnTypeDescriptor,
571704
TablePrimaryKeyDescriptor,
572705
TableScalarColumnTypeDescriptor,
573706
TableUnsupportedColumnTypeDescriptor,
707+
TableUnsupportedIndexDefinition,
574708
TableValuedColumnType,
575709
TableValuedColumnTypeDescriptor,
576710
TableVectorColumnTypeDescriptor,

0 commit comments

Comments
 (0)