Skip to content

Commit 6522b9e

Browse files
authored
Merge pull request #90 from datastax/KG-v2.0
v2.0.0
2 parents c97fb57 + c77cf6f commit 6522b9e

File tree

478 files changed

+52963
-18676
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

478 files changed

+52963
-18676
lines changed

.env.example

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,3 @@ CLIENT_DB_TOKEN=<token>
1010

1111
# Backend for the Data API (astra | dse | hcd | cassandra | other). Defaults to 'astra'.
1212
# CLIENT_DB_ENVIRONMENT=<env>
13-
14-
# Uncomment to enable running all (or specific) types of test by default
15-
# CLIENT_RUN_VECTORIZE_TESTS=1
16-
# CLIENT_RUN_LONG_TESTS=1
17-
# CLIENT_RUN_ADMIN_TESTS=1

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,5 +140,8 @@ tsdoc-metadata.json
140140
vectorize_test_spec.json
141141
etc/test-reports/
142142
etc/playgrounds/
143+
tmp-lib-check
144+
!examples/astra-db-ts.tgz
143145

144146
.direnv
147+
.mocha

DEVGUIDE.md

Lines changed: 9 additions & 298 deletions
Original file line numberDiff line numberDiff line change
@@ -1,306 +1,17 @@
1-
# DEVGUIDE.md
1+
# DEVGUIDE.md (WIP)
2+
3+
> **Note: this DEVGUIDE is under construction and is not complete yet; see `scripts/docs` for documentation on each of the available scripts.**
24
35
## Contents
4-
1. [Running the tests](#running-the-tests)
5-
1. [Prerequisites](#prerequisites)
6-
2. [I can't be bothered to read all of this](#i-cant-be-bothered-to-read-all-of-this)
7-
3. [The custom test script](#the-custom-test-script)
8-
4. [Test tags](#test-tags)
9-
5. [Running vectorize tests](#running-vectorize-tests)
10-
6. [Running the tests on local Stargate](#running-the-tests-on-local-stargate)
11-
7. [The custom Mocha wrapper](#the-custom-mocha-wrapper)
12-
2. [Typechecking & Linting](#typechecking--linting)
13-
3. [Building the library](#building-the-library)
14-
4. [Publishing](#publishing)
15-
5. [Miscellaneous](#miscellaneous)
6+
1. [I can't be bothered to read all of this](#i-cant-be-bothered-to-read-all-of-this)
7+
2. [Building the library](#building-the-library)
8+
3. [Publishing](#publishing)
9+
4. [Miscellaneous](#miscellaneous)
1610
1. [nix-shell + direnv support](#nix-shell--direnv-support)
1711

18-
## Running the tests
19-
20-
### Prerequisites
21-
22-
- `npm`/`npx`
23-
- A running Data API instance
24-
- A `.env` with the credentials filled out
25-
26-
<sub>*DISCLAIMER: The test suite will create any necessary namespaces/collections, and any existing collections in
27-
the database will be deleted.*</sub>
28-
29-
<sub>*Also, if you for some reason already have an existing namespace called 'slania', it too will be deleted. Not
30-
sure why you'd have a namespace named that, but if you do, you have a good taste in music.*</sub>
31-
32-
### I can't be bothered to read all of this
33-
34-
1. Just make sure `CLIENT_DB_URL` and `CLIENT_DB_TOKEN` are set in your `.env` file
35-
2. If you're running the full test suite, copy `vectorize_test_spec.example.json`, fill out the providers you want
36-
to test, and delete the rest
37-
3. Run one of the following commands:
38-
39-
```sh
40-
# Add '-e dse' or '-e hcd' to the command if running on either of those
41-
42-
# Runs the full test suite (~10m)
43-
sh scripts/test.sh -all # -e dse|hcd
44-
45-
# Runs a version of the test suite that omits all longer-running tests (~2m)
46-
sh scripts/test.sh -light # -e dse|hcd
47-
```
48-
49-
### The custom test script
50-
51-
The `astra-db-ts` test suite uses a custom wrapper around [ts-mocha](https://www.npmjs.com/package/ts-mocha), including
52-
its own custom test script.
53-
54-
While this undeniably adds in extra complexity and getting-started overhead, you can read the complete rationale as to
55-
why [here](https://github.com/datastax/astra-db-ts/pull/66#issue-2430902926), but TL;DR:
56-
- We sped up the complete test suite by 500%
57-
- We improved the test filtering capabilities
58-
- We made it easier to write and work with `astra-db-ts`-esque tests
59-
60-
The API for the test script is as the following:
61-
62-
```sh
63-
1. scripts/test.sh
64-
2. [-all | -light | -coverage]
65-
3. [-fand | -for] [-f/F <filter>]+ [-g/G <regex>]+
66-
4. [-w/W <vectorize_whitelist>]
67-
5. [-b | -bail]
68-
6. [-R | -no-report]
69-
7. [-c <http_client>]
70-
8. [-e <environment>]
71-
```
72-
73-
#### 1. The test file (`scripts/test.sh`)
74-
75-
While you can use `npm run test` or `bun run test` if you so desire, attempting to use the test script's flags with it
76-
may be a bit iffy, as the inputs are first "de-quoted" (evaluated) when you use the shell command, but they're
77-
"de-quoted" again when the package manager runs the actual shell command.
78-
79-
Just use `scripts/test.sh` (or `sh scripts/test.sh`) directly if you're using command-line flags and want to
80-
avoid a headache.
81-
82-
#### 2. The test types (`[-all | -light | -coverage]`)
83-
84-
There are three main test types:
85-
- `-all`: This is a shorthand for running enabling the `(LONG)`, `(ADMIN)`, and `(VECTORIZE)` tests (alongside all the normal tests that always run)
86-
- `-light`: This is a shorthand for disabling the aforementioned tests. This runs only the normal tests, which are much quicker to run in comparison
87-
- `-coverage`: This runs all tests, but uses `nyc` to test for coverage statistics. Enabled the `-b` (bail) flag, as no point continuing if a test fails
88-
89-
By default, just running `scripts/test.sh` will be like using `-light`, but you can set the default config for which tests
90-
to enable in your `.env` file, through the `CLIENT_RUN_*_TESTS` env vars.
91-
92-
#### 3. The test filters (`[-fand | -for] [-f/F <filter>]+ [-g/G <regex>]+`)
93-
94-
The `astra-db-ts` test suite implements fully custom test filtering, inspired by Mocha's, but improved upon.
95-
96-
You can add a basic filter using `-f <filter>` which acts like Mocha's own `-f` flag. Like Mocha, we also support `-g`,
97-
which is like `-f`, but for regex. Each only needs to match a part of the test name (or its parent describes' names) to
98-
succeed, so use `^$` as necessary.
99-
100-
Unlike Mocha, there is no `-i` flag—instead, you can invert a filter by using `-F <filter>` or `-G <regex>`, so that the
101-
test needs to NOT match that string/regex to run.
102-
103-
You can also use multiple filters by simply using multiple of `-f`, `-g`, `-F`, and `-G` as you please. By default,
104-
it'll only run a test if it satisfies all the filters (`-fand`), but you can use the `-for` flag to run a test if
105-
it satisfies any one of the filters.
106-
107-
In case filters overlap, an inverted filter always wins over a regular filter, and the conflicted test won't run.
108-
109-
#### 4. The vectorize whitelist (`[-w/W <vectorize_whitelist>]`)
110-
111-
There's a special filtering system just for vectorize tests, called the "vectorize whitelist", of which there are two
112-
different types: either a piece of regex, or a special filter operator.
113-
114-
##### Regex filtering
115-
116-
Every vectorize test is given a test name representing every branch it took to become that specific test. It is
117-
of the following format:
118-
119-
```sh
120-
# providerName@modelName@authType@dimension
121-
# where dimension := 'specified' | 'default' | <some_number>
122-
# where authType := 'header' | 'providerKey' | 'none'
123-
```
124-
125-
Again, the regex only needs to match part of each test's name to succeed, so use `^$` as necessary.
126-
127-
##### Filter operators
128-
129-
The vectorize test suite also defines some custom "filter operators" to provide filtering that can't be done through
130-
basic regex. They come of the format `-w $<operator>:<colon_separated_args>`
131-
132-
1. `$limit:<number>` - This is a limit over the total number of vectorize tests, only running up to the specified amount
133-
2. `$provider-limit:<number>` - This limits the amount of vectorize tests that can be run per provider
134-
3. `$model-limit:<number>` - Akin to the above, but limits per model.
135-
136-
The default whitelist is `$limit-per-model:1`.
137-
138-
#### 5. Bailing (`[-b | -bail]`)
139-
140-
Simply sets the bail flag, as it does in Mocha. Forces the test script to exit after a single test failure.
141-
142-
#### 6. Disabling error reporting (`[-R | -no-report]`)
143-
144-
By default, the test suite logs the complete error objects of any that may've been thrown during your tests to the
145-
`./etc/test-reports` directory for greatest debuggability. However, this can be disabled for a test run using the
146-
`-R`/`-no-report` flag.
147-
148-
#### 7. The HTTP client (`[-c <http_client>]`)
149-
150-
By default, `astra-db-ts` will run its tests on `fetch-h2` using `HTTP/2`, but you can specify a specific client, which
151-
is one of `default:http1`, `default:http2`, or `fetch`.
152-
153-
#### 8. The Data API environment (`[-e <environment>]`)
154-
155-
By default, `astra-db-ts` assumes you're running on Astra, but you can specify the Data API environment through this
156-
flag. It should be one of `dse`, `hcd`, `cassandra`, or `other`. You can also provide `astra`, but it wouldn't really
157-
do anything. But I'm not the boss of you; you can make your own big-boy/girl/other decisions.
158-
159-
### Test tags
160-
161-
The `astra-db-ts` test suite uses the concept of "test tags" to further advance test filtering. These are tags in
162-
the names of test blocks, such as `(LONG) createCollection tests` or `(ADMIN) (ASTRA) AstraAdmin tests`.
163-
164-
These tags are automatically parsed and filtered through the custom wrapper our test suite uses, though
165-
you can still interact with them through test filters as well. For example, I commonly use `-f VECTORIZE` to
166-
only run the vectorize tests.
167-
168-
Current tags include:
169-
- `VECTORIZE` - Enabled if `CLIENT_RUN_VECTORIZE_TESTS` is set (or `-all` is set)
170-
- `LONG` - Enabled if `CLIENT_RUN_LONG_TESTS` is set (or `-all` is set)
171-
- `ADMIN` - Enabled if `CLIENT_RUN_ADMIN_TESTS` is set (or `-all` is set)
172-
- `DEV` - Automatically enabled if running on Astra-dev
173-
- `NOT-DEV` - Automatically enabled if not running on Astra-dev
174-
- `ASTRA` - Automatically enabled if running on Astra
175-
176-
Attempting to set any other test tag will throw an error. (All test tags must contain only uppercase letters &
177-
hyphens—any tag not matching `\([A-Za]+?\)` will not be counted.)
178-
179-
### Running vectorize tests
180-
181-
To run vectorize tests, you need to have a vectorize-enabled kube running, with the correct tags enabled.
182-
183-
Ensure `CLIENT_RUN_VECTORIZE_TESTS` and `CLIENT_RUN_LONG_TESTS` are enabled as well (or just pass the `-all` flag to
184-
the test script).
185-
186-
Lastly, you must create a file, `vectorize_tests.json`, in the root folder, with the following format:
187-
188-
```ts
189-
type VectorizeTestSpec = {
190-
[providerName: string]: {
191-
headers?: {
192-
[header: `x-${string}`]: string,
193-
},
194-
sharedSecret?: {
195-
providerKey?: string,
196-
},
197-
dimension?: {
198-
[modelNameRegex: string]: number,
199-
},
200-
parameters?: {
201-
[modelNameRegex: string]: Record<string, string>,
202-
},
203-
warmupErr?: string,
204-
},
205-
}
206-
```
207-
208-
where:
209-
- `providerName` is the name of the provider (e.g. `nvidia`, `openai`, etc.) as found in `findEmbeddingProviders`.
210-
- `headers` sets the embedding headers to be used for header auth.
211-
- resolves to an `EmbeddingHeadersProvider` under the hood—throws error if no corresponding one found.
212-
- optional if no header auth test wanted.
213-
- `sharedSecret` is the block for KMS auth (isomorphic to `providerKey`, but it's an object for future-compatability).
214-
- `providerKey` is the provider key for the provider (which will be passed in @ collection creation).
215-
- optional if no KMS auth test wanted.
216-
- `parameters` is a mapping of model names to their corresponding parameters. The model name can be some regex that partially matches the full model name.
217-
- `"text-embedding-3-small"`, `"3-small"`, and `".*"` will all match `"text-embedding-3-small"`.
218-
- optional if not required. `azureOpenAI`, for example, will need this.
219-
- `dimension` is also a mapping of model name regex to their corresponding dimensions, like the `parameters` field.
220-
- optional if not required. `huggingfaceDedicated`, for example, will need this.
221-
- `warmupErr` may be set if the provider errors on a cold start
222-
- if set, the provider will be called in a `while (true)` loop until it stops throwing an error matching this message
223-
224-
This file is .gitignore-d by default and will not be checked into VCS.
225-
226-
See `vectorize_test_spec.example.json` for, guess what, an example.
227-
228-
This spec is cross-referenced with `findEmbeddingProviders` to create a suite of tests branching off each possible
229-
parameter, with tests names of the format `providerName@modelName@authType@dimension`, where each section is another
230-
potential branch.
231-
232-
To run *only* the vectorize tests, a common pattern I use is `scripts/test.sh -all -f VECTORIZE [-w <vectorize_whitelist>]`.
233-
234-
### Running the tests on local Stargate
235-
In another terminal tab, you can do `sh scripts/start-stargate-4-tests.sh` to spin up an ephemeral Data API on DSE
236-
instance which will destroy itself on script exit. The test suite will set up any keyspaces/collections as necessary.
237-
238-
Then, be sure to set the following vars in `.env` exactly.
239-
```dotenv
240-
CLIENT_DB_URL=http://localhost:8181
241-
CLIENT_DB_TOKEN=Cassandra:Y2Fzc2FuZHJh:Y2Fzc2FuZHJh
242-
CLIENT_DB_ENVIRONMENT=dse
243-
```
244-
245-
Once the local Data API instance is fully started and ready for requests, you can run the tests.
246-
247-
### The custom Mocha wrapper
248-
249-
The `astra-db-ts` test suite is massively IO-bound, and desires a more advanced test filtering system than
250-
Mocha provides by default. As such, we have written a (relatively) light custom wrapper around Mocha, extending
251-
it to allow us to squeeze all possible performance out of our tests, and make it easier to write, scale, and work
252-
with tests in both the present, and the future.
253-
254-
#### The custom test functions
255-
256-
The most prominent changes are the introduction of 5 new Mocha-API-esque functions (two of which are overhauls)
257-
- [`describe`](https://github.com/datastax/astra-db-ts/blob/60fa445192b6a648b7a139a45986af8525a37ffb/tests/testlib/describe.ts) - An overhaul to the existing `dynamic` block
258-
- Provides fresh instances of the "common fixtures" in its callback
259-
- Performs "tag filtering" on the suite names
260-
- Some suite options to reduce boilerplate
261-
- `truncateColls: 'default'` - Does `deleteMany({})` on the default collection in the default namespace after each test case
262-
- `truncateColls: 'both'` - Does `deleteMany({})` on the default collection in both test namespaces after each test case
263-
- `dropEphemeral: 'after'` - Drops all non-default collections in both test namespaces after all the test cases in the suite
264-
- `dropEphemeral: 'afterEach'` - Drops all non-default collections in both test namespaces each test case
265-
- [`it`](https://github.com/datastax/astra-db-ts/blob/60fa445192b6a648b7a139a45986af8525a37ffb/tests/testlib/it.ts) - An overhaul to the existing `it` block
266-
- Performs "tag filtering" on the test names
267-
- Provides unique string keys for every test case
268-
- [`parallel`](https://github.com/datastax/astra-db-ts/blob/60fa445192b6a648b7a139a45986af8525a37ffb/tests/testlib/parallel.ts) - A wrapper around `describe` which runs all of its test cases in parallel
269-
- Only allows `it`, `before`, `after`, and a single layer of `describe` functions
270-
- Will run all tests simultaneously in a `before` hook, capture any exceptions, and rethrow them in reconstructed `it`/`describe` blocks for the most native-like behavior
271-
- Performs tag and test filtering as normal
272-
- Nearly all integration tests have been made parallel
273-
- [`background`](https://github.com/datastax/astra-db-ts/blob/60fa445192b6a648b7a139a45986af8525a37ffb/tests/testlib/background.ts) - A version of `describe` which runs in the background while all the other test cases run
274-
- Only allows `it` blocks
275-
- Will run the test at the very start of the test script, capture any exceptions, and rethrow them in reconstructed `it`/`describe` blocks for the most native-like behavior at the end of the test script
276-
- Performs tag and test filtering as normal
277-
- Meant for independent tests that take a very long time to execute (such as the `integration.devops.db-admin` lifecycle test)
278-
279-
These are not globals like Mocha's—rather, they are imported, like so:
280-
```ts
281-
import { background, describe, it, parallel } from '@/tests/testlib';
282-
```
283-
284-
#### Examples
285-
286-
You can find examples of usages of each in most, if not all, test files, such as:
287-
- [`/tests/integration/miscs/timeouts.test.ts`](https://github.com/datastax/astra-db-ts/blob/60fa445192b6a648b7a139a45986af8525a37ffb/tests/integration/misc/timeouts.test.ts) (`describe`, `parallel`, `it`)
288-
- [`/tests/integration/devops/lifecycle.test.ts`](https://github.com/datastax/astra-db-ts/blob/60fa445192b6a648b7a139a45986af8525a37ffb/tests/integration/devops/lifecycle.test.ts) (`background`)
289-
290-
## Typechecking & Linting
291-
292-
The test script also provides typechecking and linting through the following commands:
293-
294-
```sh
295-
# Full typechecking
296-
scripts/test.sh -tc
297-
298-
# Linting
299-
scripts/test.sh -lint
12+
## I can't be bothered to read all of this
30013

301-
# Or even both
302-
scripts/test.sh -lint -tc
303-
```
14+
yeah, fair enough.
30415

30516
## Building the library
30617

0 commit comments

Comments
 (0)