|
7 | 7 | ## Running the tests
|
8 | 8 | Prerequisites:
|
9 | 9 | - A JS package manager (npm, bun, etc.)
|
10 |
| -- A clean AstraDB instance with two keyspaces—`default_keyspace` and `other_keyspace` |
| 10 | +- A clean Data API instance with two keyspaces—`default_keyspace` and `other_keyspace` |
11 | 11 | - Copy the `.env.example` file and create a new `.env` file following the example template
|
12 | 12 |
|
| 13 | +The library comes with a small custom test script, whose usage is shown below: |
| 14 | + |
13 | 15 | ```shell
|
14 |
| -npm run test -- [--all | --light | --coverage | --prerelease] [-f <filter>] [-b] [--args <raw_args>] |
| 16 | +npm run test -- [--all | --light | --coverage | --prerelease] [-f <filter>] [-w <vectorize_whitelist>] [-b] [--args <raw_args>] |
15 | 17 | # or
|
16 | 18 | npm run test -- <--types>
|
17 | 19 | ```
|
@@ -39,19 +41,22 @@ npm run test -- --light -f 'integration.'
|
39 | 41 | npm run test -- --types
|
40 | 42 | ```
|
41 | 43 |
|
42 |
| -(bun does not need the extra initial `--` like npm does) |
| 44 | +(bun does not need the extra initial `--` like npm does). |
43 | 45 |
|
44 | 46 | ### Running the tests on local Stargate
|
45 | 47 | You can do `sh scripts/start-stargate-4-tests.sh` to spin up an ephemeral Data API on DSE instance which automatically
|
46 | 48 | creates the required keyspaces and destroys itself on exit.
|
47 | 49 |
|
48 |
| -Then, be sure to set the following vars in `.env` exactly, then run the tests as usual. |
| 50 | +Then, be sure to set the following vars in `.env` exactly. |
49 | 51 | ```dotenv
|
50 | 52 | APPLICATION_URI=http://localhost:8181
|
51 | 53 | APPLICATION_TOKEN=Cassandra:Y2Fzc2FuZHJh:Y2Fzc2FuZHJh
|
52 | 54 | APPLICATION_ENVIRONMENT=dse
|
53 | 55 | ```
|
54 | 56 |
|
| 57 | +Once the local Data API instance is ready (you see the output for the created namespaces and everything), you can |
| 58 | +run the tests. |
| 59 | + |
55 | 60 | ### Running tagged tests
|
56 | 61 | Tests can be given certain tags to allow for more granular control over which tests are run. These tags currently include:
|
57 | 62 | - `[long]`/`'LONG'`: Longer running tests that take more than a few seconds to run
|
@@ -97,38 +102,65 @@ test suite harder to manage.
|
97 | 102 |
|
98 | 103 | ### Running vectorize tests
|
99 | 104 | To run vectorize tests, you need to have a vectorize-enabled kube running, with the correct tags enabled.
|
100 |
| -You must create a file, `vectorize_tests.json`, in the root folder, with the following format: |
| 105 | + |
| 106 | +Ensure `ASTRA_RUN_VECTORIZE_TESTS` and `ASTRA_RUN_LONG_TESTS` are enabled as well (or just pass the `--all` flag to |
| 107 | +the test script). |
| 108 | + |
| 109 | +Lastly, you must create a file, `vectorize_tests.json`, in the root folder, with the following format: |
101 | 110 |
|
102 | 111 | ```ts
|
103 |
| -interface VectorizeTestSpec { |
| 112 | +type VectorizeTestSpec = { |
104 | 113 | [providerName: string]: {
|
105 |
| - apiKey?: string, |
106 |
| - providerKey?: string, |
| 114 | + headers?: { |
| 115 | + [header: `x-${string}`]: string, |
| 116 | + }, |
| 117 | + sharedSecret?: { |
| 118 | + providerKey?: string, |
| 119 | + }, |
107 | 120 | dimension?: {
|
108 | 121 | [modelNameRegex: string]: number,
|
109 | 122 | },
|
110 | 123 | parameters?: {
|
111 |
| - [modelNameRegex: string]: Record<string, string> |
| 124 | + [modelNameRegex: string]: Record<string, string>, |
112 | 125 | },
|
113 |
| - } |
| 126 | + }, |
114 | 127 | }
|
115 | 128 | ```
|
116 | 129 |
|
117 | 130 | where:
|
118 | 131 | - `providerName` is the name of the provider (e.g. `nvidia`, `openai`, etc.) as found in `findEmbeddingProviders`.
|
119 |
| -- `apiKey` is the API key for the provider (which will be passed in through the header) . |
| 132 | +- `headers` sets the embedding headers to be used for header auth. |
| 133 | + - resolves to an `EmbeddingHeadersProvider` under the hood—throws error if no corresponding one found. |
120 | 134 | - optional if no header auth test wanted.
|
121 |
| -- `providerKey` is the provider key for the provider (which will be passed in @ collection creation) . |
| 135 | +- `sharedSecret` is the block for KMS auth (isomorphic to `providerKey`, but it's an object for future-compatability). |
| 136 | + - `providerKey` is the provider key for the provider (which will be passed in @ collection creation). |
122 | 137 | - optional if no KMS auth test wanted.
|
123 | 138 | - `parameters` is a mapping of model names to their corresponding parameters. The model name can be some regex that partially matches the full model name.
|
124 | 139 | - `"text-embedding-3-small"`, `"3-small"`, and `".*"` will all match `"text-embedding-3-small"`.
|
125 | 140 | - optional if not required. `azureOpenAI`, for example, will need this.
|
126 |
| -- `dimension` is a also a mapping of model name regex to their corresponding dimensions, like the `parameters` field. |
| 141 | +- `dimension` is also a mapping of model name regex to their corresponding dimensions, like the `parameters` field. |
127 | 142 | - optional if not required. `huggingfaceDedicated`, for example, will need this.
|
128 | 143 |
|
129 | 144 | This file is gitignored by default and will not be checked into VCS.
|
130 | 145 |
|
131 |
| -See `vectorize_credentials.example.json` for—guess what—an example. |
| 146 | +See `vectorize_test_spec.example.json` for, guess what, an example. |
| 147 | +
|
| 148 | +This spec is cross-referenced with `findEmbeddingProviders` to create a suite of tests branching off each possible |
| 149 | +parameter, with tests names of the format `providerName@modelName@authType@dimension`, where each section is another |
| 150 | +potential branch. |
| 151 | +
|
| 152 | +These branches can be narrowed down with the `VECTORIZE_WHITELIST` env var (or pass `-w <vectorize_whitelist>` to |
| 153 | +the test script). It's a regex parameter which only needs to match part of the test name to whitelist (so use `^$` as |
| 154 | +necessary). |
| 155 | +
|
| 156 | +An example would be `VECTORIZE_WHITELIST=^.*@(header|none)@(default|specified)` to only run the vectorize tests using |
| 157 | +the header auth (or no-auth for nvidia), and only using the default/specified version of the dimension, essentially |
| 158 | +stopping creating additional branches off of authentication and vector dimension to reduce the number of near-duplicate |
| 159 | +tests run. |
| 160 | +
|
| 161 | +Defaults to just `*`. |
| 162 | +
|
| 163 | +To run *only* the vectorize tests, a common pattern I use is `bun run test --all -f vectorize [-w <vectorize_whitelist>]`. |
132 | 164 |
|
133 | 165 | ### Coverage testing
|
134 | 166 |
|
|
0 commit comments