Skip to content

Commit 416c84d

Browse files
authored
Minor internal tests fixes/documentation (#64)
* patched case of auth method not present in provider info * updated vectorize_credentials.example.json * updated test spec structure * vectorize_credentials => vectorize_test_spec * fixed couple bugs in test filtering * updated test script helpstring a bit * updated vectorize_test_spec structure in the DEVGUIDE * fixed minor typo in DEVGUIDE * fixed another minor typo in DEVGUIDE * fixed another minor typo in DEVGUIDE * holy crap I can't stop making typos * updated test script helpstring a bit in DEVGUIDE * updated devguide to talk about vectorize whitelist * minor addition to running vectorize tests int he DEVGUIDE * sdafkljdsal;kfjsdaklf * so many commits... * running out of things to say... may be time to use whatthecommit * not like anyone's gonna read these anyways... * pointless limitation * [FIX] asdf * I __ a word
1 parent f0b060c commit 416c84d

File tree

8 files changed

+201
-92
lines changed

8 files changed

+201
-92
lines changed

.env.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ASTRA_RUN_VECTORIZE_TESTS=1
1818
# - where dimension := 'specified' | 'default' | a specific number
1919
# - where authType := 'header' | 'providerKey' | 'none'
2020
# Only needs to match part of the test name to whitelist (use ^$ as necessary)
21-
# VECTORIZE_WHITELIST=^.*@(header|none)@default
21+
# VECTORIZE_WHITELIST=^.*@(header|none)@(default|specified)
2222
VECTORIZE_WHITELIST=.*
2323

2424
# Set this to some value to enable running long-running tests

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,4 +137,4 @@ build.zip
137137
temp
138138
tsdoc-metadata.json
139139

140-
vectorize_credentials.json
140+
vectorize_test_spec.json

DEVGUIDE.md

Lines changed: 46 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@
77
## Running the tests
88
Prerequisites:
99
- A JS package manager (npm, bun, etc.)
10-
- A clean AstraDB instance with two keyspaces—`default_keyspace` and `other_keyspace`
10+
- A clean Data API instance with two keyspaces—`default_keyspace` and `other_keyspace`
1111
- Copy the `.env.example` file and create a new `.env` file following the example template
1212

13+
The library comes with a small custom test script, whose usage is shown below:
14+
1315
```shell
14-
npm run test -- [--all | --light | --coverage | --prerelease] [-f <filter>] [-b] [--args <raw_args>]
16+
npm run test -- [--all | --light | --coverage | --prerelease] [-f <filter>] [-w <vectorize_whitelist>] [-b] [--args <raw_args>]
1517
# or
1618
npm run test -- <--types>
1719
```
@@ -39,19 +41,22 @@ npm run test -- --light -f 'integration.'
3941
npm run test -- --types
4042
```
4143

42-
(bun does not need the extra initial `--` like npm does)
44+
(bun does not need the extra initial `--` like npm does).
4345

4446
### Running the tests on local Stargate
4547
You can do `sh scripts/start-stargate-4-tests.sh` to spin up an ephemeral Data API on DSE instance which automatically
4648
creates the required keyspaces and destroys itself on exit.
4749

48-
Then, be sure to set the following vars in `.env` exactly, then run the tests as usual.
50+
Then, be sure to set the following vars in `.env` exactly.
4951
```dotenv
5052
APPLICATION_URI=http://localhost:8181
5153
APPLICATION_TOKEN=Cassandra:Y2Fzc2FuZHJh:Y2Fzc2FuZHJh
5254
APPLICATION_ENVIRONMENT=dse
5355
```
5456

57+
Once the local Data API instance is ready (you see the output for the created namespaces and everything), you can
58+
run the tests.
59+
5560
### Running tagged tests
5661
Tests can be given certain tags to allow for more granular control over which tests are run. These tags currently include:
5762
- `[long]`/`'LONG'`: Longer running tests that take more than a few seconds to run
@@ -97,38 +102,65 @@ test suite harder to manage.
97102

98103
### Running vectorize tests
99104
To run vectorize tests, you need to have a vectorize-enabled kube running, with the correct tags enabled.
100-
You must create a file, `vectorize_tests.json`, in the root folder, with the following format:
105+
106+
Ensure `ASTRA_RUN_VECTORIZE_TESTS` and `ASTRA_RUN_LONG_TESTS` are enabled as well (or just pass the `--all` flag to
107+
the test script).
108+
109+
Lastly, you must create a file, `vectorize_tests.json`, in the root folder, with the following format:
101110

102111
```ts
103-
interface VectorizeTestSpec {
112+
type VectorizeTestSpec = {
104113
[providerName: string]: {
105-
apiKey?: string,
106-
providerKey?: string,
114+
headers?: {
115+
[header: `x-${string}`]: string,
116+
},
117+
sharedSecret?: {
118+
providerKey?: string,
119+
},
107120
dimension?: {
108121
[modelNameRegex: string]: number,
109122
},
110123
parameters?: {
111-
[modelNameRegex: string]: Record<string, string>
124+
[modelNameRegex: string]: Record<string, string>,
112125
},
113-
}
126+
},
114127
}
115128
```
116129
117130
where:
118131
- `providerName` is the name of the provider (e.g. `nvidia`, `openai`, etc.) as found in `findEmbeddingProviders`.
119-
- `apiKey` is the API key for the provider (which will be passed in through the header) .
132+
- `headers` sets the embedding headers to be used for header auth.
133+
- resolves to an `EmbeddingHeadersProvider` under the hood—throws error if no corresponding one found.
120134
- optional if no header auth test wanted.
121-
- `providerKey` is the provider key for the provider (which will be passed in @ collection creation) .
135+
- `sharedSecret` is the block for KMS auth (isomorphic to `providerKey`, but it's an object for future-compatability).
136+
- `providerKey` is the provider key for the provider (which will be passed in @ collection creation).
122137
- optional if no KMS auth test wanted.
123138
- `parameters` is a mapping of model names to their corresponding parameters. The model name can be some regex that partially matches the full model name.
124139
- `"text-embedding-3-small"`, `"3-small"`, and `".*"` will all match `"text-embedding-3-small"`.
125140
- optional if not required. `azureOpenAI`, for example, will need this.
126-
- `dimension` is a also a mapping of model name regex to their corresponding dimensions, like the `parameters` field.
141+
- `dimension` is also a mapping of model name regex to their corresponding dimensions, like the `parameters` field.
127142
- optional if not required. `huggingfaceDedicated`, for example, will need this.
128143
129144
This file is gitignored by default and will not be checked into VCS.
130145
131-
See `vectorize_credentials.example.json` for—guess what—an example.
146+
See `vectorize_test_spec.example.json` for, guess what, an example.
147+
148+
This spec is cross-referenced with `findEmbeddingProviders` to create a suite of tests branching off each possible
149+
parameter, with tests names of the format `providerName@modelName@authType@dimension`, where each section is another
150+
potential branch.
151+
152+
These branches can be narrowed down with the `VECTORIZE_WHITELIST` env var (or pass `-w <vectorize_whitelist>` to
153+
the test script). It's a regex parameter which only needs to match part of the test name to whitelist (so use `^$` as
154+
necessary).
155+
156+
An example would be `VECTORIZE_WHITELIST=^.*@(header|none)@(default|specified)` to only run the vectorize tests using
157+
the header auth (or no-auth for nvidia), and only using the default/specified version of the dimension, essentially
158+
stopping creating additional branches off of authentication and vector dimension to reduce the number of near-duplicate
159+
tests run.
160+
161+
Defaults to just `*`.
162+
163+
To run *only* the vectorize tests, a common pattern I use is `bun run test --all -f vectorize [-w <vectorize_whitelist>]`.
132164
133165
### Coverage testing
134166

scripts/test.sh

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
# Define necessary commands
44
test_cmd="npx ts-mocha --paths -p tsconfig.json --recursive tests/prelude.test.ts tests/unit tests/integration --extension .test.ts -t 60000"
55

6-
all_tests_cmd="env ASTRA_RUN_LONG_TESTS=1 ASTRA_RUN_ADMIN_TESTS=1 ASTRA_RUN_VECTORIZE_TESTS=1 $test_cmd"
6+
all_tests_cmd="ASTRA_RUN_LONG_TESTS=1 ASTRA_RUN_ADMIN_TESTS=1 ASTRA_RUN_VECTORIZE_TESTS=1 $test_cmd"
77

8-
light_tests_cmd="env ASTRA_RUN_LONG_TESTS=0 ASTRA_RUN_ADMIN_TESTS=0 ASTRA_RUN_VECTORIZE_TESTS=0 $test_cmd"
8+
light_tests_cmd="ASTRA_RUN_LONG_TESTS= ASTRA_RUN_ADMIN_TESTS= ASTRA_RUN_VECTORIZE_TESTS= $test_cmd"
99

1010
run_lint_cmd="npm run lint"
1111

@@ -48,6 +48,10 @@ while [ $# -gt 0 ]; do
4848
"-b")
4949
bail_early=1
5050
;;
51+
"-w")
52+
shift
53+
whitelist="$1"
54+
;;
5155
"--args")
5256
shift
5357
raw_args="$1"
@@ -56,7 +60,7 @@ while [ $# -gt 0 ]; do
5660
echo "Invalid flag $1"
5761
echo ""
5862
echo "Usage:"
59-
echo "npm run test -- [--all | --light | --coverage | --prerelease] [-f <filter>] [-b] [--args <raw_args>]"
63+
echo "npm run test -- [--all | --light | --coverage | --prerelease] [-f <filter>] [-w <vectorize_whitelist>] [-b] [--args <raw_args>]"
6064
echo "or"
6165
echo "npm run test -- <--types>"
6266
exit
@@ -66,8 +70,8 @@ while [ $# -gt 0 ]; do
6670
done
6771

6872
# Ensure the flags are compatible with each other
69-
if [ "$test_type" = '--types' ] && { [ -n "$bail_early" ] || [ -n "$filter" ] || [ -n "$raw_args" ]; }; then
70-
echo "Can't use a filter, bail, or args flag when typechecking"
73+
if [ "$test_type" = '--types' ] && { [ -n "$bail_early" ] || [ -n "$filter" ] || [ -n "$raw_args" ] || [ -n "$whitelist" ]; }; then
74+
echo "Can't use a filter, bail, whitelist, or args flag when typechecking"
7175
exit
7276
fi
7377

@@ -105,5 +109,10 @@ if [ -n "$raw_args" ]; then
105109
cmd_to_run="$cmd_to_run $raw_args"
106110
fi
107111

112+
if [ -n "$whitelist" ]; then
113+
cmd_to_run="VECTORIZE_WHITELIST='$whitelist' $cmd_to_run"
114+
fi
115+
108116
# Run it
117+
echo "$cmd_to_run"
109118
eval "$cmd_to_run"

tests/integration/data-api/vectorize.test.ts

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,12 @@ import {
2929

3030
type VectorizeTestSpec = {
3131
[providerName: string]: {
32-
[header: `x-${string}`]: string,
33-
providerKey?: string,
32+
headers?: {
33+
[header: `x-${string}`]: string,
34+
}
35+
sharedSecret?: {
36+
providerKey?: string,
37+
}
3438
dimension?: {
3539
[modelNameRegex: string]: number,
3640
},
@@ -84,7 +88,7 @@ describe('integration.data-api.vectorize', () => {
8488
});
8589

8690
const initVectorTests = async (db: Db) => {
87-
const spec = JSON.parse(fs.readFileSync('vectorize_credentials.json', 'utf8')) as VectorizeTestSpec;
91+
const spec = JSON.parse(fs.readFileSync('vectorize_test_spec.json', 'utf8')) as VectorizeTestSpec;
8892

8993
const { embeddingProviders } = await (
9094
(ENVIRONMENT === 'astra')
@@ -156,15 +160,15 @@ const branchOnAuth = (spec: VectorizeTestSpec[string], providerInfo: EmbeddingPr
156160

157161
const ehp = resolveHeaderProvider(spec);
158162

159-
if (auth['HEADER'].enabled && ehp) {
163+
if (auth['HEADER']?.enabled && ehp) {
160164
tests.push({ ...test, authType: 'header', header: ehp, testName: `${test.testName}@header` });
161165
}
162166

163-
if (auth['SHARED_SECRET'].enabled && spec.providerKey && ENVIRONMENT === 'astra') {
164-
tests.push({ ...test, authType: 'providerKey', providerKey: spec.providerKey, testName: `${test.testName}@providerKey` });
167+
if (auth['SHARED_SECRET']?.enabled && spec.sharedSecret?.providerKey && ENVIRONMENT === 'astra') {
168+
tests.push({ ...test, authType: 'providerKey', providerKey: spec.sharedSecret?.providerKey, testName: `${test.testName}@providerKey` });
165169
}
166170

167-
if (auth['NONE'].enabled && ENVIRONMENT === 'astra') {
171+
if (auth['NONE']?.enabled && ENVIRONMENT === 'astra') {
168172
tests.push({ ...test, authType: 'none', testName: `${test.testName}@none` });
169173
}
170174

@@ -174,7 +178,7 @@ const branchOnAuth = (spec: VectorizeTestSpec[string], providerInfo: EmbeddingPr
174178
}
175179

176180
const resolveHeaderProvider = (spec: VectorizeTestSpec[string]) => {
177-
const headers = Object.entries(spec).filter(([k]) => k.startsWith('x-')).sort() as [string, string][];
181+
const headers = Object.entries(spec?.headers ?? []).sort();
178182

179183
if (headers.length === 0) {
180184
return null;
@@ -219,7 +223,9 @@ const branchOnDimension = (spec: VectorizeTestSpec[string], modelInfo: Embedding
219223
type VectorizeTest = DimensionBranch;
220224

221225
const createVectorizeProvidersTest = (db: Db, test: VectorizeTest, name: string) => {
222-
it(`[vectorize] [dev] has a working lifecycle (${test.testName})`, async () => {
226+
it(`[vectorize] [long] has a working lifecycle (${test.testName})`, async function () {
227+
assertTestsEnabled(this, 'VECTORIZE', 'LONG');
228+
223229
const collection = await db.createCollection(name, {
224230
vector: {
225231
dimension: test.dimension,
@@ -289,12 +295,14 @@ const createVectorizeProvidersTest = (db: Db, test: VectorizeTest, name: string)
289295
};
290296

291297
const createVectorizeParamTests = function (db: Db, test: VectorizeTest, name: string) {
292-
describe('[vectorize] [dev] $vectorize/vectorize params', () => {
298+
describe('[vectorize] $vectorize/vectorize params', () => {
293299
const collection = db.collection(name, {
294300
embeddingApiKey: test.header,
295301
});
296302

297303
before(async function () {
304+
assertTestsEnabled(this, 'VECTORIZE');
305+
298306
if (!await db.listCollections({ nameOnly: true }).then(cs => cs.some((c) => c === name))) {
299307
this.skip();
300308
}

tests/prelude.test.ts

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,22 @@
1212
// See the License for the specific language governing permissions and
1313
// limitations under the License.
1414

15-
import { DEFAULT_COLLECTION_NAME, initTestObjects, OTHER_NAMESPACE } from '@/tests/fixtures';
15+
import { DEFAULT_COLLECTION_NAME, ENVIRONMENT, initTestObjects, OTHER_NAMESPACE } from '@/tests/fixtures';
16+
import { DEFAULT_NAMESPACE } from '@/src/api';
1617

1718
before(async () => {
1819
const [, db] = await initTestObjects();
1920

21+
const admin = (ENVIRONMENT === 'astra')
22+
? db.admin({ environment: ENVIRONMENT })
23+
: db.admin({ environment: ENVIRONMENT });
24+
25+
const namespaces = await admin.listNamespaces();
26+
27+
if (!namespaces.includes(DEFAULT_NAMESPACE) || !namespaces.includes(OTHER_NAMESPACE)) {
28+
throw new Error(`Missing required namespace(s)... make sure you have both ${DEFAULT_NAMESPACE} and ${OTHER_NAMESPACE}`);
29+
}
30+
2031
await db.createCollection(DEFAULT_COLLECTION_NAME, { vector: { dimension: 5, metric: 'cosine' }, checkExists: false, namespace: OTHER_NAMESPACE })
2132
.then(c => c.deleteMany({}));
2233

vectorize_credentials.example.json

Lines changed: 0 additions & 60 deletions
This file was deleted.

0 commit comments

Comments
 (0)