SCHOL-273: Run /chat API tests in local builds#1025
Open
Conversation
ONLY for backend/etl-pipeline!
- allow refresh of tokens in docker container - remove dummy aws env vars - add google api key fo docker env - update create vra user/password fixture
…o NOREF/aws-auth-docker-compose
- document - improve error clarity frbrized_record_data - read post ETL pipeline values by record.source_id to prevent confusion is cases where the same record data inserted multiple times into the DB resulting in multiple Records with the same source_id
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
05398e3 to
34b9053
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes summary
Enables running tests for the /chat API endpoint against a fresh local dockerized build by introducing two scripts to generate and seed the local Postgres database with FRBR graph data associated with two known editions.
Only two editions were chosen as of right now since they're always returned for the prompt currently used in the corresponding /chat smoke tests. Even though /chat responses are non-deterministic, follow-up work that wires hybrid search to a dedicated testing namespace in the vector DB for builds in CI (and local, if desired) will minimize the potential for test flakiness.
Tests supported
The /chat API tests—while they exist in this feature branch—were not directly added to it and they do not yet exist in main (see #947) due to setup blockers resolved by NOREF/aws-auth-docker-compose, which pulled in the tests along with the crucial setup. This branch was merged with that one, therefore this PR includes the tests themselves and the ability to run them in CI.
Fetching data for the local Postgres DB
The script that handles generating the seed data has been committed to the dev-scripts folder along with these changes so that it can be used continually to add more as the test suite scales. If a different location for this script is desired (e.g. within the test suite), please indicate so.
Question:
Is there anything sensitive in the current seed data file? If so, it can instead be generated fresh each time in CI then cleaned up after, however that introduces an external dependency during test setup (i.e. reading from the production Postgres DB) along with increased run times.
Tests workflow
A new step has been added to the respective workflow to handle local seeding before the VRA integration tests are run.
Question:
Is there a different ordering desired for this step?
Re: ordering
The functional test run in CI was relocated so that it's run after integration tests due to a failure that was occurring which ended the job before the impact of the changes within immediate scope could be witnessed in CI, however that failure is no longer occurring in the latest commit (unsure why, but it isn't).
Is there any objection to keeping the test run ordering this way?
Functional tests are usually run after integration tests in a typical layered testing approach, but there is value in running them first to fail faster in CI if there is something awry.
Documentation
The README has been updated to include instructions on using this new alternate method of seeding a local DB.
How to test
Start by running
make down-cleanfrom etl-pipeline/ to wipe any existing Docker containers, networks, volumes, etc., since the goal of these changes is to ensure VRA non-unit tests can run in CI against freshly-built local dockerized services.Then run the commands given in the docstring comment at the top of seed_frbr_data.py followed by
make intregration-vra(defaults to local without specifyingENVIRONMENT) while local dockerized services are up.Ensure all tests pass locally and in CI.
Perhaps more importantly, inspect the /chat response yielded from the test query to verify it is well-formed. Reviewers added to this PR are more familiar with that response than I am currently. Note that while logs may indicate hits on editions other than the two with associated FRBR data, only those two are returned as results.