SCHOL 399: merge etl pipeline QA env tests by bantucaravan · Pull Request #1033 · NYPL/digital-research-books

bantucaravan · 2026-03-30T21:18:22Z

Describe your changes

merge the gh-hosted and self-hosted workflow variants

remove unneeded set up used in ci-self-hosted variant

retained pytest-rerunner usage

junitxml output removed bc it was consumed

upadated hash files call based on docs here: https://docs.github.com/en/actions/reference/workflows-and-actions/expressions#hashfiles

How to test

all tests should pass

previous core functionality worked on the self-hosted runners: see: https://github.com/NYPL/digital-research-books/actions/runs/23767475519/job/69250855487

switched to authentication to IAM roles recommended for use with the self-hosted runner

so I can confirm they run work on the self-hosted runner b4 merging

turns out manual trigger only works if its in the "main" version and by them we will have already triggered the etl-pipeline-ci by main merge

vercel · 2026-03-30T21:18:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
digital-research-books	Ready	Preview, Comment	Mar 31, 2026 2:36pm

bantucaravan · 2026-03-30T21:27:25Z

.github/workflows/etl-pipeline-ci.yaml

+          pip install --prefer-binary -r dev-requirements.txt
+          pip install --prefer-binary -r requirements.txt

-      - name: Run API tests
+      - name: Run API integration tests against QA env
        working-directory: ./etl-pipeline
        run: |
          pytest tests/integration/api --env=qa
+
+      - name: Run (some) functional tests against QA env
+        working-directory: ./etl-pipeline
+        run: |
+          python -m pytest tests/functional/processes/ingest \
+            --reruns=3 \
+            --reruns-delay=5 \
+            --env=qa


we have been only running a subset of functional/integration tests on QA post PR merge into main and post deployment of the new main HEAD to QA.

I guess this makes sense because all tests (unit, functional, QA) passed against the local env in the PR, and so maybe we only need to run test on code that is now running in QA infra not local infra. and we do not expect there to be differerences btw QA infra an local infra for tests that rely on DB data fixtures, etc...

However, I do wonder if we should be running all tests against QA?

@jdohan
@alea12

This raises a good point due to having to seed the Postgres DB accordingly for some desired set of books in the vector DB—which I'm currently building a test dataset for in a dedicated testing namespace—when running tests on local builds (#1025 achieves success with this for VRA integration tests), but considering we're using QA as our live environment for VRA, I think it's still best to run the tests in etl-pipeline-tests.yaml prior to merge against local builds with respect to defect prevention.

We can run the entire test suite after deploy, but really I think targeted checks for critical paths are necessary once they've passed in local builds.

As I think further about it, a restructuring of the test suite soon would be beneficial so that tests scoped strictly to legacy DRB functionality get separated from VRA tests. Since code changes for VRA (i.e. every code change being made now) have little risk of impacting legacy functionality (correct me if I'm wrong), we could aim to omit those from checks on PRs targeting main and run them only before deploying to prod.

Top-level separation within the test suite would enable straightforward targeting of the two sets of tests, e.g.:

etl-pipeline/ └── tests/ ├── drb/ │ ├── unit/ # Isolated logic for legacy DRB components │ ├── integration/ # Database/API interactions for legacy flows │ └── functional/ # End-to-end legacy pipeline validation └── vra/ ├── unit/ # Core logic for new VRA features ├── integration/ # Interaction between agent and search, DBs, API tests └── functional/ # End-to-end tests for VRA backend

There is likely some overlap in functional areas like ingest, which we'd have to account for when separating them in this manner.

bantucaravan · 2026-03-30T21:31:08Z

.github/workflows/etl-pipeline-ci.yaml

+            --reruns=3 \
+            --reruns-delay=5 \


Are these really necessary? these are only used here

Unless we have known flaky tests, reruns are just a waste of time. Maybe 1, if anything.

Do note however that I'm still not yet totally acquainted with the entire backend test suite, given its breadth—particularly, the nuances of targeting different environments.

based on https://docs.github.com/en/actions/reference/workflows-and-actions/expressions#hashfiles

bantucaravan added 6 commits March 27, 2026 12:49

set up self-hosted with needed IAM role auth

73aa94e

switched to authentication to IAM roles recommended for use with the self-hosted runner

adding space to trigger etl-pipeline-full-ci-cd.yaml on main merge

062fac2

limiting playwriting install to headless chromium

e2822d7

temp add manual trigger to etl-pipeline tests

f130193

so I can confirm they run work on the self-hosted runner b4 merging

remove manual trigger

161807c

turns out manual trigger only works if its in the "main" version and by them we will have already triggered the etl-pipeline-ci by main merge

merge GH-hosted and self-hosted workflow files

2258c3d

vercel bot deployed to Preview March 30, 2026 21:18 View deployment

Merge branch 'main' into SCHOL-399/qa-tests-to-self-hosted

8bd3973

vercel bot deployed to Preview March 30, 2026 21:22 View deployment

bantucaravan commented Mar 30, 2026

View reviewed changes

add dev dep

1a096c6

bantucaravan commented Mar 30, 2026

View reviewed changes

vercel bot deployed to Preview March 30, 2026 21:32 View deployment

update reqs cache key format

f0a0b70

based on https://docs.github.com/en/actions/reference/workflows-and-actions/expressions#hashfiles

vercel bot deployed to Preview March 30, 2026 21:49 View deployment

bantucaravan mentioned this pull request Mar 30, 2026

SCHOL-422: stochastic process tests #1034

Open

bantucaravan requested review from alea12, jackiequach, jdohan and kunalnabar March 31, 2026 14:21

update readmes that QA API only on subnet

fd94f6b

vercel bot deployed to Preview March 31, 2026 14:36 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCHOL 399: merge etl pipeline QA env tests#1033

SCHOL 399: merge etl pipeline QA env tests#1033
bantucaravan wants to merge 10 commits intomainfrom
SCHOL-399/qa-tests-to-self-hosted

bantucaravan commented Mar 30, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

bantucaravan Mar 30, 2026

Uh oh!

jdohan Mar 31, 2026 •

edited

Loading

Uh oh!

jdohan Mar 31, 2026 •

edited

Loading

Uh oh!

jdohan Mar 31, 2026 •

edited

Loading

Uh oh!

bantucaravan Mar 30, 2026

Uh oh!

bantucaravan Mar 30, 2026

Uh oh!

jdohan Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bantucaravan commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

How to test

Uh oh!

vercel bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bantucaravan Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jdohan Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdohan Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdohan Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bantucaravan Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

bantucaravan Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jdohan Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bantucaravan commented Mar 30, 2026 •

edited

Loading

vercel bot commented Mar 30, 2026 •

edited

Loading

jdohan Mar 31, 2026 •

edited

Loading

jdohan Mar 31, 2026 •

edited

Loading

jdohan Mar 31, 2026 •

edited

Loading

jdohan Mar 31, 2026 •

edited

Loading