Skip to content

Conversation

@Jorres
Copy link
Contributor

@Jorres Jorres commented Dec 4, 2024

Compatibility tests

Here is an overall pipeline:

  • each time the new tag is created, a set of compatibility tests runs automatically:
    • determines pair of [A; B], where A - previous minor version with latest patch, B - newly created tag
    • installs operator from dockerhub + charts.ydb.tech, image/chart version A
    • deploys storage and database
    • upgrades CRDs to version B, upgrades operator to version B (with operator deletion, just to be sure)
    • checks that Storage/Database objects were not deleted
    • restarts Storage, then Database (for simplicity here, not via a rolling restart. Just with a sufficient timeout between killing pods)
    • finally, two important checks:
      • Storage manifest can still can be applied (i.e. CRD is not broken)
      • cluster can serve load: table is created, written into, read from and dropped successfully

How to review this PR

  • basically, just review two files: the compatibility_suite_test.go file and the .github/workflows/compatibility-tests.yaml, all other diffs are just minor refactoring when various artifacts (YDB configuration templates) from e2e folder were renamed\moved around
  • A new tests folder was created instead of e2e on the top level. Inside now live compat tests, e2e tests and shared test utilities
  • a simple SELECT 1; check has been replaced with creating a table \ writing to a table even in e2e tests. This required to communicate with database nodes with TLS. So I refactored our certificates a little bit: now Storage and Database have separate certificates, and I also added a script + readme on how the test certificates are created (for future reference, in case we need to use more self-signed certs in tests).

What is planned, but NOT implemented yet

  • if a field became deprecated in version A, it is not yet checked that the field is still servable on version B
  • we really should restart pods using some sort of rolling restart, not just with a timeout. Right now the probability is low, but this test MAY flap (I haven't observed it yet) if a restarted pod does not manage to replicate its state in 2 minutes before a new pod is restarted.

@Jorres Jorres requested review from kobzonega and nikitka December 4, 2024 15:22
@Jorres Jorres merged commit 51aa80a into master Dec 27, 2024
3 checks passed
@Jorres Jorres deleted the add-compatibility-tests branch December 27, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants