Skip to content

Conversation

@beautifulentropy
Copy link
Member

@beautifulentropy beautifulentropy commented Oct 29, 2025

The original plan for getting the Vitess infrastructure running was to use vttestserver as a starting point to reach a minimum viable setup. However, vttestserver didn’t work out because some of its defaults conflicted with how we clean up rows and the level of resources (threads) we need.

Fortunately, vttestserver is just a wrapper around vtcombo that generates a vttest protobuf describing the configuration for an in-memory topology server started by vtcombo, encoded in JSON. By modifying vttestserver’s run.sh, we're able to interact with vtcombo directly, passing the JSON configuration along with other vttestserver defaults reverse-engineered from run.sh and vtprocess.go.

Vitess doesn’t provide a vtcombo image, we must build our own. Build and upload a boulder-vtcomboserver image on top of Oracle’s MySQL 8.0 image, which provides native arm64 support. The accompanying tag-and-upload shell script defaults to amd64 for CI.

As an aside, Vitess’s official Dockerfiles are only published for amd64, and modifying them to build for arm64 proved difficult because Oracle doesn’t publish MySQL arm64 binaries in its Debian apt repository.

With boulder-vtcomboserver up and running I was able to find/validate the following issues and provide workarounds:

  • Problem: db-migrate, the tool we use to apply database migrations, must be configured to talk to MariaDB through ProxySQL and to MySQL through Vitess (vtgate + vttablet).
    Solution: Use test.sh to symlink the appropriate dbconfig.yml file depending on whether MariaDB or MySQL is in use.

  • Problem: Vitess does not allow database CREATE statements and any DDL containing them will be rejected by vtgate.
    Solution: These databases are already created by vtcombo since they’re defined as KEYSPACES. Skip database creation in test/create_db.sh.

  • Problem: Vitess does not allow user creation or grants (CREATE USER, GRANT), and any DDL containing these commands will be blocked by vtgate.
    Solution: Skip user creation and grant steps in test/create_db.sh. Set % for --vschema_ddl_authorized_users as vttestserver does, and revisit this later for a more complete approach.

  • Problem: vttablet default for maximum number of rows returned from a (non-streaming) query (10,000) is too low for Boulder’s needs, causing queries to fail due to vttablet rejecting them.
    Solution: Increase --queryserver-config-max-result-size to 1,000,000 and --queryserver-config-warn-result-size to 1,000,000.

  • Problem: vttablet default for connection pool size (16) and maximum number of concurrent transactions (20) are too low for Boulder’s needs, causing queries to fail due to vttablet being overloaded.
    Solution: Increase --queryserver-config-pool-size to 64 and --queryserver-config-transaction-cap to 80.

  • Problem: Vitess does not allow TRIGGER statements and any DDL containing them will be rejected by vtgate. Without TRIGGER statements TestIssuanceCertStorageFailed, an integration test, will fail.
    Soluton: Run these TRIGGER statements in an entrypoint scripttest/vtcomboserver/install_trigger.sh, bypassing vtgate entirely.

Depends on #8479
Depends on #8489
Depends on #8490
Depends on #8494
Fixes #7736

@beautifulentropy beautifulentropy force-pushed the add-vitess branch 11 times, most recently from 1df6b37 to f6b45ac Compare October 31, 2025 16:15
@beautifulentropy beautifulentropy changed the title WIP database: Add vitess + mysql 8.0 to our development environment Oct 31, 2025
@beautifulentropy beautifulentropy force-pushed the add-vitess branch 8 times, most recently from e3b5db5 to 55dcd24 Compare November 4, 2025 17:06
@beautifulentropy beautifulentropy marked this pull request as ready for review November 12, 2025 17:33
@beautifulentropy beautifulentropy requested a review from a team as a code owner November 12, 2025 17:33
@github-actions
Copy link
Contributor

@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

Copy link
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overarching comment: I think a bunch of small pieces of this (the sa.go change, the schema file changes, the "put max_statement_time in every dburl explicitly" change, etc) could and maybe should be broken out as smaller changes that precede actually adding Vitess+MySQL8. I don't feel super strongly on this point -- this PR is actually fairly readable despite its size as-is -- but since some pieces have already been broken out, it would make sense to continue down that path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the sa/db-next/dbconfig.yml symlink need to be updated as well?

Oh wait, now that I've gotten further in I see that you're actually doing symlink-chaining. db-next's dbconfig is a symlink pointing at db's dbconfig, which itself is a symlink that dynamically points at either the mariadb or mysql8 file. That's somewhat magic and spooky action at a distance and I don't love it. I think I'd prefer that the components which need to know which dbconfig file to load inspect the USE_VITESS env var to make that decision.

# https://github.com/rubenv/sql-migrate#readme
boulder_sa_test:
dialect: mysql
datasource: root@tcp(boulder-vitess:33577)/boulder_sa_test?parseTime=true&max_execution_time=13000&long_query_time=11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason this sets long_query_time to 11 while the mariadb config has it set to 12?

Comment on lines +81 to +83
// This block is only necessary for ProxySQL + MariaDB and can be
// deleted once we're fully migrated to Vitess + MySQL 8, where the
// trigger is installed via test/vtcomboserver/install_trigger.sh.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is very likely to be blindly deleted when this whole conditional section is dropped. We should make sure that there's a very prominent comment linking from this test case to install_trigger.sh that will be preserved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

@@ -0,0 +1,69 @@
#!/usr/bin/env bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put a big blinking comment right at the top of this file explaining that it exists for the sole purpose of //‎test/integration/cert_storage_failed_test.go. The comment down on line 47 is burying the lede somewhat.

beautifulentropy added a commit that referenced this pull request Nov 13, 2025
Ahead of the move from ProxySQL + MariaDB to Vitess + MySQL 8 in #8468.
Vitess blocks partition related DDL, so partitions need to be removed
from all schemas under `sa/db*`. The team has agreed that this drift
from Production is acceptable because it lets us begin testing on Vitess
and MySQL sooner.

Separately, `thisUpdate` and `nextUpdate` were relying on an implicit
`DEFAULT NULL`. We now make that explicit, matching how we define other
DATETIME columns. Also, add a missing `DROP TABLE `incidents`;` to our
combined schema migration.

Part of #7736
aarongable

This comment was marked as duplicate.

@beautifulentropy beautifulentropy marked this pull request as draft November 18, 2025 19:58
beautifulentropy added a commit that referenced this pull request Nov 19, 2025
…erter (#8494)

Today, timestamp truncation happens for queries using `*borp.DbMap` but
not `*borp.Transaction`. That means comparisons still see sub-seconds,
but inserts into MariaDB `DATETIME` columns silently truncate them to
whole seconds.

On MySQL 8, the same queries will still include sub-seconds, but inserts
into `DATETIME` columns will round to the nearest second instead of
truncate. This leads to issues for queries like the one in
`*StorageAuthority.UpdateCRLShard()`. When two CRL updaters write within
the same second one may be rounded up to the next second. When the other
updater attempts its own `UPDATE .. WHERE thisUpdate <= ?`, the
condition fails because the stored timestamp now appears to be in the
future.

Ahead of the transition from ProxySQL + MariaDB to Vitess + MySQL 8 in
#8468, update borp (letsencrypt/borp#12) to
expose Transaction arguments to the BoulderTypeConverter, allowing it to
truncate all timestamps passed through Transactions and keep behavior
consistent across `*borp.DbMap` and `*borp.Transaction`, as well as
MariaDB and MySQL 8.

Part of #7736
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Vitess in Boulder CI

3 participants