Skip to content

Conversation

@alexdewar
Copy link
Collaborator

Description

I've just discovered the assert_cmd and figured we can use it to simplify some of our current regression test setup.

assert_cmd provides helpers for invoking your main binary as a separate process and checking that it completes etc. It even works with cargo llvm-cov, so these tests contribute to coverage. The advantage of this approach is that you don't share global state (e.g. the logger) between tests, meaning you don't have to spread your integration tests between files. This brings a few benefits:

  • Tidier file structure
  • Tidier console log when running cargo test (as each file in tests/ gets its own section in the output)
  • Better test coverage as we now run the program the whole way through rather than just particular helper functions

To make this easier, I changed things so that you can disable loading of the settings.toml file with an environment variable. I don't imagine this will be commonly used by non-devs, but it provides a couple of benefits to us:

  • We no longer need to allow for users to explicitly opt out of the debug output files with --debug-model=true
  • We won't need similar chicanery for any future options we add

Now we can just set this env var whenever we invoke muse2 for any integration test.

I was able to amalgamate the following tests:

  • Various CLI tests have been moved to tests/cli.rs
  • All the regression tests are now in tests/regression.rs, with just a single line of code needed for each

As we don't need to be able to access any functions inside the muse2 crate from our integration tests anymore, we could make many things private and could even go back to having it just be a straight binary library (no library), but that seemed out of scope for this PR (and probably a lot of work). (One advantage of having things be private is that the compiler can tell you if they're unused.)

I also added a few missing tests for example subcommands while I was at it.

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc
  • Update release notes for the latest release if this PR adds a new feature or fixes a bug
    present in the previous release

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

Copilot AI review requested due to automatic review settings January 16, 2026 18:31
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 78.26087% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.11%. Comparing base (0faa09b) to head (6ebd2ca).

Files with missing lines Patch % Lines
src/cli.rs 72.72% 0 Missing and 3 partials ⚠️
src/cli/example.rs 66.66% 0 Missing and 1 partial ⚠️
src/settings.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1086      +/-   ##
==========================================
+ Coverage   82.27%   83.11%   +0.84%     
==========================================
  Files          55       55              
  Lines        7487     7464      -23     
  Branches     7487     7464      -23     
==========================================
+ Hits         6160     6204      +44     
+ Misses       1029      952      -77     
- Partials      298      308      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies regression tests by using the assert_cmd crate to invoke the binary as a subprocess instead of calling internal functions directly. This approach eliminates shared global state (like the logger) between tests and improves test coverage by exercising the full application path. The PR consolidates multiple test files into tests/cli.rs and tests/regression.rs, and adds a MUSE2_USE_DEFAULT_SETTINGS environment variable to bypass settings file loading during tests.

Changes:

  • Introduced assert_cmd crate for subprocess-based testing
  • Added MUSE2_USE_DEFAULT_SETTINGS environment variable to enable default settings in tests
  • Consolidated test files: CLI tests moved to tests/cli.rs, regression tests to tests/regression.rs

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/validate.rs Removed - functionality moved to tests/cli.rs
tests/run.rs Removed - functionality moved to tests/cli.rs
tests/regression_*.rs Removed - consolidated into tests/regression.rs using macros
tests/regression.rs Refactored to use subprocess invocation via assert_cmd instead of direct function calls
tests/common.rs Added helper functions and macros for regression tests
tests/cli.rs New file consolidating CLI command integration tests
tests/regenerate_test_data.sh Updated to use MUSE2_USE_DEFAULT_SETTINGS env var
src/settings.rs Added load_or_default() method supporting MUSE2_USE_DEFAULT_SETTINGS env var
src/cli/example.rs Removed Settings parameter from handle_example_run_command
src/cli.rs Changed debug_model from Option to bool; removed Settings parameters from command handlers
docs/developer_guide/architecture_quickstart.md Updated documentation to reference new test location
Cargo.toml Added assert_cmd dependency
Cargo.lock Updated with assert_cmd and its dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings January 16, 2026 18:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alexdewar alexdewar requested review from Aurashk and tsmbland January 16, 2026 19:21
Copilot AI review requested due to automatic review settings January 19, 2026 08:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +78 to 84
pub fn load_or_default() -> Result<Settings> {
if env::var("MUSE2_USE_DEFAULT_SETTINGS").is_ok_and(|v| v == "1") {
Ok(Settings::default())
} else {
Self::from_path_or_default(&get_settings_file_path())
}
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MUSE2_USE_DEFAULT_SETTINGS environment variable functionality is not covered by unit tests. Consider adding a test case to verify that when this environment variable is set to "1", the function returns default settings without attempting to read from the file system, even if a settings file exists.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I'm not against this, but I don't want the test to depend on the state of the user's file system and I'm not sure of a good way round that.

Copilot AI review requested due to automatic review settings January 19, 2026 09:31
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 78.26087% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.09%. Comparing base (794dbc5) to head (2976e9a).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
src/cli.rs 72.72% 0 Missing and 3 partials ⚠️
src/cli/example.rs 66.66% 0 Missing and 1 partial ⚠️
src/settings.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1086      +/-   ##
==========================================
+ Coverage   82.25%   83.09%   +0.83%     
==========================================
  Files          55       55              
  Lines        7576     7553      -23     
  Branches     7576     7553      -23     
==========================================
+ Hits         6232     6276      +44     
+ Misses       1050      973      -77     
- Partials      294      304      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alexdewar alexdewar added this to MUSE Jan 19, 2026
@alexdewar alexdewar moved this to 👀 In review in MUSE Jan 19, 2026
Copy link
Collaborator

@tsmbland tsmbland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. Definitely cleaner if we end up with loads of regression tests, and it seems a lot quicker to run as well


/// Define a regression test with extra command-line arguments
#[allow(unused_macros)]
macro_rules! define_regression_test_with_extra_args {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this (and subsequent macros) not be better placed in regression.rs? I wouldn't really call this "common".

Also, without fully qualifying run_regression_test you're assuming that the macro will be called in the same file as a function called run_regression_test, which happens to be true but it would be less scary if these were all defined in the same file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not really common code... I stuck the macros in a separate file because you have to define them above where you use them in a file and I wanted the place where the regression tests themselves were defined to be near the top of the file.

I guess we can just stick a comment in telling people to scroll down though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of that but I guess that's as good a reason as any. Up to you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah maybe let's just leave it as is for now... We have to choose between the ugliness of having the macros defined outside regression.rs or the ugliness of the tests being defined halfway through the file. We can always fiddle with it later.

@alexdewar alexdewar merged commit 7a780c4 into main Jan 19, 2026
8 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in MUSE Jan 19, 2026
@alexdewar alexdewar deleted the better-integration-tests branch January 19, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants