Skip to content

Add option to redirect guest serial console output to file #5350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Aug 6, 2025

Add a new logger sub-configuration to specify a file into which guest serial console output should be written (instead of stdout).

Reason

Getting serial output from our integration tests would make debugging intermittent failures where SSH didn't work / the guest became unresponsive a lot easier

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Aug 6, 2025

Codecov Report

❌ Patch coverage is 73.61111% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.32%. Comparing base (80aa1de) to head (049d1b3).

Files with missing lines Patch % Lines
src/vmm/src/device_manager/mod.rs 66.66% 7 Missing ⚠️
src/vmm/src/devices/legacy/serial.rs 0.00% 4 Missing ⚠️
src/vmm/src/lib.rs 70.00% 3 Missing ⚠️
src/vmm/src/rpc_interface.rs 0.00% 3 Missing ⚠️
src/firecracker/src/api_server/request/serial.rs 87.50% 1 Missing ⚠️
src/vmm/src/resources.rs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5350      +/-   ##
==========================================
- Coverage   82.35%   82.32%   -0.03%     
==========================================
  Files         265      266       +1     
  Lines       30640    30674      +34     
==========================================
+ Hits        25233    25253      +20     
- Misses       5407     5421      +14     
Flag Coverage Δ
5.10-c5n.metal 82.28% <69.23%> (-0.05%) ⬇️
5.10-m5n.metal 82.28% <69.23%> (-0.05%) ⬇️
5.10-m6a.metal 81.55% <69.23%> (-0.05%) ⬇️
5.10-m6g.metal 78.90% <66.17%> (-0.02%) ⬇️
5.10-m6i.metal 82.27% <69.23%> (-0.05%) ⬇️
5.10-m7a.metal-48xl 81.55% <69.23%> (-0.05%) ⬇️
5.10-m7g.metal 78.90% <66.17%> (-0.02%) ⬇️
5.10-m7i.metal-24xl 82.25% <69.23%> (-0.05%) ⬇️
5.10-m7i.metal-48xl 82.25% <69.23%> (-0.05%) ⬇️
5.10-m8g.metal-24xl 78.90% <66.17%> (-0.03%) ⬇️
5.10-m8g.metal-48xl 78.90% <66.17%> (-0.03%) ⬇️
6.1-c5n.metal 82.32% <69.23%> (-0.05%) ⬇️
6.1-m5n.metal 82.32% <69.23%> (-0.05%) ⬇️
6.1-m6a.metal 81.60% <69.23%> (-0.06%) ⬇️
6.1-m6g.metal 78.90% <66.17%> (-0.02%) ⬇️
6.1-m6i.metal 82.31% <69.23%> (-0.06%) ⬇️
6.1-m7a.metal-48xl 81.59% <69.23%> (-0.05%) ⬇️
6.1-m7g.metal 78.90% <66.17%> (-0.02%) ⬇️
6.1-m7i.metal-24xl 82.33% <69.23%> (-0.05%) ⬇️
6.1-m7i.metal-48xl 82.33% <69.23%> (-0.05%) ⬇️
6.1-m8g.metal-24xl 78.90% <66.17%> (-0.02%) ⬇️
6.1-m8g.metal-48xl 78.90% <66.17%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roypat roypat force-pushed the more-debug branch 2 times, most recently from 15dd953 to 78977d3 Compare August 7, 2025 07:48
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Aug 7, 2025
@roypat roypat changed the base branch from main to feature/pcie August 7, 2025 09:40
@roypat roypat force-pushed the more-debug branch 4 times, most recently from 0eedf63 to c4665a6 Compare August 7, 2025 11:54
Comment on lines 211 to 215
.arg(
Argument::new("serial-out-path")
.takes_value(true)
.help("Path to a fifo or a file used for serial output."),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option only relevant once the VM is started, so no need to add it as a cmd line arg. Configuring it through config file/API should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, @bchalios only wants the CLI argument, and you only want the config file/API field. Sooo, I'm inclined to keep both now, haha

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to settle on this question. We should be careful with expanding API surface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. To me this is part of the logging API though (or at least, logging adjacent), so configuration should be the same as the logger. At the very least, I suspect people will configure both logging and serial output at the same time. For the logger, we allow both HTTP and CLI configuration, so having both available for the serial output as well seems reasonable to me, otherwise people might need to unnaturally split their configuration of "output" stuff (for example, our test framework configures the logger via CLI, because some of our tests don't have an API server, so that's simply easier, so I'd like to do the same for serial output)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has to be in the API and config json for consistency with the other logging options. I'm neutral on whether we want it as a CLI as well. I don't see anything wrong with it and it'd be consistent with the other logging options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having this under the logger API is very counter intuitive. The serial console has nothing to do with the logger. We could have the one without the other, so I think they definitely shouldn't be configured under the same API call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @bchalios that serial configuration should not be mixed with logger config. But I still think adding CLI arg is not necessary. Maybe we can add a new config file option/API endpoint just for serial device (so finally treat it as a separate device)? Initially it can only contain the path to dump the serial output, but in the future it can also contain an option to enable/disable the serial in general (currently we just check the kernel args to see if serial needs to be enabled).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a PUT /serial? I think that's also fine (and I do like the idea of having the new endpoint configure whether the serial device is available at all, although maybe we can do that another time indeed like Egor is saying).

I'll see how annoying this will be to enable by-default in our tests without the CLI arg, if it doesn't get too bad, I'm open to dropping that too

@roypat roypat force-pushed the more-debug branch 6 times, most recently from aa06cf0 to c91280a Compare August 8, 2025 08:33
Comment on lines 211 to 215
.arg(
Argument::new("serial-out-path")
.takes_value(true)
.help("Path to a fifo or a file used for serial output."),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has to be in the API and config json for consistency with the other logging options. I'm neutral on whether we want it as a CLI as well. I don't see anything wrong with it and it'd be consistent with the other logging options.

@roypat roypat force-pushed the more-debug branch 2 times, most recently from 1382aca to 1255df3 Compare August 12, 2025 13:29
@roypat roypat force-pushed the more-debug branch 2 times, most recently from 7f7e4df to 5dd94ee Compare August 12, 2025 14:58
@roypat roypat force-pushed the more-debug branch 2 times, most recently from f2326ce to 0beef3f Compare August 12, 2025 15:36
Manciukic and others added 14 commits August 12, 2025 16:49
I've rebuilt the CI artifacts for the new development version.

Signed-off-by: Riccardo Mancini <[email protected]>
Signed-off-by: Babis Chalios <[email protected]>
This type is only useful if we are dynamically dispatching io via the
Read/Write traits, but we are not doing that (we converted everything to
static dispatch a while back, for better or for worse, and to avoid
conflicts with the PCI branch, I'm not reverting that here).

Signed-off-by: Patrick Roy <[email protected]>
The `event_observer` field in the Vmm struct is of type Option<Stdin>.
There are two problems
1. With the way the code is written, it will never be `None`
2. `Stdin` is a singleton, there is no need to store it _anywhere_.

With that in mind, we can just remove this field, and update its two
uses to just directly operate on std::io::stdin(). Since it never
`None', we can also remove the logic that matches and handles the `None`
case. Furthermore, the `Drop` impl used to print the same error message
twice in case resetting stdin to canonical mode failed, so fix that to
only print it once.

Signed-off-by: Patrick Roy <[email protected]>
We always called it right before setup_serial_device(), so just move the
call inside. This will also be helpful once setup_serial_device() can
setup the serial device to print to a file, where we dont need to mess
with stdout in the first place.

Signed-off-by: Patrick Roy <[email protected]>
Have teh aarch64 serial device restoration path use the common serial
device initialization code in setup_serial_device(). The only functional
change here is that now we do set_stdou_nonblocking() on the restore
path, although I would wager that not doing so in the past was actually
an oversight, because we do it for all booted VMs, and for restored VMs
on x86.

Signed-off-by: Patrick Roy <[email protected]>
we only write to it, so no need to pass .read(true).

Signed-off-by: Patrick Roy <[email protected]>
Add a new field, `serial_out_path`, to the logger configuration
(available both via API and CLI) which can be set to the path of a file
into which the output of the guest's serial console should be dumped.

Have the file behave identically to our log file (e.g. it must already
exist, Firecracker does not create it).

If we redirect serial output to a log file, disable serial input via
stdin.

Signed-off-by: Patrick Roy <[email protected]>
Copy the screen log file (which contains firecracker's stdout/guest
dmesg if non-daemonized / no log file specified) to the test_results
directory, to be included in artifacts, instead of explicitly reading
one file and subsequently writing another file in Python code.

Signed-off-by: Patrick Roy <[email protected]>
If SSH to guest is failing, we don't really get much useful in terms of
logs in our ci artifacts. Enable serial console so that we get guest
dmesg always, at least if we have the API server around (which is the
case for the vast majority of tests).

Since serial console is enabled by default now, this means we can stop
explicitly passing modified bootargs in a bunch of tests now.

Disable it for the tests that interact with the serial console via
stdin/stdout (here we are not daemonizing, so we are capturing serial
output anyway).

Disable pylint's "too-many-statements" lint, because it started firing
in microvm.spawn()

Signed-off-by: Patrick Roy <[email protected]>
Add a test that verifies having serial output go to a file works.

Signed-off-by: Patrick Roy <[email protected]>
In test_serial_console_login, we were implementing something that takes
three lines (send and receive data via serial) via an overcomplicated
state machine system. Eliminate the 100 lines of state machine code and
instead just... do the 3 lines of serial.tx/rx.

Signed-off-by: Patrick Roy <[email protected]>
If a test fails, take a snapshot fo the microvms involved, and upload it
as a CI artifact, for post-mortem analysis.

Signed-off-by: Patrick Roy <[email protected]>
These have only caused confusion, as they're output often makes no
sense, and we generally just end up ignoring them (but have discussions
about it every single time). Let's remove them.

Signed-off-by: Patrick Roy <[email protected]>
When A/B-testing, the "A" revision firecracker binary does not know
about the new --serial-out-path CLI argument. Thus, disable the serial
console for all A/B-tests. It gets quite ugly to do for the
vulnerability tests, because the .spawn() call is burried deep in the
fixtures.

This commit should be reverted after merging the PR

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat changed the base branch from feature/pcie to main August 12, 2025 15:49
use crate::api_server::parsed_request::{ParsedRequest, RequestError};

pub(crate) fn parse_put_serial(body: &Body) -> Result<ParsedRequest, RequestError> {
METRICS.put_api_requests.logger_count.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a separate metric for serial

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, I just messed up the copy/paste 😭

constructor_args
.event_manager
.add_subscriber(serial.clone());
let serial = crate::DeviceManager::setup_serial_device(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on restore we're expecting a file with the same name to exist for us to put the serial to, right? should we make it overrideable from the load API or is it too much (unnecessary) work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not, the file is not stored in the snapshot, instead configuration is reset, just like for the logger (so before restore, you'd need to do a new PUT /serial request)

@@ -35,6 +35,10 @@ and this project adheres to
requests to `/mmds/config` to enforce MMDS to always respond plain text
contents in the IMDS format regardless of the `Accept` header in requests.
Users need to regenerate snapshots.
- [#5350](https://github.com/firecracker-microvm/firecracker/pull/5350): Added a
`--serial-out-path` CLI option and a `serial_out_path` field to `PUT /logger`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changelog still refers to the previous revision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, thanks, will fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants