Open
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After agent.Close(), FIN-ACK packets linger in the vsock TX virtqueue. If PauseVM() runs immediately, vhost-vsock device state is corrupted in the snapshot — all
connections hang on restore. Fix: 500ms sleep between close and pause in all 4 snapshot paths (doHibernate, doSaveAsTemplate, CreateCheckpoint, PrepareGoldenSnapshot).
Added diagnostic logging in doWake and waitForAgent (manager.go) for vsock.sock state.
Guest clock freezes at snapshot time. There's no NTP inside the VM — nothing was correcting it. The old clock_delta_us approach never worked (Firecracker doesn't support
that field), so every sandbox had a stale clock that drifted further with each hibernate/wake cycle. Fix: replaced the dead clock_delta_us system entirely — removed the
clockDeltaUs parameter from LoadSnapshot, deleted the snapshotClockDeltaUs helper, and added a new syncGuestClock() that sets the guest clock via date -s through the
agent after every snapshot restore. Called in 7 places covering all paths (wake, cold boot, checkpoint resume, fork, golden create). Verified 0-1s drift across all
paths.
POST /api/sandboxes/:id/exec/run on the control plane panicked — s.manager is nil in server mode with no fallback. Fix: new execRunRemote() that looks up the sandbox's
worker in the DB and forwards the exec over gRPC.
Deleting a checkpoint with forked sandboxes referencing it returned 500 (FK violation). Fix: transaction that NULLs out based_on_checkpoint_id references before
deleting.
In createFromGoldenSnapshot, the network reconfig and clock sync were using the HTTP request ctx. If the SDK client disconnects (timeout, user cancels, etc.) before
those steps finish, the context gets cancelled, leaving the sandbox with broken networking — even though the VM is running fine.
The fix switches to context.Background() for those post-restore steps so they always complete regardless of what the client does.