Skip to content

Feature: External driver plugin system and update Makefile #3694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 14, 2025

Conversation

unsuman
Copy link
Contributor

@unsuman unsuman commented Jul 5, 2025

Note

This PR may appear large. However, a significant portion of the diff consists of auto-generated gRPC code (*.pb.go and *_grpc.pb.go).

This pull request completes the initial GSoC driver plugin project by introducing full support for external, out-of-process drivers. It ties together the foundational work from the previous three PRs by implementing a gRPC transport layer and providing the necessary build tooling through the Makefile.

Key Changes

  • gRPC Transport for External Drivers: A new gRPC transport layer has been implemented to facilitate communication between the main limactl binary and external driver plugins. This communication occurs over Unix sockets, ensuring efficient and secure local inter-process communication.
  • External Driver Binaries: Drivers can now be compiled as standalone executables (e.g., lima-driver-qemu, lima-driver-vz). Each external driver runs its own gRPC server, which listens for requests from the main Lima process.
  • Makefile Integration for Build Flexibility: The Makefile has been enhanced to manage the new build system. It now includes targets to:
    • Build external driver binaries individually.
    • Control the inclusion or exclusion of internal drivers in the main limactl binary using go build tags.
      make limactl: To include all existing drivers as internal.
      make ADDITIONAL_DRIVERS="vz" limactl additional-drivers: To exclude vz from the main binary and build it as a standalone executable.

GSoC Project Context

This is the 4th and final PR of the initial GSoC project submission. It depends on the successful merge of the preceding three PRs #3691, #3692 & #3693 and delivers the complete, functional driver plugin system. I'll have to do some rebasing after the first three PR to make it ready to merge.

Related Issues

@AkihiroSuda AkihiroSuda added the gsoc/2025 Google Summer of Code 2025 label Jul 7, 2025
@AkihiroSuda AkihiroSuda added this to the v2.0 (?) milestone Jul 7, 2025
@unsuman unsuman force-pushed the feature/external-driver-system branch 2 times, most recently from daea9a1 to 0e78773 Compare July 10, 2025 13:26
Makefile Outdated
Comment on lines 278 to 279
codesign -f -v --entitlements vz.entitlements -s - $(DRIVER_INSTALL_DIR)/lima-driver-vz; \
fi; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing TABs should be used:

Suggested change
codesign -f -v --entitlements vz.entitlements -s - $(DRIVER_INSTALL_DIR)/lima-driver-vz; \
fi; \
codesign -f -v --entitlements vz.entitlements -s - $(DRIVER_INSTALL_DIR)/lima-driver-vz; \
fi; \

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename the file to main_windows.go and remove //go:build windows. See #3295

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename the file to main_darwin.go and remove //go:build darwin. See #3295

@unsuman unsuman force-pushed the feature/external-driver-system branch 2 times, most recently from 43df500 to f3ea5e7 Compare July 11, 2025 19:08
@alexandear alexandear requested a review from Copilot July 12, 2025 16:16
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a complete external driver plugin system for Lima, introducing a gRPC-based architecture that allows VM drivers to run as standalone processes. The changes enable drivers like QEMU, VZ, and WSL2 to operate as external binaries that communicate with the main Lima process over Unix sockets, providing better isolation and flexibility.

Key changes include:

  • Implementation of gRPC transport layer for inter-process driver communication
  • Refactoring of driver interface to support both internal and external drivers
  • New ConfiguredDriver pattern with driver registry and discovery mechanisms
  • Build system updates to support external driver compilation

Reviewed Changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/driver/driver.go Refactored driver interface into composable interfaces with new Info struct
pkg/registry/registry.go Added driver discovery and registration system for external drivers
pkg/driverutil/instance.go Replaced direct driver instantiation with registry-based driver creation
pkg/driver/external/ New gRPC protocol definitions and client/server implementations
cmd/lima-driver-*/ New external driver binaries for QEMU, VZ, and WSL2
Makefile Added build targets for external driver binaries with conditional compilation

if err == nil && connect.SourcePort() != 0 {
return connect, nil
}
return connect, "vsock", err
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GuestAgentConn method returns the connection and error immediately without checking if the connection is valid. If err is not nil, returning a non-nil connection could lead to undefined behavior.

Suggested change
return connect, "vsock", err
if err != nil {
return nil, "vsock", err
}
return connect, "vsock", nil

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is necessary because the connection is checked for nil in the hostagent.go file itself.

errChan, err := s.driver.Start(stream.Context())
if err != nil {
s.logger.Errorf("Start failed: %v", err)
return nil
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Start method returns nil when s.driver.Start() fails, but it should return the actual error to properly propagate failure information to the client.

Suggested change
return nil
return status.Errorf(codes.Internal, "Start failed: %v", err)

Copilot uses AI. Check for mistakes.

Comment on lines 60 to 61
//nolint:forbidigo // fmt.Println used intentionally for capturing the socket path on the client side(Lima)
fmt.Println(socketPath)
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using fmt.Println for inter-process communication is fragile. Consider using a more structured approach like JSON output or a dedicated protocol for communicating the socket path.

Suggested change
//nolint:forbidigo // fmt.Println used intentionally for capturing the socket path on the client side(Lima)
fmt.Println(socketPath)
// Output the socket path in JSON format for structured inter-process communication
output := map[string]string{"socketPath": socketPath}
if err := json.NewEncoder(os.Stdout).Encode(output); err != nil {
logger.Fatalf("Failed to encode socket path as JSON: %v", err)
}

Copilot uses AI. Check for mistakes.

Comment on lines 137 to 155
time.Sleep(time.Millisecond * 100)

scanner := bufio.NewScanner(stdout)
var socketPath string
if scanner.Scan() {
socketPath = strings.TrimSpace(scanner.Text())
} else {
cancel()
if err := cmd.Process.Kill(); err != nil {
driverLogger.Errorf("Failed to kill external driver process: %v", err)
}
return errors.New("failed to read socket path from driver")
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded sleep duration creates a race condition. Consider implementing proper synchronization or retry logic with exponential backoff instead of a fixed delay.

Suggested change
time.Sleep(time.Millisecond * 100)
scanner := bufio.NewScanner(stdout)
var socketPath string
if scanner.Scan() {
socketPath = strings.TrimSpace(scanner.Text())
} else {
cancel()
if err := cmd.Process.Kill(); err != nil {
driverLogger.Errorf("Failed to kill external driver process: %v", err)
}
return errors.New("failed to read socket path from driver")
var socketPath string
scanner := bufio.NewScanner(stdout)
const maxRetries = 5
const initialDelay = time.Millisecond * 50
delay := initialDelay
for i := 0; i < maxRetries; i++ {
if scanner.Scan() {
socketPath = strings.TrimSpace(scanner.Text())
break
}
driverLogger.Warnf("Retrying to read socket path from driver (attempt %d/%d)", i+1, maxRetries)
time.Sleep(delay)
delay *= 2 // Exponential backoff
}
if socketPath == "" {
cancel()
if err := cmd.Process.Kill(); err != nil {
driverLogger.Errorf("Failed to kill external driver process: %v", err)
}
return errors.New("failed to read socket path from driver after retries")

Copilot uses AI. Check for mistakes.

@AkihiroSuda
Copy link
Member

Needs rebase

@unsuman unsuman force-pushed the feature/external-driver-system branch from f3ea5e7 to 7bd7066 Compare July 14, 2025 10:12
@@ -438,3 +439,38 @@ func (inst *Instance) Unprotect() error {
inst.Protected = false
return nil
}

func (inst *Instance) MarshalJSON() ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a custom marshaller for errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To marshal the data and send it over gRPC to configure drivers. 1

JSON Unmarshal error: Failed to unmarshal InstanceConfigJson: json: cannot unmarshal object into Go struct field Instance.errors of type error

Earlier I was getting this error while marshalling.

Is there any problem with this approach? I remember Jan said that this approach isn't the best way in terms of retaining the error type. 2

Footnotes

  1. https://github.com/lima-vm/lima/pull/3694/files#diff-568929538104e5b23efe9abea3c06031910bccb92182fcb14c7171096079578cR284-R309

  2. https://github.com/lima-vm/lima/pull/3573#discussion_r2157863356

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 50bb714 into lima-vm:master Jul 14, 2025
62 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc/2025 Google Summer of Code 2025 impact/changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Current Drivers to new Plugin System Define gRPC protocol for Plugins Create a Plugin Manager
3 participants