Skip to content

Conversation

amd989
Copy link

@amd989 amd989 commented Oct 12, 2025

Add Windows Container Support for Elastic Agent

What does this PR do?

This PR adds full support for building and deploying Elastic Agent as a Windows container, enabling deployment on Windows Server hosts with container support.

Note: Prior to this PR, Elastic Agent only supported Linux containers. All existing Docker documentation and Kubernetes manifests assume Linux containers. This PR adds Windows container support as a new deployment option for Windows Server hosts.

The implementation includes:

Core Build System Changes:

Windows Container Templates:

CI/CD Integration (Buildkite):

Documentation:

Why is it important?

Business Value:

  • Enables Elastic Agent deployment on Windows Server environments that use containerization
  • Expands platform coverage to support Windows-based Kubernetes clusters and Windows container orchestration
  • Provides feature parity with Linux container deployments

Technical Benefits:

  • Minimal Footprint: PowerShell Nanoserver base (~400MB)
  • Multi-Agent CI/CD: Solves the Docker daemon limitation (Linux vs Windows mode) by splitting cross-compilation and packaging across separate Buildkite agents
  • Production Ready: Fully automated build pipeline integrated with existing release process
  • Standards Compliant: Uses Microsoft's official PowerShell Nanoserver images with LTS support

Architecture Decision:
The implementation uses a multi-agent Buildkite pipeline to overcome the fundamental limitation that Docker cannot run Linux and Windows containers simultaneously:

  • Linux agent: Cross-compiles Windows binary using golang-crossbuild (requires Linux containers)
  • Windows agent: Downloads binary and builds Windows Docker image (requires Windows containers)
  • Publishing agent: Collects all artifacts for release

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (Not applicable - uses existing agent configuration)
  • [ ] I have added tests that prove my fix is effective or that my feature works (Testing via CI/CD pipeline)
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test (Windows container integration tests would require Windows container infrastructure)

Disruptive User Impact

No disruptive impact. This is a purely additive feature:

  • Existing Linux container builds are unchanged
  • Existing Windows binary packages (zip, MSI) are unchanged
  • No changes to agent configuration or runtime behavior
  • No changes to Fleet Server integration

Users who want Windows containers can now use them; users who don't are unaffected.

How to test this PR locally

Prerequisites

  • Windows 10/11 or Windows Server 2022
  • Docker Desktop with Windows containers enabled
  • Go 1.24.7+
  • Mage build tool

Option 1: Full Build (requires Docker mode switching)

Phase 1 - Cross-compile (Linux containers):

# Switch Docker to Linux mode
& 'C:\Program Files\Docker\Docker\DockerCli.exe' -SwitchLinuxEngine

# Cross-compile Windows binary
$env:DEV="true"
$env:SNAPSHOT="true"
$env:PLATFORMS="windows/amd64"
mage package

Phase 2 - Package Docker (Windows containers):

# Switch Docker to Windows mode
& 'C:\Program Files\Docker\Docker\DockerCli.exe' -SwitchWindowsEngine

# Package Windows Docker image
$env:DEV="true"
$env:SNAPSHOT="true"
$env:PLATFORMS="windows/amd64"
$env:PACKAGES="docker"
mage package

Option 3: CI/CD Verification

The Buildkite pipeline will automatically build Windows containers when:

  • Running the package pipeline
  • Creating a release
  • Triggered by manifest URL

Verify the package_elastic-agent-windows-docker step succeeds and uploads artifacts to build/distributions/.

Related issues

Questions to ask yourself

How are we going to support this in production?

  • Windows containers will be published to docker.elastic.co/elastic-agent alongside Linux variants
  • Automated CI/CD builds ensure consistency with other platforms
  • Uses official Microsoft base images with LTS support
  • Follows existing agent container patterns for configuration and lifecycle

How are we going to measure its adoption?

  • Docker Hub/Elastic registry pull metrics for Windows image tags
  • Telemetry from agents identifying as Windows containers (platform detection)
  • Customer feedback through support channels and GitHub issues

How are we going to debug this?

  • PowerShell entrypoint script provides clear error messages
  • Standard Docker logging via docker logs <container-id>
  • Agent diagnostics work identically to Linux containers
  • docker exec into container for interactive PowerShell debugging
  • Test directory allows rapid iteration without full build

What are the metrics I should take care of?

  • Build Metrics: CI/CD pipeline success rate for Windows Docker step
  • Image Metrics: Image size, layer count, pull times
  • Runtime Metrics: Container startup time, memory footprint, CPU usage
  • Adoption Metrics: Download/pull counts, deployment patterns (standalone vs Fleet)
  • Quality Metrics: Issue reports specific to Windows containers, crash rates

Additional Considerations:

  • Base Image Updates: Monitor Microsoft's Nanoserver releases for security patches
  • Windows Version Compatibility: Test against Windows Server 2022 and newer
  • Kubernetes Integration: Validate on AKS Windows node pools
  • Resource Requirements: Document minimum Windows container host requirements
  • License Compliance: Ensure Windows container licensing is understood by users

PR labels:
backport-active-all

Copy link

cla-checker-service bot commented Oct 12, 2025

💚 CLA has been signed

Copy link
Contributor

mergify bot commented Oct 12, 2025

This pull request does not have a backport label. Could you fix it @amd989? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@mergify mergify bot assigned amd989 Oct 12, 2025
@amd989 amd989 force-pushed the feature/windows branch 2 times, most recently from 8e8bf9f to e1dc791 Compare October 14, 2025 16:27
@amd989
Copy link
Author

amd989 commented Oct 14, 2025

please add backport-active-8 and backport-active-9 (I don't have permissions)

@amd989 amd989 marked this pull request as ready for review October 16, 2025 20:21
@amd989 amd989 requested review from a team as code owners October 16, 2025 20:21
Adding support for windows containers for elastic-agent. This will allow deploying the image in a windows host to take advantage of the agents capabilities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for windows containers

1 participant