Skip to content

[BUG] Windows agent service doesn't bind to port 6556 until restarted after installation #965

@zenwalk2013

Description

@zenwalk2013

Describe the bug
The CheckMK Windows agent service starts successfully after installation/registration but does not bind to port 6556 until the service is restarted. This causes the win_wait_for port verification task in the agent role to timeout, even though the service is running.
After installation completes:
CheckmkService status shows as "Running"
Port 6556 is NOT listening (verified with netstat -an | findstr 6556)
After manually restarting the service with restart-service CheckmkService, port 6556 becomes active and listening

This appears to be a Windows-specific issue where the agent controller (cmk-agent-ctl.exe) doesn't immediately initialize the network listener after installation/registration, requiring a service restart to fully activate.

Component Name
Component Name: ansible_collections/checkmk/general/roles

Ansible Version
ansible [core 2.20.0]
jinja version = 3.1.6
pyyaml version = 6.0.3 (with libyaml v0.2.5)


**Checkmk Version and Edition**
CheckMK version: 2.4.0p3

**Collection Version**
checkmk.general                          6.5.0

To Reproduce
Steps to reproduce the behavior:

Install CheckMK agent on Windows 11 host using the checkmk.general.agent role
Use configuration with checkmk_agent_mode: 'pull' and checkmk_agent_tls: false
Wait for the role to complete agent installation and registration
Observe the win_wait_for task timeout with error:

TASK [checkmk.general.agent : Win32NT: Verify Checkmk Agent Port is open.] ****
fatal: [monitoring-host: FAILED! => {
"changed": false,
"elapsed": 60.900359599999994,
"msg": "timeout while waiting for 127.0.0.1:6556 to start listening",
"wait_attempts": 20
}

Expected behavior
After the CheckMK agent installation and registration completes:

The CheckmkService should be running AND listening on port 6556
The win_wait_for task should successfully verify the port is open
No manual service restart should be required

Actual behavior
After installation completes:The CheckmkService shows as "Running" but is NOT listening on port 6556The win_wait_for task times out after 60 secondsManual service restart is required to bind port 6556PS C:\Users\user> get-service CheckmkServiceStatus Name DisplayName------ ---- -----------Running CheckmkService Checkmk ServicePS C:\Users\user> netstat -an | findstr 6556# No output - port not listening despite service runningPS C:\Users\user> restart-service CheckmkServicePS C:\Users\user> netstat -an | findstr 6556TCP 0.0.0.0:6556 0.0.0.0:0 LISTENINGTCP [::]:6556 [::]:0 LISTENING

Minimum reproduction example

  • name: Install CheckMk agent from monitoring server
    ansible.builtin.include_role:
    name: checkmk.general.agent
    vars:
    checkmk_agent_version: "2.4.0p3"
    checkmk_agent_server_protocol: https
    checkmk_agent_server_validate_certs: false
    checkmk_agent_server_port: 443
    checkmk_agent_configure_firewall: false
    checkmk_agent_site: 'sci_monitoring'
    checkmk_agent_user: "{{ checkmk_automation_user }}"
    checkmk_agent_secret: "{{ checkmk_automation_secret }}"
    checkmk_agent_registration_server_protocol: "https"
    checkmk_agent_add_host: false
    checkmk_agent_host_name: "{{ inventory_hostname }}"
    checkmk_agent_tls: false
    checkmk_agent_mode: 'pull'

Additional context
workaround by adding restart plays after installation play in the playbook

  • name: Install CheckMk agent from monitoring server
    ansible.builtin.include_role:
    name: checkmk.general.agent
    vars:
    checkmk_agent_mode: 'ssh' # Skip port check to avoid timeout

  • name: Restart CheckMK service to ensure port binding (Windows)
    ansible.windows.win_service:
    name: CheckMkService
    state: restarted
    when: ansible_os_family == "Windows"

  • name: Wait for CheckMK agent port to be listening (Windows)
    ansible.windows.win_wait_for:
    port: 6556
    timeout: 30
    when: ansible_os_family == "Windows"

Suggested fix
The agent role's Win32NT.yml tasks should include an automatic service restart after agent installation/registration and before the port verification check. This would ensure the agent is fully initialized and listening on the correct port.
Proposed change location: roles/agent/tasks/Win32NT.yml - add a service restart task before line 123 (the win_wait_for port verification task).

OS: Windows 11 (also reported on Windows Server 2019/2022)
Issue occurs: On both fresh installations and updates
Affected versions:

Previously observed with collection version 5.10.1
Still present in version 6.5.0

Agent mode: pull mode without TLS
This is a Windows-specific issue - Linux agent installations work correctly

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrole:agentThis affects the agent roleupstreamThere is something upstream blocking thiswontfixThis will not be worked on

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions