Skip to content

Run as NETWORK SERVICE on Windows#46

Merged
gabriel-samfira merged 2 commits intocloudbase:mainfrom
KBjorndal-VizRT:patch-1
Jul 16, 2025
Merged

Run as NETWORK SERVICE on Windows#46
gabriel-samfira merged 2 commits intocloudbase:mainfrom
KBjorndal-VizRT:patch-1

Conversation

@KBjorndal-VizRT
Copy link

@KBjorndal-VizRT KBjorndal-VizRT commented Jul 14, 2025

Having the runner run as the SYSTEM account causes a bunch of weird issues because it's profile directory is under C:\Windows\System32 which is subject to WoW64 folder redirection.

When registered using config.cmd actions-runner registers the service to run as NT AUTHORITY\NETWORK SERVICE which doesn't have this same issue. This changes GARM to also use that same service user which also requires running the service executable in a special "init" mode to register a Event Log Trace Source (without doing this the service silently does nothing).

Issue #45

@KBjorndal-VizRT
Copy link
Author

KBjorndal-VizRT commented Jul 14, 2025

I think this needs some testing outside our environment. We have a custom provider for vSphere and a not entirely straight forward VM template, so although this seems to work for us I haven't tested it in any default context.

The actions-runner service setup also sets up file permissions for the service user (and I might have missed other details it does, there's quite a lot of code involved), I'm not sure if that's required when running as NETWORK SERVICE or if that's only relevant when using a different service user with --windowslogonaccount.

Unfortunately I don't really have access to any environments to test any of the normal providers.

@gabriel-samfira
Copy link
Member

This is great! I can test it out on another provider, but it might take a few days. From looking at the code, I see no reason things would not work. Give me a few days to validate and I'll ping back here.

We have a custom provider for vSphere and a not entirely straight forward VM template, so although this seems to work for us I haven't tested it in any default context.

That's awesome! Is your provider open source by any chance? If it is, and you'd like, I can add a link to it in the GARM README.md.

@KBjorndal-VizRT
Copy link
Author

That's awesome! Is your provider open source by any chance? If it is, and you'd like, I can add a link to it in the GARM README.md.

No, though I could look at going through the process to publish it if there's a demand for it. vSphere isn't what most people want to base new infrastructure on at this point though :)

@gabriel-samfira
Copy link
Member

No, though I could look at going through the process to publish it if there's a demand for it. vSphere isn't what most people want to base new infrastructure on at this point though :)

That's true, but there is a lot of existing infra that people still use. Believe it or not, there are ancient versions of XenServer, still around as well as various other discontinued virtualization solutions.

There is absolutely no pressure. I mentioned it as a purely optional thing, in case it was something you would like to do. But there is no worries if the answer is "no" 😄. I do not want to put any extra work on your shoulders.

@gabriel-samfira
Copy link
Member

This issue reminded me of an old Windows issue when it comes to running services that need to also behave like regular users (run installers, have a profile, run commands, etc).

As you've noticed the SYSTEM user doesn't really have a profile, so some operations will fail in creative ways. The NETWORK SERVICE user does, but does not have elevated privileges. This means it cannot run an installer, modify files in C:\Program Files\, etc (well, at least not by default).

In the past what we've done for other projects was to create a new user, add it to the Administrators group, and grant SeBatchLogonRight and/or SeServiceLogonRight (complete list of privileges here). Then we would use that user to run the service. This would allow the service to run applications with elevated privileges while getting the behavior of a normal user.

Running the github runner as NETWORK SERVICE will be fine as long as jobs don't need to modify system files, run installers, etc. If all that jobs require is downloading files, network access and running executables that already exist, it should work great. In fact, disabling JIT config runs the service as NETWORK SERVICE as you've stated.

So I will test this with another provider hopefully today. If all is well, we can merge it as is.

Given that you use Windows with GARM, I would love your feedback on the following: In a future PR (as part of a later update), should we try and set up a dedicated administrative user with service logon rights and run the runner using those credentials, or should we just stick with NETWORK SERVICE? Would you benefit from the actions-runner being started under a regular user with administrative privileges?

@KBjorndal-VizRT
Copy link
Author

This issue reminded me of an old Windows issue when it comes to running services that need to also behave like regular users (run installers, have a profile, run commands, etc).

As you've noticed the SYSTEM user doesn't really have a profile, so some operations will fail in creative ways. The NETWORK SERVICE user does, but does not have elevated privileges. This means it cannot run an installer, modify files in C:\Program Files\, etc (well, at least not by default).

It does have a profile, and as long as all executables used are either 64-bit or 32-bit it all works fine, but the moment for example msbuild is 32-bit and everything else is 64-bit then msbuild doesn't see the files the previous step fetched or unpacked and confusion ensues.

In the past what we've done for other projects was to create a new user, add it to the Administrators group, and grant SeBatchLogonRight and/or SeServiceLogonRight (complete list of privileges here). Then we would use that user to run the service. This would allow the service to run applications with elevated privileges while getting the behavior of a normal user.

Running the github runner as NETWORK SERVICE will be fine as long as jobs don't need to modify system files, run installers, etc. If all that jobs require is downloading files, network access and running executables that already exist, it should work great. In fact, disabling JIT config runs the service as NETWORK SERVICE as you've stated.

So I will test this with another provider hopefully today. If all is well, we can merge it as is.

Given that you use Windows with GARM, I would love your feedback on the following: In a future PR (as part of a later update), should we try and set up a dedicated administrative user with service logon rights and run the runner using those credentials, or should we just stick with NETWORK SERVICE? Would you benefit from the actions-runner being started under a regular user with administrative privileges?

From some quick searching around it looks like the Github hosted runners do something like this. For our use case we don't really care if our builds runs as NETWORK SERVICE or a more "normal" user, but admin rights could be necessary if some build wants to test that .msi installation works for example.

And it's not like restricting the build to a less privileged account really secures anything since it's a throwaway virtual machine. So yes, I think a privileged runner account is a good future improvement. Also means the behaviour on Windows is closer to how the Linux runners are set up, and consistency is always nice.

PS: Just ran into an issue with job cancellations with cached runner, I think it's because of the github runner self update mechanism, is that maybe something that also should be disabled on these runners since they are ephemeral?

@gabriel-samfira
Copy link
Member

PS: Just ran into an issue with job cancellations with cached runner, I think it's because of the github runner self update mechanism, is that maybe something that also should be disabled on these runners since they are ephemeral?

Disabling updates might result in the runner never being able to join github, as Github will refuse connections from runners more than 2 versions behind (if I remember correctly).

We might need to grant full access for NT AUTHORITY\NETWORK SERVICE to C:\actions-runner. That should allow the auto update to work.

@KBjorndal-VizRT
Copy link
Author

PS: Just ran into an issue with job cancellations with cached runner, I think it's because of the github runner self update mechanism, is that maybe something that also should be disabled on these runners since they are ephemeral?

Disabling updates might result in the runner never being able to join github, as Github will refuse connections from runners more than 2 versions behind (if I remember correctly).

We might need to grant full access for NT AUTHORITY\NETWORK SERVICE to C:\actions-runner. That should allow the auto update to work.

So the auto update works without aborting running jobs if it has the right permissions?

Also, I found another reason you want an admin account, it seems like WiX builds fail when run under the normal builtin service accounts: wixtoolset/issues#6254

So figuring out a reasonable fix for that for our environment is my headache for tomorrow :)

@gabriel-samfira
Copy link
Member

gabriel-samfira commented Jul 15, 2025

So the auto update works without aborting running jobs if it has the right permissions?

If we use ephemeral runners, in theory, it should update before registering on github. Once it picks up a job, it should no longer try to update.

Also, I found another reason you want an admin account, it seems like WiX builds fail when run under the normal builtin service accounts: wixtoolset/issues#6254

So figuring out a reasonable fix for that for our environment is my headache for tomorrow :)

Gah. Yeah. I will also look into this. We've fixed it in the past, but it either requires pinvoke and C# or rely on some binary. Alternatives are to use secedit which will make us question our life choices if we try to use it. There's also powershell Carbon, but not all environments will have access to powershell gallery.

Edit: The issue is not creating a user. The issue is granting privileges like SeServiceLogonRight.

@gabriel-samfira
Copy link
Member

having a look at https://github.com/microsoft/CsWin32 to generate the C# code to call into netsecapi. Once we have a function we can use to grant privileges, we can easily create a user and allow is to be used to run a service.

@gabriel-samfira
Copy link
Member

So, theoretically, this should work:

https://github.com/cloudbase/garm-provider-common/compare/main...gabriel-samfira:garm-provider-common:windows-fixes?expand=1

But I have not tested it yet, as part of an actual deployment. I will give it a shot later today and report back. The changes above create a new runner user, adds it to the local administrators group. The group name may be internationalized (had this happen a bunch of times), so we get the name by looking up the well known SID for the local administrators group.

After that, we use the Local Security Authority functions to grant the SeServiceLogonRight and SeBatchLogonRight to the new user. After that thge GH runner service should be able to start using credentials for this user.

The change also sets ACLs to full control for the runner user on C:\actions-runner.

I just need to test it to see if it works. But if it does, it should fix the issue you mentioned.

@gabriel-samfira
Copy link
Member

sigh. It's larger than 16 KB, and EC2 has a size limit. Looking for alternatives.

@gabriel-samfira
Copy link
Member

ohh! I can zip the userdata and it's fine. So the branch I posted above results in this:

image

@gabriel-samfira
Copy link
Member

And here is the output of a job using a runner spawned with the modified userdata:

https://github.com/gsamfira/garm-testing/actions/runs/16304803944/job/46048252926

it prints the hostname and the output of whoami.

@KBjorndal-VizRT
Copy link
Author

Thanks! I'll try to run some tests with our provider as well.

I see a couple of places where native powershell could be used instead of running external commands and needing to do error handling manually. The easiest is probably if you open a PR with this and I can comment on it? Or should I just make comments on the commit in your fork?

@gabriel-samfira
Copy link
Member

With your permission, I can push my changes to the branch you proposed here (github allows maintainers to change proposed branches), and you can then change it whichever way you see fit. Would that be ok?

@KBjorndal-VizRT
Copy link
Author

With your permission, I can push my changes to the branch you proposed here (github allows maintainers to change proposed branches), and you can then change it whichever way you see fit. Would that be ok?

Absolutely, it's a much better fix than using NETWORK SERVICE

Using a normal user to run the runner service under allows workflows to
run applications just like any other user. The SYSTEM user has some
limitations.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
@gabriel-samfira
Copy link
Member

gabriel-samfira commented Jul 16, 2025

done. You can pull your branch, make changes, push an update. I force pushed, so you may need to git reset --hard HEAD~1 and then git pull, then make changes.

Use built-in powershell functionality for a few more things in order
to not need manual error handling everywhere.
$runnerACL.SetAccessRule((New-Object System.Security.AccessControl.FileSystemAccessRule(
"runner", "FullControl", "ContainerInherit,ObjectInherit", "None", "Allow"
)))
Set-Acl -Path $runnerDir -AclObject $runnerAcl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this recursive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might also want to change access rights after the service is configured but before it is started. Some additional files with credentials and agent ID are created when config.cmd is run/jit config files are downloaded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as none of the subfolders have permission inheritance turned off. I don't believe that can happen except if it's a cached runner and the template creator explicitly did that, at which point they probably have a reason:

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order also shouldn't matter since Windows permissions are inherited by default

@gabriel-samfira gabriel-samfira merged commit f9f079b into cloudbase:main Jul 16, 2025
2 checks passed
@gabriel-samfira
Copy link
Member

Thanks!

@gabriel-samfira
Copy link
Member

I didn't even ask. Did this fix things in your env? All good?

@KBjorndal-VizRT
Copy link
Author

I didn't even ask. Did this fix things in your env? All good?

Only got it deployed this morning, but at least so far everything is good and the builds broken by the switch to NETWORK SERVICE are working again. Thanks a lot for the speedy fix, it was above and beyond!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants