Skip to content

Gzip compress userdata on Linux#28

Merged
gabriel-samfira merged 3 commits intocloudbase:mainfrom
sapslaj:gzip-for-linux
Sep 16, 2025
Merged

Gzip compress userdata on Linux#28
gabriel-samfira merged 3 commits intocloudbase:mainfrom
sapslaj:gzip-for-linux

Conversation

@sapslaj
Copy link
Contributor

@sapslaj sapslaj commented Sep 15, 2025

cloud-init does not support zip compression but instead supports gzip compression. Trying to use zip compression confuses cloud-init.

I manually tested this as working:

root@ip-172-31-13-23:~# systemctl status actions.runner.sapslaj-garm-test.garm-test.service
● actions.runner.sapslaj-garm-test.garm-test.service - GitHub Actions Runner (actions.runner.sapslaj-garm-test.garm-test)
     Loaded: loaded (/etc/systemd/system/actions.runner.sapslaj-garm-test.garm-test.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-09-15 12:15:50 UTC; 1min 3s ago
   Main PID: 1791 (runsvc.sh)
      Tasks: 21 (limit: 2057)
     Memory: 47.5M (peak: 48.0M)
        CPU: 2.633s
     CGroup: /system.slice/actions.runner.sapslaj-garm-test.garm-test.service
             ├─1791 /bin/bash /home/runner/actions-runner/runsvc.sh
             ├─1794 ./externals/node20/bin/node ./bin/RunnerService.js
             └─1802 /home/runner/actions-runner/bin/Runner.Listener run --startuptype service

Sep 15 12:15:50 ip-172-31-13-23 systemd[1]: Started actions.runner.sapslaj-garm-test.garm-test.service - GitHub Actions Runner (actions.run>
Sep 15 12:15:50 ip-172-31-13-23 runsvc.sh[1791]: .path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/g>
Sep 15 12:15:50 ip-172-31-13-23 runsvc.sh[1794]: Starting Runner listener with startup type: service
Sep 15 12:15:50 ip-172-31-13-23 runsvc.sh[1794]: Started listener process, pid: 1802
Sep 15 12:15:50 ip-172-31-13-23 runsvc.sh[1794]: Started running service
Sep 15 12:15:52 ip-172-31-13-23 runsvc.sh[1794]: √ Connected to GitHub
Sep 15 12:15:52 ip-172-31-13-23 runsvc.sh[1794]: Current runner version: '2.328.0'
Sep 15 12:15:52 ip-172-31-13-23 runsvc.sh[1794]: 2025-09-15 12:15:52Z: Listening for Jobs
root@ip-172-31-13-23:~# file /var/lib/cloud/instance/user-data.txt
/var/lib/cloud/instance/user-data.txt: gzip compressed data, original size modulo 2^32 7353
image

This doesn't close the feature request in #27 but it at least fixes the underlying issue of not being able to spin up Linux instances.

cloud-init does not support zip compression but instead supports gzip
compression. Trying to use zip compression confuses cloud-init.
return "", fmt.Errorf("failed to generate userdata: %w", err)
}
udata = []byte(cloudCfg)
gzipped := gzip.NewWriter(&b)
Copy link
Member

@gabriel-samfira gabriel-samfira Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about creating a new function like:

func maybeCompressUserdata(udata []byte, targetOS bootstrapParams.OSType) ([]byte, error) {
	// feel free to define the result of 1<<24 as a constant.
	if len(udata) < 1<<24 {
		return udata, nil
	}
	var b bytes.Buffer
	switch targetOS {
	case params.Windows:
		zipped := zip.NewWriter(&b)
		fd, err := zipped.Create("udata")
		if err != nil {
			return "", err
		}
		if _, err := fd.Write(udata); err != nil {
			return "", fmt.Errorf("failed to compress cloud config: %w", err)
		}
		if err := zipped.Close(); err != nil {
			return "", err
		}
	default:
		gzipped := gzip.NewWriter(&b)
		if _, err := gzipped.Write(udata); err != nil {
			return "", fmt.Errorf("failed to compress cloud config: %w", err)
		}
		if err := gzipped.Close(); err != nil {
			return "", err
		}
	}
	return b.Bytes(), nil
}

For linux, this will most likely not compress the userdata, as it's under 16KB. It will only compress it if you override the userdata in the pool via extra specs with something larger. It will automatically compress if it goes above 16KB. AWS will complain if it's larder than 16 KB anyway because in 2025 AD, it still has the same userdata limit it did in 2007.

We can use a function like this (I have not tested it) immediately after the switch statement in the ComposeUserData() function.

What do you think? This way you should see it in plain text in the instance userdata for linux machines at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have to step away for a while and won't be able to test until later today, but went ahead and pushed the change if you happen to get around to testing it before I do.

And don't get me started on AWS's ridiculousness 😂 I have to do deal with it way too much already! haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got around to testing this and it seems to work as intended. I can see the userdata in plaintext by default, so this seems to be good to go unless you want to move that 1<<24 out to a constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah. We can do that later. Thanks for the PR!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you by any chance test with a Windows runner as well? No worries if not, I can test it out before merging.

Copy link
Contributor Author

@sapslaj sapslaj Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh thanks for the suggestion. I tried launching a Windows runner and it's complaining about the user data limit 😞

{"time":"2025-09-16T10:50:42.466779291Z","level":"ERROR","source":{"function":"github.com/cloudbase/garm/runner/pool.(*basePoolManager).addPendingInstances.func1","file":"/opt/garm/garm/runner/pool/pool.go","line":1564},"msg":"failed to add instance to provider","error":"error creating instance: provider binary /usr/local/bin/garm-provider-aws returned error: provider binary failed with stdout: ; stderr: failed to run command: failed to create instance in provider: failed to create instance: failed to create instance: operation error EC2: RunInstances, https response error StatusCode: 400, RequestID: b818e72f-edc1-4b9e-95cc-833d41c19932, api error InvalidParameterValue: User data is limited to 16384 bytes\n: exit status 1","runner_name":"garm-uiBdLbGzFKAf","pool_mgr":"sapslaj/garm-test","endpoint":"github.com","pool_type":"repository"}
{"time":"2025-09-16T10:50:42.469819054Z","level":"ERROR","source":{"function":"github.com/cloudbase/garm/runner/pool.(*basePoolManager).addPendingInstances.func1","file":"/opt/garm/garm/runner/pool/pool.go","line":1573},"msg":"failed to create instance in provider","error":"error creating instance: provider binary /usr/local/bin/garm-provider-aws returned error: provider binary failed with stdout: ; stderr: failed to run command: failed to create instance in provider: failed to create instance: failed to create instance: operation error EC2: RunInstances, https response error StatusCode: 400, RequestID: b818e72f-edc1-4b9e-95cc-833d41c19932, api error InvalidParameterValue: User data is limited to 16384 bytes\n: exit status 1","runner_name":"garm-uiBdLbGzFKAf","pool_mgr":"sapslaj/garm-test","endpoint":"github.com","pool_type":"repository"}

let me see if I can figure out what's going wrong if anything...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the issue. 1<<24 is 16 MB not 16 KB. Seems to be fixed now. Windows runners are working on my install at least!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine. I messed up only by a couple of orders of magnitude. Near miss. 😅

Waiting for the CI and merging. Thanks again for everything!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, that happens to the best of us. No problem, thanks for making a great project!

@gabriel-samfira gabriel-samfira merged commit 296f095 into cloudbase:main Sep 16, 2025
1 check passed
@gabriel-samfira
Copy link
Member

FYI, I just merged: cloudbase/garm#525

It's available in the main branch. It allows you to manage runner install templates. When used, the actual userdata script is minimal is is meant to download the install script which is generated from the template you can view/edit in the web UI or via the CLI (see demo in the PR).

This might help your use case.

@sapslaj
Copy link
Contributor Author

sapslaj commented Sep 25, 2025

Thanks @gabriel-samfira! Since this PR was merged I haven't had any issues but I'll definitely give that a try ASAP.

This looks like the route is a subpath of the metadata URL? Just want to confirm since I have particular subpaths with IP allowlists on my GARM ingress so want to make sure that's all going to work as expected.

@sapslaj sapslaj deleted the gzip-for-linux branch September 25, 2025 12:57
@gabriel-samfira
Copy link
Member

yep. It's called /install-script inside the same subpath as the rest of the metadata URLs.

@sapslaj
Copy link
Contributor Author

sapslaj commented Sep 25, 2025

Awesome, thank you so much @gabriel-samfira!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants