Gzip compress userdata on Linux#28
Conversation
cloud-init does not support zip compression but instead supports gzip compression. Trying to use zip compression confuses cloud-init.
internal/spec/spec.go
Outdated
| return "", fmt.Errorf("failed to generate userdata: %w", err) | ||
| } | ||
| udata = []byte(cloudCfg) | ||
| gzipped := gzip.NewWriter(&b) |
There was a problem hiding this comment.
What do you think about creating a new function like:
func maybeCompressUserdata(udata []byte, targetOS bootstrapParams.OSType) ([]byte, error) {
// feel free to define the result of 1<<24 as a constant.
if len(udata) < 1<<24 {
return udata, nil
}
var b bytes.Buffer
switch targetOS {
case params.Windows:
zipped := zip.NewWriter(&b)
fd, err := zipped.Create("udata")
if err != nil {
return "", err
}
if _, err := fd.Write(udata); err != nil {
return "", fmt.Errorf("failed to compress cloud config: %w", err)
}
if err := zipped.Close(); err != nil {
return "", err
}
default:
gzipped := gzip.NewWriter(&b)
if _, err := gzipped.Write(udata); err != nil {
return "", fmt.Errorf("failed to compress cloud config: %w", err)
}
if err := gzipped.Close(); err != nil {
return "", err
}
}
return b.Bytes(), nil
}For linux, this will most likely not compress the userdata, as it's under 16KB. It will only compress it if you override the userdata in the pool via extra specs with something larger. It will automatically compress if it goes above 16KB. AWS will complain if it's larder than 16 KB anyway because in 2025 AD, it still has the same userdata limit it did in 2007.
We can use a function like this (I have not tested it) immediately after the switch statement in the ComposeUserData() function.
What do you think? This way you should see it in plain text in the instance userdata for linux machines at least.
There was a problem hiding this comment.
Makes sense. I have to step away for a while and won't be able to test until later today, but went ahead and pushed the change if you happen to get around to testing it before I do.
And don't get me started on AWS's ridiculousness 😂 I have to do deal with it way too much already! haha
There was a problem hiding this comment.
Just got around to testing this and it seems to work as intended. I can see the userdata in plaintext by default, so this seems to be good to go unless you want to move that 1<<24 out to a constant.
There was a problem hiding this comment.
nah. We can do that later. Thanks for the PR!
There was a problem hiding this comment.
Did you by any chance test with a Windows runner as well? No worries if not, I can test it out before merging.
There was a problem hiding this comment.
Ooh thanks for the suggestion. I tried launching a Windows runner and it's complaining about the user data limit 😞
{"time":"2025-09-16T10:50:42.466779291Z","level":"ERROR","source":{"function":"github.com/cloudbase/garm/runner/pool.(*basePoolManager).addPendingInstances.func1","file":"/opt/garm/garm/runner/pool/pool.go","line":1564},"msg":"failed to add instance to provider","error":"error creating instance: provider binary /usr/local/bin/garm-provider-aws returned error: provider binary failed with stdout: ; stderr: failed to run command: failed to create instance in provider: failed to create instance: failed to create instance: operation error EC2: RunInstances, https response error StatusCode: 400, RequestID: b818e72f-edc1-4b9e-95cc-833d41c19932, api error InvalidParameterValue: User data is limited to 16384 bytes\n: exit status 1","runner_name":"garm-uiBdLbGzFKAf","pool_mgr":"sapslaj/garm-test","endpoint":"github.com","pool_type":"repository"}
{"time":"2025-09-16T10:50:42.469819054Z","level":"ERROR","source":{"function":"github.com/cloudbase/garm/runner/pool.(*basePoolManager).addPendingInstances.func1","file":"/opt/garm/garm/runner/pool/pool.go","line":1573},"msg":"failed to create instance in provider","error":"error creating instance: provider binary /usr/local/bin/garm-provider-aws returned error: provider binary failed with stdout: ; stderr: failed to run command: failed to create instance in provider: failed to create instance: failed to create instance: operation error EC2: RunInstances, https response error StatusCode: 400, RequestID: b818e72f-edc1-4b9e-95cc-833d41c19932, api error InvalidParameterValue: User data is limited to 16384 bytes\n: exit status 1","runner_name":"garm-uiBdLbGzFKAf","pool_mgr":"sapslaj/garm-test","endpoint":"github.com","pool_type":"repository"}
let me see if I can figure out what's going wrong if anything...
There was a problem hiding this comment.
Found the issue. 1<<24 is 16 MB not 16 KB. Seems to be fixed now. Windows runners are working on my install at least!
There was a problem hiding this comment.
it's fine. I messed up only by a couple of orders of magnitude. Near miss. 😅
Waiting for the CI and merging. Thanks again for everything!
There was a problem hiding this comment.
Haha, that happens to the best of us. No problem, thanks for making a great project!
Per PR feedback
16777216 bytes is 16 MB not 16 KB
|
FYI, I just merged: cloudbase/garm#525 It's available in the This might help your use case. |
|
Thanks @gabriel-samfira! Since this PR was merged I haven't had any issues but I'll definitely give that a try ASAP. This looks like the route is a subpath of the metadata URL? Just want to confirm since I have particular subpaths with IP allowlists on my GARM ingress so want to make sure that's all going to work as expected. |
|
yep. It's called |
|
Awesome, thank you so much @gabriel-samfira! |
cloud-init does not support zip compression but instead supports gzip compression. Trying to use zip compression confuses cloud-init.
I manually tested this as working:
This doesn't close the feature request in #27 but it at least fixes the underlying issue of not being able to spin up Linux instances.