refactor(nodeadm): rework containerd template mixin interface#2135
refactor(nodeadm): rework containerd template mixin interface#2135ndbaker1 wants to merge 1 commit intoawslabs:mainfrom
Conversation
cartermckinnon
left a comment
There was a problem hiding this comment.
Bit too much refactoring mixed in with the logical changes, IMO; I think this PR can be split
| if err := writeBaseRuntimeSpec(cfg); err != nil { | ||
| return err | ||
| } | ||
| type configurator interface { |
There was a problem hiding this comment.
Name is a little clunky, I liked mixin in the prev impl 😉 or maybe modifier?
There was a problem hiding this comment.
ill try going with modifier
| if err := writeBaseRuntimeSpec(c); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
What's this about? Doesn't seem related to the other changes
There was a problem hiding this comment.
The top level daemon config calls should contain separate components and i didn't like how the runtime spec was nested inside the config.toml call.
i think the kubelet daemon sets the right precedent
amazon-eks-ami/nodeadm/internal/kubelet/daemon.go
Lines 28 to 44 in 28f65e2
| case 1: | ||
| configurators[0].Transform(&configVars) | ||
| default: | ||
| zap.L().Error("Skipping configuration because more than one configurator was matched", zap.Any("configurators", configurators)) |
There was a problem hiding this comment.
I don't think this should be an error; configurator-s should be composable or there's not much benefit to the abstraction.
There was a problem hiding this comment.
I agree this looks funky, mainly was a byproduct of trying to continue using the abstraction without a great reason. It doesn't work like a real mixin, it just populates a set of parameters and wouldn't make sense if they got written more than once
I think ill just call them in a loop if it matches rather than doing this additional aggregation + check
cf5d952 to
a537482
Compare
|
#2146 replaces this PR for PCIe detection functionality, while this will be turned into a refactor for the |
|
superceded by #2227 |
Issue #, if available:
Description of changes:
In order to recognize when the running instance has GPU devices, we currently use a hard coded list of instance type families. to keep this list updated or remove it we can do one of the following:
ec2.DescribeInstanceTypesto get whether the current instance has a GPU devicethis PR implements (3) from above by looking for the
nvidiavendor name in/proc/bus/pcie/device. the original logic is left in and PCIe devices are used as a fallback.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Testing Done
See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.