Conversation
10c9a98 to
914a4f0
Compare
fbouliane
left a comment
There was a problem hiding this comment.
Small comments here and there, no big logic issues
With the size of the PR, I couldn't get as low-level as I enjoy giving feedback.
Everything is negotiable, just in the spirit of giving feedback.
| } | ||
|
|
||
| // getFailureDomainByName gets the CloudStack Failure Domain by name. | ||
| func getFailureDomainByName(ctx context.Context, k8sClient client.Client, name, namespace, clusterName string) (*infrav1.CloudStackFailureDomain, error) { |
There was a problem hiding this comment.
Improvement:
It feels like this "scope" objects have dual responsibilities.
1- Manage the cache of clients,
2- "wrap" the client methods with accessors.
Maybe a smell of another class coming out.
There was a problem hiding this comment.
This is a bit of a WIP. My intention is to split the cloud pkg into several packages, one for the Client interface and client creation functions, and one (or more) packages that implement the client interface.
| endpointSecretStrings := map[string]string{} | ||
| for k, v := range endpointSecret.Data { | ||
| endpointSecretStrings[k] = string(v) | ||
| } | ||
| bytes, err := yaml.Marshal(endpointSecretStrings) | ||
| if err != nil { | ||
| return cloudConfig, err | ||
| } | ||
|
|
||
| if err := yaml.Unmarshal(bytes, &cloudConfig); err != nil { | ||
| return cloudConfig, err | ||
| } | ||
|
|
||
| if err := cloudConfig.Validate(); err != nil { | ||
| return cloudConfig, errors.Wrapf(err, "invalid cloud config") |
There was a problem hiding this comment.
Improvement:
Not sure why this classes knows
- Where are the secrets located
- how to marshal and unmarshal yaml
- how to error if the config isn't valid.
It feels that belongs to a configManager of some sort.
There was a problem hiding this comment.
The cloudconfig part is currently part of the client bits in the cloud pkg. My intention is to separate those in a client pkg.
| isonet := &infrav1.CloudStackIsolatedNetwork{} | ||
| if s.CloudStackIsolatedNetwork == nil || s.CloudStackIsolatedNetwork.Name == "" { | ||
| err := s.client.Get(ctx, client.ObjectKey{Name: s.IsolatedNetworkName(), Namespace: s.Namespace()}, isonet) | ||
| if err != nil { | ||
| return nil, errors.Wrapf(err, "failed to get isolated network with name %s", s.IsolatedNetworkName()) | ||
| } | ||
| } | ||
|
|
||
| return isonet, nil |
There was a problem hiding this comment.
Improvement; I feel this kind of validation should be pushed a domain lower, or within a controller / client.
| g.Eventually(func() error { | ||
| ph, err := patch.NewHelper(dummies.CSCluster, testEnv.Client) | ||
| g.Expect(err).ToNot(HaveOccurred()) | ||
| dummies.CSCluster.OwnerReferences = append(dummies.CSCluster.OwnerReferences, metav1.OwnerReference{ | ||
| Kind: "Cluster", | ||
| APIVersion: clusterv1.GroupVersion.String(), | ||
| Name: dummies.CAPICluster.Name, | ||
| UID: types.UID("cluster-uid"), | ||
| }) | ||
|
|
||
| return ph.Patch(ctx, dummies.CSCluster, patch.WithStatusObservedGeneration{}) | ||
| }, timeout).Should(Succeed()) |
There was a problem hiding this comment.
Honest question: Not sure why these tests are made from a completely "isolated" / blackbox way.
Looks like triggering a single reconciliation would allow to remove the looping and timeouts, which is not what we're trying to test here.
And it looks like sometimes, we do trigger the reconcillitation manually.
Not sure how fast are these either, it might not be an issue at all, but these kind of test are usually flaky.
There was a problem hiding this comment.
Actually currently the reconcile is called manually every step. This is mostly so we can test the logic during the various phases of reconciliation (this is most visible in the machine controller). We could consider adding tests that are more integration tests, where all controllers are running and manual calls to reconcile are not needed.
In this particular case a patch for ownerreferences is not really needed, so i pushed a change for that.
There was a problem hiding this comment.
Improvement: Looks like the tested controller is missing a bit of coverage.
There was a problem hiding this comment.
Yes, the tests are not complete yet. I converted what was there in the original controllers to the new ones, but the old situation wasn't 100% either :)
9e0737f to
461abbb
Compare
kasperhendriks
left a comment
There was a problem hiding this comment.
I haven't looked at every line, but it looks good from what I can see. I agree with some of Felix's comments, but I also think this issue can be addressed in future PRs.
…amespace for dummies
…mprove testability
removed the factory from the cloud pkg, moved to scope pkg as client scope adjust cloud pkg for factory change, refactory instance operations add machine controller and basic tests
…tions and remove old GetOrCreateVMInstance implementation
…n findinstance returns err
…ne controller In the case of control plane nodes (or when specified in the Machine template), the machine controller will set the failure domain in the Machine. If this is not the case, we assign one of the failure domains set in CloudStackCluster.
worker nodes should now no longer get stuck in provisioning state, by working around the fact that CloudStack's stopped state has two meanings - new instance: stopped -> starting -> running - running -> stopped
461abbb to
0029b89
Compare
|
Addressed some issues and rebased on develop after K8s 1.30.10 / CAPI 1.8.10 changes. |
1407e91 to
80d3eb5
Compare
Issue #, if available:
Description of changes:
This PR refactors the controllers entirely to not use the base reconciler pattern but instead use a more common approach used in CAPI infrastructure providers. It also adds a scope package and client factory for easier testing and updates CAPI to 1.8.9 and K8s deps to 1.30.9. The machinestatechecker controller is completely removed, as it was no longer needed. Other than that, the goal for now was to refactor without changing any of the features.
The only API change is the deprecation of the non-CAPI-standard 'status' and 'reason' fields, and the addition of their replacements FailureReason and FailureMessage in CloudStackMachineStatus.
Some basic tests have been added to the controllers using envtest. The e2e suite is not updated and probably doesnt work at this point.
This PR targets the develop branch.
Testing performed:
not tested yet: replace pre-refactored CAPC (0.7.0) with this in an existing mgmt cluster.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.