- 
                Notifications
    You must be signed in to change notification settings 
- Fork 634
📖 Proposal for ROSA networking #5381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
📖 Proposal for ROSA networking #5381
Conversation
| [APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. 
Needs approval from an approver in each of these files:
 Approvers can indicate their approval by writing  | 
| Hi @mzazrivec. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with  Once the patch is verified, the new status will be reflected by the  I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more details for the proposed CRD RosaNetwork. The proposal mention with four attributes: name, AZ count, region and CIDR block for VPC , what is the spec and status fields ?
- Is it namespace scope ? cluster scope ?
- What is the status conditions ? ex; validate/phases/applied ?
- As you are using cloud formation, how do you propagate the errors from cloud formation template to the CR status ?
- ROSAControlPlane will refer to the RosaNetwork, What fields the RosaControlplane will read to figure out the subnet-id and zones
- How do you will handle deleting the RosaNetwork while there is a RosaControlPlane refer to it ?
|  | ||
| 1. As a CAPA user, I want to be able to provision the network infrastructure for ROSA HCP. | ||
| 2. As a CAPA user, I want to be able to use the provisioned network infrastructure in ROSA HCP control plane. | ||
| 3. As a CAPA user, I want to be able to modify the network inrastructure previously provisioned by CAPA. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that possible, you may change tags or labels but I don't think we can change VPC attributes after creating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I removed the stack modification part as it can get a bit tricky.
38dadd9    to
    624dd12      
    Compare
  
    | 
 Yes, I added more details regarding the spec / status part of the new custom resource. 
 Namespaced. Added to the doc. 
 This is the part I'm not 100% sure about, but I'd probably prefer being consistent with what CAPA already uses, e.g. something like: 
 Through the  
 Added to the doc. 
 On one hand it would make sense to prevent RosaNetwork deletion should it be referenced from an existing control plane, but not quite sure how to implement that without using some type of labeling or something else that would allow CAPA to see whether or not is the particular RosaNetwork being referenced or not. In short, I don't want to be implementing any checks in the initial implementation. | 
| 1. A new reconciler for the new custom resource, implementing creation and deletion. The reconciler will be using an existing [CloudFormation template from ROSA CLI](https://github.com/openshift/rosa/blob/master/cmd/create/network/templates/rosa-quickstart-default-vpc/cloudformation.yaml) and will use AWS CloudFormation API to create and delete the AWS CloudFormation stack. | ||
|  | ||
| Outputs and resources created in the cloudformation stack will be tracked under `status` of the `RosaNetwork` type. In particular, the `status` will have the following fields: | ||
| * publicSubnets: array of public subnet ids | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it one list of ;
- availabilityZone
 privateSubnet
 publicSubnet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the above. Do you want all the AZs, private and public subnets to be rolled into one array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the status to be an array of the subnets grouped by AZs.
| Adopting the CloudFormation template used by rosa-cli would mean that CAPA and the `RosaNetwork` custom resource would be relying on a mechanism that is know to work well and any changes or fixes implemented in ROSA CLI would be picked up automatically in CAPA. | ||
|  | ||
| In practical terms, implementation of the proposal would mean: | ||
| 1. A new namespaced custom resource definition `RosaNetwork` in CAPA with four attributes: name, AZ count, region and CIRD block for VPC. These four attributes will become the `spec` part of the new `RosaNetwork` type. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the go struct OR CR example for the rosaNetwork.spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| ``` | ||
| status: | ||
| resources: | ||
| - resource: NATGateway1 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this will duplication for subnet-ids ? may we don't need the list above for (zone, pub-subnet, priv-subnet) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, duplicated, as a convenience function.
| status: CREATE_COMPLETE | ||
| reason: | ||
| ``` | ||
| and will be reflecting the resource definitions from the CloudFormation template (`resource`) and the values coming from AWS CloudFormation API (`id`, `status` and `reason). | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the go struct OR CR example for the rosaNetwork.status and include the suggested conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
624dd12    to
    8a14cc6      
    Compare
  
    | name: rosa-network-01 | ||
| namespace: default | ||
| spec: | ||
| availability_zone_count: 3 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for not specifying the regions? which regions will be used if the count is 2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the template here can you clarify why is the max count is 3 ? some region like us-west-2 has 4 AZ.
Lets have the availabilityZone as list of strings similar to rosaControlPlane
It should be okay if we will need to change the template a bit to specify the zones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for not specifying the regions? which regions will be used if the count is 2 ?
It is possible to specify the region. It is one of the input parameters of the CF template.
I'm guessing you meant AZs here, not the regions. It's not possible to specify the AZs, because that's how the CF template is written. You just specify the number of AZs you want to be used and they are being taken in the alphabetical order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rosa cli you can also set only the number of AZs. So I think it should be fine to implement it similarly here. If user wants more control about network,  can create network and pass it to RosaControlPlane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the template here can you clarify why is the max count is 3 ? some region like us-west-2 has 4 AZ.
Don't know why 3 is the limit. But that's how the current template is.
Lets have the availabilityZone as list of strings similar to rosaControlPlane It should be okay if we will need to change the template a bit to specify the zones
That would be beyond the scope of this proposal document and the initial implementation. It's also not just a matter of changing the CF template. This change would mean that the AZs are the spec of the CR, not the status of the CR. Or maybe even both the spec and status depending on whether you want to specify the AZ count or the AZ list.
It could potentially be a future enhancement, though I'm not entirely convinced that it would all that useful. In the multi-zone cluster installation you're probably more interested in the number of zones you're going to be using rather than their names, which is how the template is written now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To let end user has the ability to set the AZs its not beyond the proposal scope.
- As endUser using us-west-2 region it has 4 AZs, Why I cannot create subnets in all AZs (forced to 3 AZs) ? Is there a technical reason for that ?
- As endUser us-east-2 region, I want to specify which AZ that I can create subnet to it ? Is there technical reason that I cannot specify AZ ? (The RosaControlPlane has the option to specify the AZs and subnets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing, the CLI allows attempts to create (passing validation) HCP clusters with any number of AZs (one public subnet, 4 private on 4 different AZs worked. As well as 2 different AZs/private subnets, and 3)
There seems to be no official API validation for the number of subnets/AZs, meaning we can have 1 for private link, 2 for public, but any number more for either one. I was unable to find any restrictions besides at least one for private, at least 2 for public. HCP must be MultiAZ as well so this is always the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To let end user has the ability to set the AZs its not beyond the proposal scope.
- As endUser using us-west-2 region it has 4 AZs, Why I cannot create subnets in all AZs (forced to 3 AZs) ? Is there a technical reason for that ?
That particular problem / question is not really related to this proposal nor can I really address it here.
- As endUser us-east-2 region, I want to specify which AZ that I can create subnet to it ? Is there technical reason that I cannot specify AZ ? (The RosaControlPlane has the option to specify the AZs and subnets)
Well, technical reason .... There's a technical limitation right now: the CF template in rosa-cli does not allow it / support it. Things may change on that front for sure and it may be an enhancement (meaning both in rosa-cli & CAPA) in the future, but right now it's not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the ability to specify explicit AZs: I created openshift/rosa#2834 and adjusted my proposal doc accordingly.
| spec: | ||
| availability_zone_count: 3 | ||
| region: us-west-2 | ||
| cidr_block: 10.0.0.0/16 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cidrBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| name: rosa-network-01 | ||
| namespace: default | ||
| status: | ||
| publicSubnets: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lists cannot tell which public, private, zone belong to each other, better to have it as below
| publicSubnets: | |
| subnets: | |
| - availabilityZone: us-west-2a | |
| privateSubnet: subnet-1d9f28ba992a83514 | |
| publicSubnet: subnet-0d9f28ba991b93514 | |
| - availabilityZone: us-west-2b | |
| privateSubnet: subnet-1d9f28ba992a83513 | |
| publicSubnet: subnet-0d9f28ba991b93513 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
| name: hcp01-control-plane | ||
| namespace: default | ||
| spec: | ||
| RosaNetworkRef: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rosaNetworkRef
| RosaNetworkRef: | |
| rosaNetworkRef: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| RosaNetworkRef: | ||
| name: hcp01-rosa-network | ||
| ``` | ||
| Should the ROSA control plane CR contain reference to ROSA network, the reconciler will read the AWS region, AZ and subnet ids parameters from the ROSA network CR. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets also mention that end user cannot set both RosaControlPlane->spec->subnets and rosaNetworkRef at same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the user can specify both, but CAPA would prefer one to the other, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we will need to validate that in the webook raising error if user set both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the doc.
a36c4cc    to
    e9502ee      
    Compare
  
    | /ok-to-test | 
| Thanks @mzazrivec, looks good. You may mention as well during ROSANetwork deletion a validation must be applied to check if there is a RosaControlplane refer to it. | 
|  | ||
| #### Functional Requirements | ||
|  | ||
| 1. Ability to create a new namespaced custom resource `RosaNetwork` with four attributes: name, AZ count, region and CIDR block for VPC. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably note the AZ list, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrb yep, added to the proposal doc.
| privateSubnet: subnet-7d7e19c0af1f4d57f | ||
| ``` | ||
|  | ||
| All resources created in the cloudformation stack will be tracked under `status.resources` array: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for setting subnets as a top-level field instead of listing them under status.resources.subnets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrb in the end the subnets would be listed under both status.subnets (grouped by AZ) and status.resources. The status.resources will be a list filled with the data propagated from AWS as the stack creation progresses. My thought here was to duplicate the results we're interested in (subnets & AZs) so that we don't have to parse it out of the status.resources (i.e. a convenience function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I could rephrase to confirm I understand - the subnets are promoted to a top level resource primarily for ease of readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrb Yes, I guess you could say that. Although I meant it mostly for the benefit of readibility by other controllers: AZs and subnet ids is what the ROSA controller needs to provision the actual cluster.
Do you think this is not a good idea?
| conditions: | ||
| - lastTransitionTime: "2025-03-20T14:45:26Z" | ||
| status: "True" | ||
| type: RosaNetworkReady | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to add a Reason field[1], even for success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can add the reason in here, I think that's the last of my comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrb yep, ack 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure if I understand the linked document correctly. The document says: Use of the Reason field is required (currently in Cluster API reasons is added only when condition are false)
Does that mean we can have an empty string and still show the Reason field? Or do we have to have a non-empty reason string even for success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrb Added the reason to the example for a successful creation.
| I agree with the general direction this proposal takes. I made some comments around clarity and explicitness, paying special attention to  The examples given in YAML are very helpful! It may also be useful to include Go definitions in order to better understand optional or mutually exclusive fields, since these can often be excluded from YAML output due to the  | 
e9502ee    to
    655cac8      
    Compare
  
    655cac8    to
    02fee3e      
    Compare
  
    | I updated the proposal document slightly (mainly filed names and such) to reflect the implementation in #5464 | 
02fee3e    to
    65d9b98      
    Compare
  
    | /label tide/merge-method-squash | 
| /release-note-none | 
| @nrb are you happy with this? | 
What type of PR is this?
/kind documentation
What this PR does / why we need it:
This pull request contains a proposal for implementing provisioning of networking infrastructure for ROSA HCP clusters.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist:
Release note: