-
Notifications
You must be signed in to change notification settings - Fork 80
Multi-cell adoption #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-cell adoption #517
Conversation
|
The recent revision gives an overview to the approach taken, PTAL. |
docs_user/modules/proc_migrating-databases-to-mariadb-instances.adoc
Outdated
Show resolved
Hide resolved
docs_user/modules/proc_migrating-databases-to-mariadb-instances.adoc
Outdated
Show resolved
Hide resolved
docs_user/modules/proc_retrieving-topology-specific-service-configuration.adoc
Outdated
Show resolved
Hide resolved
docs_user/modules/proc_retrieving-topology-specific-service-configuration.adoc
Outdated
Show resolved
Hide resolved
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/install_yamls#826 is needed. |
docs_user/modules/proc_retrieving-topology-specific-service-configuration.adoc
Outdated
Show resolved
Hide resolved
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Based on feedback from @SeanMooney, we should not shift cells names as I proposed here. We want it instead like this:
Implementing either of these is quite challenging given the local requirement to maintain code in tests in the same form as it is documented (meaning shell commands). This sofisticated logic will bring in even more loops and arrays handling into already overcomplicated code proposed in this PR draft. |
As nova-operator allows a cell to be named "default" the simplest solution would be your second proposal. Just import the cells as is. This has the benefit also that it will work even if a given customer wrongly attached computes to the default cell. |
|
I tend now to implement the last choice: for a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1. This keeps it consistent for single cell and multicell... /update: See the combined option which allows both renaming or importing as is |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/64b2ab79e55847e3b2622e10febae11e ✔️ noop SUCCESS in 0s |
|
recheck adoption-standalone-to-crc-ceph |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/10d929291834486995d25e11bddf7bf7 ✔️ noop SUCCESS in 0s |
|
recheck adoption-standalone-to-crc-ceph |
jistr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Some follow up suggestions inline, i'm mainly concerned about 2 things right now:
- Hardcoding multi-DB and multi-MQ which increases HW requirements for all jobs.
- This
$CONTROLLER1_SSH if sudo systemctl is-active tripleo_ovn_cluster_northd.service ';' then sudo systemctl stop tripleo_ovn_cluster_northd.service ';' fiwhich is likely just lack of knowledge on my part and i need to do some experimentation how SSH behaves in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like we use $ prefix for lines that begin a command, when the commands are multi-line (for cycles etc.) we use the > prefix. I was a bit sceptical about that but today i learned the copy button in the downstream docs actually works well with those -- the $ and > prefixes are not copied into clipboard.
Just nitpicking though, a thing like this should be done in a follow-up given the size and priority of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I proposed that $ and > earlier for the extracted DB adoption of multi cell setups, and we agreed to follow this on.
However, we have also agreed to address the docs review in follow up
docs_user/modules/proc_adopting-compute-services-to-the-data-plane.adoc
Outdated
Show resolved
Hide resolved
docs_user/modules/proc_adopting-compute-services-to-the-data-plane.adoc
Outdated
Show resolved
Hide resolved
tests/roles/backend_services/templates/openstack_control_plane.j2
Outdated
Show resolved
Hide resolved
if you have more then one cell nova require either that rabbit is configure to use vhosts which our rabbit cant do or that you have separate conductor per cell. that is why we use separate message queues is a limitation of our rabbit operator. each cell can share difent schemas on the same db server. for testing that valid but cell exist pruly to scale nova horizontally to overcome message queue and db bottelnecs so it normally does not make sense to share the db between cells since the db performance used to be one of the bottle necks that cells were designed to overcome ssds removed most of the db bottle neck so its really the rabbitmq throughput that is the limiting factor now. |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/7f04d9e4fa00467cad5f990d38908b7a ✔️ noop SUCCESS in 0s |
|
recheck |
Split edpm nodes into compute cells by 1:1 mapping it as dataplane nodesets. Use edpm_nodes var to describe compuptes for each cell, instead of static host and ip vars that only used to work for a single-cell standalone, or multi-node single cell cases. Also explain EDPM net config requirements in vars.sample, when it is used outside of ci-framework (local deployments). Remove edpm_computes vars no longer used after moving stopping control-plane tripleo services into edpm-ansible Simplify ENV headers management by collecting in a single place. Provide a variable to define the source cloud Ironic topology, for any cells with Ironic services. Align nova/libvirt and related services ordering in the lists of services defined in multiple places, with those specified in VA. Align the names in the tests to follow the documented steps to make the corresponding code easy discoverable. Adjust storage/storageRequests values to make it better fitting a multi-cell test scenarios. Also provide values in docs and add a comment to adjust them as needed. Stop ovn services only if active, or not missing (like on the cell controllers) Signed-off-by: Bohdan Dobrelia <[email protected]>
Signed-off-by: Bohdan Dobrelia <[email protected]>
Without that, edpm-ansible's os-net-config changes IPs on internalapi, which also breaks connectivity to EDPM hosts for ansible (restores after a node reboot though). Signed-off-by: Bohdan Dobrelia <[email protected]>
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a221b52ae8854ec08b129ff743855344 ✔️ noop SUCCESS in 0s |
|
recheck |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/36bb1aafc4f5477995134332a02d8253 ✔️ noop SUCCESS in 0s |
|
I think the octavia bug should now be fixed by #874 |
|
recheck |
|
This PR has been on review for more than half a year and received peer review earlier, and we recently agreed to postpone merging this to let #855 land, which has happened. Let's take the opportunity to merge this PR and address any tweaks in follow-ups, we already have other work waiting for this one to land. Re-adding my LGTM and approving. /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jistr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
80c021f
into
openstack-k8s-operators:main
|
This broke multinode adoption jobs:- |
|
Proposed revert #878 while this is being checked, also uni adoption jobs broken |
Split edpm nodes into compute cells by 1:1 mapping it as
dataplane nodesets.
Use edpm_nodes var to describe compuptes for each cell,
instead of static host and ip vars that only used to work for
a single-cell standalone, or multi-node single cell cases.
Also explain EDPM net config requirements in vars.sample, when
it is used outside of ci-framework (local deployments).
Remove edpm_computes vars no longer used after moving stopping
control-plane tripleo services into edpm-ansible
Simplify ENV headers management by collecting in a single place.
Provide a variable to define the source cloud Ironic topology,
for any cells with Ironic services.
Align nova/libvirt and related services ordering in the
lists of services defined in multiple places, with those
specified in VA.
Align the names in the tests to follow the documented steps
to make the corresponding code easy discoverable.
Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.
Stop ovn services only if active, or not missing (like on
the cell controllers)
Retain EDPM host IPs on internalapi network. Without that, edpm-ansible's os-net-config
changes IPs on internalapi, and also breaks connectivity to EDPM hosts for ansible
(which restores after a node reboot).
Add edpmRoleServiceName value for tlsCerts.
Jira: #OSPRH-6548