Skip to content

Conversation

@GaOrtiga
Copy link
Contributor

Description

During the creation of firewall rules, if one of the limits for the ports is not informed, it is saved in the database as Null, indicating that there is no starting/ending limit.

while creating a Kubernetes cluster, if the selected network has a rule that contains ports saved as null, an error is thrown, stopping the execution of the process.

This behaviour has been fixed, making it so that any port saved as null is regarded as being being on its respective limit (1 for start ports and 65535 for end ports) during the creation of a Kubernetes cluster.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I created a rule in an existing network, with null starting port and ending on port 10, making it so it should not conflict with any of the ports required by the Kubernetes cluster. Before applying the changes, an error would be thrown.

I repeated the process after applying the changes and the cluster was successfully created.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@codecov
Copy link

codecov bot commented Jun 12, 2024

Codecov Report

Attention: Patch coverage is 13.63636% with 19 lines in your changes missing coverage. Please review.

Project coverage is 16.02%. Comparing base (b19c069) to head (f893bd4).
Report is 317 commits behind head on main.

Files with missing lines Patch % Lines
...va/com/cloud/network/dao/FirewallRulesDaoImpl.java 0.00% 11 Missing ⚠️
...KubernetesClusterResourceModifierActionWorker.java 0.00% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9223      +/-   ##
============================================
+ Coverage     15.81%   16.02%   +0.21%     
- Complexity    12554    13147     +593     
============================================
  Files          5629     5658      +29     
  Lines        492028   496325    +4297     
  Branches      62750    60111    -2639     
============================================
+ Hits          77813    79538    +1725     
- Misses       405892   407938    +2046     
- Partials       8323     8849     +526     
Flag Coverage Δ
uitests 4.01% <ø> (-0.47%) ⬇️
unittests 16.86% <13.63%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GaOrtiga GaOrtiga force-pushed the fix_NPE_Kubernetes_cluster branch 2 times, most recently from 1be11ff to a8ce0f2 Compare June 12, 2024 18:35
Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

not tested

@borisstoyanov
Copy link
Contributor

@blueorangutan package

@borisstoyanov borisstoyanov self-assigned this Jun 13, 2024
@blueorangutan
Copy link

@borisstoyanov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 9927

borisstoyanov

This comment was marked as outdated.

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a failing test @GaOrtiga

13:56:56 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.128 s - in com.cloud.kubernetes.version.KubernetesVersionManagerImplTest
13:56:56 [INFO] Running com.cloud.kubernetes.version.KubernetesVersionServiceTest
13:56:56 [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.234 s - in com.cloud.kubernetes.version.KubernetesVersionServiceTest
13:56:57 [INFO] 
13:56:57 [INFO] Results:
13:56:57 [INFO] 
13:56:57 [ERROR] Failures: 
13:56:57 [ERROR]   KubernetesClusterManagerImplTest.validateIsolatedNetworkIpRulesApiConflictingRules Expected exception: com.cloud.exception.InvalidParameterValueException
13:56:57 [ERROR]   KubernetesClusterManagerImplTest.validateIsolatedNetworkIpRulesSshConflictingRules Expected exception: com.cloud.exception.InvalidParameterValueException
13:56:57 [INFO] 
13:56:57 [ERROR] Tests run: 39, Failures: 2, Errors: 0, Skipped: 0
13:56:57 [INFO] 
13:56:57 [INFO] ------------------------------------------------------------------------
13:56:57 [INFO] Reactor Summary for Apache CloudStack 4.20.0.0-SNAPSHOT:

@borisstoyanov borisstoyanov self-requested a review June 13, 2024 11:43
@borisstoyanov
Copy link
Contributor

thanks @GaOrtiga
@blueorangutan package

@blueorangutan
Copy link

@borisstoyanov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9929

Copy link
Contributor

@FelipeM525 FelipeM525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR in a local environment by trying to create a k8s cluster on a network with a firewall rule containing an empty start port.

Before

image

After

image

image

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10382

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-10884)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 51258 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9223-t10884-kvm-centos7.zip
Smoke tests completed. 129 look OK, 8 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_role_account_acls_multiple_mgmt_servers Error 2.15 test_dynamicroles.py
test_query_async_job_result Error 102.01 test_async_job.py
test_revoke_certificate Error 0.02 test_certauthority_root.py
test_configure_ha_provider_invalid Error 0.01 test_hostha_simulator.py
test_configure_ha_provider_valid Error 0.04 test_hostha_simulator.py
test_ha_configure_enabledisable_across_clusterzones Error 0.01 test_hostha_simulator.py
test_ha_disable_feature_invalid Error 0.01 test_hostha_simulator.py
test_ha_enable_feature_invalid Error 0.01 test_hostha_simulator.py
test_ha_list_providers Error 0.01 test_hostha_simulator.py
test_ha_multiple_mgmt_server_ownership Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_available Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_degraded Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_fenced Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_recovering Error 0.01 test_hostha_simulator.py
test_hostha_configure_default_driver Error 0.01 test_hostha_simulator.py
test_hostha_configure_invalid_provider Error 0.01 test_hostha_simulator.py
test_hostha_disable_feature_valid Error 0.01 test_hostha_simulator.py
test_hostha_enable_feature_valid Error 0.01 test_hostha_simulator.py
test_hostha_enable_feature_without_setting_provider Error 0.01 test_hostha_simulator.py
test_list_ha_for_host Error 0.01 test_hostha_simulator.py
test_list_ha_for_host_invalid Error 0.01 test_hostha_simulator.py
test_list_ha_for_host_valid Error 0.01 test_hostha_simulator.py
test_01_host_ping_on_alert Error 0.07 test_host_ping.py
test_01_host_ping_on_alert Error 0.07 test_host_ping.py
test_01_browser_migrate_template Error 15.30 test_image_store_object_migration.py
test_06_purge_expunged_vm_background_task Failure 340.83 test_purge_expunged_vms.py
test_hostha_enable_ha_when_host_disabled Error 5.79 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Error 305.91 test_hostha_kvm.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-10912)
Environment: kvm-alma8 (x2), Advanced Networking with Mgmt server a8
Total time taken: 52725 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9223-t10912-kvm-alma8.zip
Smoke tests completed. 135 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_06_purge_expunged_vm_background_task Failure 339.14 test_purge_expunged_vms.py
test_03_secured_to_nonsecured_vm_migration Error 396.95 test_vm_life_cycle.py

@DaanHoogland
Copy link
Contributor

@borisstoyanov , @FelipeM525 tested (#9223 (review)), do you still want to test this as well?

@JoaoJandre
Copy link
Contributor

@blueorangutan package

@GaOrtiga GaOrtiga force-pushed the fix_NPE_Kubernetes_cluster branch from 97e3988 to b3be45e Compare September 11, 2024 17:41
@GaOrtiga
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@GaOrtiga a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11098

@weizhouapache
Copy link
Member

@GaOrtiga
Can you update the title with short summay of the issue ?

@GaOrtiga GaOrtiga changed the title fix error during kubernetes cluster creation Fix NPE during kubernetes cluster creation when ports are saved as null on DB Sep 12, 2024
@GaOrtiga GaOrtiga changed the title Fix NPE during kubernetes cluster creation when ports are saved as null on DB Fix NPE during kubernetes cluster creation when network has rules with ports saved as null on DB Sep 12, 2024
@GaOrtiga
Copy link
Contributor Author

@GaOrtiga can you review and address build failures? Thanks.

I have rebased it, everything seems fine now.

@GaOrtiga Can you update the title with short summay of the issue ?

Title updated.

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11103

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11481)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53922 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9223-t11481-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Contributor

@borisstoyanov is this lgty now?

Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaOrtiga

If the network state is implemented and has an existing firewall rule.

The network already has a running state router

CKS cluster fails with the following exception

Screenshot 2024-09-23 at 4 17 13 PM

Logs

2024-09-23 10:44:27,659 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] (qtp253011924-23:[ctx-ca5a29af, ctx-1c410861]) (logid:fd9f9d79) Validating rule with purpose: Firewall for network: 79a2d575-5596-4637-9675-8d39614a8515 with ports: 1-65535
2024-09-23 10:44:27,666 INFO  [c.c.a.ApiServer] (qtp253011924-23:[ctx-ca5a29af, ctx-1c410861]) (logid:fd9f9d79) Network ID: 79a2d575-5596-4637-9675-8d39614a8515 has conflicting firewall rules to provision Kubernetes cluster for API access

There is no issue with CKS cluster creation if the network state is allocated and has an existing firewall rule.

The network does not have a running router

Screenshot 2024-09-23 at 4 28 38 PM

@kiranchavala kiranchavala self-assigned this Sep 23, 2024
@kiranchavala
Copy link
Contributor

@GaOrtiga

Could you please fix the scenario where the

network state is implemented and has an existing firewall rule

@DaanHoogland
Copy link
Contributor

@GaOrtiga

Could you please fix the scenario where the

network state is implemented and has an existing firewall rule

@kiranchavala , I think @GaOrtiga left the community (for now) can you assess the state of this PR for us? ie,

  • can it be merged
  • if yes is there work remaining for new issues / PRs
  • if no, what needs to be done?
    most specifically; when you talk about " the scenario where the network state is implemented and has an existing firewall rule", is this a regression or a secondary finding? It seems to me not to have anything to do with the NPE fix the subject is about.

@borisstoyanov borisstoyanov removed their request for review February 4, 2025 11:45
@borisstoyanov borisstoyanov removed their assignment Feb 4, 2025
@winterhazel
Copy link
Member

If the network state is implemented and has an existing firewall rule.

The network already has a running state router

CKS cluster fails with the following exception

That's the expected behavior, isn't it? ACS will block cluster creation because it needs to add firewall rules that will conflict with the existing one.

What we could do is remove this limitation and, before adding the necessary rules, check if there is already a rule covering them; if there is, we do not add the rules. This would be a separate enhancement though.

There is no issue with CKS cluster creation if the network state is allocated and has an existing firewall rule.

The network does not have a running router

Also the current expected behavior. No validation is performed if the network's state is Allocated.

private void validateIsolatedNetwork(Network network, int clusterTotalNodeCount) {
if (!Network.GuestType.Isolated.equals(network.getGuestType())) {
return;
}
if (Network.State.Allocated.equals(network.getState())) { // Allocated networks won't have IP and rules
return;
}

I haven't looked deeper into whether there is a better reason for this, but it seems to me that we could remove this so that the rules get validated for Allocated networks. Again, also a separate enhancement.

@kiranchavala , I think @GaOrtiga left the community (for now) can you assess the state of this PR for us? ie,

* can it be merged

* if yes is there work remaining for new issues / PRs

* if no, what needs to be done?
  most specifically; when you talk about " the scenario where the network state is implemented and has an existing firewall rule", is this a regression or a secondary finding? It seems to me not to have anything to do with the NPE fix the subject is about.

@DaanHoogland I think that it can be merged, what @kiranchavala's pointed out seems like issues that can be addressed separately for me.

@kiranchavala
Copy link
Contributor

Thanks @winterhazel

@DaanHoogland this can be merged, I will create separate enhancement requests for the issues reported

Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested manually

@DaanHoogland DaanHoogland merged commit f8563b8 into apache:main Feb 13, 2025
24 of 26 checks passed
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 19, 2025
…h ports saved as null on DB (apache#9223)

Co-authored-by: Gabriel <[email protected]>
Co-authored-by: Fabricio Duarte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants