Skip to content

Conversation

@hsato03
Copy link
Collaborator

@hsato03 hsato03 commented Mar 25, 2024

Description

The method that generates the alert e-mail contains code repeated in many places and is not formatted properly.

This PR intends to refactor this method.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I changed the value of many notification threshold global settings to 0% (zone.vlan.capacity.notificationthreshold and cluster.memory.allocated.capacity.notificationthreshold for example). Then I checked through the logs that the messages were created correctly.

2024-03-25T18:22:03,626 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[4] dataCenterId=[1] podId=[null] clusterId=[null] message=[System Alert: Number of unallocated virtual network public IPs is low in availability zone [zn-test].].
2024-03-25T18:22:04,487 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Low Available Secondary Storage in availability zone [zn-test].] and content [Available secondary storage space is low, total: 60212 MB, used: 5476 MB (9.09%).].
2024-03-25T18:22:04,488 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[6] dataCenterId=[1] podId=[null] clusterId=[null] message=[System Alert: Low Available Secondary Storage in availability zone [zn-test].].
2024-03-25T18:22:04,503 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Number of unallocated VLANs is low in availability zone [zn-test].] and content [Number of unallocated VLANs is low, total: 101.0, allocated: 6.0 (5.94%).].
2024-03-25T18:22:04,504 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[18] dataCenterId=[1] podId=[null] clusterId=[null] message=[System Alert: Number of unallocated VLANs is low in availability zone [zn-test].].
2024-03-25T18:22:04,534 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Number of unallocated private IPs is low in pod Pod-zn-test of availability zone [zn-test].] and content [Number of unallocated private IPs is low, total: 10.0, allocated: 2.0 (20%)].
2024-03-25T18:22:04,534 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[5] dataCenterId=[1] podId=[1] clusterId=[null] message=[System Alert: Number of unallocated private IPs is low in pod Pod-zn-test of availability zone [zn-test].].
2024-03-25T18:22:06,243 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Low Unallocated CPU in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].] and content [Unallocated CPU is low, total: 30000 Mhz, used: 2500 Mhz (8.33%).].
2024-03-25T18:22:06,244 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[1] dataCenterId=[1] podId=[1] clusterId=[1] message=[System Alert: Low Unallocated CPU in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].].
2024-03-25T18:22:06,263 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Low Available Memory in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].] and content [System memory is low, total: 21914 MB, used: 2560 MB (11.68%).].
2024-03-25T18:22:06,263 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[0] dataCenterId=[1] podId=[1] clusterId=[1] message=[System Alert: Low Available Memory in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].].
2024-03-25T18:22:06,292 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Low Available Storage in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].] and content [Available storage space is low, total: 19019 MB, used: 4317 MB (22.7%).].
2024-03-25T18:22:06,293 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[2] dataCenterId=[1] podId=[1] clusterId=[1] message=[System Alert: Low Available Storage in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].].
2024-03-25T18:22:06,323 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) Sending alert with subject [System Alert: Remaining unallocated Storage is low in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].] and content [Unallocated storage space is low, total: 38037 MB, allocated: 20000 MB (52.58%)].
2024-03-25T18:22:06,324 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:[ctx-73e84063]) (logid:0ebef2c9) alertType=[3] dataCenterId=[1] podId=[1] clusterId=[1] message=[System Alert: Remaining unallocated Storage is low in cluster [Cluster-zn-test] pod [Pod-zn-test] of availability zone [zn-test].].

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Mar 29, 2024

Codecov Report

Attention: Patch coverage is 0% with 65 lines in your changes missing coverage. Please review.

Project coverage is 16.02%. Comparing base (085bd3b) to head (60d35a9).
Report is 75 commits behind head on 4.20.

Files with missing lines Patch % Lines
...rc/main/java/com/cloud/alert/AlertManagerImpl.java 0.00% 65 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              4.20    #8831       +/-   ##
============================================
+ Coverage     4.02%   16.02%   +12.00%     
- Complexity       0    13149    +13149     
============================================
  Files          394     5658     +5264     
  Lines        32357   496291   +463934     
  Branches      5728    60110    +54382     
============================================
+ Hits          1301    79538    +78237     
- Misses       30907   407904   +376997     
- Partials       149     8849     +8700     
Flag Coverage Δ
uitests 4.01% <ø> (-0.01%) ⬇️
unittests 16.86% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9092

@DaanHoogland
Copy link
Contributor

@blueorangutan test alma9 kvm-alma9

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-9652)
Environment: kvm-alma9 (x2), Advanced Networking with Mgmt server a9
Total time taken: 51042 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8831-t9652-kvm-alma9.zip
Smoke tests completed. 129 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, good cleanup

Copy link
Contributor

@BryanMLima BryanMLima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM

@hsato03
Copy link
Collaborator Author

hsato03 commented Jun 3, 2024

Any other concerns about this one?

@DaanHoogland
Copy link
Contributor

Any other concerns about this one?

any third party testing done (i.e. not the developer)?

@hsato03
Copy link
Collaborator Author

hsato03 commented Aug 26, 2024

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10815

@github-actions
Copy link

github-actions bot commented Jan 8, 2025

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@hsato03
Copy link
Collaborator Author

hsato03 commented Jan 27, 2025

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12222

}
sendAlert(alertType, dc, pod, cluster, msgSubject, msgContent);
logger.debug("Sending alert with subject [{}] and content [{}].", msgSubject, msgContent);
sendAlert(alertType, dc.getId(), podId, clusterId, msgSubject, msgContent);
Copy link
Member

@winterhazel winterhazel Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to test the PR, but I'm getting a NPE on line 731 (newAlert.setDataCenterId(dataCenter.getId());) because dataCenter is null when the alertType is ALERT_TYPE_MANAGEMENT_NODE

2025-02-02 00:31:03,441 WARN  [c.c.a.AlertManagerImpl] (Cluster-Notification-1:[ctx-97296f5c]) (logid:c38f0c5f) alertType=[14] dataCenter=[null] pod=[null] cluster=[null] message=[Management server node 192.168.122.10 is up].
2025-02-02 00:31:03,447 ERROR [c.c.a.AlertManagerImpl] (Cluster-Notification-1:[ctx-97296f5c]) (logid:c38f0c5f) Problem sending email alert java.lang.NullPointerException: Cannot invoke "com.cloud.dc.DataCenter.getId()" because "dataCenter" is null
	at com.cloud.alert.AlertManagerImpl.sendAlert(AlertManagerImpl.java:771)
	at com.cloud.alert.AlertManagerImpl.sendAlert(AlertManagerImpl.java:746)
	at com.cloud.alert.AlertManagerImpl.sendAlert(AlertManagerImpl.java:255)
...

I checked that it was not introduced by this PR, but #9873. However, could you fix it alongside the refactoring and target 4.20? @hsato03

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winterhazel Thanks for testing (or at least trying). I think this NPE is already being addressed in PR #10252. Furthermore, I changed the target branch to 4.20.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aight, thanks @hsato03

Will try testing soon with the changes in #10252

@hsato03 hsato03 force-pushed the fix-npe-on-alert-mail-sending branch from 57925ec to 60d35a9 Compare February 3, 2025 13:30
@hsato03 hsato03 changed the base branch from main to 4.20 February 3, 2025 13:30
Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm (not a fan of this bigswitch style, but that is not up to the author of this PR.)

Copy link
Member

@winterhazel winterhazel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I tested by doing the following steps:

  • Configured alert email reception
  • Set the notification thresholds to a low value
  • Verified that I received alert emails after the threshold was crossed, and that their contents were ok.

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12653

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12571)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53654 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8831-t12571-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_06_purge_expunged_vm_background_task Failure 402.26 test_purge_expunged_vms.py

@DaanHoogland DaanHoogland merged commit a841ed9 into apache:4.20 Mar 5, 2025
24 of 26 checks passed
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants