Skip to content

Conversation

@himani2411
Copy link
Contributor

@himani2411 himani2411 commented Mar 21, 2025

Description of changes

  • The Ip route table for rhel and rocky has clashing routes in terms of the priority given for the 2 NICs, which is why some tests pass and the other fails. This happens on the HN where the NI 0 is default one which has the elastic public IP

This change setts a higher priority for the parallelcluster added routes allow the NI 0 to be selected.

  • Renaming folders from redhat-8.network_interfaces to redhat-8/network_interfaces for files to be found
ERROR - Failed when getting instance info from EC2 with exception Connect timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"

IP Table of Parallelcluster 3.12.0 where the NI 0 has the highest priority and NI 1 has the 2nd highest priority

 ip route show table main
default via <IP-IG> dev eth0 proto dhcp src 192.168.23.21 metric 100 # N1 0 
default via <IP-IG> dev eth1 proto dhcp src 192.168.17.156 metric 101  # N1 1
default via <IP-IG> dev eth0 metric 1000 # PC added 
default via <IP-IG> dev eth1 metric 1001  # PC added 
 <IP-IG>/20 dev eth0 proto kernel scope link src 192.168.23.21 metric 100
 <IP-IG>/20 dev eth1 proto kernel scope link src 192.168.17.156 metric 101

IP Table of 3.13.0

default via  <IP-IG> dev eth1 proto dhcp src 192.168.17.156 metric 100 # clashes with the NI 0 line below
default via  <IP-IG> dev eth0 proto dhcp src 192.168.23.21 metric 100 
default via <IP-IG> dev eth1 proto dhcp src 192.168.17.156 metric 101
default via <IP-IG> dev eth0 metric 1000 # PC added 
default via <IP-IG> dev eth1 metric 1101  # PC added 
<IP-IG>/20 dev eth0 proto kernel scope link src 192.168.23.21 metric 100
<IP-IG>/20 dev eth1 proto kernel scope link src 192.168.17.156 metric 100 # clashes with the NI 0 line above
<IP-IG>/20 dev eth1 proto kernel scope link src 192.168.17.156 metric 101

Tests

ONGOING

test-suites:
  multiple_nics:
    test_multiple_nics.py::test_multiple_nics:
      dimensions:
        - regions: ["use1-az1"]
          instances: ["c6in.32xlarge"]
          oss: ["rhel8", "rhel9", "rocky8", "rocky9"]
          schedulers: ["slurm"]

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

gmarciani
gmarciani previously approved these changes Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants