Skip to content

Add tag pool for links + fix potential consistency and dead lock issues#281

Open
Ktmi wants to merge 25 commits intomasterfrom
rebase/tag_capable
Open

Add tag pool for links + fix potential consistency and dead lock issues#281
Ktmi wants to merge 25 commits intomasterfrom
rebase/tag_capable

Conversation

@Ktmi
Copy link

@Ktmi Ktmi commented Aug 14, 2025

Part of closing kytos-ng/kytos#554

Summary

Originally this was meant to be just a forward port of the changes for the tag pool separation from feat/tag_capable branch. It adds in support for the separate tag pool for links, and implements automatic transfer of tags from interfaces to links upon link creation, as well as returning the tags from links to interfaces upon link deletion.

However, I discovered that there were potential inconsistency issues with topology in regards to the way in which locks were being used or not used in some places, in addition to some potential scenarios where deleted objects could be revived. I handled quite a few of these scenarios by adding in locks to kytos core, but I think there are still a few scenarios which haven't been handled.

Here is the hierarchy of locks, or what locks protect what groups of locks:

  • switches_lock
    • switch.lock
    • switch.interfaces_lock
      • interface.lock
  • links_lock
    • link.lock
  • multi_tag_lock
    • interface.tag_lock
    • link.tag_lock

And here is the order in which locks should be acquired:

  1. switches_lock
  2. links_lock
  3. multi_tag_lock
  4. switch.lock
  5. switch.interfaces_lock
  6. interface.lock
  7. link.lock
  8. interface.tag_lock
  9. link.tag_lock

To run this, the following branches need to be installed:

Local Tests

Link creation and deletion both appear to be working as expected. Tags are successfully taken and returned to the constituent endpoints.

End-to-End Tests

Redid tests with latest E2E version. Test are now passing with the modified tests kytos-ng/kytos-end-to-end-tests#401.

kytos-1  | Starting enhanced syslogd: rsyslogd.
kytos-1  | /etc/openvswitch/conf.db does not exist ... (warning).
kytos-1  | Creating empty database /etc/openvswitch/conf.db.
kytos-1  | Starting ovsdb-server.
kytos-1  | rsyslogd: error during config processing: omfile: chown for file '/var/log/syslog' failed: Operation not permitted [v8.2302.0 try https://www.rsyslog.com/e/2207 ]
kytos-1  | Configuring Open vSwitch system IDs.
kytos-1  | Starting ovs-vswitchd.
kytos-1  | Enabling remote OVSDB managers.
kytos-1  | + '[' -z '' ']'
kytos-1  | + '[' -z '' ']'
kytos-1  | + echo 'There is no NAPPS_PATH specified. Default will be used.'
kytos-1  | + NAPPS_PATH=
kytos-1  | + sed -i 's/STATS_INTERVAL = 60/STATS_INTERVAL = 7/g' /var/lib/kytos/napps/kytos/of_core/settings.py
kytos-1  | There is no NAPPS_PATH specified. Default will be used.
kytos-1  | + sed -i 's/CONSISTENCY_MIN_VERDICT_INTERVAL =.*/CONSISTENCY_MIN_VERDICT_INTERVAL = 60/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LINK_UP_TIMER = 10/LINK_UP_TIMER = 1/g' /var/lib/kytos/napps/kytos/topology/settings.py
kytos-1  | + sed -i 's/DEPLOY_EVCS_INTERVAL = 60/DEPLOY_EVCS_INTERVAL = 5/g' /var/lib/kytos/napps/kytos/mef_eline/settings.py
kytos-1  | + sed -i 's/LLDP_LOOP_ACTIONS = \["log"\]/LLDP_LOOP_ACTIONS = \["disable","log"\]/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/LLDP_IGNORED_LOOPS = {}/LLDP_IGNORED_LOOPS = {"00:00:00:00:00:00:00:01": \[\[4, 5\]\]}/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_COOKIE_IGNORED_RANGE =.*/CONSISTENCY_COOKIE_IGNORED_RANGE = [(0xdd00000000000000, 0xdd00000000000009)]/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LIVENESS_DEAD_MULTIPLIER =.*/LIVENESS_DEAD_MULTIPLIER = 3/g' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + kytosd --help
kytos-1  | + sed -i s/WARNING/DEBUG/g /etc/kytos/logging.ini
kytos-1  | + sed -i s/INFO/DEBUG/g /etc/kytos/logging.ini
kytos-1  | + sed -i 's/keys: root,kytos,api_server,socket/keys: root,kytos,api_server,socket,aiokafka/' /etc/kytos/logging.ini
kytos-1  | + sed -i 's/handlers: syslog,console/handlers: syslog,console,file/g' /etc/kytos/logging.ini
kytos-1  | + echo -e '\n\n[logger_aiokafka]\nlevel: INFO\nhandlers:\nqualname: aiokafka'
kytos-1  | + echo
kytos-1  | + test -z ''
kytos-1  | + TESTS=tests/
kytos-1  | + test -z ''
kytos-1  | + RERUNS=2
kytos-1  | + python3 scripts/wait_for_mongo.py
kytos-1  | Trying to run hello command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Ran 'hello' command on MongoDB successfully. It's ready!
kytos-1  | + python3 scripts/setup_kafka.py
kytos-1  | Starting setup_kafka.py...
kytos-1  | Attempting to create an admin client at ['broker1:19092', 'broker2:19092', 'broker3:19092']...
kytos-1  | Admin client was successful! Attempting to validate cluster...
kytos-1  | Cluster info: {'throttle_time_ms': 0, 'brokers': [{'node_id': 1, 'host': 'broker1', 'port': 19092, 'rack': None}, {'node_id': 2, 'host': 'broker2', 'port': 19092, 'rack': None}, {'node_id': 3, 'host': 'broker3', 'port': 19092, 'rack': None}], 'cluster_id': '5L6g3nShT-eMCtK--X86sw', 'controller_id': 2}
kytos-1  | Cluster was successfully validated! Attempting to create topic 'event_logs'...
kytos-1  | Topic 'event_logs' was created! Attempting to close the admin client...
kytos-1  | Kafka admin client closed.
kytos-1  | + python3 -m pytest tests/ --reruns 2 -r fEr
kytos-1  | ============================= test session starts ==============================
kytos-1  | platform linux -- Python 3.11.2, pytest-8.4.2, pluggy-1.6.0
kytos-1  | rootdir: /
kytos-1  | configfile: pytest.ini
kytos-1  | plugins: rerunfailures-13.0, timeout-2.2.0, asyncio-1.1.0, anyio-4.3.0
kytos-1  | asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
kytos-1  | collected 301 items
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
kytos-1  | tests/test_e2e_05_topology.py ...................                        [  6%]
kytos-1  | tests/test_e2e_06_topology.py ....                                       [  8%]
kytos-1  | tests/test_e2e_10_mef_eline.py ..........ss.....x....................... [ 21%]
kytos-1  | .                                                                        [ 22%]
kytos-1  | tests/test_e2e_11_mef_eline.py ........                                  [ 24%]
kytos-1  | tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 27%]
kytos-1  | tests/test_e2e_13_mef_eline.py ....Xs.s.....Xs.s.XXxX.xxxx..X........... [ 41%]
kytos-1  | .                                                                        [ 41%]
kytos-1  | tests/test_e2e_14_mef_eline.py ......                                    [ 43%]
kytos-1  | tests/test_e2e_15_mef_eline.py ......                                    [ 45%]
kytos-1  | tests/test_e2e_16_mef_eline.py ..                                        [ 46%]
kytos-1  | tests/test_e2e_17_mef_eline.py .....                                     [ 47%]
kytos-1  | tests/test_e2e_18_mef_eline.py .....                                     [ 49%]
kytos-1  | tests/test_e2e_20_flow_manager.py ............................           [ 58%]
kytos-1  | tests/test_e2e_21_flow_manager.py ...                                    [ 59%]
kytos-1  | tests/test_e2e_22_flow_manager.py ...............                        [ 64%]
kytos-1  | tests/test_e2e_23_flow_manager.py ..............                         [ 69%]
kytos-1  | tests/test_e2e_30_of_lldp.py .R...                                       [ 70%]
kytos-1  | tests/test_e2e_31_of_lldp.py ...RRF                                      [ 72%]
kytos-1  | tests/test_e2e_32_of_lldp.py ...                                         [ 73%]
kytos-1  | tests/test_e2e_40_sdntrace.py ................                           [ 78%]
kytos-1  | tests/test_e2e_41_kytos_auth.py ........                                 [ 81%]
kytos-1  | tests/test_e2e_42_sdntrace.py ..                                         [ 81%]
kytos-1  | tests/test_e2e_50_maintenance.py ...............................         [ 92%]
kytos-1  | tests/test_e2e_60_of_multi_table.py .....                                [ 93%]
kytos-1  | tests/test_e2e_70_kytos_stats.py .........                               [ 96%]
kytos-1  | tests/test_e2e_80_pathfinder.py ss......                                 [ 99%]
kytos-1  | tests/test_e2e_90_kafka_events.py .                                      [ 99%]
kytos-1  | tests/test_e2e_95_telemtry_int.py s                                      [100%]
kytos-1  | 
kytos-1  | =================================== FAILURES ===================================
kytos-1  | ________________ TestE2EOfLLDP.test_010_liveness_intf_deletion _________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_31_of_lldp.TestE2EOfLLDP object at 0x7882e3f1ba90>
kytos-1  | 
kytos-1  |     def test_010_liveness_intf_deletion(self) -> None:
kytos-1  |         """Test liveness not loaded after intf deletion."""
kytos-1  |         polling_interval = 1
kytos-1  |         self.set_polling_time(polling_interval)
kytos-1  |         intf_id = "00:00:00:00:00:00:00:01:1"
kytos-1  |         interface_ids = [intf_id]
kytos-1  |         self.enable_link_liveness(interface_ids)
kytos-1  |     
kytos-1  |         time.sleep(polling_interval * 5)
kytos-1  |     
kytos-1  |         # Assert GET liveness/ entries are in init state
kytos-1  |         api_url = f"{KYTOS_API}/of_lldp/v1/liveness/"
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         for entry in data["interfaces"]:
kytos-1  |             assert entry["id"] in interface_ids, entry
kytos-1  |             assert entry["status"] == "init", entry
kytos-1  |     
kytos-1  |         # Disable the intf for deletion
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}/disable/'
kytos-1  |         response = requests.post(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Deactivate the interface for deletion
kytos-1  |         self.net.net.configLinkStatus('s1', 'h11', 'down')
kytos-1  |     
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}'
kytos-1  |         response = requests.delete(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Restart the controller maintaining config
kytos-1  | >       self.restart(wait_for=15)
kytos-1  | 
kytos-1  | tests/test_e2e_31_of_lldp.py:298: 
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | tests/test_e2e_31_of_lldp.py:38: in restart
kytos-1  |     self.net.start_controller(clean_config=clean_config, enable_all=enable_all)
kytos-1  | tests/helpers.py:402: in start_controller
kytos-1  |     self.wait_controller_start()
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | 
kytos-1  | self = <tests.helpers.NetworkTest object at 0x7882c02da290>
kytos-1  | 
kytos-1  |     def wait_controller_start(self):
kytos-1  |         """Wait until controller starts according to core/status API."""
kytos-1  |         wait_count = 0
kytos-1  |         last_error = ""
kytos-1  |         while wait_count < 60:
kytos-1  |             try:
kytos-1  |                 response = requests.get('http://127.0.0.1:8181/api/kytos/core/status/', timeout=3)
kytos-1  |                 assert response.json()['response'] == 'running', response.text
kytos-1  |                 break
kytos-1  |             except Exception as exc:
kytos-1  |                 last_error = str(exc)
kytos-1  |                 time.sleep(0.5)
kytos-1  |                 wait_count += 0.5
kytos-1  |         else:
kytos-1  |             msg = f"Timeout while starting Kytos controller. Last error: {last_error}"
kytos-1  | >           raise Exception(msg)
kytos-1  | E           Exception: Timeout while starting Kytos controller. Last error: HTTPConnectionPool(host='127.0.0.1', port=8181): Read timed out. (read timeout=3)
kytos-1  | 
kytos-1  | tests/helpers.py:422: Exception
kytos-1  | ---------------------------- Captured stdout setup -----------------------------
kytos-1  | FAIL to stop kytos after 5 seconds -- Kytos pid still exists.. Force stop!
kytos-1  | ---------------------------- Captured stdout setup -----------------------------
kytos-1  | FAIL to stop kytos after 5 seconds -- Kytos pid still exists.. Force stop!
kytos-1  | =============================== warnings summary ===============================
kytos-1  | usr/local/lib/python3.11/dist-packages/kytos/core/config.py:254
kytos-1  |   /usr/local/lib/python3.11/dist-packages/kytos/core/config.py:254: UserWarning: Unknown arguments: ['tests/', '--reruns', '2', '-r', 'fEr']
kytos-1  |     warnings.warn(f"Unknown arguments: {unknown}")
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py: 17 warnings
kytos-1  | tests/test_e2e_05_topology.py: 17 warnings
kytos-1  | tests/test_e2e_06_topology.py: 37 warnings
kytos-1  | tests/test_e2e_10_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_11_mef_eline.py: 25 warnings
kytos-1  | tests/test_e2e_12_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_13_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_14_mef_eline.py: 76 warnings
kytos-1  | tests/test_e2e_15_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_16_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_17_mef_eline.py: 37 warnings
kytos-1  | tests/test_e2e_18_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_20_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_21_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_22_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_23_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_30_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_31_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_32_of_lldp.py: 11 warnings
kytos-1  | tests/test_e2e_40_sdntrace.py: 49 warnings
kytos-1  | tests/test_e2e_41_kytos_auth.py: 17 warnings
kytos-1  | tests/test_e2e_42_sdntrace.py: 84 warnings
kytos-1  | tests/test_e2e_50_maintenance.py: 17 warnings
kytos-1  | tests/test_e2e_60_of_multi_table.py: 17 warnings
kytos-1  | tests/test_e2e_70_kytos_stats.py: 17 warnings
kytos-1  | tests/test_e2e_80_pathfinder.py: 37 warnings
kytos-1  | tests/test_e2e_90_kafka_events.py: 17 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1121: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     return ( StrictVersion( cls.OVSVersion ) <
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py: 17 warnings
kytos-1  | tests/test_e2e_05_topology.py: 17 warnings
kytos-1  | tests/test_e2e_06_topology.py: 37 warnings
kytos-1  | tests/test_e2e_10_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_11_mef_eline.py: 25 warnings
kytos-1  | tests/test_e2e_12_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_13_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_14_mef_eline.py: 76 warnings
kytos-1  | tests/test_e2e_15_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_16_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_17_mef_eline.py: 37 warnings
kytos-1  | tests/test_e2e_18_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_20_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_21_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_22_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_23_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_30_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_31_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_32_of_lldp.py: 11 warnings
kytos-1  | tests/test_e2e_40_sdntrace.py: 49 warnings
kytos-1  | tests/test_e2e_41_kytos_auth.py: 17 warnings
kytos-1  | tests/test_e2e_42_sdntrace.py: 84 warnings
kytos-1  | tests/test_e2e_50_maintenance.py: 17 warnings
kytos-1  | tests/test_e2e_60_of_multi_table.py: 17 warnings
kytos-1  | tests/test_e2e_70_kytos_stats.py: 17 warnings
kytos-1  | tests/test_e2e_80_pathfinder.py: 37 warnings
kytos-1  | tests/test_e2e_90_kafka_events.py: 17 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1122: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     StrictVersion( '1.10' ) )
kytos-1  | 
kytos-1  | -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
kytos-1  | ------------------------------- start/stop times -------------------------------
kytos-1  | rerun: 0
kytos-1  | tests/test_e2e_30_of_lldp.py::TestE2EOfLLDP::test_010_disable_of_lldp: 2026-02-13,02:13:41.285613 - 2026-02-13,02:14:06.485583
kytos-1  | self = <tests.test_e2e_30_of_lldp.TestE2EOfLLDP object at 0x7882e3da1350>
kytos-1  | 
kytos-1  |     def test_010_disable_of_lldp(self):
kytos-1  |         """ Test if the disabling OF LLDP in an interface worked properly. """
kytos-1  |         self.net.start_controller(clean_config=True, enable_all=False)
kytos-1  |         self.net.wait_switches_connect()
kytos-1  |         time.sleep(5)
kytos-1  |         self.enable_all_interfaces()
kytos-1  |     
kytos-1  |         # disabling all the UNI interfaces
kytos-1  |         payload = {
kytos-1  |             "interfaces": [
kytos-1  |                 "00:00:00:00:00:00:00:01:1", "00:00:00:00:00:00:00:01:2", "00:00:00:00:00:00:00:01:4294967294",
kytos-1  |                 "00:00:00:00:00:00:00:02:1", "00:00:00:00:00:00:00:02:4294967294",
kytos-1  |                 "00:00:00:00:00:00:00:03:1", "00:00:00:00:00:00:00:03:4294967294"
kytos-1  |             ]
kytos-1  |         }
kytos-1  |         expected_interfaces = [
kytos-1  |                 "00:00:00:00:00:00:00:01:3", "00:00:00:00:00:00:00:01:4",
kytos-1  |                 "00:00:00:00:00:00:00:02:2", "00:00:00:00:00:00:00:02:3",
kytos-1  |                 "00:00:00:00:00:00:00:03:2", "00:00:00:00:00:00:00:03:3"
kytos-1  |         ]
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + '/of_lldp/v1/interfaces/disable/'
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + '/of_lldp/v1/interfaces/'
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         assert set(data["interfaces"]) == set(expected_interfaces)
kytos-1  |     
kytos-1  |         h11, h12, h2, h3 = self.net.net.get('h11', 'h12', 'h2', 'h3')
kytos-1  |         rx_stats_h11 = self.get_iface_stats_rx_pkt(h11)
kytos-1  |         rx_stats_h12 = self.get_iface_stats_rx_pkt(h12)
kytos-1  |         rx_stats_h2 = self.get_iface_stats_rx_pkt(h2)
kytos-1  |         rx_stats_h3 = self.get_iface_stats_rx_pkt(h3)
kytos-1  |         time.sleep(10)
kytos-1  |         rx_stats_h11_2 = self.get_iface_stats_rx_pkt(h11)
kytos-1  |         rx_stats_h12_2 = self.get_iface_stats_rx_pkt(h12)
kytos-1  |         rx_stats_h2_2 = self.get_iface_stats_rx_pkt(h2)
kytos-1  |         rx_stats_h3_2 = self.get_iface_stats_rx_pkt(h3)
kytos-1  |     
kytos-1  | >       assert rx_stats_h11_2 == rx_stats_h11 \
kytos-1  |             and rx_stats_h12_2 == rx_stats_h12 \
kytos-1  |             and rx_stats_h2_2 == rx_stats_h2 \
kytos-1  |             and rx_stats_h3_2 == rx_stats_h3
kytos-1  | E       assert (19 == 18)
kytos-1  | 
kytos-1  | tests/test_e2e_30_of_lldp.py:127: AssertionError
kytos-1  | rerun: 0
kytos-1  | tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion: 2026-02-13,02:20:08.025225 - 2026-02-13,02:27:15.923321
kytos-1  | self = <tests.test_e2e_31_of_lldp.TestE2EOfLLDP object at 0x7882e3f1ba90>
kytos-1  | 
kytos-1  |     def test_010_liveness_intf_deletion(self) -> None:
kytos-1  |         """Test liveness not loaded after intf deletion."""
kytos-1  |         polling_interval = 1
kytos-1  |         self.set_polling_time(polling_interval)
kytos-1  |         intf_id = "00:00:00:00:00:00:00:01:1"
kytos-1  |         interface_ids = [intf_id]
kytos-1  |         self.enable_link_liveness(interface_ids)
kytos-1  |     
kytos-1  |         time.sleep(polling_interval * 5)
kytos-1  |     
kytos-1  |         # Assert GET liveness/ entries are in init state
kytos-1  |         api_url = f"{KYTOS_API}/of_lldp/v1/liveness/"
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         for entry in data["interfaces"]:
kytos-1  |             assert entry["id"] in interface_ids, entry
kytos-1  |             assert entry["status"] == "init", entry
kytos-1  |     
kytos-1  |         # Disable the intf for deletion
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}/disable/'
kytos-1  |         response = requests.post(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Deactivate the interface for deletion
kytos-1  |         self.net.net.configLinkStatus('s1', 'h11', 'down')
kytos-1  |     
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}'
kytos-1  |         response = requests.delete(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Restart the controller maintaining config
kytos-1  | >       self.restart(wait_for=15)
kytos-1  | 
kytos-1  | tests/test_e2e_31_of_lldp.py:298: 
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | tests/test_e2e_31_of_lldp.py:38: in restart
kytos-1  |     self.net.start_controller(clean_config=clean_config, enable_all=enable_all)
kytos-1  | tests/helpers.py:402: in start_controller
kytos-1  |     self.wait_controller_start()
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | 
kytos-1  | self = <tests.helpers.NetworkTest object at 0x7882c02da290>
kytos-1  | 
kytos-1  |     def wait_controller_start(self):
kytos-1  |         """Wait until controller starts according to core/status API."""
kytos-1  |         wait_count = 0
kytos-1  |         last_error = ""
kytos-1  |         while wait_count < 60:
kytos-1  |             try:
kytos-1  |                 response = requests.get('http://127.0.0.1:8181/api/kytos/core/status/', timeout=3)
kytos-1  |                 assert response.json()['response'] == 'running', response.text
kytos-1  |                 break
kytos-1  |             except Exception as exc:
kytos-1  |                 last_error = str(exc)
kytos-1  |                 time.sleep(0.5)
kytos-1  |                 wait_count += 0.5
kytos-1  |         else:
kytos-1  |             msg = f"Timeout while starting Kytos controller. Last error: {last_error}"
kytos-1  | >           raise Exception(msg)
kytos-1  | E           Exception: Timeout while starting Kytos controller. Last error: HTTPConnectionPool(host='127.0.0.1', port=8181): Read timed out. (read timeout=3)
kytos-1  | 
kytos-1  | tests/helpers.py:422: Exception
kytos-1  | rerun: 1
kytos-1  | tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion: 2026-02-13,02:27:37.065407 - 2026-02-13,02:34:44.996798
kytos-1  | self = <tests.test_e2e_31_of_lldp.TestE2EOfLLDP object at 0x7882e3f1ba90>
kytos-1  | 
kytos-1  |     def test_010_liveness_intf_deletion(self) -> None:
kytos-1  |         """Test liveness not loaded after intf deletion."""
kytos-1  |         polling_interval = 1
kytos-1  |         self.set_polling_time(polling_interval)
kytos-1  |         intf_id = "00:00:00:00:00:00:00:01:1"
kytos-1  |         interface_ids = [intf_id]
kytos-1  |         self.enable_link_liveness(interface_ids)
kytos-1  |     
kytos-1  |         time.sleep(polling_interval * 5)
kytos-1  |     
kytos-1  |         # Assert GET liveness/ entries are in init state
kytos-1  |         api_url = f"{KYTOS_API}/of_lldp/v1/liveness/"
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         for entry in data["interfaces"]:
kytos-1  |             assert entry["id"] in interface_ids, entry
kytos-1  |             assert entry["status"] == "init", entry
kytos-1  |     
kytos-1  |         # Disable the intf for deletion
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}/disable/'
kytos-1  |         response = requests.post(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Deactivate the interface for deletion
kytos-1  |         self.net.net.configLinkStatus('s1', 'h11', 'down')
kytos-1  |     
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}'
kytos-1  |         response = requests.delete(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Restart the controller maintaining config
kytos-1  | >       self.restart(wait_for=15)
kytos-1  | 
kytos-1  | tests/test_e2e_31_of_lldp.py:298: 
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | tests/test_e2e_31_of_lldp.py:38: in restart
kytos-1  |     self.net.start_controller(clean_config=clean_config, enable_all=enable_all)
kytos-1  | tests/helpers.py:402: in start_controller
kytos-1  |     self.wait_controller_start()
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | 
kytos-1  | self = <tests.helpers.NetworkTest object at 0x7882c02da290>
kytos-1  | 
kytos-1  |     def wait_controller_start(self):
kytos-1  |         """Wait until controller starts according to core/status API."""
kytos-1  |         wait_count = 0
kytos-1  |         last_error = ""
kytos-1  |         while wait_count < 60:
kytos-1  |             try:
kytos-1  |                 response = requests.get('http://127.0.0.1:8181/api/kytos/core/status/', timeout=3)
kytos-1  |                 assert response.json()['response'] == 'running', response.text
kytos-1  |                 break
kytos-1  |             except Exception as exc:
kytos-1  |                 last_error = str(exc)
kytos-1  |                 time.sleep(0.5)
kytos-1  |                 wait_count += 0.5
kytos-1  |         else:
kytos-1  |             msg = f"Timeout while starting Kytos controller. Last error: {last_error}"
kytos-1  | >           raise Exception(msg)
kytos-1  | E           Exception: Timeout while starting Kytos controller. Last error: HTTPConnectionPool(host='127.0.0.1', port=8181): Read timed out. (read timeout=3)
kytos-1  | 
kytos-1  | tests/helpers.py:422: Exception
kytos-1  | tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion: 2026-02-13,02:35:06.127242 - 2026-02-13,02:42:14.019770
kytos-1  | =========================== rerun test summary info ============================
kytos-1  | RERUN tests/test_e2e_30_of_lldp.py::TestE2EOfLLDP::test_010_disable_of_lldp
kytos-1  | RERUN tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion
kytos-1  | RERUN tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion
kytos-1  | =========================== short test summary info ============================
kytos-1  | FAILED tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion
kytos-1  | = 1 failed, 277 passed, 9 skipped, 7 xfailed, 7 xpassed, 1355 warnings, 3 rerun in 13730.45s (3:48:50) =

�[Kkytos-1 exited with code 1

@Ktmi Ktmi requested a review from a team as a code owner August 14, 2025 19:06
@Ktmi Ktmi marked this pull request as draft August 22, 2025 17:09
@viniarck
Copy link
Member

viniarck commented Aug 22, 2025

@Ktmi, here's the feedback you asked about the locks usage and the patterns we'll look forward to maintaining:

Certainly we need to improve thread safety in certain resource deletions. I'm glad you identified these cases. But, let's try to make it as dead simple as possible by following these guidelines below, see if you have other suggestions to simplify even more while still solving the thread safety problem:

  • G1) When acquiring individual locks for multiple operations, let's always get them in order (sorted), so removing the need to have a global-ish lock in the first place.
  • G2) Let's try to keep a 1-to-1 relationship with the lock and the DB document model, that way we also don't generate an extra bottleneck in terms of the expected underlying IO throughput
  • G3) Global lock could also be used in cases where it's not using DB when simplify too, in cases where it's expected to be a fast path, where locking would be almost negligible in terms of the IO context, but let's prioritize G1 and G2 when applicable

Currently you have:

  1. switches_lock
  2. links_lock
  3. multi_tag_lock
  4. switch.lock
  5. switch.interfaces_lock
  6. interface.lock
  7. link.lock
  8. interface.tag_lock
  9. link.tag_lock

By following guidelines G1 and G2, I think in theory we could simplify to (not too far away from what we currently have):

  1. switch.lock (related to runtime & switches collection)
  2. link.lock (related to runtime & links collection)
  3. interface.tag_lock (related to runtime & interface_details)
  4. link.tag_lock (related to runtime & interface_details)

That'd be much simpler to reason and maintain, while in theory still not decreasing or impacting much IO Ops throughput. Places where an interface lock is being used, it would use its respective switch.lock instead

Also, a resource lower in the list wouldn't need to try to get a higher lock, say for instance when making an interface tag available it wouldn't need either a switch nor a link lock, and only the respective tag_lock, but for a switch deletion it would need to acquire 1) through 4), so in a case of deletion it'd only delete if no creation/deletion is happening at that time, and wait locked accordingly until it deletes.

Future note: a another thing that can also simplify partly would be to move towards asyncio since the concurrency preemption wouldn't happen everywhere so the surface where locks are needed are typically smaller and explicit, but that refactoring would be large and that wouldn't be one of our top priorities at the moment, but something to keep in mind.

Can you check if this solves the problem?

See if you can come up with a simpler solution or a counter example, and let's keep refining.

@Ktmi
Copy link
Author

Ktmi commented Aug 25, 2025

@viniarck

I would have liked to do G1, but the problem is that iterating a dict doesn't guarantee that the elements will always be iterated in the same order. Yes, elements in a dict are iterated in the order they where inserted, however that order can be changed by deleting elements. Additionally, elements may be iterated from another source, such as endpoints of a link, in which case it would be near impossible to guarantee the same order. Maybe we could introduce some tools specifically for iteration over switches, interfaces, and links to acquire the locks?

As for G2 I like that idea. A lot of the time I'm acquiring multiple interface locks I'm also acquiring the switch.lock as well. Though it could be possible to specifically target the interface subdocument, in which case you could have multiple simultaneous transactions targeting the same switch document.

For G3, we could cut down on the usage of global locks around db operations if we cut down on the usage of upsert_switch, its one of the biggest reasons I have to hold the global switch lock in some areas, because upsert_switch can reanimate deleted switches. We can introduce some more specialized functions for updating the switches instead, and that should alleviate the issue.

@viniarck
Copy link
Member

viniarck commented Aug 27, 2025

Maybe we could introduce some tools specifically for iteration over switches, interfaces, and links to acquire the locks?

Yes @Ktmi. +1 to that idea, let's start leveraging a 1) sorted dict (ordered tree implementation), 2) also encapsulate some accesss for iteration, just so we can also take the opportunity to mitigate that issue here you and I were discussing in some quick fixes some time ago kytos-ng/kytos#534, if we still end up with a mutable sorted dict, then let's also encapsulate the shallow copy in a method for iteration, just so that doesn't get repeated on every call site. I'll leave up to you to refine and propose the libs, +1 on the idea, I think we should finally pursue and prioritize it, it's a small-ish effort that can really make things more ergonomic for us code maintenance-wise too.

in which case you could have multiple simultaneous transactions targeting the same switch document

On Mongo side its provide atomicity at a document level, let's let it handle, and in our runtime objs we follow suite.

Though it could be possible to specifically target the interface subdocument, in which case you could have multiple simultaneous transactions targeting the same switch document.

Yes, but DB IO wise we don't gain much from it so let's steer away from it for now, since ultimately MongoDB will be r/w bounded to that document. If one day we need more throughput or that starts to be a bottleneck (which isn't at the moment), then we reassess our data models too.

For G3, we could cut down on the usage of global locks around db operations if we cut down on the usage of upsert_switch, its one of the biggest reasons I have to hold the global switch lock in some areas, because upsert_switch can reanimate deleted switches. We can introduce some more specialized functions for updating the switches instead, and that should alleviate the issue.

Yes, let's only go for G3 in cases where it's not doing DB IO though or it's expected to be very minimal, say for instance, keeping track of liveness stuff in the runtime objs (and even in that case it could be G1 - per sorted object - except no DB w/r). For operations involving typical moderate+ DB r/w ops, let's keep following G2, and then in this case acquire the switch.lock when operating on any of the switch interfaces, write to the switches collections and call it a day.

Appreciated your thoughts and feedback. Let's stick with G1, G2 and G3 in that order then.

@Ktmi
Copy link
Author

Ktmi commented Aug 28, 2025

I'm thinking some of the E2E tests failed because I forgot to pull the latest version, I'll be re running them.

@Ktmi
Copy link
Author

Ktmi commented Sep 3, 2025

So key changes that need to be made for E2E tests.

  • Some tests use NNIs as UNIs for EVCs. These tests should be changed to either use UNIs exclusively for these purposes, or reallocate the tags for usage as UNIs.
  • Some tests modify the interface tags, expecting that to affect the NNI tags. These should be modified to target the Link/NNI.

@Ktmi
Copy link
Author

Ktmi commented Oct 22, 2025

Tested against race condition E2E test, and it is passing:

kytos-1  | Starting enhanced syslogd: rsyslogd.
kytos-1  | /etc/openvswitch/conf.db does not exist ... (warning).
kytos-1  | Creating empty database /etc/openvswitch/conf.db.
kytos-1  | Starting ovsdb-server.
kytos-1  | rsyslogd: error during config processing: omfile: chown for file '/var/log/syslog' failed: Operation not permitted [v8.2302.0 try https://www.rsyslog.com/e/2207 ]
kytos-1  | Configuring Open vSwitch system IDs.
kytos-1  | Starting ovs-vswitchd.
kytos-1  | Enabling remote OVSDB managers.
kytos-1  | + '[' -z '' ']'
kytos-1  | + '[' -z '' ']'
kytos-1  | There is no NAPPS_PATH specified. Default will be used.
kytos-1  | + echo 'There is no NAPPS_PATH specified. Default will be used.'
kytos-1  | + NAPPS_PATH=
kytos-1  | + sed -i 's/STATS_INTERVAL = 60/STATS_INTERVAL = 7/g' /var/lib/kytos/napps/kytos/of_core/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_MIN_VERDICT_INTERVAL =.*/CONSISTENCY_MIN_VERDICT_INTERVAL = 60/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LINK_UP_TIMER = 10/LINK_UP_TIMER = 1/g' /var/lib/kytos/napps/kytos/topology/settings.py
kytos-1  | + sed -i 's/DEPLOY_EVCS_INTERVAL = 60/DEPLOY_EVCS_INTERVAL = 5/g' /var/lib/kytos/napps/kytos/mef_eline/settings.py
kytos-1  | + sed -i 's/LLDP_LOOP_ACTIONS = \["log"\]/LLDP_LOOP_ACTIONS = \["disable","log"\]/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/LLDP_IGNORED_LOOPS = {}/LLDP_IGNORED_LOOPS = {"00:00:00:00:00:00:00:01": \[\[4, 5\]\]}/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_COOKIE_IGNORED_RANGE =.*/CONSISTENCY_COOKIE_IGNORED_RANGE = [(0xdd00000000000000, 0xdd00000000000009)]/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LIVENESS_DEAD_MULTIPLIER =.*/LIVENESS_DEAD_MULTIPLIER = 3/g' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + kytosd --help
kytos-1  | + sed -i s/WARNING/INFO/g /etc/kytos/logging.ini
kytos-1  | + sed -i 's/keys: root,kytos,api_server,socket/keys: root,kytos,api_server,socket,aiokafka/' /etc/kytos/logging.ini
kytos-1  | + echo -e '\n\n[logger_aiokafka]\nlevel: INFO\nhandlers:\nqualname: aiokafka'
kytos-1  | + test -z ''
kytos-1  | + TESTS=tests/
kytos-1  | + test -z ''
kytos-1  | + RERUNS=2
kytos-1  | + python3 scripts/wait_for_mongo.py
kytos-1  | Trying to run hello command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Ran 'hello' command on MongoDB successfully. It's ready!
kytos-1  | + python3 scripts/setup_kafka.py
kytos-1  | Starting setup_kafka.py...
kytos-1  | Attempting to create an admin client at ['broker1:19092', 'broker2:19092', 'broker3:19092']...
kytos-1  | Admin client was successful! Attempting to validate cluster...
kytos-1  | Cluster info: {'throttle_time_ms': 0, 'brokers': [{'node_id': 1, 'host': 'broker1', 'port': 19092, 'rack': None}, {'node_id': 2, 'host': 'broker2', 'port': 19092, 'rack': None}, {'node_id': 3, 'host': 'broker3', 'port': 19092, 'rack': None}], 'cluster_id': '5L6g3nShT-eMCtK--X86sw', 'controller_id': 3}
kytos-1  | Cluster was successfully validated! Attempting to create topic 'event_logs'...
kytos-1  | Topic 'event_logs' was created! Attempting to close the admin client...
kytos-1  | Kafka admin client closed.
kytos-1  | + python3 -m pytest tests/test_e2e_07_topology.py
kytos-1  | ============================= test session starts ==============================
kytos-1  | platform linux -- Python 3.11.2, pytest-8.4.2, pluggy-1.6.0
kytos-1  | rootdir: /
kytos-1  | configfile: pytest.ini
kytos-1  | plugins: rerunfailures-13.0, timeout-2.2.0, asyncio-1.1.0, anyio-4.3.0
kytos-1  | asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
kytos-1  | collected 2 items
kytos-1  | 
kytos-1  | tests/test_e2e_07_topology.py ..                                         [100%]
kytos-1  | 
kytos-1  | =============================== warnings summary ===============================
kytos-1  | tests/test_e2e_07_topology.py::TestE2ETopology::test_020_switch_delete_interfaces_disable_race_condition
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1121: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     return ( StrictVersion( cls.OVSVersion ) <
kytos-1  | 
kytos-1  | tests/test_e2e_07_topology.py::TestE2ETopology::test_020_switch_delete_interfaces_disable_race_condition
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1122: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     StrictVersion( '1.10' ) )
kytos-1  | 
kytos-1  | -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
kytos-1  | ------------------------------- start/stop times -------------------------------
kytos-1  | ================== 2 passed, 2 warnings in 106.25s (0:01:46) ===================

�[Kkytos-1 exited with code 0

@Ktmi
Copy link
Author

Ktmi commented Oct 30, 2025

@viniarck

On the matter of removing global locks, I don't think its possible. With only ordered locks, some relations become literally impossible to traverse safely. To traverse a relation safely, one end of the relation needs to be locked. To hold multiple locks safely, without another lock to prevent holding more than one, then you must acquire them in a predetermined order. That means that with only ordered locks the only relations we can traverse safely are from higher in the lock hierarchy to lower in the lock hierarchy, for example Switch to Interface

With global locks, we add an additional set of locks higher in the lock hierarchy that have no other prerequisite to being acquired other than being in order, and that can lock certain sets of relationships. This then makes it possible to traverse through sets of objects in any direction, before acquiring any of their locks.

As for maybe simplifying the code for acquiring object locks, it might be possible, but it would be a lot of work to get done. I compiled a list of most of the multi lock scenarios in topology, and got the following:

  • Acquire switch + interfaces + connected links
    • disable switch
  • Acquire switch + interfaces + prevent adding links + tag ranges
    • delete switch
  • Acquire interfaces from switch
    • enable interface(s)
    • disable interface(s)
  • Acquire interface + connected switch
    • delete interface
    • handle interface deleted
  • Acquire interface + connected link + connected interface tag ranges
    • set interface tag ranges
  • Acquire link + connected interfaces
    • enable link
  • Acquire link + connected interfaces tag ranges
    • set link tag ranges
  • Acquire link + connected interfaces + connected switches + tag ranges
    • delete link
  • Acquire interfaces + connected links
    • handle liveness disabled
  • Create link + acquire link lock + acquire connected endpoints + tag ranges
    • create link
  • Acquire interfaces from switch + connected links
    • Notify switch links status

I was originally thinking, that we could add some specific functions to create context managers for these scenarios. For example:

with interface_manager.get_interface_and_connected_link_and_interface(interface_id) as endpoint_a, link, endpoint_b:
    # do work here

However the scenarios where objects are being created or deleted, as that depends on having not just the object, but the collection its in, as well as modifying the relations between objects. I just don't see a good way to encapsulate that in a single context manager outside maybe a builder pattern, but that still has its own issues. There just isn't a good way to incorporate updating the database for the related objects that where modified in addition to the target object. It would require more integration into topology, when I would want to make such an API more general than that.

@viniarck
Copy link
Member

viniarck commented Nov 1, 2025

@Ktmi, here's a short answer feedback (I'll reply a longer version in the next reply too)

then you must acquire them in a predetermined order.

Yes, G1.

the only relations we can traverse safely are from higher in the lock hierarchy to lower in the lock hierarchy, for example Switch to Interface

Let's follow G2 as much as possible but let's find a balance on kytos core by leveraging a common resource denominator that unifies the lock: a switch.

Switch and its interfaces should use the same switch lock. Kind of a balance of not using global lock, but still mostly aligned with G2 - only links and interface_details would deviate. Links should use ordered endpoint_a's switch and endpoint_b's switch locks, any link method that needs concurrency protection should also use it, same thing for interface but then using its switch lock. Resource specific lock that helps ensuring that threaded KytosEvents are processed in order such as _intfs_lock from topology will still stay as they are until we make more progress towards asyncio.

See if suffices (and then let's also measure in a development prototype phase trying to scale the operations) and/or if you have any other ideas that we should consider which aren't too far away from what we currently.

@viniarck
Copy link
Member

viniarck commented Nov 1, 2025

@Ktmi, long answer feedback:

Hierarchical locks with more than two different types/kinds of resources switch/link/interface will only be potentially accepted on kytos core if such approach is strongly justified to a) not decrease related DB IO ops while ensuring concurrency safety and b) while still be easy to reason in terms of code maintainability. b) will usually be the deal breaker depending on the resources relationships too, certain hierarchical locks with different types of resources is very hard to maintain as you pointed out even to a point where the list above you compiled it almost need a wiki to remember, so let's steer away from it.

@Ktmi, suggestion on kytos core:

1a) Let's stick with a lock with a common denominator: a switch

  • A lock per switch (kind of a balance of not using global lock, but still mostly aligned with G2 - only links would deviate). So, if you need atomic operations on a switch, interface or tag ranges you acquire that specific switch lock (but then we also wouldn't allow this lock to be acquired for too long, the longest operations are expected to be in hundreds of ms on average).
  • For a link, then we use ExitStack and acquire both switches lock in the same order (following G1), and links_lock can also be removed in many places, it's OK not to have a link get not atomic in many threaded places (and in places where it's actually needed, then we can still use links_lock or similar kind of locks that can protect before iterating on the collection or when getting a link). Let's ensure the core resources object their major create/update/delete operations are concurrently safe. Also, none of these core (switch) locks should be exposed to higher level NApps, ideally only core and topology should use them directly, the rest will use encapsulated methods when getting a switch/link/interface, we should also gradually steer away from using direct dict access on NApps when getting a switch/link/interface, this would allow NApps to read atomically when getting a switch/interface/link at least, and then from that point forward, even if it gets disabled/deleted, the NApp will eventually get an event (so we don't need to trying to have full atomicity everywhere, only in certain smaller critical contexts).
  • Resource specific lock that helps ensuring that threaded events are processed in order such as _intfs_lock from topology will still stay as they are until we make more progress towards asyncio.

1b) Most of current object runtime operations aren't a bottleneck and tend to be fast compared to DB IO operations.

  • We'd be compromising and lowering a bit of throughput on locks collection at the expense of simplifying our runtime locks complexity while increasing thread safety and reasoning about the code, on interface_details collection we can still keep the writes as they are (dispatched via threads - thread pools are shutdown gracefully so in most cases it'll be fine, if users force terminate kytosd that's their choice and risk), but still holding the switch lock when managing the runtime object.

1c) This approach is still expected to work with your PR kytos-ng/kytos-end-to-end-tests#399 should still be passing. Switch/interface/link deletion disabling shouldn't result in race conditions or dead locks.

See if point 1 above suffices (and then let's also measure in a development prototype phase trying to scale the operations) and/or if you have any other ideas which aren't too far away from what we currently have that we should consider.

We can also expand the discussion for non core resources on NApps (flows, evcs, windows, and so on), but I don't think we'd have immediate critical problems, ultimately, even if when a switch/link/interface were to be deleted mid procedure eventually an event (disabled/deleted) and during its core resource recreation it should be an opportunity for the NApp to try to clean up any potential inconsistency (left over) of their owned resources (we don't clean up switch/link/interface recreation implemented on NApps yet, but this could be enhanced on future versions too).

@Ktmi Ktmi force-pushed the rebase/tag_capable branch from 6d5d04c to ce7e867 Compare November 3, 2025 19:13
@Ktmi
Copy link
Author

Ktmi commented Nov 6, 2025

Alright, I got a branch which removes interfaces_lock, interface.lock, and multi_tag_lock. multi_tag_lock was always being used in scenarios where we were using the links_lock, as we were interested in the relation between interface and link. interfaces_lock and interface.lock are replaced with switch.lock. This still results in a somewhat complex pattern in locks. The only remaining improvement I can think of is to have 1 global lock instead of 2, that way any of the desired object locks can be acquired in any order.

@Ktmi
Copy link
Author

Ktmi commented Nov 6, 2025

Also here's a potential utility for acquiring locks:

from contextlib import ExitStack
from threading import Lock

from kytos.core.controller import Controller

class SwitchGetter:

    def __init__(self, controller: Controller, switch_id: str):
        self.controller = controller
        self.target = switch_id
        self.switch = None

    def acquire(self):
        switch = self.controller.switches[self.taget]
        switch.lock.acquire()
        self.switch = switch

    def release(self):
        self.switch.lock.release()
        self.switch = None

    def __enter__(self):
        self.acquire()
        return self.switch

    def __exit__(self, exc_type, exc_val, tb):
        self.release()

class StrongSwitchGetter(SwitchGetter):
    def acquire(self):
        with self.controller.switches_lock:
            return super().acquire()


class InterfaceGetter:

    def __init__(self, controller: Controller, interface_id: str):
        self.controller = controller
        switch_id, _, port = interface_id.rpartition(":")
        port = int(port)
        self.target = switch_id
        self.port = port
        self.switch = None
        self.interface = None

    def acquire(self):
        switch = self.controller.switches[self.taget]
        switch.lock.acquire()
        interface = switch.interfaces[self.port]
        self.switch = switch
        self.interface = interface

    def release(self):
        self.switch.lock.release()
        self.switch = None
        self.interface = None

    def __enter__(self):
        self.acquire()
        return self.interface

    def __exit__(self, exc_type, exc_val, tb):
        self.release()

class StrongInterfaceGetter(InterfaceGetter):
    def acquire(self):
        with self.controller.switches_lock:
            return super().acquire()

class LinkGetter:

    def __init__(self, controller: Controller, link_id: str):
        self.controller = controller
        self.target = link_id
        self.link = None

    def acquire(self):
        link = self.controller.links[self.taget]
        link.lock.acquire()
        self.link = link

    def release(self):
        self.link.lock.release()
        self.link = None

    def __enter__(self):
        self.acquire()
        return self.link

    def __exit__(self, exc_type, exc_val, tb):
        self.release()

class StrongLinkGetter(LinkGetter):
    def acquire(self):
        with self.controller.switches_lock:
            return super().acquire()

class TopologyEditorGetter:

    def __init__(self, controller: Controller):
        self.controller = controller
        self.editor = None

    def acquire(self):
        self.controller.switches_lock.acquire()
        self.editor = TopologyEditor(self.controller)

    def release(self):
        self.editor.release()
        self.controller.switches_lock.release()


class TopologyEditor:

    def __init__(self, controller: Controller):
        self.controller = controller
        self.lock_stack = ExitStack()
        self.switches = {}
        self.interfaces = {}
        self.links = {}

    def acquire_lock(self, lock: Lock):
        self.lock_stack.enter_context(lock)

    def release(self):
        self.lock_stack.close()

    def get_switch(self, switch_id: str):
        if switch_id in self.switches:
            return self.switches[switch_id]
        switch = self.controller.switches[switch_id]
        self.acquire_lock(switch.lock)
        self.switches[switch_id] = switch
        return switch
    
    def get_interface(self, interface_id: str):
        if interface_id in self.interfaces:
            return self.interfaces[interface_id]
        switch_id, _, port = interface_id.rsplit(":")
        port = int(port)
        switch = self.get_switch(switch_id)
        interface = switch.interfaces[port]
        self.interfaces[interface_id] = interface
        return interface

    def get_link(self, link_id: str):
        if link_id in self.links:
            return self.links[link_id]
        link = self.controller.links[link_id]
        self.acquire_lock(link.lock)
        self.links[link_id] = link
        return link

I think this is starting to move in the right direction. Most of the cases where we need to work on a single object are covered, with the ability to guarantee the lifetime of an object if necessary. The editor, I think needs more tools such as creation and deletion, but for now serves as a good proof of concept.

@Ktmi Ktmi marked this pull request as ready for review January 5, 2026 14:27
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review:

  • Scrutinizer is failing
  • Changelog needs to be updated
  • I'd like to see APM DB charts for major DB stress test OPs
  • These scripts https://github.com/kytos-ng/topology/tree/master/scripts/console/2023.2.0 need to be updated too, if tags becomes unexpectedly inconsistent in prod this is what fixes them
  • We also need DB migration otherwise in a upgrade it'll crash out of the gate when loading links:
nThread) Loading link: 20070c87457fc5197d74eb6354365560dd7b2b19ac414ad48c8dc2597236a6c4
2026-01-05 19:32:39,456 - INFO [kytos.napps.kytos/topology] [main.py:192:_load_links] (MainThread) Loading link: 3bdc34e8e0ca38d7c24724d07c8282cc2c5f123cfed602f5b2eb3594c9606476
2026-01-05 19:32:39,456 - INFO [kytos.napps.kytos/topology] [main.py:192:_load_links] (MainThread) Loading link: 4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30
2026-01-05 19:32:39,457 - INFO [kytos.napps.kytos/topology] [main.py:192:_load_links] (MainThread) Loading link: 78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0
2026-01-05 19:32:39,457 - INFO [kytos.napps.kytos/topology] [main.py:192:_load_links] (MainThread) Loading link: c0fe8cb39c5d91a28891d8c81403391fda144e77ac812c22043ce3bb373f6cd2
2026-01-05 19:32:39,458 - INFO [kytos.napps.kytos/topology] [main.py:192:_load_links] (MainThread) Loading link: c8b55359990f89a5849813dc348d30e9e1f991bad1dcb7f82112bd35429d9b07
Kytos couldn't start because of KytosNAppSetupException: NApp kytos/topology exception 'default_tag_ranges'  Traceback (most recent call last):
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 893, in load_napp
    napp = napp_module.Main(controller=self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/viniarck/repos/kytos/kytos/core/napps/base.py", line 196, in __init__
    self.setup()
  File "/home/viniarck/repos/napps/napps/kytos/topology/main.py", line 80, in setup
    self.load_topology()
  File "/home/viniarck/repos/napps/napps/kytos/topology/main.py", line 351, in load_topology
    self._load_interface_details(
  File "/home/viniarck/repos/napps/napps/kytos/topology/main.py", line 313, in _load_interface_details
    self.load_details(
  File "/home/viniarck/repos/napps/napps/kytos/topology/main.py", line 1829, in load_details
    tag_details["default_tag_ranges"],
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'default_tag_ranges'


@Ktmi
Copy link
Author

Ktmi commented Jan 6, 2026

To test the DB performance, I decided to test it by disabling every single interface, in several concurrent requests.

Here is the latency distribution for disabling interfaces:

image

The latency distribution doesn't tell us much about the actually db transactions, so I targeting just the DB update operation napps.switches.find_one_and_update. In that, we see that it takes on average 13,745.259 ms.

image

By comparison, before the update the latency distribution for disabling interfaces looked like this:

image

And the DB update operations were taking on average 25,588.426 ms:

image

It seems like a performance improvement, but I'm not certain if this is outside the margin of error. This is only trying to disable 54 interfaces at a time.

@viniarck
Copy link
Member

To test the DB performance, I decided to test it by disabling every single interface, in several concurrent requests.

Here is the latency distribution for disabling interfaces:

image The latency distribution doesn't tell us much about the actually db transactions, so I targeting just the DB update operation `napps.switches.find_one_and_update`. In that, we see that it takes on average 13,745.259 ms. image By comparison, before the update the latency distribution for disabling interfaces looked like this: image And the DB update operations were taking on average 25,588.426 ms: image It seems like a performance improvement, but I'm not certain if this is outside the margin of error. This is only trying to disable 54 interfaces at a time.

Nice @Ktmi good to have this information and know about the collected measure, also the comparison looks good as far as the percentiles, sometimes we know that there are certain data point outliers but that's fine as long as they only show up in IO stress cases and temporarily

@Ktmi
Copy link
Author

Ktmi commented Feb 13, 2026

Latest E2E results:

kytos-1  | Starting enhanced syslogd: rsyslogd.
kytos-1  | /etc/openvswitch/conf.db does not exist ... (warning).
kytos-1  | Creating empty database /etc/openvswitch/conf.db.
kytos-1  | Starting ovsdb-server.
kytos-1  | rsyslogd: error during config processing: omfile: chown for file '/var/log/syslog' failed: Operation not permitted [v8.2302.0 try https://www.rsyslog.com/e/2207 ]
kytos-1  | Configuring Open vSwitch system IDs.
kytos-1  | Starting ovs-vswitchd.
kytos-1  | Enabling remote OVSDB managers.
kytos-1  | + '[' -z '' ']'
kytos-1  | + '[' -z '' ']'
kytos-1  | + echo 'There is no NAPPS_PATH specified. Default will be used.'
kytos-1  | + NAPPS_PATH=
kytos-1  | + sed -i 's/STATS_INTERVAL = 60/STATS_INTERVAL = 7/g' /var/lib/kytos/napps/kytos/of_core/settings.py
kytos-1  | There is no NAPPS_PATH specified. Default will be used.
kytos-1  | + sed -i 's/CONSISTENCY_MIN_VERDICT_INTERVAL =.*/CONSISTENCY_MIN_VERDICT_INTERVAL = 60/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LINK_UP_TIMER = 10/LINK_UP_TIMER = 1/g' /var/lib/kytos/napps/kytos/topology/settings.py
kytos-1  | + sed -i 's/DEPLOY_EVCS_INTERVAL = 60/DEPLOY_EVCS_INTERVAL = 5/g' /var/lib/kytos/napps/kytos/mef_eline/settings.py
kytos-1  | + sed -i 's/LLDP_LOOP_ACTIONS = \["log"\]/LLDP_LOOP_ACTIONS = \["disable","log"\]/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/LLDP_IGNORED_LOOPS = {}/LLDP_IGNORED_LOOPS = {"00:00:00:00:00:00:00:01": \[\[4, 5\]\]}/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_COOKIE_IGNORED_RANGE =.*/CONSISTENCY_COOKIE_IGNORED_RANGE = [(0xdd00000000000000, 0xdd00000000000009)]/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LIVENESS_DEAD_MULTIPLIER =.*/LIVENESS_DEAD_MULTIPLIER = 3/g' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + kytosd --help
kytos-1  | + sed -i s/WARNING/DEBUG/g /etc/kytos/logging.ini
kytos-1  | + sed -i s/INFO/DEBUG/g /etc/kytos/logging.ini
kytos-1  | + sed -i 's/keys: root,kytos,api_server,socket/keys: root,kytos,api_server,socket,aiokafka/' /etc/kytos/logging.ini
kytos-1  | + sed -i 's/handlers: syslog,console/handlers: syslog,console,file/g' /etc/kytos/logging.ini
kytos-1  | + echo -e '\n\n[logger_aiokafka]\nlevel: INFO\nhandlers:\nqualname: aiokafka'
kytos-1  | + echo
kytos-1  | + test -z ''
kytos-1  | + TESTS=tests/
kytos-1  | + test -z ''
kytos-1  | + RERUNS=2
kytos-1  | + python3 scripts/wait_for_mongo.py
kytos-1  | Trying to run hello command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Ran 'hello' command on MongoDB successfully. It's ready!
kytos-1  | + python3 scripts/setup_kafka.py
kytos-1  | Starting setup_kafka.py...
kytos-1  | Attempting to create an admin client at ['broker1:19092', 'broker2:19092', 'broker3:19092']...
kytos-1  | Admin client was successful! Attempting to validate cluster...
kytos-1  | Cluster info: {'throttle_time_ms': 0, 'brokers': [{'node_id': 1, 'host': 'broker1', 'port': 19092, 'rack': None}, {'node_id': 2, 'host': 'broker2', 'port': 19092, 'rack': None}, {'node_id': 3, 'host': 'broker3', 'port': 19092, 'rack': None}], 'cluster_id': '5L6g3nShT-eMCtK--X86sw', 'controller_id': 2}
kytos-1  | Cluster was successfully validated! Attempting to create topic 'event_logs'...
kytos-1  | Topic 'event_logs' was created! Attempting to close the admin client...
kytos-1  | Kafka admin client closed.
kytos-1  | + python3 -m pytest tests/ --reruns 2 -r fEr
kytos-1  | ============================= test session starts ==============================
kytos-1  | platform linux -- Python 3.11.2, pytest-8.4.2, pluggy-1.6.0
kytos-1  | rootdir: /
kytos-1  | configfile: pytest.ini
kytos-1  | plugins: rerunfailures-13.0, timeout-2.2.0, asyncio-1.1.0, anyio-4.3.0
kytos-1  | asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
kytos-1  | collected 301 items
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
kytos-1  | tests/test_e2e_05_topology.py ...................                        [  6%]
kytos-1  | tests/test_e2e_06_topology.py ....                                       [  8%]
kytos-1  | tests/test_e2e_10_mef_eline.py ..........ss.....x....................... [ 21%]
kytos-1  | .                                                                        [ 22%]
kytos-1  | tests/test_e2e_11_mef_eline.py ........                                  [ 24%]
kytos-1  | tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 27%]
kytos-1  | tests/test_e2e_13_mef_eline.py ....Xs.s.....Xs.s.XXxX.xxxx..X........... [ 41%]
kytos-1  | .                                                                        [ 41%]
kytos-1  | tests/test_e2e_14_mef_eline.py ......                                    [ 43%]
kytos-1  | tests/test_e2e_15_mef_eline.py ......                                    [ 45%]
kytos-1  | tests/test_e2e_16_mef_eline.py ..                                        [ 46%]
kytos-1  | tests/test_e2e_17_mef_eline.py .....                                     [ 47%]
kytos-1  | tests/test_e2e_18_mef_eline.py .....                                     [ 49%]
kytos-1  | tests/test_e2e_20_flow_manager.py ............................           [ 58%]
kytos-1  | tests/test_e2e_21_flow_manager.py ...                                    [ 59%]
kytos-1  | tests/test_e2e_22_flow_manager.py ...............                        [ 64%]
kytos-1  | tests/test_e2e_23_flow_manager.py ..............                         [ 69%]
kytos-1  | tests/test_e2e_30_of_lldp.py .R...                                       [ 70%]
kytos-1  | tests/test_e2e_31_of_lldp.py ...RRF                                      [ 72%]
kytos-1  | tests/test_e2e_32_of_lldp.py ...                                         [ 73%]
kytos-1  | tests/test_e2e_40_sdntrace.py ................                           [ 78%]
kytos-1  | tests/test_e2e_41_kytos_auth.py ........                                 [ 81%]
kytos-1  | tests/test_e2e_42_sdntrace.py ..                                         [ 81%]
kytos-1  | tests/test_e2e_50_maintenance.py ...............................         [ 92%]
kytos-1  | tests/test_e2e_60_of_multi_table.py .....                                [ 93%]
kytos-1  | tests/test_e2e_70_kytos_stats.py .........                               [ 96%]
kytos-1  | tests/test_e2e_80_pathfinder.py ss......                                 [ 99%]
kytos-1  | tests/test_e2e_90_kafka_events.py .                                      [ 99%]
kytos-1  | tests/test_e2e_95_telemtry_int.py s                                      [100%]
kytos-1  | 
kytos-1  | =================================== FAILURES ===================================
kytos-1  | ________________ TestE2EOfLLDP.test_010_liveness_intf_deletion _________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_31_of_lldp.TestE2EOfLLDP object at 0x7882e3f1ba90>
kytos-1  | 
kytos-1  |     def test_010_liveness_intf_deletion(self) -> None:
kytos-1  |         """Test liveness not loaded after intf deletion."""
kytos-1  |         polling_interval = 1
kytos-1  |         self.set_polling_time(polling_interval)
kytos-1  |         intf_id = "00:00:00:00:00:00:00:01:1"
kytos-1  |         interface_ids = [intf_id]
kytos-1  |         self.enable_link_liveness(interface_ids)
kytos-1  |     
kytos-1  |         time.sleep(polling_interval * 5)
kytos-1  |     
kytos-1  |         # Assert GET liveness/ entries are in init state
kytos-1  |         api_url = f"{KYTOS_API}/of_lldp/v1/liveness/"
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         for entry in data["interfaces"]:
kytos-1  |             assert entry["id"] in interface_ids, entry
kytos-1  |             assert entry["status"] == "init", entry
kytos-1  |     
kytos-1  |         # Disable the intf for deletion
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}/disable/'
kytos-1  |         response = requests.post(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Deactivate the interface for deletion
kytos-1  |         self.net.net.configLinkStatus('s1', 'h11', 'down')
kytos-1  |     
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}'
kytos-1  |         response = requests.delete(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Restart the controller maintaining config
kytos-1  | >       self.restart(wait_for=15)
kytos-1  | 
kytos-1  | tests/test_e2e_31_of_lldp.py:298: 
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | tests/test_e2e_31_of_lldp.py:38: in restart
kytos-1  |     self.net.start_controller(clean_config=clean_config, enable_all=enable_all)
kytos-1  | tests/helpers.py:402: in start_controller
kytos-1  |     self.wait_controller_start()
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | 
kytos-1  | self = <tests.helpers.NetworkTest object at 0x7882c02da290>
kytos-1  | 
kytos-1  |     def wait_controller_start(self):
kytos-1  |         """Wait until controller starts according to core/status API."""
kytos-1  |         wait_count = 0
kytos-1  |         last_error = ""
kytos-1  |         while wait_count < 60:
kytos-1  |             try:
kytos-1  |                 response = requests.get('http://127.0.0.1:8181/api/kytos/core/status/', timeout=3)
kytos-1  |                 assert response.json()['response'] == 'running', response.text
kytos-1  |                 break
kytos-1  |             except Exception as exc:
kytos-1  |                 last_error = str(exc)
kytos-1  |                 time.sleep(0.5)
kytos-1  |                 wait_count += 0.5
kytos-1  |         else:
kytos-1  |             msg = f"Timeout while starting Kytos controller. Last error: {last_error}"
kytos-1  | >           raise Exception(msg)
kytos-1  | E           Exception: Timeout while starting Kytos controller. Last error: HTTPConnectionPool(host='127.0.0.1', port=8181): Read timed out. (read timeout=3)
kytos-1  | 
kytos-1  | tests/helpers.py:422: Exception
kytos-1  | ---------------------------- Captured stdout setup -----------------------------
kytos-1  | FAIL to stop kytos after 5 seconds -- Kytos pid still exists.. Force stop!
kytos-1  | ---------------------------- Captured stdout setup -----------------------------
kytos-1  | FAIL to stop kytos after 5 seconds -- Kytos pid still exists.. Force stop!
kytos-1  | =============================== warnings summary ===============================
kytos-1  | usr/local/lib/python3.11/dist-packages/kytos/core/config.py:254
kytos-1  |   /usr/local/lib/python3.11/dist-packages/kytos/core/config.py:254: UserWarning: Unknown arguments: ['tests/', '--reruns', '2', '-r', 'fEr']
kytos-1  |     warnings.warn(f"Unknown arguments: {unknown}")
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py: 17 warnings
kytos-1  | tests/test_e2e_05_topology.py: 17 warnings
kytos-1  | tests/test_e2e_06_topology.py: 37 warnings
kytos-1  | tests/test_e2e_10_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_11_mef_eline.py: 25 warnings
kytos-1  | tests/test_e2e_12_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_13_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_14_mef_eline.py: 76 warnings
kytos-1  | tests/test_e2e_15_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_16_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_17_mef_eline.py: 37 warnings
kytos-1  | tests/test_e2e_18_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_20_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_21_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_22_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_23_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_30_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_31_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_32_of_lldp.py: 11 warnings
kytos-1  | tests/test_e2e_40_sdntrace.py: 49 warnings
kytos-1  | tests/test_e2e_41_kytos_auth.py: 17 warnings
kytos-1  | tests/test_e2e_42_sdntrace.py: 84 warnings
kytos-1  | tests/test_e2e_50_maintenance.py: 17 warnings
kytos-1  | tests/test_e2e_60_of_multi_table.py: 17 warnings
kytos-1  | tests/test_e2e_70_kytos_stats.py: 17 warnings
kytos-1  | tests/test_e2e_80_pathfinder.py: 37 warnings
kytos-1  | tests/test_e2e_90_kafka_events.py: 17 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1121: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     return ( StrictVersion( cls.OVSVersion ) <
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py: 17 warnings
kytos-1  | tests/test_e2e_05_topology.py: 17 warnings
kytos-1  | tests/test_e2e_06_topology.py: 37 warnings
kytos-1  | tests/test_e2e_10_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_11_mef_eline.py: 25 warnings
kytos-1  | tests/test_e2e_12_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_13_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_14_mef_eline.py: 76 warnings
kytos-1  | tests/test_e2e_15_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_16_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_17_mef_eline.py: 37 warnings
kytos-1  | tests/test_e2e_18_mef_eline.py: 17 warnings
kytos-1  | tests/test_e2e_20_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_21_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_22_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_23_flow_manager.py: 17 warnings
kytos-1  | tests/test_e2e_30_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_31_of_lldp.py: 16 warnings
kytos-1  | tests/test_e2e_32_of_lldp.py: 11 warnings
kytos-1  | tests/test_e2e_40_sdntrace.py: 49 warnings
kytos-1  | tests/test_e2e_41_kytos_auth.py: 17 warnings
kytos-1  | tests/test_e2e_42_sdntrace.py: 84 warnings
kytos-1  | tests/test_e2e_50_maintenance.py: 17 warnings
kytos-1  | tests/test_e2e_60_of_multi_table.py: 17 warnings
kytos-1  | tests/test_e2e_70_kytos_stats.py: 17 warnings
kytos-1  | tests/test_e2e_80_pathfinder.py: 37 warnings
kytos-1  | tests/test_e2e_90_kafka_events.py: 17 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1122: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     StrictVersion( '1.10' ) )
kytos-1  | 
kytos-1  | -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
kytos-1  | ------------------------------- start/stop times -------------------------------
kytos-1  | rerun: 0
kytos-1  | tests/test_e2e_30_of_lldp.py::TestE2EOfLLDP::test_010_disable_of_lldp: 2026-02-13,02:13:41.285613 - 2026-02-13,02:14:06.485583
kytos-1  | self = <tests.test_e2e_30_of_lldp.TestE2EOfLLDP object at 0x7882e3da1350>
kytos-1  | 
kytos-1  |     def test_010_disable_of_lldp(self):
kytos-1  |         """ Test if the disabling OF LLDP in an interface worked properly. """
kytos-1  |         self.net.start_controller(clean_config=True, enable_all=False)
kytos-1  |         self.net.wait_switches_connect()
kytos-1  |         time.sleep(5)
kytos-1  |         self.enable_all_interfaces()
kytos-1  |     
kytos-1  |         # disabling all the UNI interfaces
kytos-1  |         payload = {
kytos-1  |             "interfaces": [
kytos-1  |                 "00:00:00:00:00:00:00:01:1", "00:00:00:00:00:00:00:01:2", "00:00:00:00:00:00:00:01:4294967294",
kytos-1  |                 "00:00:00:00:00:00:00:02:1", "00:00:00:00:00:00:00:02:4294967294",
kytos-1  |                 "00:00:00:00:00:00:00:03:1", "00:00:00:00:00:00:00:03:4294967294"
kytos-1  |             ]
kytos-1  |         }
kytos-1  |         expected_interfaces = [
kytos-1  |                 "00:00:00:00:00:00:00:01:3", "00:00:00:00:00:00:00:01:4",
kytos-1  |                 "00:00:00:00:00:00:00:02:2", "00:00:00:00:00:00:00:02:3",
kytos-1  |                 "00:00:00:00:00:00:00:03:2", "00:00:00:00:00:00:00:03:3"
kytos-1  |         ]
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + '/of_lldp/v1/interfaces/disable/'
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + '/of_lldp/v1/interfaces/'
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         assert set(data["interfaces"]) == set(expected_interfaces)
kytos-1  |     
kytos-1  |         h11, h12, h2, h3 = self.net.net.get('h11', 'h12', 'h2', 'h3')
kytos-1  |         rx_stats_h11 = self.get_iface_stats_rx_pkt(h11)
kytos-1  |         rx_stats_h12 = self.get_iface_stats_rx_pkt(h12)
kytos-1  |         rx_stats_h2 = self.get_iface_stats_rx_pkt(h2)
kytos-1  |         rx_stats_h3 = self.get_iface_stats_rx_pkt(h3)
kytos-1  |         time.sleep(10)
kytos-1  |         rx_stats_h11_2 = self.get_iface_stats_rx_pkt(h11)
kytos-1  |         rx_stats_h12_2 = self.get_iface_stats_rx_pkt(h12)
kytos-1  |         rx_stats_h2_2 = self.get_iface_stats_rx_pkt(h2)
kytos-1  |         rx_stats_h3_2 = self.get_iface_stats_rx_pkt(h3)
kytos-1  |     
kytos-1  | >       assert rx_stats_h11_2 == rx_stats_h11 \
kytos-1  |             and rx_stats_h12_2 == rx_stats_h12 \
kytos-1  |             and rx_stats_h2_2 == rx_stats_h2 \
kytos-1  |             and rx_stats_h3_2 == rx_stats_h3
kytos-1  | E       assert (19 == 18)
kytos-1  | 
kytos-1  | tests/test_e2e_30_of_lldp.py:127: AssertionError
kytos-1  | rerun: 0
kytos-1  | tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion: 2026-02-13,02:20:08.025225 - 2026-02-13,02:27:15.923321
kytos-1  | self = <tests.test_e2e_31_of_lldp.TestE2EOfLLDP object at 0x7882e3f1ba90>
kytos-1  | 
kytos-1  |     def test_010_liveness_intf_deletion(self) -> None:
kytos-1  |         """Test liveness not loaded after intf deletion."""
kytos-1  |         polling_interval = 1
kytos-1  |         self.set_polling_time(polling_interval)
kytos-1  |         intf_id = "00:00:00:00:00:00:00:01:1"
kytos-1  |         interface_ids = [intf_id]
kytos-1  |         self.enable_link_liveness(interface_ids)
kytos-1  |     
kytos-1  |         time.sleep(polling_interval * 5)
kytos-1  |     
kytos-1  |         # Assert GET liveness/ entries are in init state
kytos-1  |         api_url = f"{KYTOS_API}/of_lldp/v1/liveness/"
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         for entry in data["interfaces"]:
kytos-1  |             assert entry["id"] in interface_ids, entry
kytos-1  |             assert entry["status"] == "init", entry
kytos-1  |     
kytos-1  |         # Disable the intf for deletion
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}/disable/'
kytos-1  |         response = requests.post(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Deactivate the interface for deletion
kytos-1  |         self.net.net.configLinkStatus('s1', 'h11', 'down')
kytos-1  |     
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}'
kytos-1  |         response = requests.delete(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Restart the controller maintaining config
kytos-1  | >       self.restart(wait_for=15)
kytos-1  | 
kytos-1  | tests/test_e2e_31_of_lldp.py:298: 
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | tests/test_e2e_31_of_lldp.py:38: in restart
kytos-1  |     self.net.start_controller(clean_config=clean_config, enable_all=enable_all)
kytos-1  | tests/helpers.py:402: in start_controller
kytos-1  |     self.wait_controller_start()
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | 
kytos-1  | self = <tests.helpers.NetworkTest object at 0x7882c02da290>
kytos-1  | 
kytos-1  |     def wait_controller_start(self):
kytos-1  |         """Wait until controller starts according to core/status API."""
kytos-1  |         wait_count = 0
kytos-1  |         last_error = ""
kytos-1  |         while wait_count < 60:
kytos-1  |             try:
kytos-1  |                 response = requests.get('http://127.0.0.1:8181/api/kytos/core/status/', timeout=3)
kytos-1  |                 assert response.json()['response'] == 'running', response.text
kytos-1  |                 break
kytos-1  |             except Exception as exc:
kytos-1  |                 last_error = str(exc)
kytos-1  |                 time.sleep(0.5)
kytos-1  |                 wait_count += 0.5
kytos-1  |         else:
kytos-1  |             msg = f"Timeout while starting Kytos controller. Last error: {last_error}"
kytos-1  | >           raise Exception(msg)
kytos-1  | E           Exception: Timeout while starting Kytos controller. Last error: HTTPConnectionPool(host='127.0.0.1', port=8181): Read timed out. (read timeout=3)
kytos-1  | 
kytos-1  | tests/helpers.py:422: Exception
kytos-1  | rerun: 1
kytos-1  | tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion: 2026-02-13,02:27:37.065407 - 2026-02-13,02:34:44.996798
kytos-1  | self = <tests.test_e2e_31_of_lldp.TestE2EOfLLDP object at 0x7882e3f1ba90>
kytos-1  | 
kytos-1  |     def test_010_liveness_intf_deletion(self) -> None:
kytos-1  |         """Test liveness not loaded after intf deletion."""
kytos-1  |         polling_interval = 1
kytos-1  |         self.set_polling_time(polling_interval)
kytos-1  |         intf_id = "00:00:00:00:00:00:00:01:1"
kytos-1  |         interface_ids = [intf_id]
kytos-1  |         self.enable_link_liveness(interface_ids)
kytos-1  |     
kytos-1  |         time.sleep(polling_interval * 5)
kytos-1  |     
kytos-1  |         # Assert GET liveness/ entries are in init state
kytos-1  |         api_url = f"{KYTOS_API}/of_lldp/v1/liveness/"
kytos-1  |         response = requests.get(api_url)
kytos-1  |         data = response.json()
kytos-1  |         for entry in data["interfaces"]:
kytos-1  |             assert entry["id"] in interface_ids, entry
kytos-1  |             assert entry["status"] == "init", entry
kytos-1  |     
kytos-1  |         # Disable the intf for deletion
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}/disable/'
kytos-1  |         response = requests.post(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Deactivate the interface for deletion
kytos-1  |         self.net.net.configLinkStatus('s1', 'h11', 'down')
kytos-1  |     
kytos-1  |         api_url = f'{KYTOS_API}/topology/v3/interfaces/{intf_id}'
kytos-1  |         response = requests.delete(api_url)
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |     
kytos-1  |         # Restart the controller maintaining config
kytos-1  | >       self.restart(wait_for=15)
kytos-1  | 
kytos-1  | tests/test_e2e_31_of_lldp.py:298: 
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | tests/test_e2e_31_of_lldp.py:38: in restart
kytos-1  |     self.net.start_controller(clean_config=clean_config, enable_all=enable_all)
kytos-1  | tests/helpers.py:402: in start_controller
kytos-1  |     self.wait_controller_start()
kytos-1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
kytos-1  | 
kytos-1  | self = <tests.helpers.NetworkTest object at 0x7882c02da290>
kytos-1  | 
kytos-1  |     def wait_controller_start(self):
kytos-1  |         """Wait until controller starts according to core/status API."""
kytos-1  |         wait_count = 0
kytos-1  |         last_error = ""
kytos-1  |         while wait_count < 60:
kytos-1  |             try:
kytos-1  |                 response = requests.get('http://127.0.0.1:8181/api/kytos/core/status/', timeout=3)
kytos-1  |                 assert response.json()['response'] == 'running', response.text
kytos-1  |                 break
kytos-1  |             except Exception as exc:
kytos-1  |                 last_error = str(exc)
kytos-1  |                 time.sleep(0.5)
kytos-1  |                 wait_count += 0.5
kytos-1  |         else:
kytos-1  |             msg = f"Timeout while starting Kytos controller. Last error: {last_error}"
kytos-1  | >           raise Exception(msg)
kytos-1  | E           Exception: Timeout while starting Kytos controller. Last error: HTTPConnectionPool(host='127.0.0.1', port=8181): Read timed out. (read timeout=3)
kytos-1  | 
kytos-1  | tests/helpers.py:422: Exception
kytos-1  | tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion: 2026-02-13,02:35:06.127242 - 2026-02-13,02:42:14.019770
kytos-1  | =========================== rerun test summary info ============================
kytos-1  | RERUN tests/test_e2e_30_of_lldp.py::TestE2EOfLLDP::test_010_disable_of_lldp
kytos-1  | RERUN tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion
kytos-1  | RERUN tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion
kytos-1  | =========================== short test summary info ============================
kytos-1  | FAILED tests/test_e2e_31_of_lldp.py::TestE2EOfLLDP::test_010_liveness_intf_deletion
kytos-1  | = 1 failed, 277 passed, 9 skipped, 7 xfailed, 7 xpassed, 1355 warnings, 3 rerun in 13730.45s (3:48:50) =

�[Kkytos-1 exited with code 1

Have been consistently failing that 1 of_lldp test. I'll have to do some investigation on what's meant to be happening here to understand why the error is occuring, and how my changes caused it.

@Ktmi
Copy link
Author

Ktmi commented Feb 20, 2026

@viniarck I can't figure out the E2E test issue. I did some further experimenting with it, and wherever the packet is coming from doesn't appear to be from of_lldp, as shutting down the controller before testing if any packets are being sent across those links still results in packets being detected. Other than that, this is feature complete, with an exception to maybe some web-ui changes for updating tag ranges for both interfaces and links.

Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did another partial review.

I'll do another later soon enough.

Did you also have the chance to try out all the scripts?

[UNRELEASED] - Under development
********************************

Changed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog should include the new endpoints

if (
link.id in self._link_tags_updated_at
and self._link_tags_updated_at[link.id] > event.timestamp
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return is missing here, as it is it'd be able to process an old preempted event.

Can you review this again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also lead to other issues, since it won't right correctly to db.link_details, after a restart, it would leak vlans, example with a link with one EVC:

kytos $> link_id = '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0'

kytos $> 

kytos $> controller.links[link_id].available_tags
Out[2]: {'vlan': [[2, 3798], [3800, 4094]]}

After restart:


kytos $> link_id = '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0'

kytos $> controller.links[link_id].available_tags
Out[2]: {'vlan': [[1, 3798], [3800, 4094]]}

port_num = int(port_num)
return self.db.switches.find_one_and_update(
{"_id": switch_id},
{"$unset": {"interfaces.$[iface]": 1}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this leave a null interface? Did you mean to use $pull instead of $unset?

Can you review this again

class TopoController:
"""TopoController."""

def __init__(self, get_mongo=lambda: Mongo()) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial comment but we've been using this way in other NApps too, do we really need this equivalent change and start deviating from the rest?

Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a major issue after restart that's leaking vlan.

Make sure to fully explore your change

if (
link.id in self._link_tags_updated_at
and self._link_tags_updated_at[link.id] > event.timestamp
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also lead to other issues, since it won't right correctly to db.link_details, after a restart, it would leak vlans, example with a link with one EVC:

kytos $> link_id = '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0'

kytos $> 

kytos $> controller.links[link_id].available_tags
Out[2]: {'vlan': [[2, 3798], [3800, 4094]]}

After restart:


kytos $> link_id = '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0'

kytos $> controller.links[link_id].available_tags
Out[2]: {'vlan': [[1, 3798], [3800, 4094]]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants