-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Summary
Multiple Neutron API workers concurrently updating the same metadata port can cause data inconsistency between Neutron and OVN databases, where OVN ends up with fewer IP addresses than Neutron despite having matching revision numbers.
Environment
- OpenStack with Neutron using OVN backend
- Multiple Neutron API workers (distributed across multiple nodes)
- Galera cluster for Neutron database
- OVN with clustered OVSDB (Raft consensus)
Problem Description
Observed Behavior
When creating multiple subnets on a network in rapid succession, the metadata port gets updated by multiple workers simultaneously. This results in:
- OVN database having only 2 IP addresses for the metadata port
- Neutron database having 4 IP addresses for the same port
- Both databases showing the same revision number (5)
- No RevisionConflict exceptions being raised
All updates happened within 29ms across different controller nodes.
Root Cause Analysis
Transaction Isolation Issue
- Neutron uses
REPEATABLE READisolation level - Each worker starts a transaction and reads the metadata port state
- Due to
REPEATABLE READ, workers cannot see concurrent updates from other workers - All workers read the same initial state and proceed with stale data
Code Flow
The race occurs in update_metadata_port (neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py:2784-2849):
def update_metadata_port(self, context, network, subnet=None):
# Worker reads metadata port once at the beginning
metadata_port = self.create_metadata_port(context, network) # Line 2814
# Uses this stale metadata_port throughout the function
port_subnet_ids = {ip['subnet_id'] for ip in metadata_port['fixed_ips']} # Line 2820
# Updates based on stale data
if subnet_ids != port_subnet_ids:
update_metadata_port_fixed_ips(metadata_port, # Passes stale port
subnet_ids - port_subnet_ids,
port_subnet_ids - subnet_ids)Why StaleDataError Isn't Raised
Despite SQLAlchemy having version_id_col support via revision_number:
Portobject not directly modified: Updates tofixed_ipsmodify relatedIPAllocationobjects, not thePortobject itself- Indirect revision bumping:
IPAllocationhasrevises_on_change = ('port',)which bumps Port revision indirectly - REPEATABLE READ prevents version detection: Even when revision is bumped, other workers can't see it due to transaction isolation
- Insufficient revision checking:
CheckRevisionNumberCommandonly prevents revision from going backwards, doesn't check if revision changed since initial read
Call Stack
create_subnet (API call from different workers)
↓
_create_subnet_postcommit (ML2 plugin)
↓
create_subnet (OVN client)
↓
update_metadata_port (reads port once, uses stale data)
↓
update_port (ML2 plugin with REPEATABLE READ transaction)
↓
OVN database update (with stale fixed_ips)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels