-
Notifications
You must be signed in to change notification settings - Fork 938
Description
Describe the bug
Introduction of the async confbridge in #3183 also introduced race conditions.
The update changed for how long conf->mutex is held when the confbridge tick is being processed.
Before, the lock would be held for the entirety of the tick; now it is "only" held
while getting a task from the op queue, but the actual task (add, remove, (dis-)connect port) runs unsynchronized.
Some of the other (public) confbridge functions read/modify members of the confbridge or confports
after acquiring the conf mutex. Previously, this wasn't an issue because while these functions were reading/modifying data,
the confbridge had to wait for those functions to finish.
However, this is no longer the case. These functions still (except for two, see below) acquire the conf mutex (for modifying port data)
but the confbridge itself doesn't hold the conf mutex while processing the ports, thus we've got a race condition.
In the current master branch, after manually reviewing the code I found issues in the following functions:
pjmedia_conf_get_port_count()
Accesses conf->port_cnt unsynchronized (was the case even before introduction of the async confbridge).
Maybe using an atomic would help? The member is modified in create_sound_port() (during creation of the confbridge, so there shouldn't be an issue),
in op_add_port() and op_remove_port(); both of which are called with no lock.
pjmedia_conf_get_connect_count()
Accesses conf->connect_cnt unsynchronized (was the case even before introduction of the async confbridge).
Same as above basically.
The member is modified in op_connect_ports(), op_disconnect_ports() both if which run unsynchronized.
The member is read in pjmedia_conf_connect_port() with the conf mutex being held.
pjmedia_conf_configure_port()
Potentially sets tx_setting and/or rx_setting of the port while the confbridge is servicing ports.
This can be problematic because previous assumptions don't hold any longer:
If for example a direction is disabled, then with the synchronous version we would know for a fact that after the function returns
the port will no longer get get_frame() or put_frame() calls (depending on direction that has been disabled).
With the introduction of the async confbridge, this is no longer true because the confbridge itself locks conf->mutex only while getting a task from the op queue
but no longer while it is processing the ports, so even if the function returns, there may be calls into the confport (at least until the next tick I suppose)
there is not really a way of knowing when the port is "really" disabled.
pjmedia_conf_adjust_tx_level()
Sets tx_adj_level of the port while it is being serviced.
pjmedia_conf_adjust_rx_level()
Sets rx_adj_level of the port while it is being serviced.
pjmedia_conf_get_signal_level()
Reads tx_level and rx_level of the port while it is being serviced.
pjmedia_conf_adjust_conn_level()
Sets listener_adj_level while the ports are being serviced in conference.c get_frame()
Has this ever come up before?
Is anyone else having issues with this?
I did the review because I was reviewing a race condition between a WAV player EOF callback and a "start"
function both of which make use of pjmedia_conf_configure_port().
Steps to reproduce
Hard to do with the test app; needs custom code to provoke race conditions I guess...
PJSIP version
master
Context
The (app level) race condition I was investigating runs no Linux/Windows.
Log, call stack, etc
No logs