update run marker to be on at start and off at stop-trigger-sources #10
update run marker to be on at start and off at stop-trigger-sources #10wesketchum wants to merge 1 commit intopatch/fddaq-v5.3.xfrom
Conversation
…trying to avoid an infinite while loop)
|
To test: |
|
Thank you for your efforts to solve the "stop-trigger-sources" transition issue. I must say I cannot see how the current implementation may be causing the issue and how this PR will fix it. I used In if (!m_run_marker.load()) {
set_running(true);
TLOG() << "Starting iface wrappers.";
for (auto& [iface_id, iface] : m_ifaces) {
iface->start();
}
} else {
TLOG_DEBUG(5) << "iface wrappers are already running!";
}We don’t have m_producer_thread.set_work(&CRTGrenobleReaderModule::run_produce, this);RX threads then start reading data over DPDK. But when they arrive to the point where the Parallelly, this is // Configure HW interface?
if (!m_run_marker.load()) {
set_running(true);
} else {
TLOG_DEBUG(5) << "Already running!";
} We could put the call to Moving on to for (auto& [iface_id, iface] : m_ifaces) {
iface->enable_flow();
}This is the trigger for data handling to start.
enable_flow();
m_producer_thread.set_work(&CRTGrenobleReaderModule::run_produce, this);As I mentioned, the logic is currently not being used. We enable the flow before scheduling the work anyway, so, currently for CRT, we start data taking and immediately handling. We can change this by moving Until this point I covered "conf" and "start". I don’t see the point of duplicating Moving on to for (auto& [iface_id, iface] : m_ifaces) {
iface->disable_flow();
}This stops data handling, but data taking will continue. CRTGrenobleReaderModule::do_stop: disable_flow();Same for CRT. (Again, maybe we don’t even want this.) Finally, DPDKReaderModule::do_scrap: if (m_run_marker.load()) {
TLOG() << "Raising stop through variables!";
set_running(false);
TLOG() << "Stopping iface wrappers.";
for (auto& [iface_id, iface] : m_ifaces) {
iface->stop();
}
ealutils::wait_for_lcores();
TLOG() << "Stoppped DPDK lcore processors and internal threads...";
} else {
TLOG_DEBUG(5) << "DPDK lcore processor is already stopped!";
}What
if (m_run_marker.load()) {
TLOG() << "Raising stop through variables!";
set_running(false);
while (!m_producer_thread.get_readiness()) {
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
} else {
TLOG_DEBUG(5) << "Already stopped!";
} Now we covered "stop" and "scrap". Similarly, I don’t see the point of duplicating |
|
As far as the problem of a timeout occurring on the stop triggers transition, I don't believe this PR actually solves it. Please note that in my experience, the problem only occurs roughly ~20% of the time which means we need to do multiple runs to figure out if things are working. However, I created a test build which uses this PR ( 10 times in a loop, 3 of those times we get the timeout problem. (*) https://github.com/DUNE-DAQ/daq-release/actions/runs/16180927746 |
|
Closing as it's resolved otherwise in a different PR. |
(… trying to avoid an infinite while loop)
This is in response to the issue raised here: DUNE-DAQ/daqsystemtest#218
@bieryAtFnal reports that this resolved the problem when he ran it (though @denizergonul and I were not able to reproduce the problem in our own tests).