update run marker to be on at start and off at stop-trigger-sources by wesketchum · Pull Request #10 · DUNE-DAQ/crtmodules

wesketchum · 2025-07-02T12:21:56Z

(… trying to avoid an infinite while loop)

This is in response to the issue raised here: DUNE-DAQ/daqsystemtest#218

@bieryAtFnal reports that this resolved the problem when he ran it (though @denizergonul and I were not able to reproduce the problem in our own tests).

…trying to avoid an infinite while loop)

wesketchum · 2025-07-02T12:25:58Z

To test:

source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt latest_v5

dbtx-create-workarea-from-recipe.py --use-ref https://gist.githubusercontent.com/wesketchum/c64eb0dedccdf43219b86c41d68f7fcc/raw

cd testarea_wketchum_crt_test_fix*
source env.sh
dbt-build
dbt-workarea-env

drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-crt-grenoble-1x1-config ${USER}-local-test boot wait 5 conf wait 3 start --run-number 134 enable-triggers wait 30 disable-triggers drain-dataflow stop-trigger-sources stop scrap terminate

denizergonul · 2025-07-03T10:25:03Z

Thank you for your efforts to solve the "stop-trigger-sources" transition issue. I must say I cannot see how the current implementation may be causing the issue and how this PR will fix it.

I used dpdklibs::DPDKReaderModule as a reference to write this part.

In DPDKReaderModule::do_configure:

if (!m_run_marker.load()) {
  set_running(true);
  TLOG() << "Starting iface wrappers.";
  for (auto& [iface_id, iface] : m_ifaces) {
    iface->start();
  }
} else {
  TLOG_DEBUG(5) << "iface wrappers are already running!";
}

We don’t have m_ifaces for CRT, but what the start function does is to schedule work (IfaceWrapper::rx_runner) for the RX threads. I believe this is similar to:

m_producer_thread.set_work(&CRTGrenobleReaderModule::run_produce, this);

RX threads then start reading data over DPDK. But when they arrive to the point where the handle_eth_payload call is, they will not make the call because IfaceWrapper::m_lcore_enable_flow is still false at this point. In other words, data taking has started, but data is not being handled yet.

Parallelly, this is CRTGrenobleReaderModule::do_conf:

// Configure HW interface?
if (!m_run_marker.load()) {
  set_running(true);
} else {
  TLOG_DEBUG(5) << "Already running!";
}

We could put the call to set_work here, but we were not sure if this “start taking data but don’t handle it yet” logic is wanted for CRT.

Moving on to DPDKReaderModule::do_start:

for (auto& [iface_id, iface] : m_ifaces) {
  iface->enable_flow();
}

This is the trigger for data handling to start.

CRTGrenobleReaderModule::do_start:

enable_flow(); 
m_producer_thread.set_work(&CRTGrenobleReaderModule::run_produce, this);

As I mentioned, the logic is currently not being used. We enable the flow before scheduling the work anyway, so, currently for CRT, we start data taking and immediately handling. We can change this by moving set_work inside do_conf, if desired.

Until this point I covered "conf" and "start". I don’t see the point of duplicating set_running(true); at start as it is already being done at conf.

Moving on to DPDKReaderModule::do_stop:

for (auto& [iface_id, iface] : m_ifaces) {
  iface->disable_flow();
}

This stops data handling, but data taking will continue.

CRTGrenobleReaderModule::do_stop:

disable_flow();

Same for CRT. (Again, maybe we don’t even want this.)

Finally, DPDKReaderModule::do_scrap:

if (m_run_marker.load()) {
  TLOG() << "Raising stop through variables!";
  set_running(false);
  TLOG() << "Stopping iface wrappers.";
  for (auto& [iface_id, iface] : m_ifaces) {
    iface->stop();
  }
  ealutils::wait_for_lcores();
  TLOG() << "Stoppped DPDK lcore processors and internal threads...";
} else {
  TLOG_DEBUG(5) << "DPDK lcore processor is already stopped!";
}

What IfaceWrapper::stop does is to break the data taking while loop. For CRT, this corresponds to checking m_run_marker anyway (while (m_run_marker.load())), so, we don’t need to do anything else. Then ealutils::wait_for_lcores corresponds to CRT waiting for the thread to end.

CRTGrenobleReaderModule::do_scrap:

if (m_run_marker.load()) {
  TLOG() << "Raising stop through variables!";
  set_running(false);
  while (!m_producer_thread.get_readiness()) {
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
  }
} else {
  TLOG_DEBUG(5) << "Already stopped!";
}

Now we covered "stop" and "scrap". Similarly, I don’t see the point of duplicating set_running(false); at stop as it is already being done at scrap.

jcfreeman2 · 2025-07-10T14:28:11Z

As far as the problem of a timeout occurring on the stop triggers transition, I don't believe this PR actually solves it. Please note that in my experience, the problem only occurs roughly ~20% of the time which means we need to do multiple runs to figure out if things are working. However, I created a test build which uses this PR (CRTFD_DEV_250709_A9) (*) and I just created a work area on daq.fnal.gov based on it; after running

drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-crt-grenoble-1x1-config ${USER}-local-test boot --no-override-logs wait 5 conf wait 3 start --run-number 111 enable-triggers wait 10 disable-triggers drain-dataflow stop-trigger-sources stop scrap terminate

10 times in a loop, 3 of those times we get the timeout problem.

(*) https://github.com/DUNE-DAQ/daq-release/actions/runs/16180927746

wesketchum · 2025-08-15T14:23:57Z

Closing as it's resolved otherwise in a different PR.

update run marker to be on at start and off at stop-trigger-sources (…

c2f1599

…trying to avoid an infinite while loop)

wesketchum requested review from bieryAtFnal and denizergonul July 2, 2025 12:22

wesketchum mentioned this pull request Jul 2, 2025

cleanup in action plan, as recommended by Kurt DUNE-DAQ/daqsystemtest#219

Merged

jcfreeman2 mentioned this pull request Jul 10, 2025

m_raw_receiver_sleep_us not initialized in SocketWriterModule DUNE-DAQ/asiolibs#8

Closed

wesketchum closed this Aug 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update run marker to be on at start and off at stop-trigger-sources #10

update run marker to be on at start and off at stop-trigger-sources #10
wesketchum wants to merge 1 commit intopatch/fddaq-v5.3.xfrom
wketchum/modify_run_marker_location

wesketchum commented Jul 2, 2025

Uh oh!

wesketchum commented Jul 2, 2025

Uh oh!

denizergonul commented Jul 3, 2025

Uh oh!

jcfreeman2 commented Jul 10, 2025

Uh oh!

wesketchum commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wesketchum commented Jul 2, 2025

Uh oh!

wesketchum commented Jul 2, 2025

Uh oh!

denizergonul commented Jul 3, 2025

Uh oh!

jcfreeman2 commented Jul 10, 2025

Uh oh!

wesketchum commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants