Skip to content

Conversation

@Sikabo
Copy link

@Sikabo Sikabo commented Oct 7, 2025

Update to build and run on modern Linux distributions with current toolchains. For example Debian Trixie

Tested with: NS-3.35 (and NS-3.45 but probably simpler to start with 3.35).

glance- and others added 7 commits October 7, 2025 14:06
Modern glibc have dropped their inline function mapping, so they use the
"real" name as their symbol now.

This adapts the dce code to this.

The old int ver is still passed as a dummy parameter via the c++ class
layer for Fxstat/Fxstat64 but that can be dropped later.

Signed-off-by: Anders Martinsson <[email protected]>
This tool expects all libraries to be linked via --no-as-needed but it
doesn't work as we want it to for some reason, so workaround it via
some dlopen to ensure the lib is loaded.

Signed-off-by: Anders Martinsson <[email protected]>
libio have bin merged into glibc, and _IO_* functions are now gone.

Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
@Sikabo
Copy link
Author

Sikabo commented Oct 13, 2025

This branch should build on Debian Trixie with latest NS-3 release. One patch to NS-3 is needed. Sikabo/ns-3-dev-git@e7d2a51

I have not looked into needed DCE changes to get rid of the NS-3 patch.

@Gabrielcarvfer
Copy link

Regarding the ns-3 patch, not needed. This was changed recently upstream because there were issues with it.

Basically needs to remove all instances of

virtual TypeId GetInstanceTypeId (void) const;

and

TypeId
*::GetInstanceTypeId (void) const
{
return *::GetTypeId ();
}

Then they should be resolved properly.

See https://gitlab.com/nsnam/ns-3-dev/-/merge_requests/2287

Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
From NS-3 change log:
* (core) Deprecated "EventId::IsRunning()". It has been replaced with
"EventId::IsPending()".

Signed-off-by: Anders Martinsson <[email protected]>
@Sikabo
Copy link
Author

Sikabo commented Nov 25, 2025

Thanks @Gabrielcarvfer

This pull request should build with the latest NS3 release (NS-3.46) on Debian Trixie.

@tomhenderson
Copy link
Collaborator

Thanks @Gabrielcarvfer

This pull request should build with the latest NS3 release (NS-3.46) on Debian Trixie.

Hello, thanks for this PR. We will try to find time to review and test.

How much have you tested DCE (test suites, a real workload), vs. making the build succeed? The reason that I ask is that the previous effort to modernize DCE succeeded to make the build pass, and it seemed to work with some existing tests, but later when I tried to use it on a project, it turned out to be unstable.

If you did successfully test, can you sketch out what you did (system tested on, how you ran tests)?

@Sikabo
Copy link
Author

Sikabo commented Dec 10, 2025

Thanks @Gabrielcarvfer
This pull request should build with the latest NS3 release (NS-3.46) on Debian Trixie.

Hello, thanks for this PR. We will try to find time to review and test.

I still use the patched NS-3.46 that allow me to override GetInstanceTypeId. I have not had time to look at the runtime error I get without it. I think it's a usage error from my side, but I haven't had time to look yet so I'm not ruling anything out.

How much have you tested DCE (test suites, a real workload), vs. making the build succeed? The reason that I ask is that the previous effort to modernize DCE succeeded to make the build pass, and it seemed to work with some existing tests, but later when I tried to use it on a project, it turned out to be unstable.

I use DCE a lot, mainly for a large multithreaded C program but also a Rust project. Mainly for testing UDP communication but also TCP. I have been running DCE for a long time and this is as stable as when I started using DCE (when DCE ran on latest version of Debian).

If you did successfully test, can you sketch out what you did (system tested on, how you ran tests)?

I will try to produce examples that's suitable to share.

@Sikabo
Copy link
Author

Sikabo commented Dec 11, 2025

I have prepared branch with a working example. https://github.com/Sikabo/ns-3-dce/tree/dce_example

This example require that object.h in NS-3 is patched. NS-3-DCE need a rebuilt after the patch is applied (clean then build).

- TypeId GetInstanceTypeId() const final;
+ TypeId GetInstanceTypeId() const;

Run with command below and look at output in files-0 and files-1:
./waf --run dce_example

This new branch is probably a better starting-point for trying to get things working without a patched version of NS-3 than the branch in this pull-request.

Is this enough or do you want more details about how I get this running?

@Sikabo
Copy link
Author

Sikabo commented Dec 11, 2025

I have updated https://github.com/Sikabo/ns-3-dce/tree/dce_example with two new commits. So now it should work without modifying object.h in NS-3. Many thanks to @Gabrielcarvfer. I don't think I have understood everything in enough detail so there are probably room for improvements.

glance- and others added 2 commits December 12, 2025 15:50
This is to fix direct-code-execution#57

There are a couple of new TODO:s that would be good to correct.

Signed-off-by: Anders Martinsson <[email protected]>
@Sikabo
Copy link
Author

Sikabo commented Dec 12, 2025

I have updated the branch https://github.com/Sikabo/ns-3-dce/commits/sikabo_master to include the last parts that was needed to get things working without modifying NS-3.

@Sikabo
Copy link
Author

Sikabo commented Dec 15, 2025

@tomhenderson: This works for me on Debian Trixie. Feel free to ask if there are any questions.

#!/usr/bin/env bash

mkdir -p "dce-build/source"
pushd "dce-build" || exit 1

INSTALLDIR="$PWD/build/"
NS3_VER=3.46

# Build NS-3
pushd "source" || exit 1
git clone --branch "ns-$NS3_VER" https://gitlab.com/nsnam/ns-3-dev.git "ns-$NS3_VER"
pushd "ns-$NS3_VER" || exit 1

./ns3 configure --disable-werror "--prefix=$INSTALLDIR" --build-profile=release
./ns3 build
./ns3 install

popd || exit 1 # "ns-$NS3_VER"

# Build DCE
git clone --branch sikabo_master https://github.com/Sikabo/ns-3-dce.git "ns-3-dce"
pushd "ns-3-dce" || exit 1

./waf configure "--with-ns3=$INSTALLDIR" "--prefix=$INSTALLDIR" --enable-opt
./waf build

export DCE_PATH="$PWD/myscripts/mc2"
./waf --run "dce-mt2"
cat files-1/var/log/11279/stdout

@Gabrielcarvfer
Copy link

Thanks for the code. I will try to write a GitHub actions to replace CircleCI. Just need few more days (very busy end of year).

@Gabrielcarvfer
Copy link

Gabrielcarvfer commented Dec 18, 2025

Ok, gave it a go and found multiple examples/tests failing. Some encoding issues in thread management. Path issues. Not sure if this MR fault, or misconfiguration somewhere.

dce-tcp-simple
dce-ping
dce-iperf
test: dce-process-manager


Don't know how to configure dynamic library path for the platform 'linux'; assuming it's LD_LIBRARY_PATH.
NS_ASSERT failed, cond="exeFullPath.length () > 0", msg="Executable 'iperf' not found !  Please check your DCE_PATH and DCE_ROOT environment variables.", +0.600000000s 1 file=../model/dce-manager.cc, line=262
NS_FATAL, terminating
terminate called without an active exception
Command ['/dce-build/source/ns-3-dce/build/bin/dce-iperf'] terminated with signal SIGIOT. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").

#0  0x000072c8cfe6f4c4 in std::codecvt<char16_t, char, __mbstate_t>::do_unshift(__mbstate_t&, char*, char*, char*&) const () from elf-cache/0/libstdc++.so.6
(gdb) bt
#0  0x000072c8cfe6f4c4 in std::codecvt<char16_t, char, __mbstate_t>::do_unshift(__mbstate_t&, char*, char*, char*&) const () from elf-cache/0/libstdc++.so.6
#1  0x000072c8cfedb11d in std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<long>(long) () from elf-cache/0/libstdc++.so.6
#2  0x000072c8d40aa436 in client_run (ctx=0x0) at ../example/tcp-loopback.cc:37
#3  0x000072c8d566f880 in pthread_do_start (context=0x639d9a3b2ef0) at ../model/dce-pthread.cc:108
#4  0x000072c8d56a26b2 in ns3::TaskManager::Trampoline (context=0x72c8d001b400) at ../model/task-manager.cc:275
#5  0x000072c8d569e127 in ns3::PthreadFiberManager::Run (arg=0x72c8d0019900) at ../model/pthread-fiber-manager.cc:402
#6  0x000072c8d449fb7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x000072c8d451d5f0 in clone () from /lib/x86_64-linux-gnu/libc.so.6
      x000072c8d449fb7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x000072c8d451d5f0 in clone () from /lib/x86_64-linux-gnu/libc.so.64-linux-gnu/libc.so.6

@Sikabo
Copy link
Author

Sikabo commented Dec 18, 2025

I think it might be related to C++ iostream. I got dce-tcp-simple working by making it more C-like (replacing std::cout with printf and some minor changes in thread-handling that might not be needed).

We only use DCE for C and RUST code, so C++ is a blind spot for our usage. Should be possible to check by writing a program that do as little as possible but still crash.

@tomhenderson
Copy link
Collaborator

I finally had some time to try this branch, and was able to reproduce the reported behaviors.

Regarding these failures:

  • dce-tcp-simple
  • dce-ping
  • dce-iperf
  • test: dce-process-manager

dce-ping and dce-iperf likely fail because the binaries that they rely on (not the ns-3 ping or iperf, but the actual programs) are missing. Bake would normally build them (and we need to see if they could still successfully link to newer glibc) but the above build script just focuses on building dce itself.

dec-tcp-simple exhibits failures that I observed also in using Parth's 2021 GSOC code (which as I mentioned in other issues, introduced some instability). Here is a backtrace of dce-tcp-simple:

Starting program: /docker/dce-build/source/ns-3-dce/build/bin/dce-tcp-simple 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGUSR1, User defined signal 1.
0x00007d79648e695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) up
#1  0x00007d7964891cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 
#2  0x00007d7965a620bd in ns3::StackTrampoline::StackTrampoline (this=0x5df4cd1b39d0)
    at ../model/pthread-fiber-manager.cc:143
143	    status = raise (SIGUSR1);
(gdb) 
#3  0x00007d7965a5eb8a in ns3::PthreadFiberManager::PthreadFiberManager (this=0x5df4cd219e30)
    at ../model/pthread-fiber-manager.cc:244
244	  m_trampoline = new StackTrampoline ();
(gdb) 
#4  0x00007d7965a6934b in ns3::TaskManager::SetFiberManagerType (this=0x5df4cd1b80d0, 
    type=ns3::TaskManager::PTHREAD_FIBER_MANAGER) at ../model/task-manager.cc:510
510	      m_fiberManager = new PthreadFiberManager ();
(gdb) 
#5  0x00007d7965a6be56 in ns3::DoMakeAccessorHelperOne<ns3::EnumValue<ns3::TaskManager::FiberManagerType>, ns3::TaskManager, ns3::TaskManager::FiberManagerType>(void (ns3::TaskManager::*)(ns3::TaskManager::FiberManagerType))::MemberMethod::DoSet(ns3::TaskManager*, ns3::EnumValue<ns3::TaskManager::FiberManagerType> const*) const (this=0x5df4cd1cfe90, object=0x5df4cd1b80d0, 
    v=0x5df4cd219c70) at /docker/dce-build/build/include/ns3/attribute-accessor-helper.h:387
387	            (object->*m_setter)(tmp);

I was experiencing similar failures with DCE-1.12 and the modified glibc a couple of years ago; something is unstable with thread handling.

I will push a docker image allowing this to be reproduced, and I'll also work on a bake branch to replace the build script so we can look into iperf and ping binary support. I also will check whether the above problems with threading can be isolated to any of Parth's commits.

@tomhenderson
Copy link
Collaborator

For reference, Debian trixie glibc version is 2.41.

@Gabrielcarvfer
Copy link

Wait, but I had iperf and ping installed via apt. I need a different glibc regardless?

Regarding c++ encoding, I will take a look. Maybe there is something missing in setup.

@tomhenderson
Copy link
Collaborator

DCE apps need to be built as position independent code. See how iperf binary has been built by bake (for DCE) in the past: https://gitlab.com/nsnam/bake/-/blob/master/bakeconf.xml?ref_type=heads#L168

Then iperf needs to be loaded multiple times by DCE (for each simulation instance). glibc put some security features in that blocked how DCE was doing this (for stack-smashing protection, I believe), which led us to look at building a custom glibc for DCE that was the focus on Parth's project (which he got to work on glibc-2.31 at the time). The custom glibc undid those security features.

@Gabrielcarvfer
Copy link

Hmm, maybe we should use musl libc instead then? It has less hardening features. And since you require recompiling from scratch anyways, sounds like a decent idea.

@Sikabo
Copy link
Author

Sikabo commented Dec 30, 2025

I appreciate that you are looking into this, it helps a lot.

In test: dce-process-manager, it's "only" test-local-socket that fail when I run the tests. I have not looked into why.

I have a patch that will solve the problem with "dce-tcp-simple" by overriding std::cout and std::cerr. I still don't know the entire problem, but from my understanding it's related to "thread-local storage" (TLS). There are quite a lot of problems/potential problems related to TLS. I will do a commit with the override solution soon, just need to check some of my assumptions first. From what I can see a proper TLS implementation in DCE is a large project, so workarounds is more pragmatic, at-least in the short run.

Sikabo added 3 commits January 7, 2026 21:10
Introduce dce-iostream-simple.h to provide iostream functionality
without relying on the library version of iostream.
This is a workaround for a segfault that occurs when pthreads and
iostream are used together.
The root cause of the issue is unknown. It could be related to TLS
(thread local storage).

Signed-off-by: Anders Martinsson <[email protected]>
Add option "--gdb" to re-run tests that crash with GDB. This improve
ergonomics when troubleshooting tests that crash.

Signed-off-by: Anders Martinsson <[email protected]>
Object::GetInstanceTypeId() cannot be overridden in modern NS-3, making
TypeId comparison always fail. Type safety ensured by dynamic_cast.

Signed-off-by: Anders Martinsson <[email protected]>
@Sikabo
Copy link
Author

Sikabo commented Jan 9, 2026

I failed to find the root cause of the iostream/pthread problems for "dce-tcp-simple". My guess is that it's TLS related but sometimes that is the goto blame for anything that is hard to understand.

I managed to fix the failed test in "dce-process-manager".

Would be nice if you could review, improve and merge these changes. From my point of view, the pragmatic way is to get DCE running good enough with a modern base and then start solving issues, one after another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants