-
Notifications
You must be signed in to change notification settings - Fork 47
Update ns-3-dce for compatibility with modern glibc #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Anders Martinsson <[email protected]>
Modern glibc have dropped their inline function mapping, so they use the "real" name as their symbol now. This adapts the dce code to this. The old int ver is still passed as a dummy parameter via the c++ class layer for Fxstat/Fxstat64 but that can be dropped later. Signed-off-by: Anders Martinsson <[email protected]>
This tool expects all libraries to be linked via --no-as-needed but it doesn't work as we want it to for some reason, so workaround it via some dlopen to ensure the lib is loaded. Signed-off-by: Anders Martinsson <[email protected]>
libio have bin merged into glibc, and _IO_* functions are now gone. Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
|
This branch should build on Debian Trixie with latest NS-3 release. One patch to NS-3 is needed. Sikabo/ns-3-dev-git@e7d2a51 I have not looked into needed DCE changes to get rid of the NS-3 patch. |
|
Regarding the ns-3 patch, not needed. This was changed recently upstream because there were issues with it. Basically needs to remove all instances of and Then they should be resolved properly. |
Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
From NS-3 change log: * (core) Deprecated "EventId::IsRunning()". It has been replaced with "EventId::IsPending()". Signed-off-by: Anders Martinsson <[email protected]>
Signed-off-by: Anders Martinsson <[email protected]>
For more information see: https://gitlab.com/nsnam/ns-3-dev/-/merge_requests/2287 Signed-off-by: Anders Martinsson <[email protected]>
66b1cf7 to
d1c0cdc
Compare
|
Thanks @Gabrielcarvfer This pull request should build with the latest NS3 release (NS-3.46) on Debian Trixie. |
Hello, thanks for this PR. We will try to find time to review and test. How much have you tested DCE (test suites, a real workload), vs. making the build succeed? The reason that I ask is that the previous effort to modernize DCE succeeded to make the build pass, and it seemed to work with some existing tests, but later when I tried to use it on a project, it turned out to be unstable. If you did successfully test, can you sketch out what you did (system tested on, how you ran tests)? |
I still use the patched NS-3.46 that allow me to override GetInstanceTypeId. I have not had time to look at the runtime error I get without it. I think it's a usage error from my side, but I haven't had time to look yet so I'm not ruling anything out.
I use DCE a lot, mainly for a large multithreaded C program but also a Rust project. Mainly for testing UDP communication but also TCP. I have been running DCE for a long time and this is as stable as when I started using DCE (when DCE ran on latest version of Debian).
I will try to produce examples that's suitable to share. |
|
I have prepared branch with a working example. https://github.com/Sikabo/ns-3-dce/tree/dce_example This example require that object.h in NS-3 is patched. NS-3-DCE need a rebuilt after the patch is applied (clean then build). Run with command below and look at output in This new branch is probably a better starting-point for trying to get things working without a patched version of NS-3 than the branch in this pull-request. Is this enough or do you want more details about how I get this running? |
|
I have updated https://github.com/Sikabo/ns-3-dce/tree/dce_example with two new commits. So now it should work without modifying |
This is to fix direct-code-execution#57 There are a couple of new TODO:s that would be good to correct. Signed-off-by: Anders Martinsson <[email protected]>
For more information see: https://gitlab.com/nsnam/ns-3-dev/-/issues/1249 Signed-off-by: Anders Martinsson <[email protected]>
|
I have updated the branch https://github.com/Sikabo/ns-3-dce/commits/sikabo_master to include the last parts that was needed to get things working without modifying NS-3. |
|
@tomhenderson: This works for me on Debian Trixie. Feel free to ask if there are any questions. |
|
Thanks for the code. I will try to write a GitHub actions to replace CircleCI. Just need few more days (very busy end of year). |
|
Ok, gave it a go and found multiple examples/tests failing. Some encoding issues in thread management. Path issues. Not sure if this MR fault, or misconfiguration somewhere. dce-tcp-simple |
|
I think it might be related to C++ iostream. I got dce-tcp-simple working by making it more C-like (replacing std::cout with printf and some minor changes in thread-handling that might not be needed). We only use DCE for C and RUST code, so C++ is a blind spot for our usage. Should be possible to check by writing a program that do as little as possible but still crash. |
|
I finally had some time to try this branch, and was able to reproduce the reported behaviors. Regarding these failures:
dce-ping and dce-iperf likely fail because the binaries that they rely on (not the ns-3 ping or iperf, but the actual programs) are missing. Bake would normally build them (and we need to see if they could still successfully link to newer glibc) but the above build script just focuses on building dce itself. dec-tcp-simple exhibits failures that I observed also in using Parth's 2021 GSOC code (which as I mentioned in other issues, introduced some instability). Here is a backtrace of dce-tcp-simple: I was experiencing similar failures with DCE-1.12 and the modified glibc a couple of years ago; something is unstable with thread handling. I will push a docker image allowing this to be reproduced, and I'll also work on a bake branch to replace the build script so we can look into iperf and ping binary support. I also will check whether the above problems with threading can be isolated to any of Parth's commits. |
|
For reference, Debian trixie glibc version is 2.41. |
|
Wait, but I had iperf and ping installed via apt. I need a different glibc regardless? Regarding c++ encoding, I will take a look. Maybe there is something missing in setup. |
|
DCE apps need to be built as position independent code. See how iperf binary has been built by bake (for DCE) in the past: https://gitlab.com/nsnam/bake/-/blob/master/bakeconf.xml?ref_type=heads#L168 Then iperf needs to be loaded multiple times by DCE (for each simulation instance). glibc put some security features in that blocked how DCE was doing this (for stack-smashing protection, I believe), which led us to look at building a custom glibc for DCE that was the focus on Parth's project (which he got to work on glibc-2.31 at the time). The custom glibc undid those security features. |
|
Hmm, maybe we should use musl libc instead then? It has less hardening features. And since you require recompiling from scratch anyways, sounds like a decent idea. |
|
I appreciate that you are looking into this, it helps a lot. In test: dce-process-manager, it's "only" test-local-socket that fail when I run the tests. I have not looked into why. I have a patch that will solve the problem with "dce-tcp-simple" by overriding std::cout and std::cerr. I still don't know the entire problem, but from my understanding it's related to "thread-local storage" (TLS). There are quite a lot of problems/potential problems related to TLS. I will do a commit with the override solution soon, just need to check some of my assumptions first. From what I can see a proper TLS implementation in DCE is a large project, so workarounds is more pragmatic, at-least in the short run. |
Introduce dce-iostream-simple.h to provide iostream functionality without relying on the library version of iostream. This is a workaround for a segfault that occurs when pthreads and iostream are used together. The root cause of the issue is unknown. It could be related to TLS (thread local storage). Signed-off-by: Anders Martinsson <[email protected]>
Add option "--gdb" to re-run tests that crash with GDB. This improve ergonomics when troubleshooting tests that crash. Signed-off-by: Anders Martinsson <[email protected]>
Object::GetInstanceTypeId() cannot be overridden in modern NS-3, making TypeId comparison always fail. Type safety ensured by dynamic_cast. Signed-off-by: Anders Martinsson <[email protected]>
|
I failed to find the root cause of the iostream/pthread problems for "dce-tcp-simple". My guess is that it's TLS related but sometimes that is the goto blame for anything that is hard to understand. I managed to fix the failed test in "dce-process-manager". Would be nice if you could review, improve and merge these changes. From my point of view, the pragmatic way is to get DCE running good enough with a modern base and then start solving issues, one after another. |
Update to build and run on modern Linux distributions with current toolchains. For example Debian Trixie
Tested with: NS-3.35 (and NS-3.45 but probably simpler to start with 3.35).