Skip to content

Conversation

@wkliao
Copy link
Collaborator

@wkliao wkliao commented Aug 14, 2025

Feature request:
To add pthread ID to DXT to enable exploring the correlation between low-level
Darshan I/O records and tasks in task-based workflows.

@wkliao wkliao added Work in Progress Still under development. When done this label can be removed. and removed pydarshan labels Aug 14, 2025
@wkliao
Copy link
Collaborator Author

wkliao commented Aug 14, 2025

Hi, @orcunyildiz

One of the pydarshan CI tasks failed. The error messages mentioned
UnicodeDecodeError: 'utf-8' codec

https://github.com/darshan-hpc/darshan/actions/runs/16977498309/job/48130008025

Could you please take a look?

@orcunyildiz
Copy link
Collaborator

The reason that some CI tests are failing because those tests are using old DXT records that do not have the extra_info field (hence, the bogus values). Besides deciding on the format of the extra_info field, we would need to bump up the DXT_POSIX_VER and DXT_MPIIO_VER, although these tests will still fail because of the above reason---Darshan DXT logs in the tests being old.

@carns
Copy link
Contributor

carns commented Sep 26, 2025

I'll suggest a little bit different data layout for discussion. I think we can modify the DXT records to have two extra integer fields:

  • the thread id (always populated, assuming we do a sanity check of pthread_self() performance to make sure it isn't onerous to collect at all times)
  • a trailing data size for optional annotations (defaults to zero)

If the trailing data size is non-zero, then it indicates that the record also has a variable-length string appended to it (containing data that the user provided via environment variable). We've used variable length records in some other contexts in Darshan. The rationale is that it a) it will keep the record format compact for the common case (no user-provided annotations) and b) when the annotations are provided, they can be relatively large rather than bounded by a fixed-size string)

@carns
Copy link
Contributor

carns commented Sep 30, 2025

Here is some data from running the node-microbench benchmark from https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/node-microbench.c on Aurora to assess how much overhead there might be from adding pthread_self() and getenv() calls to the Darshan DXT wrappers. There are two graphs; one with gcc and one with Intel's OneAPI icc compiler:

aurora-node-microbench-gcc-m500 dat aurora-node-microbench-icc-m500 dat

There are a couple of interesting things here:

  • pthread_self() is indeed very cheap; it would be totally fine to automatically collect the thread id for every DXT operation. Interestingly, the gettid() call is far more expensive even though notionally it does the same thing. We've made the right choice here.
  • getenv() has some confusing results. It is really really cheap with icc; I suspect some optimization here because that function is provided by the system libc, I would think, so it shouldn't be faster or slower with either compiler. with gcc (the recommended compiler for Darshan) it is quite expensive. It is more expensive when attempting to retrieve the value of a variable that has not been set, which suggests that libc is performing a linear search of the environment. I'm a little worried about calling this in every DXT wrapper since it is considerably more expensive than the locks and timers we are using.

We need to figure out what to do about this. This PR is using environment variables to pass additional annotations to the tracer, which is something we may have multiple use cases for. We don't necessarily have to use environment variables to do this, though; that was just a convenient idea for passing information into the wrappers that is runtime/language agnostic without API modifications.

Other alternatives we could/should consider? Maybe pthread thread local storage? Anything else that might work? Ideally it would be something with low overhead when annotations are not used.

Tagging @GueroudjiAmal and @infispiel (for some context, this is some testing to check the performance of function calls we would use in the merged version of this to capture thread information and user-provided annotations).

@carns
Copy link
Contributor

carns commented Sep 30, 2025

Just noting for completion, in a developer meeting this week we talked about the potential to save space used by the annotation strings in the log by using a "symbol lookup" approach, where the log record just stores an integer id that refers to a string stored elsewhere, which would deduplicate strings automatically for cases where the same annotation is used repeatedly. Darshan does this already for file names, but we could generalize the approach. TBD if we do this up front or as a later optimization after we have the basic functionality in place.

@roblatham00
Copy link
Contributor

'fn_call_cross_object' is right there suggesting we should instead implement something like darshan_dxt_annotate(char * data). We so far have avoided exposing darshan that way to callers, and it means darshan has to always be available... or we provide stubs (inline routines in the header?) that no-op until libdarshan overrides them... definitely a few steps more complicated than getenv

@carns
Copy link
Contributor

carns commented Sep 30, 2025

'fn_call_cross_object' is right there suggesting we should instead implement something like darshan_dxt_annotate(char * data). We so far have avoided exposing darshan that way to callers, and it means darshan has to always be available... or we provide stubs (inline routines in the header?) that no-op until libdarshan overrides them... definitely a few steps more complicated than getenv

Part of the complication too would be providing bindings for Python etc. as well.

@carns
Copy link
Contributor

carns commented Oct 2, 2025

Another idea might be to have a control env variable that's checked at startup (e.g. DARSHAN_ENABLE_DXT_ANNOTATIONS=1, sort of like how we have DARSHAN_ENABLE_NONMPI=1) to activate if Darshan's DXT module will check the annotation environment variable or not at runtime. People who want to use annotations will need to remember to set that to get the feature, but if you don't then there won't be risk of getenv() performance overhead on tracing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Work in Progress Still under development. When done this label can be removed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants