Skip to content

Comments

log: add log_location parameter to allow log redirection to file#11858

Open
philippfriese wants to merge 1 commit intoofiwg:mainfrom
philippfriese:core_log_location
Open

log: add log_location parameter to allow log redirection to file#11858
philippfriese wants to merge 1 commit intoofiwg:mainfrom
philippfriese:core_log_location

Conversation

@philippfriese
Copy link
Contributor

This PR adds the parameter log_location / the environment variabel FI_LOG_LOCATION to the standard logging provider in to allow control of the output location of the libfabric log. Possible options are: stderr (default), stdout, as well as a path to a file or to a directory. If a directory is provided, a log file with format ofi_<pid>.log is used. Log files are created with the file mode specified via log_location_mode / FI_LOG_LOCATION_MODE, defaulting to 0600, and must not exist prior to launching libfabric[1].

The use-case of this parameter concerns multi-process application cases, such as MPI applications. By default, libfabric writes to stderr, which is usually unbuffered. Even when stderr of an MPI application is redirected to per-process files, for example using Open MPIs --output-filename option, then the libfabric log still contains interleaved lines, resulting in one MPI process' log containing a log line from another MPI process.
When modifying stderr to be line-buffered, then libfabric log entries are no longer interleaved, but may be out-of-order, which may render certain provider outputs impossible to interpret correctly. An example is the output of the ofi_hook_profile provider, which relies on strict line ordering for its output sematic.

While libfabric allows the logging provider to be extended and overwritten programmatically, this approach is not feasible for applications using another communication library on-top of libfabric, such as MPI applications.
The parameter introduced in this PR addresses this.

[1] This is to prevent libfabric from overwriting/modifying files via an accidentally or intentionally mis-crafted log location path.

Usage Example

$ FI_LOG_LEVEL=debug FI_LOG_LOCATION="./" fi_info -l
ofi_rxm:
    version: 201.0
ofi_rxd:
[...]
$ ls ./
ofi_545157.log
$ cat ofi_545157.log
libfabric:545157:1770045780::core:core:fi_param_get_():372<info> variable perf_cntr=<not set>
libfabric:545157:1770045780::core:core:fi_param_get_():372<info> variable hook=<not set>
[...]

goto error_log_location;
}

log_location = fopen(path_buffer, "w");

Check failure

Code scanning / CodeQL

Time-of-check time-of-use filesystem race condition High

The
filename
being operated upon was previously
checked
, but the underlying file may have been changed since then.
log_location = fopen(path_buffer, "w");
if (log_location == NULL)
goto error_log_location;
if (chmod(path_buffer, mode) == -1)

Check failure

Code scanning / CodeQL

Time-of-check time-of-use filesystem race condition High

The
filename
being operated upon was previously
checked
, but the underlying file may have been changed since then.
@shefty
Copy link
Member

shefty commented Feb 4, 2026

The line indentation is off -- using spaces instead of tabs. A library should never print without explicitly being enabled to do so. For all the library knows, stderr could have been closed and redirected to an application file. Use only logging functions to print errors, not direct calls to fprintf.

@philippfriese
Copy link
Contributor Author

Thank you for the feedback @shefty. I have pushed an updated commit which moves the fprintf calls to the OFI logging macros.
I also moved the handling of log_location{_mode} to after the handling of the other log_* settings in fi_log_init and added a check for log_location != NULL to ofi_log_enabled. Without this check, a small amount of logging calls, emitted by the fi_param_get/define functions in fi_log_init after setting log_mask and before calling fi_log_location_init, would be logged to stderr even if FI_LOG_LOCATION has been set to a valid, non-stderr target, breaking the semantics of that setting.
As a consequence, these logging calls do not get emitted at all. This matches behaviour of the calls to fi_param* before setting log_mask, which also do not emit logs, even prior to this PR.

As for the line indentation: I adjusted some indentations but overall could not find places where my changes use spaces instead of tabs. Is this an issue on my end or did I miss spots where I did in fact use spaces for indentation?

src/log.c Outdated
char path_buffer[PATH_MAX];
int len;

if (strncmp(locationstr, "stderr", 6) == 0 || *locationstr == '\0') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use strcmp instead. Reason: (1) for exact match; (2) the second string is constant with known length so the comparison will stop after 6 characters any way. Same for L126.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Changed in the updated commit.

strncpy(path_buffer, locationstr, PATH_MAX-1);
if (stat(dirname(path_buffer), &st) == -1)
goto error_log_location;
strncpy(path_buffer, locationstr, PATH_MAX-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L148 is duplicate of L145.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in fact intentional: dirname may modify the input string, so the second strncpy fetches a fresh version. I have added a comment in the updated commit to document this intent.

Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants