Skip to content

flame: fixed perf->flamegraph to handle terminated processes #5992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

npatel-jump
Copy link
Contributor

No description provided.

@@ -58,7 +71,7 @@ flame_cmd_fn( args_t * args,
install_parent_signals();

fd_topo_join_workspaces( &config->topo, FD_SHMEM_JOIN_MODE_READ_ONLY );
fd_topo_fill( &config->topo );
fd_topo_fill_resilient( &config->topo );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? The workspaces should still be valid?

if( FD_UNLIKELY( -1==kill( (int)pid, 0 ) ) ) {
if( FD_UNLIKELY( errno==ESRCH ) ) FD_LOG_ERR(( "tile %s:%lu is not running", config->topo.tiles[ i ].name, config->topo.tiles[ i ].kind_id ));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like changing this to WARNING should be pretty much the only change needed? Why all the other pid / tid bounds checks + resilient thing etc?

Copy link
Contributor Author

@npatel-jump npatel-jump Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fd_topo_workspace_fill does FD_TEST's itself that fail (fseq checks).

For PID checks, I was getting many PID/TID's = 0, and a tid=438430872235253 for writer:1, and filtered around them so perf wouln't err due to a failed attachment.
I also think you previously had a check for it, so I included a similar filter to just continue on without them.
FD_TEST( pid<=INT_MAX );

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What FD_TEST would fail? I don't think that seems right

Copy link
Contributor Author

@npatel-jump npatel-jump Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's failing on:
main src/disco/topo/fd_topo.c(178): tile net in_link_fseq[0] = (nil) main src/disco/topo/fd_topo.c(179): FAIL: tile->in_link_fseq[ j ]

I'll take a deeper look into why it might be failing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you providing the config file to the flamegraph tool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I just realized I was using the wrong one. That makes a lot more sense. Thank you, ill make the changes so it just fixes the warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants