Skip to content

Conversation

@daniel-levin
Copy link
Contributor

@daniel-levin daniel-levin commented Aug 21, 2025

This change fixes a few things in one go. I wrote up a blog post on my adventures with this crate: https://roci.co.za/posts/hunting-bugs-in-perf-event2

  1. The parameters set by implementations of Event were not being respected. This is now fixed. More importantly, the events in this crate now set sensible defaults. For instance, static kernel tracepoints are intrinsically kernel-level events, so they are now automatically marked as such. This fixes the bug where poll hung indefinitely when asking for a tracepoint. Tracepoints now work out of the box. If anyone needs to explicitly set these parameters back to the non-sensible defaults, they still can.
  2. I added a tracepoint example and removed the comment saying that they do not work.
  3. I removed some code that thiserror could generate, and took a dependency on it.
  4. I introduced a new, painfully explicit, lifetime-free enum into the public API that makes it easier to ask for what you want. It has the knock-on effect of obviating the need for Builder to carry a lifetime. This works because on Linux you can liberally and infallibly clone file descriptors pointing at cgroups.
  5. I tightened the lifetime bounds on what can go into EventData, because non-static lifetimes cannot reasonably be justified.

This change simplifies Builder's internals and makes it harder to
misuse.
It also makes tracepoint support explicit, and an example of working
tracepoints is supplied.
Finally, I took a dependency on this error to generate some previously hand-written
code.
),
};

debug_assert!(!(pid == -1 && cpu == -1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, it was too easy to accidentally set this combination, and there was nothing of diagnostic usefulness in the kernel's error message.

event_data: data,
};

builder.enabled(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events that actually need these values now set them. So, existing code continues to work!

}

/// Which CPU and PID to target.
pub fn targeting(&mut self, cpu_pid: CpuPid) -> &mut Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's impossible to pass invalid options to this method. So, you no longer need to fiddle with combinations of other cpu and pid methods. Just ask for what you want.

Copy link
Owner

@Phantomical Phantomical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'm not going to have time to give it a proper review until next week (as I'll be out camping) but I've given the PR (and your blog post!) a skim and I'm gonna drop some initial comments here.

So for some initial remarks:

  • You've definitely run into a papercut/footgun with the tracepoint. That's definitely somthing that should be fixed. I'm somewhat inclined to update the docs here as opposed to implicitly setting options that no other event type sets. I will admit that your experience is a strong point in the other direction. I'll think about this more.
  • I really do like the new builder method. It is a good addition.
  • It appears that I've missed some methods on the various error types. I'll get that fixed.

Now onto issues I see:

  • Adding 'static to the EventData trait is a breaking change. I'm not entirely sure I want to do that just yet.
  • I would rather the builder remain cloneable as opposed to removing the lifetime.
  • The same applies to changing the defaults of the builder. I don't particularly like the defaults but changing them like this is a dangerous breaking change because there's no outward sign that users of this library would have to change their code to work with it.
  • Tracepoint should at a minimum document what other things it is changing on the builder.
  • I don't think it's worth it to pull in thiserror for this. I have tried quite hard to keep the dependency count down for perf-event2 and syn is a rather heavy dependency for a crate like this.

@daniel-levin
Copy link
Contributor Author

daniel-levin commented Aug 22, 2025

The same applies to changing the defaults of the builder. I don't particularly like the defaults but changing them like this is a dangerous breaking change because there's no outward sign that users of this library would have to change their code to work with it.

In that case let me suggest a major version bump. Hard to use APIs can be deprecated. Breakages can be clearly advertised. That makes none of these breaking changes dangerous. I have deliberately introduced breaking changes because I consider the existing behavior to be buggy. Yes, you can figure out the right combination of flags by reading perf_event.h and strace-ing perf but that's quite a suboptimal experience. To wit, this code doesn't work:

let mut sampler = Builder::new(tp)
    .sample_period(1)
    .build()?
    .sampled(8192)?;

Instead, you have to do this:

let mut sampler = Builder::new(tp)
    .any_pid()
    .one_cpu(1)
    .include_kernel()
    .sample(SampleFlag::RAW)
    .sample_period(1)
    .build()?
    .sampled(8192)?;

Do you want to make these APIs more user friendly in a way that is clearly advertised as a breaking change? There are only three dependents on crates.io.

I don't think it's worth it to pull in thiserror for this. I have tried quite hard to keep the dependency count down for perf-event2 and syn is a rather heavy dependency for a crate like this.

I would ask that you reconsider. syn is only heavy when all its features are turned on. thiserror uses a fairly small subset.

@daniel-levin
Copy link
Contributor Author

daniel-levin commented Aug 22, 2025

I would rather the builder remain cloneable as opposed to removing the lifetime.

I've updated the code so that Builder is still cloneable. In fact, it's now Clone + Send + Sync + 'static, which is very useful!

@daniel-levin
Copy link
Contributor Author

I've bumped the version number because I think these breaking changes are useful enough to justify. Nobody depending on 0.7* will break.

@daniel-levin
Copy link
Contributor Author

Tracepoint should at a minimum document what other things it is changing on the builder.

Added better docs and some examples.

@daniel-levin
Copy link
Contributor Author

daniel-levin commented Aug 22, 2025

The same applies to changing the defaults of the builder. I don't particularly like the defaults but changing them like this is a dangerous breaking change because there's no outward sign that users of this library would have to change their code to work with it.

Ah, just to clarify, I also changed the implementations of the other Events so that existing code will not break. In fact, if you call Builder::new with the other event types the resultant builder is the same as it was before this change. The only things other than Tracepoints that may break are implementations of Event outside of this crate.

/// This is automatically implemented for any type which is both `Send` and
/// `Sync`.
pub trait EventData: Send + Sync {}
pub trait EventData: Send + Sync + 'static {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I consider this a sensible breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants