Skip to content

Conversation

@hokein
Copy link
Collaborator

@hokein hokein commented Feb 3, 2025

This WIP PR explores the idea of introducing [[clang::lifetimebound_if(<bool expression>)]], a built-in attribute that conditionally applies [[clang::lifetimebound]] based on the given boolean expression.

One of the key challenges in adopting [[clang::lifetimebound]] and [[clang::lifetime_capture_by]] is that these attributes should only apply to pointer-like types. Currently, we handle this by introducing multiple overloads, which increases code complexity and compilation overhead.

This new attribute aims to simplify the implementation by enabling conditional application, reducing the need for overloads.

cc @usx95 @Xazax-hun @ilya-biryukov @higher-performance @kinu

@higher-performance
Copy link
Contributor

I think we should heavily consider the wider solution space before going with this proposal. The problem we will quickly hit here is that we need to be sure that whatever we want can actually be expressed as a condition in the cases we want. That's not currently the case, which is going to pose problems.

For example, we'll want to be able to handle this case (let's call this the type-erased case):

std::map<std::string_view, std::string_view> m;
auto& value = m[std::string()];  // Gets caught
auto&& view = std::string_view(std::string());  // Gets caught
auto& value = m[std::string_view(std::string())];  // Doesn't get caught

As well as this case (call this the variadic case):

template<class... Args>
reference emplace_back(Args&&... args);

(I feel there are more interesting cases here -- these are just the two off the top of my head.)

To handle such cases we'd need to provide a way to get lifetimes as well, not merely set the attributes. Which means for lifetimebound we'll need something like [[lifetimebound_if(__builtin_is_lifetimebound(arg))]] as well. For lifetime_capture_by, I'm not clear right now if we'd ever need to detect it, but if we do then that would be a case we'd need to handle as well.

Given the above, the solution I would suggest considering here is the following:

What we ultimately want is to be able to specify things in terms of expressions, I think.

That suggests we want should consider syntax like this:

template<class... Args>
reference emplace_back(Args&&... args [[clang::lifetimebound(__builtin_is_lifetimebound(*new T(std::forward<Args>(args)...)))]]);

I think we should consider something along the lines of the above.

An interesting observation here, though, is that we can avoid code duplication even more if we just make the inference based on existing noexcept conditions:

template<class... Args>
reference emplace_back(Args&&... args) noexcept(noexcept(*new T(std::forward<Args>(args)...)));

The main problem I see here is that the noexcept condition might not actually match the lifetime condition, because, for example, the specification might have failed to specify it that way.

There's a hack we can use to handle such cases:

template<class... Args>
reference emplace_back(Args&&... args) noexcept(false && noexcept(*new T(std::forward<Args>(args)...)));

The compiler would then detect the ANDing with false, and view it as a signal that the other noexcept expressions should be utilized as an "alternate body" for the function for code diagnostic/analysis purposes. A secondary benefit of this is that we also avoid code duplication with functions that have noexcept.

The proposal here would need some baking (e.g., perhaps it should be noexcept(false && [[clang::lifetimebound]] noexcept(...))? I don't know), but I think we should consider alternate directions like this, because they're the only ways I see to (a) handle the other cases we care about, while (b) avoiding a ton of duplicated code.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could share some code snippets how exactly this would be used in practice.

Alternatively, I wonder if we should have a more general proposal here, something like:
[[conditional(lifetimebound, cond)]]. A general way to conditionally apply attributes. This way one could do this with any attribute not just lifetimebound.

return true;
if (const auto *AI = D->getAttr<LifetimeBoundIfAttr>()) {
bool Result;
// FIXME: we pay the cost of evaluating the binary condition everytime we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cache the evaluation or would it make sense to replace LifetimeBoundIfAttr with LifetimeBoundAttr once it evaluated to true so clients who already check for LifetimeBoundAttr does not need to be updated? Admittedly this would be a departure from what Clang AST is today, that is a really close representation of what the user wrote.

@ilya-biryukov
Copy link
Contributor

In addition to the concerns raised above, I also wanted to highlight that as soon as we have the ability to inspect attributes (and I agree with @higher-performance that we want it to make this useful for cases like emplace_back) at compile time, we get into a slippery slope of allowing template instantiations based on that, i.e. leaking attributes into ABI, which seems to go against the design principles of those attributes.

And since lifetimebound analysis is best-effort and does not provide actual safety guarantees (has both false positives and false negatives), it'd be nice to ensure the design does not allow it to slip past the lifetimebound attributes themselves.

Summing up, I would actually propose to have a slightly weird version of the "getters" that would only be usable in those lifetimebound attrs themselves (combining suggestion from @higher-performance and this PR):

// Terribly name for the attribute, could be better.
template<class... Args>
reference emplace_back(Args&&... args [[clang::lifetimebound_if_lifetimebound(*new T(std::forward<Args>(args)...)))]]);

if we want more conditions, we could additionally allow some conditional expressions:

// Slightly confusing syntax, could definitely be better.
template<class... Args>
reference emplace_back(Args&&... args [[clang::lifetimebound_if_lifetimebound_and(*new T(std::forward<Args>(args)...)), std::is_reference_type_v(args))]]);

More generally, my call is to figure out how to not leak the results of this analysis to the outside world to avoid the Hyrum's law effect where we cannot change it.

I am also unsure about the granularity here. For emplace_back, what we probably want is to have lifetimebound only on parameters that are marked as lifetimebound in the constructor. However, in the proposed approach we end up marking additional parameters too. Would that result in more false positives?

See https://gcc.godbolt.org/z/boPeheMsK

#include <memory>

struct X {
  X(const int &a, [[clang::lifetimebound]] const int&b);
};

void foo() {
  int a = 1;
  X x(a, 2); // warning.
  X y(1, a); // no warning.

  // make_unique would have both warnings, right?
  // similarly, for emplace_back.
  auto px = std::make_unique<X>(a, 2);
  auto py = std::make_unique<X>(1, a);
}

Is there a way to avoid it?

@kinu
Copy link
Contributor

kinu commented Feb 4, 2025

Would the alternative that is discussed here mean we want to forward some attributes only when they are applicable, something like clang::forwad_lifetimebound?

Regardless, I think it'd be also good to agree on the use cases we want to support in a concrete code snippet (that should also answer to @Xazax-hun 's question). There could also be a question of whether we really should try to support variadic templates in the current lifetime-bound too, because we're aware that the current semantic itself has a limitation.

@ilya-biryukov
Copy link
Contributor

mean we want to forward some attributes only when they are applicable

yes, something like that. And it gets a little tricky, because it's not that we want to forward only attributes from one decl to another, it's many decls (forwarding function itself and its parameters) to many decls (function we're forwarding too and its parameters). If we start talking about lifetime_capture_by(), things get even more tricky because the attribute itself is parametrized by the the "target" and we should somehow be able to connect that "target" to something else in the context we are forwarding to.

@higher-performance
Copy link
Contributor

w.r.t. ABI, I think the end state here would conceptually be most likely be similar to __attribute__((diagnose_if(...))). Does that affect ABI?

w.r.t. False positives due to annotating the entire pack -- that's a great point. I think it's avoided by the noexcept approach. Any thoughts on that?

@ilya-biryukov
Copy link
Contributor

ilya-biryukov commented Feb 4, 2025

w.r.t. ABI, I think the end state here would conceptually be most likely be similar to attribute((diagnose_if(...))). Does that affect ABI?

I believe it does not as it only produces warnings. However, having an way to query the presence of attributes in the code would allow for things like overloading or template specializations based on lifetimes. Passing information to compiler from code seems fine, it's passing the information from compiler back to the code that gives me a pause:

template <bool IsLifetimebound>
struct Foo {
  static void call() {
    if (IsLifetimebound)
      std::cerr << "I am being called from a function that passes temporaries\n";
  } 

  using type = std::conditional_t<IsLifetimebound, int, double>;
};

template <class T>
struct SomeClass {

  void some_method(const T& a [[clang::lifetimebound]]) {
    // Results of the program below in Clang and GCC are different.
    // Not sure if C++ standard is okay with this.
    using MyFoo = Foo<__is_lifetimebound(a)>;
    MyFoo::call();
    std::vector<MyFoo::type> vec;
    // ...
  } 
};

There are other ways to achieve that and there are attributes that may change behavior of the program and its meaning, so maybe some folks feel it is acceptable. I feel this is probably a bad idea and we should stick to the principle that attributes like lifetimebound can be safety removed without changing runtime semantics of the program.
I would be interested in what others have to say about this.

w.r.t. False positives due to annotating the entire pack -- that's a great point. I think it's avoided by the noexcept approach. Any thoughts on that?

Noexcept and lifetimebound are quite orthogonal, so I am a little unsure about the particular proposal as is. However, I am also for avoiding duplication if possible. And given that the template function bodies have to be available anyway, we could maybe even utilize them (for function that don't have noexcept). I am thinking about some combination of two attributes (this gets complex really quickly, as you can see, looking for better ideas):

// Can be taken from signature or body, e.g. from noexcept.
template<class ...Args>
void emplace_back(Args &&...args) [[clang::infered_fwd_lifetimebound_from_call]] noexcept(noexcept([[clang::infer_lifetimebound_from_here]] new T(std::forward(args)...))) {}

// Or from the body itself when there isn't noexcept
template<class ...Args>
void emplace_back(Args &&...args) [[clang::infered_lifetimebound_from_call]] {
  [[clang::infer_lifetimebound_from_here]] my_vector.push_back(T(std::forward(args)...)));
}

// Presumably we could also have some default inferences rules, e.g.
// - we infer lifetimebound from a single call either in noexcept or inside the body mentioning that argument.
// - if there's no such call or more than one mention of an argument, one has to mark the call with a special attribute.
// - std::forward/static_cast<T&&> is ignored somehow.

@Xazax-hun
Copy link
Collaborator

to forward some attributes only when they are applicable

One question is whether the reflection proposal would address something like this. If it does, it might make more sense to invest in that than a custom solution.

@higher-performance
Copy link
Contributor

higher-performance commented Feb 4, 2025

There are other ways to achieve that and there are attributes that may change behavior of the program and its meaning, so maybe some folks feel it is acceptable. I feel this is probably a bad idea and we should stick to the principle that attributes like lifetimebound can be safety removed without changing runtime semantics of the program.

Ahh I see what you mean. Yeah, as you mentioned I think that cat is already out of the bag with attributes like [[no_unique_address]]/[[clang::using_if_exists]]/[[clang::trivial_abi]]/[[clang::enable_if(...)]]/... so I'm not as worried about it. In fact I think even [[clang::diagnose_if(...)]] can cause template instantiations that change program behavior.

That said, we can avoid most of this problem quite easily (...in principle; perhaps not simple in implementation) I think: by preventing the query built-in from being usable anywhere outside of lifetime analysis.

noexcept and lifetimebound are quite orthogonal, so I am a little unsure about the particular proposal as is.

Their meanings are, but their specifications seem very similar: in both cases, they are trying to say, "when diagnosing {this characteristic} of this function, do it assuming the function is equivalent to {this code}". That said -- I do agree I'm not 100% on board with using noexcept for this purpose either. My reason is slightly different though: the moment we have another analysis to add to the picture (something other than lifetimebound), we might run into a case where we want one part of the spec for one analysis, and another for the other... and I can't think of a decent way to do that. At the same time, I'm not sure how likely that scenario is?

However, I am also for avoiding duplication if possible. And given that the template function bodies have to be available anyway, we could maybe even utilize them (for function that don't have noexcept).. I am thinking about some combination of two attributes (this gets complex really quickly, as you can see, looking for better ideas):

If we could utilize the actual body, that would be perfect. However... would it be feasible to make this work for emplace_back? Because emplace_back returns reference, not void. And even the simplest of wrappers

template<class... Args>
reference emplace_back(Args&&... args) {
  my_vector.push_back(T(std::forward<Args>()...));
  return my_vector.back();
}

would seem to make it pretty difficult for the compiler to analyze what the lifetime relationship to the return value is.


Here's another idea though, combining all of the suggestions above (including @kinu's suggestion): if we forego trying to avoid code duplication, this seems like it could work:

template<class... Args>
reference emplace_back(Args&&... args)
  [[clang::lifetimebound_like(T(std::forward<Args>(args)...))]];

The semantics here would be that the call to the function has the same lifetime characteristics as

decltype(auto) result = T(...);

or, if T(...) evaluates to void,

T(...);

Thoughts?

@ilya-biryukov
Copy link
Contributor

One question is whether the reflection proposal would address something like this. If it does, it might make more sense to invest in that than a custom solution.

That seems like a great long-term direction, but the amount of investment between the two is also vastly different. I think we can either have something like an attribute now, or wait quite a long time for the reflection proposal to flesh out and be implemented.

I think the reality is that this particular PR is only exploring a solution we can get relatively quickly and with relatively little effort. (Which also puts a burden on it to not cause too much maintenance cost in the long term, obviously, we don't want a quick hack that doesn't really scale or is too expensive in the long run).

...cat is already out of the bag...

Definitely agree, it's quite a nuanced topic.

I think even [[clang::diagnose_if(...)]] can cause template instantiations that change program behavior.

Unrelated, but I'd be curious to see those if you have any examples. I thought that maybe SFINAE could cause this, but at least in simple examples diagnose_if does not affect overloading.

My reason is slightly different though: the moment we have another analysis to add to the picture (something other than lifetimebound), we might run into a case where we want one part of the spec for one analysis, and another for the other... and I can't think of a decent way to do that. At the same time, I'm not sure how likely that scenario is?

I definitely have the same concerns, every single analysis makes things more and more chatty and I already feel we're close to a tipping point where annotating things becomes too hard. FWIW, it would be great to get something that "magically" works with a single attribute and does not need complicated compile-time computations.
If we cannot get that, the approach with clang::lifetimebound_like seems like the second-best alternative. It's simple and show allow modelling functions forwarding to constructors in STL: make_unique, emplace_back, etc.
Seems like a huge win, even if it's not modelling all the nuances that more sophisticated lifetime annotations could achieve.

I believe @hokein was about to prepare a list of interesting code examples that we want to support. It would probably be great to that and see how many we can cover with various options, it should help us make a more informed decision.

@higher-performance
Copy link
Contributor

it would be great to get something that "magically" works with a single attribute and does not need complicated compile-time computations. If we cannot get that, the approach with clang::lifetimebound_like seems like the second-best alternative.

The closest universal solution I can think of is something like the following, assuming we do proper bikeshedding for the name:

template<class... Args>
reference emplace_back(Args&&... args)
  [[clang::diagnosis_body(EXPRESSION_REPRESENTING_A_FAKE_BODY)]];

The faux body could then be used for diagnosis purposes -- and the contract would be that it would never affect codegen.

This would also leave the door open to further extensions in the future. e.g., [[clang::diagnosis_body(..., "lifetime_body_key_1")]] [[clang::lifetime_diagnosis_like("lifetime_body_key_1")]], to allow different analyses to share the same bodies.

I think even [[clang::diagnose_if(...)]] can cause template instantiations that change program behavior.

Unrelated, but I'd be curious to see those if you have any examples. I thought that maybe SFINAE could cause this, but at least in simple examples diagnose_if does not affect overloading.

https://godbolt.org/z/xozfs18Ta (but looks like the [[clang::diagnose_if(...)]] syntax doesn't actually work for that attribute, only __attribute__ syntax works)

Note that any attribute that can result in template instantiations could cause this. (Not sure if that's a necessary condition, but it's sufficient.)

@higher-performance
Copy link
Contributor

Just a friendly bump to figure out how to proceed here.

Does the clang::lifetimebound_like idea sound like a decent middle ground to move forward with? And if so, what do we anticipate the timeline for implementing it to be like? This would affect whether we move forward with workarounds in the meantime.

@ilya-biryukov

@hokein
Copy link
Collaborator Author

hokein commented Feb 20, 2025

It feels like the current direction & discussion is expanding into a broader problem space beyond the specific issue this PR aims to address. We have two major problems which seem to be orthogonal:

  1. avoiding code duplication – specifically, reducing the number of function overloads required due to IsViewType<T>.
  2. supporting variadic templates – handling cases like emplace_back(Args...).

The prototype here primarily targets problem (1). While lifetimebound_like seems like a reasonable solution, if I understand correctly, it mainly addresses problem (2).

As the name indicates, lifetimebound_like is a variant of lifetimebound, there are some inconsistencies:

  • lifetimebound applies to the *this object when placed after the function type. However, lifetimebound_like does not follow the same rule;
  • lifetimebound cannot be placed after standalone function (since there is no *this object), whereas lifetimebound_like can;

These differences might cause confusion for users. While they may not be a major issue, perhaps a more precise name could help clarify the intended behavior.

I'm starting to feel that we’re introducing more and more builtins to address a specific issue, which doesn’t seem ideal or scalable. That said, I don’t have a better alternative at the moment. Problem (2) is a known limitation of the current lifetimebound annotation -- supporting it would be great, but if we don’t have a solid and simple solution, we can always choose to do nothing and accept the limitation.

@hokein
Copy link
Collaborator Author

hokein commented Feb 20, 2025

auto& value = m[std::string_view(std::string())]; // Doesn't get caught

This case is supported as well, https://godbolt.org/z/KKsvd8Kx1.

@hokein
Copy link
Collaborator Author

hokein commented Feb 20, 2025

It would be nice if you could share some code snippets how exactly this would be used in practice.

Some simple examples from Abseil. When using lifetime_capture in the insert method, we currently have two overloads:

@hokein
Copy link
Collaborator Author

hokein commented Feb 21, 2025

Another thought for supporting the emplace_back(Args...) case (for STL only).

The underlying implementation of the emplace_back relies on std::allocator_traits::construct(Alloc& a, T* p, Args&&... args), so we could use the lifetime_capture_by annotation in the instantiated function (this annotation can be relatively easy to infer in clang).

For example, consider an instantiated template (note that the function parameter should be an non-const rvalue reference):

construct(std::string_view* p, std::string&& arg [[clang::lifetime_capture_by(p)]]);

Here, we annotate arg with lifetime_capture_by(p), which should allow us to detect cases like:

construct(&view, std::string()); // Detects dangling pointer

However, this approach doesn’t work for emplace_back, because in that case, we only see perfect forwarding in the function arguments (construct(std::__to_address(__tx.__pos_), std::forward<_Args>(__args)...);)

Potential Extension:

We could consider extending the analysis scope for non-const rvalue references. Specifically if a non-const rvalue reference parameter is annotated, we could always emit a warning when this function is being called.

Example:

void add(std::vector<std::string_view>& container, std::string&& s [[clang::lifetime_capture_by]]);

void test() {  
    std::vector<std::string_view> abc;  
    add(abc, std::string());  // Case 1: Warning, point to a dead object  

    std::string b;  
    add(abc, std::move(b));   // Case 2: point to a moved-from object  

    add(abc, b); // invalid c++ code, cannot bind a lvalue to rvalue reference
}

For non-const rvalue reference function parameters, there are only two legal options to pass the argument:

  1. A temporary object.
  2. A non-const object explicitly marked with std::move().
  • Case 1) is a clear use-after-free issue, which is already detected by the current implementation.
  • Case 2) is a bit subtle. The moved-from object is still alive but in a valid-but-unspecified state. While it’s technically possible to use the object after a std::move(), the general programming guideline is to avoid doing so, as it could lead to potential use-after-move issues (we have a use-after-move clang-tidy check).

If we extend the analysis to warn on Case (2), we should be able to detect the emplace_back case. However, I’m not sure whether making the compiler stricter on this is a feasible idea.

@Xazax-hun
Copy link
Collaborator

A moved from object could be reinitalized:

void test() {  
    std::vector<std::string_view> abc;  
    std::string b;  
    add(abc, std::move(b));  
    b = std::string(); // now b can be used again. 
}

That being said, maybe this is rare enough that we could have an opt-in warning. But we definitely cannot have something that is on by default.

@hokein
Copy link
Collaborator Author

hokein commented Feb 24, 2025

A moved from object could be reinitalized:

void test() {  
    std::vector<std::string_view> abc;  
    std::string b;  
    add(abc, std::move(b));  
    b = std::string(); // now b can be used again. 
}

That being said, maybe this is rare enough that we could have an opt-in warning. But we definitely cannot have something that is on by default.

I think we are primarily concerned with pointers or references to a moved-from object. When the moved-from object is reinitialized, using a pointer to it can be very tricky and is likely undefined behavior, https://godbolt.org/z/Wh5vKe8z6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants