You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do not pass ProjectMetadata to lazy index permissions builder (#135337)
During a serverless incident (INC-4832) that was caused by frequent OOM
exceptions it was discovered that ~30% of the heap was occupied by
`ProjectMetadata` instances.
The `ProjectMetadata` instances were retained by a lambda in
`IndicesPermission`, see this example of a path to gc root: <img
width="2760" height="915" alt="9a9f6dfd-bd11-41ac-a0e2-345a86ba0509"
src="https://github.com/user-attachments/assets/7de8b33e-6002-4330-87e0-28a6ab7aeeac"
/>
The reason the lambda exists is to make the [index access control
lazy](#88708). Because the
lambda is lazy, it will hold on to the reference to `ProjectMetadata`
for the full request life cycle (as opposed to building the index
permissions and dropping the reference). This becomes a problem when
there are many concurrent searches (index actions requiring us to check
index permissions) coupled with frequent `ProjectMetadata` updates.
Since the lambda holds a reference to `ProjectMetadata` it can't be
garbage collected.
I've proven this by: 1. Adding a sleep to `TransportSearchAction` to
simulate slow searches 2. Hook up visual vm to Elasticsearch 3. Launch
"slow" searches with `ProjectMetadata` updates in between (triggered by
creating new indices) 4. Trigger GC manually through visual vm 5.
Observe memory usage by `ProjectMetadata` while the searches are hanging
(to simulate request in flight)
### Before any requests
<img width="814" height="619" alt="Screenshot 2025-09-24 at 13 52 34"
src="https://github.com/user-attachments/assets/5732a317-e298-4c90-bf8e-b5c211481c5d"
/>
### While requests are in flight <img width="798" height="668"
alt="Screenshot 2025-09-24 at 14 31 41"
src="https://github.com/user-attachments/assets/c5e5e9ab-e95c-4a78-b0fd-f4bcc5d3149e"
/>
### Fix To fix this issue I've moved the part that needed
`ProjectMetadata` outside of the lambda. `ProjectMetadata` was needed
to resolved failure store indices. With this PR we will do some more
work that #88708 tried to
remove, but I think it's acceptable for the memory gain.
To validate that this fixed the issue I ran the same test as above and
could see that `ProjectMetadata` could be garbaged collected as soon as
authorization was finished.
<img width="945" height="722" alt="Screenshot 2025-09-24 at 14 04 37"
src="https://github.com/user-attachments/assets/8f70b388-2150-4768-a5bf-ac10dd36b41c"
/>
Copy file name to clipboardExpand all lines: x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/permission/IndicesPermission.java
0 commit comments