TypeCache returns unexpected data after multiple insertions and deletions#121
TypeCache returns unexpected data after multiple insertions and deletions#121SebastianStehle merged 2 commits intoy-crdt:mainfrom
Conversation
|
I see, I wonder if the extra code is needed and we just remove the exception. |
|
In the first commit I have simply removed the exception which solved the first problem from the issue: I ran the test I wrote and it worked fine. But when I tried to run the whole test suite, I suddenly got the second error: I guess the problem is that after the first commit I started getting proper type (XmlElement or XmlText) but that object might have already been already used and disposed in other tests. So it seems like we need to check Handle as well to make sure it does not point to already disposed object. |
|
But this should be general check then, right? Because I do not understand MatchesHandle in the Branch class. |
|
Okay, I dove deeper and here is what I found out. When a Branch instance is created, it's BranchId is stored in memory. BranchId is used to look Branch instance up in some map in Rust's(?) runtime. Rust's runtime controls that map's state. E.g., when a Branch is removed, it is removed from that map as well. As far as I understand, BranchId cannot be reused for a newly created Branch. So whenever Rust's runtime removes it from the map, we cannot access that Branch using ydotnet/YDotNet/Document/Types/Branches/Branch.cs Lines 97 to 104 in 93ff21b So now I am thinking about these 3 options:
|
|
What if we used BranchId (maybe HashCode of it) as key in TypeCache instead of Handle? 🤔 UPD: never mind, it would still point to deleted Branch instance |
|
Another approach I have in mind is to invalidate TypeCache's entry as soon as we remove a Branch but it does not seem trivial |
|
I think the TypeCache was used to prevent accessing handles that do not exist anymore. This was crashing the apps with "Memory Access violation" errors. I am not sure if it is needed anymore. First of all I think the map was not there in rust initially and therefore we implemented it by ourself, now we have is_alive and you could use that instead. I would wrap the IntPtr in a custom struct and then provide a native reference and whenever it is returned we should check if the ptr is still alive. It was a while ago when we discussed it: y-crdt/y-crdt#347 |
I tried this approach. Every time there is a cache hit and I get and item which is of desired runtime type, I perform if (cache.TryGetValue(handle, out var weakRef)
&& weakRef.TryGetTarget(out var item)
&& item is T typed
&& BranchChannel.Alive(typed.Handle) != 0)
{
return typed;
}
Sorry but I don't get it. Could you please elaborate? I additionally tested how
After that I decided to check
So it feels like the proper way is to use |
|
For my understandign the is_alive check needs to be done whenever we do something with the handle. So we could make a struct like And then we store this everywhere and get rid of the cache. |
|
But Rust allocator can reuse this memory address for something else, can't it? UPD: I might be wrong so I still will try what you suggest |
|
I am currently looking at
ydotnet/YDotNet/Document/Types/Branches/Branch.cs Lines 95 to 104 in 93ff21b I am thinking about simply removing cache |
|
I did not remember that we have method. If this is the only way we actually use a branch handle, it should work fine.Then I think the cache can be removed. |
70422a1 to
d081f06
Compare
|
I have removed the cache. The only concern I have is testing. The test I have written before is not a proper Unit test in my opinion. It should have tested how |
|
Can you somehow reproduce the original issue? Or do we have tests to ensure that deleted instances cannot be used anymore? |
|
I have added tests for var array = doc.Array("root");
doc.Remove("root");
var xmlFragment = doc.XmlFragment("root");So seems like root shared types live as long as the |
|
Would it be possible to merge these changes today and release a new version? |
|
Thank you! |
The original context is here: #120
Note: this code is fully implemented by Opus 4.6 and I cannot validate it is I am not familiar with such type of code. I will add summary for a little bit more of context:
Summary
Root cause: The TypeCache uses native pointers (nint) as dictionary keys, mapping them to managed wrapper objects via WeakReference. When items are deleted from the CRDT document, the native yrs library garbage-collects the underlying Branch structs, freeing their memory. The Rust allocator can then reuse those addresses for newly created items. However, the old .NET managed wrappers may still be alive (not yet collected by the .NET GC), so the TypeCache still holds live entries for those addresses. When a new item of a different type (e.g., XmlElement) lands at the same address as a previously cached item of another type (e.g., XmlText), the old code threw an exception instead of recognizing this as a stale entry.
The fix: Instead of throwing when the cached type doesn't match, treat it as a stale entry and fall through to create a new wrapper of the correct type. This is safe because the old managed object's BranchId would no longer resolve to a live branch (since the native item was GC'd), making the old wrapper effectively dead.
What this second bug is
Same root cause — native handle reuse after CRDT GC — but a different symptom. The previous fix only handled the cross-type case (handle reused for XmlElement but cache had XmlText). This bug is the same-type case:
Iteration N creates an XmlText at handle H with BranchId = {client: 1, clock: 5}. Cached.
RemoveRange + transaction commit → yrs GC frees the native Branch at H.
Iteration N+1: InsertText allocates a new XmlText at the same address H, but with BranchId = {client: 1, clock: 105}.
TypeCache.GetOrAdd(H) finds the old cached XmlText — same type, passes the type check — and returns the stale object.
The stale object's BranchId resolves to a dead/null branch → ObjectDisposedException.
The fix (3 files)
UnmanagedResource.cs: Added virtual MatchesHandle(nint handle) that returns true by default.
Branch.cs: Overrides MatchesHandle to compare the stored BranchId (captured at construction from the original native branch) with the BranchId currently at the handle address. If they differ, the handle was reused for a different branch — the cache entry is stale.
TypeCache.cs: The cache hit condition now requires both item is T and item.MatchesHandle(handle). A stale entry (whether same-type or cross-type) falls through and gets replaced.