Skip to content

TypeCache returns unexpected data after multiple insertions and deletions#121

Merged
SebastianStehle merged 2 commits intoy-crdt:mainfrom
Dewarrum:typecache-bug
Feb 14, 2026
Merged

TypeCache returns unexpected data after multiple insertions and deletions#121
SebastianStehle merged 2 commits intoy-crdt:mainfrom
Dewarrum:typecache-bug

Conversation

@Dewarrum
Copy link
Copy Markdown
Contributor

@Dewarrum Dewarrum commented Feb 6, 2026

The original context is here: #120

Note: this code is fully implemented by Opus 4.6 and I cannot validate it is I am not familiar with such type of code. I will add summary for a little bit more of context:

Summary
Root cause: The TypeCache uses native pointers (nint) as dictionary keys, mapping them to managed wrapper objects via WeakReference. When items are deleted from the CRDT document, the native yrs library garbage-collects the underlying Branch structs, freeing their memory. The Rust allocator can then reuse those addresses for newly created items. However, the old .NET managed wrappers may still be alive (not yet collected by the .NET GC), so the TypeCache still holds live entries for those addresses. When a new item of a different type (e.g., XmlElement) lands at the same address as a previously cached item of another type (e.g., XmlText), the old code threw an exception instead of recognizing this as a stale entry.
The fix: Instead of throwing when the cached type doesn't match, treat it as a stale entry and fall through to create a new wrapper of the correct type. This is safe because the old managed object's BranchId would no longer resolve to a live branch (since the native item was GC'd), making the old wrapper effectively dead.

What this second bug is
Same root cause — native handle reuse after CRDT GC — but a different symptom. The previous fix only handled the cross-type case (handle reused for XmlElement but cache had XmlText). This bug is the same-type case:
Iteration N creates an XmlText at handle H with BranchId = {client: 1, clock: 5}. Cached.
RemoveRange + transaction commit → yrs GC frees the native Branch at H.
Iteration N+1: InsertText allocates a new XmlText at the same address H, but with BranchId = {client: 1, clock: 105}.
TypeCache.GetOrAdd(H) finds the old cached XmlText — same type, passes the type check — and returns the stale object.
The stale object's BranchId resolves to a dead/null branch → ObjectDisposedException.
The fix (3 files)
UnmanagedResource.cs: Added virtual MatchesHandle(nint handle) that returns true by default.
Branch.cs: Overrides MatchesHandle to compare the stored BranchId (captured at construction from the original native branch) with the BranchId currently at the handle address. If they differ, the handle was reused for a different branch — the cache entry is stale.
TypeCache.cs: The cache hit condition now requires both item is T and item.MatchesHandle(handle). A stale entry (whether same-type or cross-type) falls through and gets replaced.

@SebastianStehle
Copy link
Copy Markdown
Collaborator

I see, I wonder if the extra code is needed and we just remove the exception.

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Dewarrum commented Feb 6, 2026

In the first commit I have simply removed the exception which solved the first problem from the issue:

Unhandled exception. YDotNet.YDotNetException: Expected YDotNet.Document.Types.XmlElements.XmlElement, got YDotNet.Document.Types.XmlTexts.XmlText
   at YDotNet.Infrastructure.TypeCache.GetOrAdd[T](IntPtr handle, Func`2 factory)
   at YDotNet.Document.Doc.GetOrAdd[T](IntPtr handle, Func`3 factory)
   at YDotNet.Document.Doc.GetXmlElement(IntPtr handle, Boolean isDeleted)
   at YDotNet.Document.Types.XmlFragments.XmlFragment.InsertElement(Transaction transaction, UInt32 index, String name)
   at Program.<Main>$(String[] args) in /Users/work/dev/experiments/ydotnet-typecache-bug/YDotNetSample/YDotNetSample/Program.cs:line 10

I ran the test I wrote and it worked fine. But when I tried to run the whole test suite, I suddenly got the second error:

Unhandled exception. System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'Object is disposed.'.
   at YDotNet.Document.Types.Branches.Branch.GetHandle(Transaction transaction)
   at YDotNet.Document.Types.XmlTexts.XmlText.Insert(Transaction transaction, UInt32 index, String value, Input attributes)
   at Program.<Main>$(String[] args) in /Users/work/dev/experiments/ydotnet-typecache-bug/YDotNetSample/YDotNetSample/Program.cs:line 10

I guess the problem is that after the first commit I started getting proper type (XmlElement or XmlText) but that object might have already been already used and disposed in other tests. So it seems like we need to check Handle as well to make sure it does not point to already disposed object.

@SebastianStehle
Copy link
Copy Markdown
Collaborator

But this should be general check then, right? Because I do not understand MatchesHandle in the Branch class.

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Dewarrum commented Feb 9, 2026

Okay, I dove deeper and here is what I found out.

When a Branch instance is created, it's BranchId is stored in memory. BranchId is used to look Branch instance up in some map in Rust's(?) runtime. Rust's runtime controls that map's state. E.g., when a Branch is removed, it is removed from that map as well. As far as I understand, BranchId cannot be reused for a newly created Branch. So whenever Rust's runtime removes it from the map, we cannot access that Branch using ybranch_get anymore (we get nint.Zero in return). This is the reason why I get ObjectDisposedException:

var branchHandle = BranchChannel.Get(handle.Handle, transaction.Handle);
if (branchHandle == nint.Zero || BranchChannel.Alive(branchHandle) == 0)
{
throw new ObjectDisposedException("Object is disposed.");
}
return branchHandle;

So now I am thinking about these 3 options:

  • Use ybranch_alive to check if Branch is alive
  • Use ybranch_get to check if BranchId is still pointing to something
  • Get rid of TypeCache entirely

ybranch_alive approach

No success, I still got ObjectDisposedException.

ybranch_get approach

In order to use ybranch_get I need to have an open transaction but looking at places that use TypeCache it is hard to say if it is possible to pass it. E.g., here:

case OutputTag.Array:
return doc.GetArray(OutputChannel.Array(handle), isDeleted);
case OutputTag.Map:
return doc.GetMap(OutputChannel.Map(handle), isDeleted);
case OutputTag.Text:
return doc.GetText(OutputChannel.Text(handle), isDeleted);
case OutputTag.XmlElement:
return doc.GetXmlElement(OutputChannel.XmlElement(handle), isDeleted);
case OutputTag.XmlText:
return doc.GetXmlText(OutputChannel.XmlText(handle), isDeleted);
case OutputTag.Doc:
return doc.GetDoc(OutputChannel.Doc(handle), isDeleted);

Get rid of TypeCache approach

I don't like it because of performance considerations and it might also be a breaking change. Right now we return the same reference of Branch but if we remove cache, we would need to create a new instance each time.

Summary

I am a little bit lost and don't see a good way to fix this bug. If you could give me a small hint after reading my research, I would really appreciate it.

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Dewarrum commented Feb 9, 2026

What if we used BranchId (maybe HashCode of it) as key in TypeCache instead of Handle? 🤔

UPD: never mind, it would still point to deleted Branch instance

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Dewarrum commented Feb 9, 2026

Another approach I have in mind is to invalidate TypeCache's entry as soon as we remove a Branch but it does not seem trivial

@SebastianStehle
Copy link
Copy Markdown
Collaborator

SebastianStehle commented Feb 9, 2026

I think the TypeCache was used to prevent accessing handles that do not exist anymore. This was crashing the apps with "Memory Access violation" errors. I am not sure if it is needed anymore.

First of all I think the map was not there in rust initially and therefore we implemented it by ourself, now we have is_alive and you could use that instead.

I would wrap the IntPtr in a custom struct and then provide a native reference and whenever it is returned we should check if the ptr is still alive.

It was a while ago when we discussed it: y-crdt/y-crdt#347

@Dewarrum
Copy link
Copy Markdown
Contributor Author

First of all I think the map was not there in rust initially and therefore we implemented it by ourself, now we have is_alive and you could use that instead.

I tried this approach. Every time there is a cache hit and I get and item which is of desired runtime type, I perform BranchChannel.Alive check. And even if I do that, I still get an ObjectDisposedException. This is how I check if it is safe to return an item found in cache:

if (cache.TryGetValue(handle, out var weakRef)
    && weakRef.TryGetTarget(out var item)
    && item is T typed
    && BranchChannel.Alive(typed.Handle) != 0)
{
    return typed;
}

I would wrap the IntPtr in a custom struct and then provide a native reference and whenever it is returned we should check if the ptr is still alive.

Sorry but I don't get it. Could you please elaborate?

I additionally tested how BranchChannel.Alive works and it feels like I don't full understand how it should work. What I did:

  • Open transaction
  • On XmlFragment call InsertElement
  • Capture Handle of returned XmlElement
  • Commit transaction
  • Open new transaction
  • On XmlFragment call RemoveRange
  • Commit transaction
  • Call BranchChannel.Alive and get 1 - so it means that the Branch is alive even though it is removed

After that I decided to check BranchChannel.Get:

  • Open transaction
  • On XmlFragment call InsertElement
  • Capture Handle of returned XmlElement
  • Commit transaction
  • Open new transaction
  • On XmlFragment call RemoveRange
  • Commit transaction
  • Call BranchChannel.Get and get 0x0 (null pointer)

So it feels like the proper way is to use BranchChannel.Get but it requires a transaction which is hard to pass and would definitely introduce a breaking change.

@SebastianStehle
Copy link
Copy Markdown
Collaborator

For my understandign the is_alive check needs to be done whenever we do something with the handle.

So we could make a struct like

struct NativeHandle(IntPtr handle)
{
    public IntPtr Handle {
       get {
          if (!IsAlive(handle)) {
             throw ...
          }
       }
    }
}

And then we store this everywhere and get rid of the cache.

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Dewarrum commented Feb 11, 2026

But Rust allocator can reuse this memory address for something else, can't it?

UPD: I might be wrong so I still will try what you suggest

@Dewarrum
Copy link
Copy Markdown
Contributor Author

I am currently looking at Branch.GetHandle and it looks very similar to what you are showing with your struct example. Doesn't it ensure the same thing? Whenever we want to do something (e.g. XmlFragment.InsertElement, Map.Insert, etc.) we:

  • Get a reference to BranchId
  • Get a reference to Branch by BranchId
  • Check if we got a null reference
  • Check if the Branch is alive

using var handle = MemoryWriter.WriteStruct(BranchId);
var branchHandle = BranchChannel.Get(handle.Handle, transaction.Handle);
if (branchHandle == nint.Zero || BranchChannel.Alive(branchHandle) == 0)
{
throw new ObjectDisposedException("Object is disposed.");
}
return branchHandle;

I am thinking about simply removing cache

@SebastianStehle
Copy link
Copy Markdown
Collaborator

I did not remember that we have method. If this is the only way we actually use a branch handle, it should work fine.Then I think the cache can be removed.

@Dewarrum
Copy link
Copy Markdown
Contributor Author

I have removed the cache. The only concern I have is testing. The test I have written before is not a proper Unit test in my opinion. It should have tested how Doc uses TypeCache. As we decided to remove TypeCache I don't know if any testing is needed. If I write a test, it would feel like I am testing the code which has been removed. Please let me know what you think about it.

@SebastianStehle
Copy link
Copy Markdown
Collaborator

Can you somehow reproduce the original issue? Or do we have tests to ensure that deleted instances cannot be used anymore?

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Dewarrum commented Feb 12, 2026

I have added tests for XmlFragment and XmlElement. I tried to figure out if other shared types are affected but seems like they are not, because there is no way to replace existing root shared type with something else. E.g., you cannot do:

var array = doc.Array("root");
doc.Remove("root");
var xmlFragment = doc.XmlFragment("root");

So seems like root shared types live as long as the Doc lives and thus can always be accessed by the same pointer (unless the Doc is destroyed).

@Dewarrum
Copy link
Copy Markdown
Contributor Author

Would it be possible to merge these changes today and release a new version?

@SebastianStehle SebastianStehle merged commit fea4f09 into y-crdt:main Feb 14, 2026
15 checks passed
@Dewarrum
Copy link
Copy Markdown
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants