TypeCache returns unexpected data after multiple insertions and deletions by Dewarrum · Pull Request #121 · y-crdt/ydotnet

Dewarrum · 2026-02-06T10:51:37Z

The original context is here: #120

Note: this code is fully implemented by Opus 4.6 and I cannot validate it is I am not familiar with such type of code. I will add summary for a little bit more of context:

Summary
Root cause: The TypeCache uses native pointers (nint) as dictionary keys, mapping them to managed wrapper objects via WeakReference. When items are deleted from the CRDT document, the native yrs library garbage-collects the underlying Branch structs, freeing their memory. The Rust allocator can then reuse those addresses for newly created items. However, the old .NET managed wrappers may still be alive (not yet collected by the .NET GC), so the TypeCache still holds live entries for those addresses. When a new item of a different type (e.g., XmlElement) lands at the same address as a previously cached item of another type (e.g., XmlText), the old code threw an exception instead of recognizing this as a stale entry.
The fix: Instead of throwing when the cached type doesn't match, treat it as a stale entry and fall through to create a new wrapper of the correct type. This is safe because the old managed object's BranchId would no longer resolve to a live branch (since the native item was GC'd), making the old wrapper effectively dead.

What this second bug is
Same root cause — native handle reuse after CRDT GC — but a different symptom. The previous fix only handled the cross-type case (handle reused for XmlElement but cache had XmlText). This bug is the same-type case:
Iteration N creates an XmlText at handle H with BranchId = {client: 1, clock: 5}. Cached.
RemoveRange + transaction commit → yrs GC frees the native Branch at H.
Iteration N+1: InsertText allocates a new XmlText at the same address H, but with BranchId = {client: 1, clock: 105}.
TypeCache.GetOrAdd(H) finds the old cached XmlText — same type, passes the type check — and returns the stale object.
The stale object's BranchId resolves to a dead/null branch → ObjectDisposedException.
The fix (3 files)
UnmanagedResource.cs: Added virtual MatchesHandle(nint handle) that returns true by default.
Branch.cs: Overrides MatchesHandle to compare the stored BranchId (captured at construction from the original native branch) with the BranchId currently at the handle address. If they differ, the handle was reused for a different branch — the cache entry is stale.
TypeCache.cs: The cache hit condition now requires both item is T and item.MatchesHandle(handle). A stale entry (whether same-type or cross-type) falls through and gets replaced.

SebastianStehle · 2026-02-06T13:00:29Z

I see, I wonder if the extra code is needed and we just remove the exception.

Dewarrum · 2026-02-06T13:12:09Z

In the first commit I have simply removed the exception which solved the first problem from the issue:

Unhandled exception. YDotNet.YDotNetException: Expected YDotNet.Document.Types.XmlElements.XmlElement, got YDotNet.Document.Types.XmlTexts.XmlText
   at YDotNet.Infrastructure.TypeCache.GetOrAdd[T](IntPtr handle, Func`2 factory)
   at YDotNet.Document.Doc.GetOrAdd[T](IntPtr handle, Func`3 factory)
   at YDotNet.Document.Doc.GetXmlElement(IntPtr handle, Boolean isDeleted)
   at YDotNet.Document.Types.XmlFragments.XmlFragment.InsertElement(Transaction transaction, UInt32 index, String name)
   at Program.<Main>$(String[] args) in /Users/work/dev/experiments/ydotnet-typecache-bug/YDotNetSample/YDotNetSample/Program.cs:line 10

I ran the test I wrote and it worked fine. But when I tried to run the whole test suite, I suddenly got the second error:

Unhandled exception. System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'Object is disposed.'.
   at YDotNet.Document.Types.Branches.Branch.GetHandle(Transaction transaction)
   at YDotNet.Document.Types.XmlTexts.XmlText.Insert(Transaction transaction, UInt32 index, String value, Input attributes)
   at Program.<Main>$(String[] args) in /Users/work/dev/experiments/ydotnet-typecache-bug/YDotNetSample/YDotNetSample/Program.cs:line 10

I guess the problem is that after the first commit I started getting proper type (XmlElement or XmlText) but that object might have already been already used and disposed in other tests. So it seems like we need to check Handle as well to make sure it does not point to already disposed object.

SebastianStehle · 2026-02-06T14:18:22Z

But this should be general check then, right? Because I do not understand MatchesHandle in the Branch class.

Dewarrum · 2026-02-09T13:48:22Z

Okay, I dove deeper and here is what I found out.

When a Branch instance is created, it's BranchId is stored in memory. BranchId is used to look Branch instance up in some map in Rust's(?) runtime. Rust's runtime controls that map's state. E.g., when a Branch is removed, it is removed from that map as well. As far as I understand, BranchId cannot be reused for a newly created Branch. So whenever Rust's runtime removes it from the map, we cannot access that Branch using ybranch_get anymore (we get nint.Zero in return). This is the reason why I get ObjectDisposedException:

ydotnet/YDotNet/Document/Types/Branches/Branch.cs

Lines 97 to 104 in 93ff21b

    
           var branchHandle = BranchChannel.Get(handle.Handle, transaction.Handle); 
        
           if (branchHandle == nint.Zero || BranchChannel.Alive(branchHandle) == 0) 
        
           { 
        
               throw new ObjectDisposedException("Object is disposed."); 
        
           } 
        
           return branchHandle;

So now I am thinking about these 3 options:

Use ybranch_alive to check if Branch is alive
Use ybranch_get to check if BranchId is still pointing to something
Get rid of TypeCache entirely

`ybranch_alive` approach

No success, I still got ObjectDisposedException.

`ybranch_get` approach

In order to use ybranch_get I need to have an open transaction but looking at places that use TypeCache it is hard to say if it is possible to pass it. E.g., here:

ydotnet/YDotNet/Document/Cells/Output.cs

Lines 163 to 179 in a7de397

    
           case OutputTag.Array: 
        
               return doc.GetArray(OutputChannel.Array(handle), isDeleted); 
        
           case OutputTag.Map: 
        
               return doc.GetMap(OutputChannel.Map(handle), isDeleted); 
        
           case OutputTag.Text: 
        
               return doc.GetText(OutputChannel.Text(handle), isDeleted); 
        
           case OutputTag.XmlElement: 
        
               return doc.GetXmlElement(OutputChannel.XmlElement(handle), isDeleted); 
        
           case OutputTag.XmlText: 
        
               return doc.GetXmlText(OutputChannel.XmlText(handle), isDeleted); 
        
           case OutputTag.Doc: 
        
               return doc.GetDoc(OutputChannel.Doc(handle), isDeleted);

Get rid of TypeCache approach

I don't like it because of performance considerations and it might also be a breaking change. Right now we return the same reference of Branch but if we remove cache, we would need to create a new instance each time.

Summary

I am a little bit lost and don't see a good way to fix this bug. If you could give me a small hint after reading my research, I would really appreciate it.

Dewarrum · 2026-02-09T13:55:45Z

What if we used BranchId (maybe HashCode of it) as key in TypeCache instead of Handle? 🤔

UPD: never mind, it would still point to deleted Branch instance

Dewarrum · 2026-02-09T13:57:52Z

Another approach I have in mind is to invalidate TypeCache's entry as soon as we remove a Branch but it does not seem trivial

SebastianStehle · 2026-02-09T17:11:52Z

I think the TypeCache was used to prevent accessing handles that do not exist anymore. This was crashing the apps with "Memory Access violation" errors. I am not sure if it is needed anymore.

First of all I think the map was not there in rust initially and therefore we implemented it by ourself, now we have is_alive and you could use that instead.

I would wrap the IntPtr in a custom struct and then provide a native reference and whenever it is returned we should check if the ptr is still alive.

It was a while ago when we discussed it: y-crdt/y-crdt#347

Dewarrum · 2026-02-11T09:40:45Z

First of all I think the map was not there in rust initially and therefore we implemented it by ourself, now we have is_alive and you could use that instead.

I tried this approach. Every time there is a cache hit and I get and item which is of desired runtime type, I perform BranchChannel.Alive check. And even if I do that, I still get an ObjectDisposedException. This is how I check if it is safe to return an item found in cache:

if (cache.TryGetValue(handle, out var weakRef)
    && weakRef.TryGetTarget(out var item)
    && item is T typed
    && BranchChannel.Alive(typed.Handle) != 0)
{
    return typed;
}

I would wrap the IntPtr in a custom struct and then provide a native reference and whenever it is returned we should check if the ptr is still alive.

Sorry but I don't get it. Could you please elaborate?

I additionally tested how BranchChannel.Alive works and it feels like I don't full understand how it should work. What I did:

Open transaction
On XmlFragment call InsertElement
Capture Handle of returned XmlElement
Commit transaction
Open new transaction
On XmlFragment call RemoveRange
Commit transaction
Call BranchChannel.Alive and get 1 - so it means that the Branch is alive even though it is removed

After that I decided to check BranchChannel.Get:

Open transaction
On XmlFragment call InsertElement
Capture Handle of returned XmlElement
Commit transaction
Open new transaction
On XmlFragment call RemoveRange
Commit transaction
Call BranchChannel.Get and get 0x0 (null pointer)

So it feels like the proper way is to use BranchChannel.Get but it requires a transaction which is hard to pass and would definitely introduce a breaking change.

SebastianStehle · 2026-02-11T10:31:41Z

For my understandign the is_alive check needs to be done whenever we do something with the handle.

So we could make a struct like

struct NativeHandle(IntPtr handle)
{
    public IntPtr Handle {
       get {
          if (!IsAlive(handle)) {
             throw ...
          }
       }
    }
}

And then we store this everywhere and get rid of the cache.

Dewarrum · 2026-02-11T10:34:33Z

But Rust allocator can reuse this memory address for something else, can't it?

UPD: I might be wrong so I still will try what you suggest

Dewarrum · 2026-02-11T13:49:20Z

I am currently looking at Branch.GetHandle and it looks very similar to what you are showing with your struct example. Doesn't it ensure the same thing? Whenever we want to do something (e.g. XmlFragment.InsertElement, Map.Insert, etc.) we:

Get a reference to BranchId
Get a reference to Branch by BranchId
Check if we got a null reference
Check if the Branch is alive

ydotnet/YDotNet/Document/Types/Branches/Branch.cs

Lines 95 to 104 in 93ff21b

    
           using var handle = MemoryWriter.WriteStruct(BranchId); 
        
           var branchHandle = BranchChannel.Get(handle.Handle, transaction.Handle); 
        
           if (branchHandle == nint.Zero || BranchChannel.Alive(branchHandle) == 0) 
        
           { 
        
               throw new ObjectDisposedException("Object is disposed."); 
        
           } 
        
           return branchHandle;

I am thinking about simply removing cache

SebastianStehle · 2026-02-11T13:58:11Z

I did not remember that we have method. If this is the only way we actually use a branch handle, it should work fine.Then I think the cache can be removed.

Dewarrum · 2026-02-11T14:19:52Z

I have removed the cache. The only concern I have is testing. The test I have written before is not a proper Unit test in my opinion. It should have tested how Doc uses TypeCache. As we decided to remove TypeCache I don't know if any testing is needed. If I write a test, it would feel like I am testing the code which has been removed. Please let me know what you think about it.

SebastianStehle · 2026-02-11T15:50:54Z

Can you somehow reproduce the original issue? Or do we have tests to ensure that deleted instances cannot be used anymore?

Dewarrum · 2026-02-12T07:09:55Z

I have added tests for XmlFragment and XmlElement. I tried to figure out if other shared types are affected but seems like they are not, because there is no way to replace existing root shared type with something else. E.g., you cannot do:

var array = doc.Array("root");
doc.Remove("root");
var xmlFragment = doc.XmlFragment("root");

So seems like root shared types live as long as the Doc lives and thus can always be accessed by the same pointer (unless the Doc is destroyed).

Dewarrum · 2026-02-13T08:26:58Z

Would it be possible to merge these changes today and release a new version?

Dewarrum · 2026-02-16T14:58:35Z

Thank you!

feat: get rid of type cache entirely

d081f06

Dewarrum force-pushed the typecache-bug branch from 70422a1 to d081f06 Compare February 11, 2026 14:12

tests: ensure multiple inserts and removals do not throw

dc34daf

SebastianStehle merged commit fea4f09 into y-crdt:main Feb 14, 2026
15 checks passed

Conversation

Dewarrum commented Feb 6, 2026

Uh oh!

SebastianStehle commented Feb 6, 2026

Uh oh!

Dewarrum commented Feb 6, 2026

Uh oh!

SebastianStehle commented Feb 6, 2026

Uh oh!

Dewarrum commented Feb 9, 2026

ybranch_alive approach

ybranch_get approach

Get rid of TypeCache approach

Summary

Uh oh!

Dewarrum commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dewarrum commented Feb 9, 2026

Uh oh!

SebastianStehle commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dewarrum commented Feb 11, 2026

Uh oh!

SebastianStehle commented Feb 11, 2026

Uh oh!

Dewarrum commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dewarrum commented Feb 11, 2026

Uh oh!

SebastianStehle commented Feb 11, 2026

Uh oh!

Dewarrum commented Feb 11, 2026

Uh oh!

SebastianStehle commented Feb 11, 2026

Uh oh!

Dewarrum commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dewarrum commented Feb 13, 2026

Uh oh!

Uh oh!

Dewarrum commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`ybranch_alive` approach

`ybranch_get` approach

Dewarrum commented Feb 9, 2026 •

edited

Loading

SebastianStehle commented Feb 9, 2026 •

edited

Loading

Dewarrum commented Feb 11, 2026 •

edited

Loading

Dewarrum commented Feb 12, 2026 •

edited

Loading