Skip to content

Conversation

benjeffery
Copy link
Member

@benjeffery benjeffery commented Sep 28, 2025

Stacked on #3287 (See 3b69da9 for this PR's diff)
Fixes #760

It was bothering me that we were releasing 1.0 without taking the chance to fix TreeSequence.tables. It's a breaking change (although one that has been warned about) so should go before 1.0.

I then realised there was a "quick" way to do this by having an ImmutableTableCollection that is solely backed by the low-level _tskit.TreeSequence class. Access to all table data is zero copy, and using that interface guarantees that we are not mutating the tree sequence. Subsets are easy too as we just keep a view as an index array to the underlying.

One complicating factor:

  • For compatibility, we have to repack ragged string arrays, as the low-level returns StringDType. We could make (e.g.)TableCollection.sites.ancestral_state return a StringDType instead, but that would be quite a nasty breaking change. The other option is to add another low-level array access method for the data in the numpy 1 way. This would be better as currently numpy<2 has to return a normal TableCollection. (Writing this makes it seem the obvious way now)
    Fixed - added accessors for ragged arrays.

Still needs some tests added, and docs, if we decide this is the way to go.

Perf:
On my machine, saves about ~15% time on the Python test suite
On the Quebecois tables.nodes.time is around 0.1s on a TableCollection (this doesn't incude dump_tables). It's 0.000001 on an ImmutableTableCollection, and that hardly changes for ts.tables.nodes.time.

@benjeffery benjeffery marked this pull request as draft September 28, 2025 08:00
Copy link

codecov bot commented Sep 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.86%. Comparing base (99f03e2) to head (12c954f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3288      +/-   ##
==========================================
- Coverage   89.84%   86.86%   -2.98%     
==========================================
  Files          29        8      -21     
  Lines       32719    15852   -16867     
  Branches     5988     3020    -2968     
==========================================
- Hits        29396    13770   -15626     
+ Misses       1882     1168     -714     
+ Partials     1441      914     -527     
Flag Coverage Δ
c-tests 86.86% <ø> (ø)
lwt-tests ?
python-c-tests ?
python-tests ?
python-tests-no-jit ?
python-tests-numpy1 ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jeromekelleher
Copy link
Member

Very nice, clever idea to use the low-level tree sequence to back and guantee immutability.

I guess the high-level question is here, why do we need this now that we have immutable array-level access on the TreeSequence? I had forgotten about this because in a way we don't really need it any more. It feels like a lot of code and complexity (and code breakage) to access functionality that we already have in a slightly different way.

@benjeffery
Copy link
Member Author

While it's true you can get the arrays, they don't have the table semantics of iteration over row objects, subsetting, equality etc, ts.tables as currently implemented is a bit of a footgun unless you've read the docs closely, and this fixes that for existing code. I agree it's a bit much code - I think I can get that down a bit.

@jeromekelleher
Copy link
Member

Shall we get some wider feedback from Slack? I feel like we should have community buy-in if we're doing breaking changes.

@benjeffery
Copy link
Member Author

I think I can make this a non-breaking change. If a mutating method is called, we emit a deprecation warning, replace the ImmutableTC with a normal TableCollection then call the mutator on that.

@hyanwong
Copy link
Member

Shall we get some wider feedback from Slack?

FWIW this is a +100 from me.

@jeromekelleher
Copy link
Member

I think I can make this a non-breaking change. If a mutating method is called, we emit a deprecation warning, replace the ImmutableTC with a normal TableCollection then call the mutator on that.

Eeesh - I can see that having intended consequences... Let's get some feedback

@molpopgen
Copy link
Member

I think this is a good way to go. Instead of a deprecation warning, I'd do a hard break and raise errors when applying mutable operations. (Or, remove the mutable ops from the API of the immutable type.)

@benjeffery
Copy link
Member Author

Thanks @molpopgen - turns out we can't do the deprecation warning anyway. Attempts at mutation will raise:
tskit.exceptions.ImmutableTableError: Cannot call add_row() on immutable nodes table. Use TreeSequence.dump_tables() for mutable copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table collection returned by ts.tables not read-only.
4 participants