-
Notifications
You must be signed in to change notification settings - Fork 77
Make TreeSequence.tables return zero-copy ImmutableTableCollection #3288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3b69da9
to
0517732
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3288 +/- ##
==========================================
- Coverage 89.84% 86.86% -2.98%
==========================================
Files 29 8 -21
Lines 32719 15852 -16867
Branches 5988 3020 -2968
==========================================
- Hits 29396 13770 -15626
+ Misses 1882 1168 -714
+ Partials 1441 914 -527
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Very nice, clever idea to use the low-level tree sequence to back and guantee immutability. I guess the high-level question is here, why do we need this now that we have immutable array-level access on the TreeSequence? I had forgotten about this because in a way we don't really need it any more. It feels like a lot of code and complexity (and code breakage) to access functionality that we already have in a slightly different way. |
While it's true you can get the arrays, they don't have the table semantics of iteration over row objects, subsetting, equality etc, |
Shall we get some wider feedback from Slack? I feel like we should have community buy-in if we're doing breaking changes. |
I think I can make this a non-breaking change. If a mutating method is called, we emit a deprecation warning, replace the |
FWIW this is a +100 from me. |
Eeesh - I can see that having intended consequences... Let's get some feedback |
2257e4e
to
12c954f
Compare
I think this is a good way to go. Instead of a deprecation warning, I'd do a hard break and raise errors when applying mutable operations. (Or, remove the mutable ops from the API of the immutable type.) |
Thanks @molpopgen - turns out we can't do the deprecation warning anyway. Attempts at mutation will raise: |
Stacked on #3287 (See 3b69da9 for this PR's diff)
Fixes #760
It was bothering me that we were releasing 1.0 without taking the chance to fix
TreeSequence.tables
. It's a breaking change (although one that has been warned about) so should go before 1.0.I then realised there was a "quick" way to do this by having an
ImmutableTableCollection
that is solely backed by the low-level_tskit.TreeSequence
class. Access to all table data is zero copy, and using that interface guarantees that we are not mutating the tree sequence. Subsets are easy too as we just keep a view as an index array to the underlying.One complicating factor:For compatibility, we have to repack ragged string arrays, as the low-level returnsStringDType
. We could make (e.g.)TableCollection.sites.ancestral_state
return aStringDType
instead, but that would be quite a nasty breaking change. The other option is to add another low-level array access method for the data in the numpy 1 way. This would be better as currentlynumpy<2
has to return a normalTableCollection
. (Writing this makes it seem the obvious way now)Fixed - added accessors for ragged arrays.
Still needs some tests added, and docs, if we decide this is the way to go.
Perf:
On my machine, saves about ~15% time on the Python test suite
On the Quebecois
tables.nodes.time
is around 0.1s on aTableCollection
(this doesn't incudedump_tables
). It's 0.000001 on anImmutableTableCollection
, and that hardly changes forts.tables.nodes.time
.