Skip to content

Conversation

petrelharp
Copy link
Contributor

@petrelharp petrelharp commented Sep 24, 2025

Here's the start at what we discussed in #3183. Might you have a go at putting in the python tests, @hyanwong?

Something I haven't done that we said we might in the other PR is require that if all_edges or all_mutations are True then check_shared_overlap is False. I don't think we actually need to require this (since in some cases all three might be true and it's fine!) but we should probably say in the docstring that the user probably wants this to be the case. (I just don't remember why that is right now.)

Still TODO:

  • python tests:
    • check if other has no edges that we can use this to add mutations to a subset of nodes in an existing ts
    • check if we trim ts down to two disjoint spans and then union these together we get back ts
  • CHANGELOG
  • say something about checking shared overlap in the docs
  • explain about all_mutations and all_edges in the docstring

Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.27%. Comparing base (647ad96) to head (3351689).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
c/tskit/tables.c 75.00% 2 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (647ad96) and HEAD (3351689). Click for more details.

HEAD has 7 uploads less than BASE
Flag BASE (647ad96) HEAD (3351689)
python-c-tests 1 0
python-tests 6 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3283       +/-   ##
===========================================
- Coverage   89.81%   76.27%   -13.55%     
===========================================
  Files          29       28        -1     
  Lines       32665    25112     -7553     
  Branches     5982     4641     -1341     
===========================================
- Hits        29338    19153    -10185     
- Misses       1886     4759     +2873     
+ Partials     1441     1200      -241     
Flag Coverage Δ
c-tests 86.85% <75.00%> (-0.02%) ⬇️
lwt-tests 80.38% <ø> (ø)
python-c-tests ?
python-tests ?
python-tests-no-jit 33.18% <ø> (+0.13%) ⬆️
python-tests-numpy1 50.84% <ø> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
python/tskit/tables.py 63.26% <ø> (-36.00%) ⬇️
python/tskit/trees.py 68.21% <ø> (-30.63%) ⬇️
c/tskit/tables.c 83.08% <75.00%> (-0.02%) ⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@petrelharp
Copy link
Contributor Author

Whoops - now this should be ready to write tests for.

And fix concatenate()
@hyanwong
Copy link
Member

hyanwong commented Oct 4, 2025

I made a PR to your PR, @petrelharp - the only thing I can see that might be a problem is that we might want to include empty sites when unioning one TS with another. If so, we could either:

  1. Include empty sites when all_mutations=True
  2. Have an additional parameter: e.g. all_sites
  3. Have a single parameter all which does all_mutations, all_edges and all_sites (is there a call for separating edges and sites?)

I guess (2) gives the most flexibility for the user?

Disjoint union: add some python tests
@petrelharp
Copy link
Contributor Author

Let's keep the convo here, and lmk if you don't have permissions to push here.

That all looks good; I think we just need some additional checks for correctness of the union (a lot of the checks are just that the number of things is right).

I guess I don't think we should have a separate all_sites argument, we should just have all_mutations also include all sites. Otherwise the behavior is complicated.

It's tempting to have a disjoint argument that does all_mutations=True, all_edges=True, and check_shared_overlap=False. I'm not sure, since it's adding another argument, but I think I like it since it points users towards the use case. (And if they use this with two that have shared overlap they'll get duplicated edges and hence an error, which is good.)

@petrelharp
Copy link
Contributor Author

There; I switched the behavior to add all sites for reals if all_mutations is True.

@hyanwong
Copy link
Member

hyanwong commented Oct 4, 2025

Great! I'll add more checks on actual correctness. Thanks @petrelharp !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants