Skip to content

Notify user if two datasets with different hashes are compared#219

Merged
paulmueller merged 10 commits intoDC-analysis:masterfrom
RaghavaAlajangi:dcnum_hash_notification
Jul 30, 2025
Merged

Notify user if two datasets with different hashes are compared#219
paulmueller merged 10 commits intoDC-analysis:masterfrom
RaghavaAlajangi:dcnum_hash_notification

Conversation

@RaghavaAlajangi
Copy link
Copy Markdown
Member

@RaghavaAlajangi RaghavaAlajangi commented Jul 18, 2025

This PR aims to implement the feature mentioned in issue #217

  • implement logic
  • run tests locally
  • CICD passed
  • update CHANGELOG after review

@RaghavaAlajangi
Copy link
Copy Markdown
Member Author

RaghavaAlajangi commented Jul 18, 2025

Case-1:

  • If more than one different dataset with different pipeline hashes are compared, you can see labels like Pipeline type A, Pipeline type B, so on.
image

Case-2:

  • If the datasets with the same pipeline hashes are compared, you don't see any labels.
image

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 18, 2025

Codecov Report

❌ Patch coverage is 89.65517% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.48%. Comparing base (22ae7ff) to head (7c0fbf3).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
shapeout2/gui/pipeline_plot.py 89.65% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #219      +/-   ##
==========================================
+ Coverage   78.45%   78.48%   +0.03%     
==========================================
  Files          67       67              
  Lines        7629     7654      +25     
==========================================
+ Hits         5985     6007      +22     
- Misses       1644     1647       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paulmueller
Copy link
Copy Markdown
Member

paulmueller commented Jul 21, 2025

Thanks! It might not be clear to the user what the "Type" label means. How about "Pipeline HASH" where HASH are the first four characters of the analysis pipeline? The label should be displayed in each plot when the hash differs in at least one of them.

@RaghavaAlajangi
Copy link
Copy Markdown
Member Author

RaghavaAlajangi commented Jul 21, 2025

As you can see from the above plots, I changed this to Pipeline type A, Pipeline type B, and so on. Does it work, or should I replace letters with a hash?

@paulmueller
Copy link
Copy Markdown
Member

I would say the word "type" is incorrect (there is no pipeline type, only different pipeline parameters) and "A" and "B" is too generic. Think about the case where you have two plots, each of them containing data from unique pipelines, but Shape-Out displays them as "A" and "B". Then a user might assume that "A" is always "A" and "B" is always "B", and we are again at the apples-vs-oranges comparison. Use "Pipeline HASH". In cases where the first four characters of two different pipelines match, more characters should be appended to the displayed hash.

@RaghavaAlajangi
Copy link
Copy Markdown
Member Author

Hi @paulmueller
Can you review these changes?

def get_hash_flag(hash_set, rtdc_ds):
"""Helper function to determine the hash flag based on the dataset and
hash set."""
short_hash_set = set(h[:4] if h is not None else None for h in hash_set)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash length should be dynamic in all cases. I.e. if the first 4 characters of two hashes are identical, then the hash length should be 5, but if the first 5 characters are identical, then the hash length should be 6 etc. It is very unlikely to happen, but it can happen at some point.

There might be a smart way of achieving this with list comprehensions, but a simple for-loop over the length of the longest hash (incrementing req_hash_len and generating short_hash_set) with the list/set comprehension you proposed would be good enough.

BTW this is a good design, putting the logic of whether to show the text and what text to show in one single method 👍

@RaghavaAlajangi
Copy link
Copy Markdown
Member Author

Hi Paul,
please have a look at these changes.



def test_get_hash_flag():
rtdc_paths = datapath.glob("*.rtdc")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assert rtdc_paths to make sure this test does not get skipped in case the data directory changes.

Copy link
Copy Markdown
Member

@paulmueller paulmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good 👍 . To make things air tight, please add this to the testing code:

The test for get_hash_flag is very generic and does not explicitly check some of the cases.

Please add two more tests with the corresponding .rtdc files:

  1. hash_set only contains None -> get_hash_flag returns None
  2. hash_flag contains at least on hash -> get_hash_flag returns "Pipeline HASH".

These explicit tests will help avoid regressions in the code.

@RaghavaAlajangi
Copy link
Copy Markdown
Member Author

Hi @paulmueller
Please look at these changes now.

Copy link
Copy Markdown
Member

@paulmueller paulmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@paulmueller paulmueller merged commit ab8887d into DC-analysis:master Jul 30, 2025
5 checks passed
RaghavaAlajangi added a commit to RaghavaAlajangi/DCscope that referenced this pull request Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants