Notify user if two datasets with different hashes are compared#219
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #219 +/- ##
==========================================
+ Coverage 78.45% 78.48% +0.03%
==========================================
Files 67 67
Lines 7629 7654 +25
==========================================
+ Hits 5985 6007 +22
- Misses 1644 1647 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks! It might not be clear to the user what the "Type" label means. How about "Pipeline HASH" where HASH are the first four characters of the analysis pipeline? The label should be displayed in each plot when the hash differs in at least one of them. |
|
As you can see from the above plots, I changed this to |
|
I would say the word "type" is incorrect (there is no pipeline type, only different pipeline parameters) and "A" and "B" is too generic. Think about the case where you have two plots, each of them containing data from unique pipelines, but Shape-Out displays them as "A" and "B". Then a user might assume that "A" is always "A" and "B" is always "B", and we are again at the apples-vs-oranges comparison. Use "Pipeline HASH". In cases where the first four characters of two different pipelines match, more characters should be appended to the displayed hash. |
|
Hi @paulmueller |
shapeout2/gui/pipeline_plot.py
Outdated
| def get_hash_flag(hash_set, rtdc_ds): | ||
| """Helper function to determine the hash flag based on the dataset and | ||
| hash set.""" | ||
| short_hash_set = set(h[:4] if h is not None else None for h in hash_set) |
There was a problem hiding this comment.
The hash length should be dynamic in all cases. I.e. if the first 4 characters of two hashes are identical, then the hash length should be 5, but if the first 5 characters are identical, then the hash length should be 6 etc. It is very unlikely to happen, but it can happen at some point.
There might be a smart way of achieving this with list comprehensions, but a simple for-loop over the length of the longest hash (incrementing req_hash_len and generating short_hash_set) with the list/set comprehension you proposed would be good enough.
BTW this is a good design, putting the logic of whether to show the text and what text to show in one single method 👍
|
Hi Paul, |
|
|
||
|
|
||
| def test_get_hash_flag(): | ||
| rtdc_paths = datapath.glob("*.rtdc") |
There was a problem hiding this comment.
Please add an assert rtdc_paths to make sure this test does not get skipped in case the data directory changes.
There was a problem hiding this comment.
This looks good 👍 . To make things air tight, please add this to the testing code:
The test for get_hash_flag is very generic and does not explicitly check some of the cases.
Please add two more tests with the corresponding .rtdc files:
hash_setonly containsNone->get_hash_flagreturnsNonehash_flagcontains at least on hash ->get_hash_flagreturns"Pipeline HASH".
These explicit tests will help avoid regressions in the code.
|
Hi @paulmueller |
DC-analysis#219)" This reverts commit ab8887d.


This PR aims to implement the feature mentioned in issue #217