performance improvements for spot location refinement code#96
performance improvements for spot location refinement code#96ellenemerson merged 2 commits intomasterfrom
Conversation
…ome packages in requirements.txt
rossbar
left a comment
There was a problem hiding this comment.
These seem like very nice improvements @ellenemerson ! These operations look equivalent to me, but of course it'd be a good idea to test on a pipeline with real data to ensure you get the same result!
I left one comment re: defaultdict, which is really just a "hey, check out this cool thing" comment and not at all important to the changes at hand.
Re: dependencies - that seems fine to me; the dependency situation is only going to get worse and worse as time goes on so finding the set of pins that keeps things working for you locally is 👍 for me. Fixing the problem outright is a bigger project!
| # Pre-create a dictionary for fast lookups | ||
| lookup_dict = {} | ||
| for idx, row in df_results.iterrows(): | ||
| key = (row['batch_id'], row['x'], row['y']) | ||
| if key not in lookup_dict: | ||
| lookup_dict[key] = [] | ||
| lookup_dict[key].append(idx) |
There was a problem hiding this comment.
This shouldn't affect performance at all, but one cool utility that I'm constantly using for situations like this is defaultdict. It essentially allows you to skip the if not in lookup as the default value for missing keys can be made an empty list (or set, or other container, etc.). So this could be something like (moving the import to a more appropriate location):
| # Pre-create a dictionary for fast lookups | |
| lookup_dict = {} | |
| for idx, row in df_results.iterrows(): | |
| key = (row['batch_id'], row['x'], row['y']) | |
| if key not in lookup_dict: | |
| lookup_dict[key] = [] | |
| lookup_dict[key].append(idx) | |
| # Pre-create a dictionary for fast lookups | |
| from collections import defaultdict | |
| lookup_dict = defaultdict(list) | |
| for idx, row in df_results.iterrows(): | |
| lookup_dict[(row['batch_id'], row['x'], row['y'])].append(idx) |
also pinning some packages in requirements.txt to be able to get it to install locally - let me know if you think any of these are too restrictive!