-
Notifications
You must be signed in to change notification settings - Fork 38
Add unrestricted_use_only
and surveillance_use_only
constructor params
#724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
unrestricted_use_only
and surveillance_use_only
constructor params
…_releases property.
Opening this for a WIP review, to potentially avoid going too far down the wrong path. @alimanfoo @cclarkson @ahernank @jonbrenas I'm trying to identify other functions that we need to filter results for, according to:
Those seem like the main ones, which some other functions use, but I want to make sure we plug all potential leaks. |
|
|
It looks like Ag3 currently has about 137 public methods... 🤔 |
It also looks like about 119 of Ag3's public methods cannot be called without specifying params (cannot rely on defaults), which makes the testing of these constructor params somewhat difficult to automate. This also smells a lot like a "god object" anti-pattern https://en.wikipedia.org/wiki/God_object We should probably consider re-organising all of those methods, despite the inconvenience, but that will probably have to wait. In the meantime, I hope to be able to figure out which functions are vulnerable to leaking unfiltered data, relating to either the surveillance-only or unrestricted-use-only flags. |
@ahernank @jonbrenas Checkmarks indicate whether the function gets its data from an upstream public function, or file, or param, or otherwise looks covered. Unchecked functions indicate some doubt and require further investigation, discussion or coding.
|
unrestricted_use_only
and surveillance_use_only
constructor paramsunrestricted_use_only
and surveillance_use_only
constructor params
Now investigating unexpected test failures after merging with master branch. Local pytest:
CI pytest:
|
Thanks @leehart. For IGV to work, it needed to have access to a 'public' version of the reference genomes, e.g., at |
At this point, I reckon we only need to talk about and resolve:
|
Submitting this for review while we figure out |
# Apply the sample_query or sample_indices, if specified.
if prepared_sample_query is not None:
# Assume a pandas query string.
sample_query_options = sample_query_options or {}
df_samples = df_samples.query(prepared_sample_query, **sample_query_options)
df_samples = df_samples.reset_index(drop=True)
# Determine which samples match the sample query.
if sample_query != "":
loc_samples = df_samples.eval(sample_query, **sample_query_options)
if prepared_sample_query is not None:
# Resolve query to a list of integers for more cache hits - we
# do this because there are different ways to write the same pandas
# query, and so it's better to evaluate the query and use a list of
# integer indices instead.
df_samples = self.sample_metadata(sample_sets=prepared_sample_sets)
sample_query_options = sample_query_options or {}
loc_samples = df_samples.eval(
prepared_sample_query, **sample_query_options
).values
sample_indices = np.nonzero(loc_samples)[0].tolist() Basically, it seems to be around using Pandas |
Since the
I'm tempted to force the use of the less efficient
|
Hi @jonbrenas @ahernank - I've tried to elucidate the problem with What is the
|
…le_indices. Add sample_query_options to biallelic_snps_to_plink.
Plan to amend the behaviour for the following functions, with regards to using the
|
Thanks @leehart. It is kind of wild (and also wrong) that we had three different potential behaviours for the interaction between As far as I am concerned, |
Thanks @jonbrenas. I think I understand some of your concerns, but I reckon I still need some clarification. I guess it might boil down to how the users are (or should be) determining the samples_df = ag3_default.sample_metadata(sample_sets=sample_set) Then it would seem natural to the user to expect the same behaviour for an object with samples_df = ag3_surveillance.sample_metadata(sample_sets=sample_set) In this context, the user is not explicitly passing any However, if the user is getting their As far as I can tell, these
Banning the use of I'm not sure I understand the solution regarding In any case, it would be good to have a clear understanding and position on this. Ideally, I think we want to preserve existing functionality and expectations, and minimise potential confusion and complexity, while keeping the user experience as unsurprising and intuitive as possible. I feel like I'm missing something or overlooking something important here! |
Thanks @leehart. Sorry, I had misinterpreted the way these parameters were going to be used and, as a result, my comment didn't make much sense. I thought that |
Ah, I see, thanks @jonbrenas . Yeah, these params are only going to be used (AFAIK) during object construction, e.g. |
OK, I think I've sorted out the
It seems good to at least have an internal method of getting an unfiltered list of samples though, e.g. via |
Investigating CI test failures, which didn't fail locally:
[ It looks like that was one of those transient random failures, due to simulated test data. Resolved by re-running. ] |
@jonbrenas Just to note that this code hasn't been reviewed or merged in yet, so work on the advanced course might need to be provisional or require changes if something here needs to change. But I expect the general behaviour will be the same. |
Thanks @leehart. Yes, the work on the advanced course is going to be provisional, it is also going to be mostly textual (i.e., explaining why some sample sets are restricted, why some samples might be biased in a way that means they are not to be used for surveillance, ...). I don't remember if we had agreed that we would meet for you to show what has changed how and why or if you thought that I could figure it out my own. If the latter, I'll get to reviewing this PR asap. |
Hi @jonbrenas . Maybe take a look and take some notes, and then we can meet to discuss, if you like. There's quite a bit, so we might need to do a few rounds. Or you might think it all looks OK! 🚀 |
@jonbrenas Let's try to meet to go through this PR in September. |
Re: issue #716