-
Notifications
You must be signed in to change notification settings - Fork 489
Add deacon and deacon DM #7473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add deacon and deacon DM #7473
Conversation
|
Test will error since there is still missing the format for deacon which has an open PR: galaxyproject/galaxy#21289 |
data_managers/data_manager_deacon/data_manager/deacon_datamanager.xml
Outdated
Show resolved
Hide resolved
data_managers/data_manager_deacon/data_manager/deacon_datamanager.xml
Outdated
Show resolved
Hide resolved
| "path":"$link.strip('/')[-1]", | ||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| "dbkey":"@TOOL_VERSION@", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why abuse dbkey to store tool version?
Do you know which genome versions have been used to generate these human / mouse genomes? Then we could set the dbkey correctly.
Seems to include GRCh38.p14 resp. GRCm39 but if I get it right there is other data included. So maybe the dbkey is not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also implement deacon index build to build indices for data from all_fasta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why abuse dbkey to store tool version?
Do you know which genome versions have been used to generate these human / mouse genomes? Then we could set the dbkey correctly.
Seems to include GRCh38.p14 resp. GRCm39 but if I get it right there is other data included. So maybe the dbkey is not useful.
There are multiple source of data used for both prebuild index files so using them are not good in my eyes.
Why not also implement
deacon index buildto build indices for data fromall_fasta
I can implement this aswell yes. I didnt thought about it since it only thought about using url to dowload any index file yet good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still: Why abuse dbkey to store tool version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did seen some tools did this so i thought i just use it aswell but i can change it for the prebuild files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing the tool version is a really good idea. It just should be stored in a version column.
Either we use the dbkey column to store the information linking to the dbkeys provided by Galaxy, i.e. genome builds, or we remove the column (or leave it empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave them empty then i think this is easier but if you prefer i could change them to something with informations
tools/deacon/macros.xml
Outdated
| </xml> | ||
| <xml name="deacon_build"> | ||
| <param name="input" type="data" format="fastq,fastq.gz,fasta,fasta.gz" label="Input fasta/fastq file to build index file"/> | ||
| <param argument="-k" type="integer" value="31" label="Set kmer length" help="IMPORTANT: (kmer length + window size) - 1 has to be odd and smaller or equal to 96!"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min / max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again there so no min/max stated also they used a combination with k and w which has to be <= 96 and it has to be odd so therefore finding a good min/max is hard for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean min=0 seems a good start :) ?
has to be odd
Can you add a validator for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try but the problem is that (k+w)-1 has to be odd not k itself. This means both w and k always has to be odd or even both. Im not sure if you can validate this in the validator tag but i could check it via commandline and return the run with no output but the msg that this doesnt function with the set parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's fine. A validator is not an option in this case, they only have access to the value itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i can mark this as resolved? I also did add a min=1 to them
tools/deacon/macros.xml
Outdated
| </xml> | ||
| <xml name="deacon_union"> | ||
| <repeat name="deacon_repeat" min="2" title="Input" default="2"> | ||
| <conditional name="file_input"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just two input parameters with multiple="true", i.e. no repeat and no conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can change it it yes but the idea behind this was that the user can input from his own data and can add the DM data to it/remove it or whatever function he used where i did use the repeat block. So to have this option a repeat block is needed atleast for the DM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with repeats is that they are difficult to use in workflows since the number (and type?) of repeat units is fixed.
a repeat block is needed atleast for the DM
why? a select with multiple=true should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm yes this will work did not thought about it like this sorry!
tools/deacon/macros.xml
Outdated
| </conditional> | ||
| </repeat> | ||
| </xml> | ||
| <xml name="deacon_diff"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we need data from the data tables for these tools at all (diff, intersect, union). I see more use cases for user data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the DM since in the future depending on how updated the index files from dev/other sourcec are that more information can be added to it since it is about host removal and when there are dataset which are implemented via DM are there they should be used so the user can build their own dataset based on the infomration they need. This is better then have like 1000 of deacon index build to transform all human genome data into a index file and then add other genome data to it which make over 1000 runs of combined deacon index runs if you get what i mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the workflow? Did not get it yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i mean is this since when you are working with deacon you have to build the index files with all the information you want to filter out/not to filter out. This means you have to collect all the data in the begining, for example you want to build an index file for the human guts. This means you need to get all the data to build this index files which might take time. With a DM you can add this then and this can be used in a WF then without using the build index file always as an extra input. Also haveing the DM give the option to the user that in the futher new data can be add to the older index files which is less work for the user since he can use work from someone else to build his index file faster.
I hope this is a better explanation now! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Lets keep diff as it is. Here, even the conditional is fine, since the order of the two files matters (I guess).
For the other (index) tools my suggestion would still be to remove conditionals if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
es the order does matter for this. I will remove then and make them simpler! for diff should i also change this so that there are no repeat blocks or are they fine?
|
I will add the change later here since i have some testing to be done which i could done yet but some of reviews input was used to change the current state of the wrappers! |
|
@bernt-matthias i did change all and test all wrapper manually. When the format was added they all should be positiv! |
|
Deacon developer here. I'm unfamiliar with Galaxy development and how to test a PR like this in a running environment, but I'm happy to answer questions, review screenshots etc. |
| <conditional name="input"> | ||
| <param name="is_select" type="select" label="Choose how to add data to the DM"> | ||
| <option value="url">Copy an index file from an URL</option> | ||
| <option value="prebuild" selected="true">Download a pre-build file</option> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest referring to "prebuilt" indexes throughout rather than "prebuild" or "pre-build"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this then no worries!
I'm sure @SantaMcCloud can provide some screenshots of the tool if you like. |
|
Btw. wonderful the get feedback from the developer of the underlying software. This is super helpful and much appreciated. |
The way a PR is tested is that the
Yes this is really helpful thank you for this! |
| <param name="k" value="31"/> | ||
| <param name="w" value="15"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k+w is not odd here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation has to be odd: (k+w) -1. This means both k and w has to be odd or both has to be even otherwise the result of the calcualtion is not odd
tools/deacon/macros.xml
Outdated
| </conditional> | ||
| </repeat> | ||
| </xml> | ||
| <xml name="deacon_diff"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Lets keep diff as it is. Here, even the conditional is fine, since the order of the two files matters (I guess).
For the other (index) tools my suggestion would still be to remove conditionals if possible.
|
In case helpful, I've updated the usage section of the Deacon readme to cover set operations on indexes: https://github.com/bede/deacon/?tab=readme-ov-file#indexing. |
FOR CONTRIBUTOR: