Skip to content

Conversation

@SantaMcCloud
Copy link
Contributor

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

@SantaMcCloud
Copy link
Contributor Author

Test will error since there is still missing the format for deacon which has an open PR: galaxyproject/galaxy#21289

@SantaMcCloud SantaMcCloud changed the title Add deacon Add deacon and deacon DM Nov 17, 2025
"path":"$link.strip('/')[-1]",
#end if
#if $input.is_select == "prebuild"
"dbkey":"@TOOL_VERSION@",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why abuse dbkey to store tool version?

Do you know which genome versions have been used to generate these human / mouse genomes? Then we could set the dbkey correctly.

Seems to include GRCh38.p14 resp. GRCm39 but if I get it right there is other data included. So maybe the dbkey is not useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also implement deacon index build to build indices for data from all_fasta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why abuse dbkey to store tool version?

Do you know which genome versions have been used to generate these human / mouse genomes? Then we could set the dbkey correctly.

Seems to include GRCh38.p14 resp. GRCm39 but if I get it right there is other data included. So maybe the dbkey is not useful.

There are multiple source of data used for both prebuild index files so using them are not good in my eyes.

Why not also implement deacon index build to build indices for data from all_fasta

I can implement this aswell yes. I didnt thought about it since it only thought about using url to dowload any index file yet good idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still: Why abuse dbkey to store tool version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did seen some tools did this so i thought i just use it aswell but i can change it for the prebuild files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the tool version is a really good idea. It just should be stored in a version column.

Either we use the dbkey column to store the information linking to the dbkeys provided by Galaxy, i.e. genome builds, or we remove the column (or leave it empty).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave them empty then i think this is easier but if you prefer i could change them to something with informations

</xml>
<xml name="deacon_build">
<param name="input" type="data" format="fastq,fastq.gz,fasta,fasta.gz" label="Input fasta/fastq file to build index file"/>
<param argument="-k" type="integer" value="31" label="Set kmer length" help="IMPORTANT: (kmer length + window size) - 1 has to be odd and smaller or equal to 96!"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min / max

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again there so no min/max stated also they used a combination with k and w which has to be <= 96 and it has to be odd so therefore finding a good min/max is hard for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean min=0 seems a good start :) ?

has to be odd

Can you add a validator for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try but the problem is that (k+w)-1 has to be odd not k itself. This means both w and k always has to be odd or even both. Im not sure if you can validate this in the validator tag but i could check it via commandline and return the run with no output but the msg that this doesnt function with the set parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it's fine. A validator is not an option in this case, they only have access to the value itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i can mark this as resolved? I also did add a min=1 to them

</xml>
<xml name="deacon_union">
<repeat name="deacon_repeat" min="2" title="Input" default="2">
<conditional name="file_input">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just two input parameters with multiple="true", i.e. no repeat and no conditional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can change it it yes but the idea behind this was that the user can input from his own data and can add the DM data to it/remove it or whatever function he used where i did use the repeat block. So to have this option a repeat block is needed atleast for the DM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with repeats is that they are difficult to use in workflows since the number (and type?) of repeat units is fixed.

a repeat block is needed atleast for the DM

why? a select with multiple=true should work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm yes this will work did not thought about it like this sorry!

</conditional>
</repeat>
</xml>
<xml name="deacon_diff">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we need data from the data tables for these tools at all (diff, intersect, union). I see more use cases for user data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the DM since in the future depending on how updated the index files from dev/other sourcec are that more information can be added to it since it is about host removal and when there are dataset which are implemented via DM are there they should be used so the user can build their own dataset based on the infomration they need. This is better then have like 1000 of deacon index build to transform all human genome data into a index file and then add other genome data to it which make over 1000 runs of combined deacon index runs if you get what i mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the workflow? Did not get it yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i mean is this since when you are working with deacon you have to build the index files with all the information you want to filter out/not to filter out. This means you have to collect all the data in the begining, for example you want to build an index file for the human guts. This means you need to get all the data to build this index files which might take time. With a DM you can add this then and this can be used in a WF then without using the build index file always as an extra input. Also haveing the DM give the option to the user that in the futher new data can be add to the older index files which is less work for the user since he can use work from someone else to build his index file faster.

I hope this is a better explanation now! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Lets keep diff as it is. Here, even the conditional is fine, since the order of the two files matters (I guess).

For the other (index) tools my suggestion would still be to remove conditionals if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

es the order does matter for this. I will remove then and make them simpler! for diff should i also change this so that there are no repeat blocks or are they fine?

@SantaMcCloud
Copy link
Contributor Author

I will add the change later here since i have some testing to be done which i could done yet but some of reviews input was used to change the current state of the wrappers!

@SantaMcCloud
Copy link
Contributor Author

@bernt-matthias i did change all and test all wrapper manually. When the format was added they all should be positiv!

@bede
Copy link

bede commented Nov 21, 2025

Deacon developer here. I'm unfamiliar with Galaxy development and how to test a PR like this in a running environment, but I'm happy to answer questions, review screenshots etc.

<conditional name="input">
<param name="is_select" type="select" label="Choose how to add data to the DM">
<option value="url">Copy an index file from an URL</option>
<option value="prebuild" selected="true">Download a pre-build file</option>
Copy link

@bede bede Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest referring to "prebuilt" indexes throughout rather than "prebuild" or "pre-build"

Copy link
Contributor Author

@SantaMcCloud SantaMcCloud Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this then no worries!

@bernt-matthias
Copy link
Contributor

Deacon developer here. I'm unfamiliar with Galaxy development and how to test a PR like this in a running environment, but I'm happy to answer questions, review screenshots etc.

I'm sure @SantaMcCloud can provide some screenshots of the tool if you like.

@bernt-matthias
Copy link
Contributor

Btw. wonderful the get feedback from the developer of the underlying software. This is super helpful and much appreciated.

@SantaMcCloud
Copy link
Contributor Author

SantaMcCloud commented Nov 21, 2025

Deacon developer here. I'm unfamiliar with Galaxy development and how to test a PR like this in a running environment, but I'm happy to answer questions, review screenshots etc.

The way a PR is tested is that the xml file has a test section where a test is written. This means the paramter which can be used are set therer together with how many outputs are generated and then this output can be check with different ways like size compare or if certain text is in the file for example. The Problem now is that in GitHub a Galaxy instance will be simulated and the each test will be run via commandline. Now the problem is that my test doesnt work is that Galaxy has datatype which are allowed to be used and i did use a datatype which is not yet implemented in Galaxy so Galaxy blocking the files which are set to this datatype to be used which will end up in a errored Test. I did test all test localy where you can add datatype without an problme and run the same simulation as GitHub does there all test where fine but when there is something where i have some troubles i will come back to this offer thank you! Also if you want too know more feel free to ask i can try to answer them or can provide any data/screenshots from implemente you tool to galaxy! :)

Btw. wonderful the get feedback from the developer of the underlying software. This is super helpful and much appreciated.

Yes this is really helpful thank you for this!

Comment on lines +124 to +125
<param name="k" value="31"/>
<param name="w" value="15"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k+w is not odd here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation has to be odd: (k+w) -1. This means both k and w has to be odd or both has to be even otherwise the result of the calcualtion is not odd

</conditional>
</repeat>
</xml>
<xml name="deacon_diff">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Lets keep diff as it is. Here, even the conditional is fine, since the order of the two files matters (I guess).

For the other (index) tools my suggestion would still be to remove conditionals if possible.

@bede
Copy link

bede commented Nov 24, 2025

In case helpful, I've updated the usage section of the Deacon readme to cover set operations on indexes: https://github.com/bede/deacon/?tab=readme-ov-file#indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants