Fixed Tiara tool error in certain cases#1595
Fixed Tiara tool error in certain cases#1595daduijker wants to merge 5 commits intobgruening:masterfrom
Conversation
Replace 'mv' with 'find' to rename output files in the ./results/ folder. This prevents the tool going into error state when trying to move an output that was not generated by Tiara (because it did not find any sequences for that domain.)
Drop the 'ls' command
|
@Minamehr Can you please help here? |
Use two <discover_datasets> tags with separate output formats. Using more than 1 extension for 1 <discover_datasets> is not allowed
|
Sure! The suggested update looks like a good fix. By replacing the mv command with find -exec mv, the issue where Tiara exits with an error when a classification output is missing is resolved. This ensures that if no sequences are classified into a specific group (e.g., eukaryotes), the process will continue without errors. The update makes sense and should help improve stability. |
| <outputs> | ||
| <collection name="output" type="list" label="${tool.name} on ${on_string}: classified sequences in txt and Fasta Output"> | ||
| <discover_datasets pattern="__name_and_ext__" ext="fasta,txt" directory="results" /> | ||
| <discover_datasets pattern="__name_and_ext__" ext="fasta" directory="results" /> |
There was a problem hiding this comment.
Do we really want mixed collections? Or should those be two collections, each for its own filetype?
There was a problem hiding this comment.
The txt file just provides general information about the results, like sequence ID, first-stage classification, and second-stage classification. I think it should be fine to have them in the same collection.
There was a problem hiding this comment.
But then you can not resue the collection. You can not just feed them into a fasta-tool. You would need to filter this collection before.
Is that what you want?
There was a problem hiding this comment.
I think it would be more practical to have two separate collections with their own type.
| <param name="taxonomy_filter" value="pla"/> | ||
| <output_collection name="output" type="list"> | ||
| <element name="main_result" file="main_result01.txt" ftype="txt"/> | ||
| <output_collection name="output_fasta" type="list"> |
There was a problem hiding this comment.
you can count here the number of expected elements.
I think it will fail then, because you still collect all the elements.
| <discover_datasets pattern="__name_and_ext__" ext="fasta" directory="results" /> | ||
| </collection> | ||
| <collection name="output_txt" type="list" label="${tool.name} on ${on_string}: Classified sequences in txt"> | ||
| <discover_datasets pattern="__name_and_ext__" ext="txt" directory="results" /> |
There was a problem hiding this comment.
__name_and_ext__ will collect all files and use for the name the filename and for the extension the .txt or .fasta. It will not filter the folder I think. You can test this in your tests.
I think what you need to do it to move all fasta files into a separate dir and just collect them, this is IMHO the easiest way as you already move the fasta file with your find step.
| #for $tf in $taxonomy_filter | ||
| && ls -l ./results/ | ||
| && mv ./results/${tf}*.dat ./results/${tf}.fasta | ||
| && find ./results/ -name ${tf}*.dat -exec mv {} ./results/${tf}.fasta ';' |
There was a problem hiding this comment.
can you maybe here move your fasta files into a separate folder?
| #for $tf in $taxonomy_filter | ||
| && ls -l ./results/ | ||
| && mv ./results/${tf}*.dat ./results/${tf}.fasta | ||
| && find ./results/ -name ${tf}*.dat -exec mv {} ./results/${tf}.fasta ';' |
There was a problem hiding this comment.
Hi! Just a thought based on the earlier comment about organizing outputs — maybe it could be useful to replace:
mkdir -p ./results &&
with:
mkdir -p ./txt &&
mkdir -p ./fasta &&
Then update the rest of the paths accordingly. For example:
use -o ./txt/... instead of -o ./results/...
For moving fasta files:
find ./txt/ -name ${tf}*.dat -exec mv {} ./fasta/${tf}.fasta ';'
And in the outputs section :
set directory="fasta" for the output_fasta collection,
and directory="txt" for the output_txt collection.
Minor change to the Tiara tool to fix the situation explained below.
Tiara does not create an output file for a class it could not classify sequences for. However, in the command section we rename files using the 'mv' command based on the user-selected classes rather than the tool output. Hence when no sequences are classified as, for example, bacteria, the 'mv' command will still attempt to rename bac*.dat to bac.fasta and exit with error code 1.
I also took out the 'ls -l ./results/' line which seems to be a leftover from debugging.