Skip to content

Fixed Tiara tool error in certain cases#1595

Open
daduijker wants to merge 5 commits intobgruening:masterfrom
daduijker:tiara_fix_failed_missing_output
Open

Fixed Tiara tool error in certain cases#1595
daduijker wants to merge 5 commits intobgruening:masterfrom
daduijker:tiara_fix_failed_missing_output

Conversation

@daduijker
Copy link

Minor change to the Tiara tool to fix the situation explained below.

Tiara does not create an output file for a class it could not classify sequences for. However, in the command section we rename files using the 'mv' command based on the user-selected classes rather than the tool output. Hence when no sequences are classified as, for example, bacteria, the 'mv' command will still attempt to rename bac*.dat to bac.fasta and exit with error code 1.

I also took out the 'ls -l ./results/' line which seems to be a leftover from debugging.

Replace 'mv' with 'find' to rename output files in the ./results/ folder. This prevents the tool going into error state when trying to move an output that was not generated by Tiara (because it did not find any sequences for that domain.)
Drop the 'ls' command
@SaimMomin12
Copy link
Collaborator

@Minamehr Can you please help here?

Use two <discover_datasets> tags with separate output formats. Using more than 1 extension for 1 <discover_datasets> is not allowed
@Minamehr
Copy link
Contributor

Sure! The suggested update looks like a good fix. By replacing the mv command with find -exec mv, the issue where Tiara exits with an error when a classification output is missing is resolved. This ensures that if no sequences are classified into a specific group (e.g., eukaryotes), the process will continue without errors. The update makes sense and should help improve stability.

<outputs>
<collection name="output" type="list" label="${tool.name} on ${on_string}: classified sequences in txt and Fasta Output">
<discover_datasets pattern="__name_and_ext__" ext="fasta,txt" directory="results" />
<discover_datasets pattern="__name_and_ext__" ext="fasta" directory="results" />
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want mixed collections? Or should those be two collections, each for its own filetype?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The txt file just provides general information about the results, like sequence ID, first-stage classification, and second-stage classification. I think it should be fine to have them in the same collection.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then you can not resue the collection. You can not just feed them into a fasta-tool. You would need to filter this collection before.

Is that what you want?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more practical to have two separate collections with their own type.

<param name="taxonomy_filter" value="pla"/>
<output_collection name="output" type="list">
<element name="main_result" file="main_result01.txt" ftype="txt"/>
<output_collection name="output_fasta" type="list">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can count here the number of expected elements.

I think it will fail then, because you still collect all the elements.

<discover_datasets pattern="__name_and_ext__" ext="fasta" directory="results" />
</collection>
<collection name="output_txt" type="list" label="${tool.name} on ${on_string}: Classified sequences in txt">
<discover_datasets pattern="__name_and_ext__" ext="txt" directory="results" />
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__name_and_ext__ will collect all files and use for the name the filename and for the extension the .txt or .fasta. It will not filter the folder I think. You can test this in your tests.

I think what you need to do it to move all fasta files into a separate dir and just collect them, this is IMHO the easiest way as you already move the fasta file with your find step.

#for $tf in $taxonomy_filter
&& ls -l ./results/
&& mv ./results/${tf}*.dat ./results/${tf}.fasta
&& find ./results/ -name ${tf}*.dat -exec mv {} ./results/${tf}.fasta ';'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you maybe here move your fasta files into a separate folder?

#for $tf in $taxonomy_filter
&& ls -l ./results/
&& mv ./results/${tf}*.dat ./results/${tf}.fasta
&& find ./results/ -name ${tf}*.dat -exec mv {} ./results/${tf}.fasta ';'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Just a thought based on the earlier comment about organizing outputs — maybe it could be useful to replace:
mkdir -p ./results &&
with:
mkdir -p ./txt &&
mkdir -p ./fasta &&

Then update the rest of the paths accordingly. For example:
use -o ./txt/... instead of -o ./results/...

For moving fasta files:
find ./txt/ -name ${tf}*.dat -exec mv {} ./fasta/${tf}.fasta ';'

And in the outputs section :
set directory="fasta" for the output_fasta collection,
and directory="txt" for the output_txt collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants