-
Notifications
You must be signed in to change notification settings - Fork 489
New tool addition: amas tool #7443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
See updated changes from George at jchchiu#1 |
tools/amas/amas.xml
Outdated
| <param name="in_format" type="select" label="Format of the input file"> | ||
| <option value="fasta">fasta</option> | ||
| <option value="phylip">phylip</option> | ||
| <option value="phylip-int">phylip-int</option> | ||
| <option value="nexus">nexus(sequential)</option> | ||
| <option value="nexus-int">nexus(interleaved)</option> | ||
| </param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fasta phylip and nexus can be distinguishe automatically, e.g. $input_file.ext gives the Galaxy datatype. Is the info on interleaved/not needed? Can it be determined automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like they have a function that detects the format for interleaved automatically; instead it depends on the input you give it. Can galaxy automatically distinguish interleaved?
tools/amas/amas.xml
Outdated
| </collection> | ||
|
|
||
| <collection name="converted_alignments" type="list" label="Converted alignments"> | ||
| <discover_datasets directory="run_dir/convert" pattern="(?P<name>.+)-out\..+" format="data" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set the format instead if format="data"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a look at the amas_split.xml at L55; is this what you were thinking?
|
Hey @bernt-matthias, could you have a look at amas_concat.xml and see if this is on the right track? If so, I'll update the rest of the subcommands with your suggestions. |
|
I've been testing the split subcommand again and it seems like AMAS doesn't work when you use a RAxML or NEXUS formatted partitions file as an input. The regex operator only works for the unspecified partitions: I've updated the subcommand accordingly with some more info. |
…titions; removed with note and more info
tools/amas/check_interleaved.py
Outdated
| # NOTE: Do we need to check all files? | ||
| if all(interleaved_status): | ||
| return 0 # Exit code 0 = interleaved | ||
| else: | ||
| return 1 # Exit code 1 = sequential |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check that they all the same. Also I would just print the result and use non-zero exit code for the error case. Something like this:
| # NOTE: Do we need to check all files? | |
| if all(interleaved_status): | |
| return 0 # Exit code 0 = interleaved | |
| else: | |
| return 1 # Exit code 1 = sequential | |
| interleaved_status = list(set(interleaved_status)) | |
| if len(interleaved_status) > 1: | |
| raise Exception("mixed interleaved") | |
| print(interleaved_status[0]) |
Or make the script output args.format + "-int" or args.format. Then you can set a bash variable in the command block. IN_FORMAT = \$(python '$__tool_directory__/check_interleaved.py' ...)
bernt-matthias
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nearly there.
| <inputs> | ||
| <param name="input_files" type="data" format="fasta,phylip,nex" label="Sequences to concatenate" multiple="true" | ||
| help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." /> | ||
| <expand macro="input_format" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <expand macro="input_format" /> |
Analogous in all commands.
| <param name="input_files" value="inputs/concat_1.fasta,inputs/concat_2.fasta" /> | ||
| <param name="out_format" value="phylip" /> | ||
| <param name="part_format" value="nexus" /> | ||
| <param name="in_format" value="fasta" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also remove in_format from tests.
| --out-format $out_format | ||
| --in-files | ||
| @INPUT_FILENAMES@ | ||
| --in-format $in_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use \$IN_FORMAT
FOR CONTRIBUTOR:
Regarding issue #7442
To do:
For now, is it possible to review the code to see if it's on the right track, and if there are any better ways to structure it?