Skip to content
Open
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
917c7aa
feat: add amas and macros
jchchiu Nov 6, 2025
bb629f6
test: add simple working test case
jchchiu Nov 6, 2025
be4db1f
fix: change category to Multiple Alignments
jchchiu Nov 6, 2025
932783d
fix: change category to Sequence Analysis
jchchiu Nov 6, 2025
793a6b1
update from george
jchchiu Nov 7, 2025
af75ec9
update from george; add tests
jchchiu Nov 7, 2025
a45c8b5
update from george; add info.xml
jchchiu Nov 7, 2025
e816d9c
fix lint
jchchiu Nov 7, 2025
9967a62
add split test; update .shed; add comment to xml command
jchchiu Nov 7, 2025
8e937d7
update .shed owners
jchchiu Nov 7, 2025
a6ff62e
remove translate
jchchiu Nov 7, 2025
c354605
docs: update .shed
jchchiu Nov 11, 2025
a4fc62f
refactor: split concat into separate tool
jchchiu Nov 11, 2025
6a56045
refactor: add input and output format as shared macro
jchchiu Nov 11, 2025
426a577
refactor: add macro for changing output format
jchchiu Nov 11, 2025
c757008
refactor: move info to macros
jchchiu Nov 11, 2025
1509d85
refactor: change tool id/name; remove info macro
jchchiu Nov 11, 2025
6872743
docs: update categories; reduce actions
jchchiu Nov 11, 2025
c77e246
refactor: rename output format
jchchiu Nov 11, 2025
582d254
refactor: move 'split' subcommand into separate tool
jchchiu Nov 11, 2025
bc9bebd
refactor: change output pattern
jchchiu Nov 11, 2025
dc15ac1
refactor: move 'replicate' subcommand into separate tool
jchchiu Nov 11, 2025
a279552
docs: add more help to explain what partitions are
jchchiu Nov 11, 2025
1d901f5
refactor: move 'summary' subcommand into separate tool
jchchiu Nov 12, 2025
77241c3
temp: move 'remove' subcommand into separate tool
jchchiu Nov 12, 2025
17c02f2
fix: change version to correct token
jchchiu Nov 12, 2025
91c5fe3
refactor: remove redundant xmls
jchchiu Nov 12, 2025
62a9bce
refactor: remove/add reused/redundant macros
jchchiu Nov 12, 2025
a12ab96
docs: update help/documentation
jchchiu Nov 12, 2025
f071907
docs: update help
jchchiu Nov 12, 2025
0bb5c40
test: remove tests no longer needed
jchchiu Nov 12, 2025
653c992
refactor: change 'remove' repeat to text + regex validator
jchchiu Nov 12, 2025
178d5cc
fix: fix misplaced end param tag
jchchiu Nov 12, 2025
81cbb66
docs: updated help for 'remove'
jchchiu Nov 12, 2025
8f32f1d
docs: update help info
jchchiu Nov 12, 2025
46502d3
refactor: add profile token to macro; replace in subcommands
jchchiu Nov 12, 2025
587bb5d
refactor: change param 'name' to 'argument' for 'boolean'
jchchiu Nov 12, 2025
6fc566f
docs: rename output label so that it is more user friendly
jchchiu Nov 12, 2025
5809b0a
Revert "docs: rename output label so that it is more user friendly"
jchchiu Nov 12, 2025
e2de21b
docs: rename output label so that it is more user friendly
jchchiu Nov 12, 2025
b3c6135
docs: add auto_tool_repositories and suite to shed.yml
jchchiu Nov 17, 2025
4b43895
refactor: run everything in ./; added ftype to tests
jchchiu Nov 17, 2025
f6a85d5
refactor: changed check_align and data_type to macro
jchchiu Nov 17, 2025
08bd74d
refactor: moved shared commands to macro tokens
jchchiu Nov 17, 2025
b19d9b7
refactor/docs: moved shared help to macro token
jchchiu Nov 17, 2025
b61f0ff
refactor: added ${tool.name} on ${on_string} to output labels
jchchiu Nov 17, 2025
846b254
docs: updated file format formatting to be more consistent
jchchiu Nov 17, 2025
eec0620
style: removed single quotes
jchchiu Nov 17, 2025
8364cfe
docs: updated docs to include info on sequential vs interleaved; fixe…
jchchiu Nov 17, 2025
834f114
docs: moved partitions help to macro token
jchchiu Nov 17, 2025
2d2349b
refactor: set format depending on part_format
jchchiu Nov 19, 2025
0e62561
style: changed formatting of output files
jchchiu Nov 19, 2025
4af9562
fix: updated version command
jchchiu Nov 19, 2025
cfcfca9
tests: changed concat test from sim size to exact
jchchiu Nov 19, 2025
d4b84ac
refactor: simplified change_format
jchchiu Nov 19, 2025
51bb36e
fix: updated/fixed concat test
jchchiu Nov 19, 2025
ff762fb
fix: added nex format to allowed inputs for partitions
jchchiu Nov 19, 2025
3d9424b
docs: updated help
jchchiu Nov 19, 2025
18a8396
style: fix lint
jchchiu Nov 19, 2025
bd9a818
fix: split subcommand does not work with RAxML or NEXUS formatted par…
jchchiu Nov 19, 2025
0aae4cb
docs: added some comments for future
jchchiu Nov 20, 2025
96395ca
style: cleaned up indenting
jchchiu Nov 20, 2025
0318c99
draft: added small helper script to check interleave
jchchiu Nov 27, 2025
6d18f6b
draft: cleaner but less informative
jchchiu Nov 27, 2025
7b84c28
draft: removed interleave from input formats
jchchiu Nov 27, 2025
71a5be4
draft: changed python script to iterate line by line instead of loadi…
jchchiu Nov 28, 2025
e40a22e
draft: added test data for nexus interleave check
jchchiu Nov 28, 2025
ae8d825
draft: refactor to make more clean/efficient
jchchiu Nov 28, 2025
2ea8f01
draft: fix python flake8 lint
jchchiu Nov 28, 2025
9db5d83
draft: fix python flake8 w504 lint
jchchiu Nov 28, 2025
ce8812f
draft: removed io usage and added utf-8 encoding
jchchiu Nov 28, 2025
db929eb
feat: added check for interleaved files
jchchiu Dec 1, 2025
3a108f1
feat: added interleaved check to all subcommands
jchchiu Dec 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions tools/amas/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
categories:
- Phylogenetics
- Sequence Analysis
- Statistics
description: AMAS high-throughput alignment manipulation and summaries for phylogenomics
homepage_url: https://github.com/marekborowiec/AMAS
long_description: Handle expansive phylogenomic data sets by concatenating, removing,
replicating, splitting, and summarising large nucleotide or amino acid alignments.
name: amas
owner: iuc
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/amas
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for amas functions: {{ tool_name }}."
suite:
name: "suite_amas"
description: "A suite of tools that brings the amas project into Galaxy."
long_description: Handle expansive phylogenomic data sets by concatenating, removing,
replicating, splitting, and summarising large nucleotide or amino acid alignments.
117 changes: 117 additions & 0 deletions tools/amas/amas_concat.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
<tool id="amas_concat" name="AMAS concat" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>concatenate multiple alignments</description>

<macros>
<import>macros.xml</import>
</macros>

<xrefs>
<xref type="bio.tools">amas</xref>
</xrefs>

<expand macro="requirements" />
<expand macro="version_command" />

<command detect_errors="exit_code"><![CDATA[
#import re
set -eu;

@SYMLINK_INPUTS@

python -m amas.AMAS
concat
--concat-part partitions.txt
--concat-out concatenated.out
--part-format $part_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can determine the input format from $input_files.ext.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also relevant to #7443 (comment)

The problem I have with this is that if it is a nexus or phylip file, their extension doesn't always tell whether it is an interleaved or sequential format. Even if you sniff it as an interleaved does $input_files.ext return phlyip-int or something like that which differentiates it from normal phylip? Otherwise I'm pretty sure amas needs the user to explicitly set the file format as an input.

What are you thoughts on only taking non-interleaved formats, and give a warning to the user that it will not accept interleaved in the help or something? Following this also removing the option to output it as an interleaved file? Problem I see with this is that they can still upload an interleaved file since they have the same extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to leave it for now work on an extension of the datatypes. Do you think the inserleaved / sequential datatypes are well defined? Are you interested in working on this. I could try to give you some pointers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could also implement a small helper script that checks if the data is interleaved. Seems rather trivial.
For the output my suggestion would be a boolean interleaved: yes/no. Plus a select: as input / phylip / nexus / fasta?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I could give the extension of the datatypes a shot.

For phylip

Since the header is a required field and gives the number of taxa, how about getting the name of the first taxa on the line following the header, then checking the line line 1 + number of taxa + 1 (empty line if sequences are interleaved)

e.g.

4 50
Taxon1    ACGTACGTACGTACGTACGT
Taxon2    ACGTACGTACGTACGTACGT
Taxon3    ACGTACGTACGTACGTACGT
Taxon4    ACGTACGTACGTACGTACGT

Taxon1    ACGTACGTACGTACGTACGT
Taxon2    ACGTACGTACGTACGTACGT
...

Then we would get the number of taxa at line 0 (4), 'Taxon1 at line 1, then have a try to check whether line (1 + 4 + 1) = 6 also contains Taxon1.

Actually could just get the length of the sequence length at line 1, and if it is less than the sequence length given (50), then classify it as interleaved? Or maybe a mix of both?

For nexus

Like you said it'll be more complicated; I'm not sure whether the header requires the number of taxa. I'll need to ask someone else to confirm or maybe you know someone else, but for a start given these examples:

#NEXUS
BEGIN DATA;
        Dimensions NTax=10 NChar=705;
        Format DataType=DNA Interleave=yes Gap=- Missing=?;
        Matrix
Cow     ATGGC ATATC CCATA CAACT AGGAT TCCAA GATGC AACAT CACCA ATCAT AGAAG AACTA
Carp    ATGGCACACCCAACGCAACTAGGTTTCAAGGACGCGGCCATACCCGTTATAGAGGAACTT
Chicken ATGGCCAACCACTCCCAACTAGGCTTTCAAGACGCCTCATCCCCCATCATAGAAGAGCTC
Human   ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCATAGAAGAGCTT
Loach   ATGGCACATCCCACACAATTAGGATTCCAAGACGCGGCCTCACCCGTAATAGAAGAACTT
Mouse   ATGGCCTACCCATTCCAACTTGGTCTACAAGACGCCACATCCCCTATTATAGAAGAGCTA
Rat     ATGGCTTACCCATTTCAACTTGGCTTACAAGACGCTACATCACCTATCATAGAAGAACTT
Seal    ATGGCATACCCCCTACAAATAGGCCTACAAGATGCAACCTCTCCCATTATAGAGGAGTTA
Whale   ATGGCATATCCATTCCAACTAGGTTTCCAAGATGCAGCATCACCCATCATAGAAGAGCTC
Frog    ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTA

Cow     CTTCACTTTCATGACCACACGCTAATAATTGTCTTCTTAATTAGCTCATTAGTACTTTAC
...

or

#NEXUS 

Begin data;
Dimensions ntax=10 nchar=234;
Format datatype=protein gap=- interleave;
Matrix
Cow     MAYPMQLGFQDATSPIMEELLHFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Carp    MAHPTQLGFKDAAMPVMEELLHFHDHALMIVLLISTLVLYIITAMVSTKLTNKYILDSQE
Chicken MANHSQLGFQDASSPIMEELVEFHDHALMVALAICSLVLYLLTLMLMEKLS-SNTVDAQE
Human   MAHAAQVGLQDATSPIMEELITFHDHALMIIFLICFLVLYALFLTLTTKLTNTNISDAQE
Loach   MAHPTQLGFQDAASPVMEELLHFHDHALMIVFLISALVLYVIITTVSTKLTNMYILDSQE
Mouse   MAYPFQLGLQDATSPIMEELMNFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Rat     MAYPFQLGLQDATSPIMEELTNFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Seal    MAYPLQMGLQDATSPIMEELLHFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Whale   MAYPFQLGFQDAASPIMEELLHFHDHTLMIVFLISSLVLYIITLMLTTKLTHTSTMDAQE
Frog    MAHPSQLGFQDAASPIMEELLHFHDHTLMAVFLISTLVLYIITIMMTTKLTNTNLMDAQE

Cow     VETIWTILPAIILILIALPSLRILYMMDEINNPSLTVKTMGHQWYWSYEYTDYEDLSFDS
...

If ntax and nchar are required in Dimensions then we could do something similar to the phylip implementation. Otherwise could maybe use the flags Interleave=yes or just interleave but this flag seems more optional/not required.

Let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples for phylip at wikipedia https://en.wikipedia.org/wiki/PHYLIP also seem to allow interleaved format where the 2nd block does not list the species names again.

So I guess the best criterion is the number of non-header lines. If there is more lines then taxa it should be interleaved.

For nexus I would trust Interleave if its there and check like for phylip otherwise.

Copy link
Author

@jchchiu jchchiu Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've drafted a small helper function in this commit. There is also the next commit that makes it cleaner but less informative. Could you have a look and give any feedback any better ways to structure it?

Regarding extension of the datatypes, could you give some pointers on how it would be best structured?

  • For phylip would making two subclasses phylipint and phylipseq which inherit the phylip class be a good idea? Or have the phylip class default to sequential and have another class for phylipint?
  • Nexus might be a bit too complicated like you said since it can also contain other info; could maybe make subclasses called nexusseq nexusint with code similar to the helper script? If it is a DATA or CHARACTERS block it is most likely a sequencing file, but what if it also contains other information blocks such as TREE? default to the Nexus class?
  • I've had a bit of read through the docs, but I'm still a bit confused. If we implement a sniffer for phylipint and phylipseq which are below the phylip sniffer, will it be sniffed as a phylip only? For the subclasses, can a interleaved phylip be both a phylip and phylipint?
  • Would it be better to implement it by adding a interleaved flag to the metadata; can we then register a phylipint based on metadata?

Copy link
Author

@jchchiu jchchiu Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the helper script so that the whole file isn't loaded in and instead is iterated through line by line and exited when the condition is met.

One question, is there any need to check all files are all interleaved/sequential? Should this check be moved somewhere else, and for this script we just assume format based on the first file to make it faster?

EDIT: you can see the latest script here and the macro here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, is there any need to check all files are all interleaved/sequential? Should this check be moved somewhere else, and for this script we just assume format based on the first file to make it faster?

I guess the we can check all files.

--out-format $out_format
--in-files
@INPUT_FILENAMES@
--in-format $in_format
--data-type $data_type
--cores "\${GALAXY_SLOTS:-1}"
$check_align
]]></command>

<inputs>
<param name="input_files" type="data" format="fasta,phylip,nex" label="Sequences to concatenate" multiple="true"
help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." />
<expand macro="input_format" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<expand macro="input_format" />

Analogous in all commands.

<expand macro="output_format" label="Select output format for concatenated alignment" />
<param name="part_format" type="select" label="Format of the partitions file"
help="A file defining how the concatenated alignment is split into separate gene/locus regions. Each line specifies a partition name and its position range (e.g., 'gene1 = 1-500' or 'DNA, gene1 = 1-500' for RAxML format).">
<option value="nexus">nexus</option>
<option value="raxml">raxml</option>
<option value="unspecified" selected="true">unspecified</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the unspecified case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just has the genes and their start and end; should I add more context or just direct them to the help section?

  • Unspecified: gene1 = 1-500
  • RAxML: DNA, gene1 = 1-500
  • NEXUS:
#NEXUS

Begin sets;
    charset gene1 = 1-500;
End;

</param>
<expand macro="data_type" />
<expand macro="check_align" />
</inputs>

<outputs>
<data name="output" from_work_dir="concatenated.out" format="txt" label="${tool.name} on ${on_string} (Concatenated alignment)">
<change_format>
<when input="out_format" value="fasta" format="fasta" />
<when input="out_format" value="phylip" format="phylip" />
<when input="out_format" value="phylip-int" format="phylip" />
<when input="out_format" value="nexus" format="nex" />
<when input="out_format" value="nexus-int" format="nex" />
</change_format>
</data>
<data name="partitions_out" from_work_dir="partitions.txt" format="txt" label="${tool.name} on ${on_string} (Partition file)" />
</outputs>

<tests>
<test expect_num_outputs="2">
<param name="input_files" value="inputs/concat_1.fasta,inputs/concat_2.fasta" />
<param name="out_format" value="phylip" />
<param name="part_format" value="nexus" />
<param name="in_format" value="fasta" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove in_format from tests.

<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output name="output" file="outputs/expected_concat.phylip" ftype="phylip" compare="sim_size" />
<output name="partitions_out" file="outputs/expected_partitions.txt" ftype="txt" />
</test>
<test expect_num_outputs="2">
<param name="input_files" value="inputs/concat_1.fasta,inputs/concat_2.fasta" />
<param name="out_format" value="fasta" />
<param name="part_format" value="raxml" />
<param name="in_format" value="fasta" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output name="output" file="outputs/expected_concat_fasta.fas" ftype="fasta" compare="sim_size" />
<output name="partitions_out" file="outputs/expected_partitions_raxml.txt" ftype="txt" />
</test>
</tests>

<help><![CDATA[
**What it does**

AMAS Concat combines multiple sequence alignments into a single concatenated alignment, commonly used in phylogenomic analyses.

**Inputs**

- **Multiple alignment files**: Select 2 or more pre-aligned sequence files (FASTA, PHYLIP, or NEXUS format)
- **Input format**: Specify the format of your input files
- **Data type**: Choose DNA for nucleotide sequences or Protein for amino acid sequences
- **Output format**: Select the desired format for the concatenated alignment

**Outputs**

1. **Concatenated alignment**: A single file containing all input alignments joined end-to-end
2. **Partitions file**: Defines the boundaries of each original alignment within the concatenated file

@PARTITIONS_HELP@

**Use cases**

- **Multi-locus phylogenomics**: Combine hundreds of genes for species tree inference
- **Partitioned phylogenetic analysis**: Apply different evolutionary models to different genes using tools like RAxML or IQ-TREE
- **Supermatrix construction**: Create dataset for concatenation-based phylogenetic methods
- **Increased phylogenetic signal**: Leverage information from multiple loci to resolve difficult nodes
- **Comparative analyses**: Prepare datasets for testing hypotheses across multiple genomic regions

@AMAS_SHARED_HELP@
]]></help>

<expand macro="citations" />
</tool>
100 changes: 100 additions & 0 deletions tools/amas/amas_remove.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<tool id="amas_remove" name="AMAS remove" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>remove taxa from multiple alignments</description>

<macros>
<import>macros.xml</import>
</macros>

<xrefs>
<xref type="bio.tools">amas</xref>
</xrefs>

<expand macro="requirements" />
<expand macro="version_command" />

<command detect_errors="exit_code"><![CDATA[
#import re
set -eu;

@SYMLINK_INPUTS@

python -m amas.AMAS
remove
--taxa-to-remove
#for $taxon in $taxa_to_remove.split()
'$taxon'
#end for
--out-format $out_format
--in-files
@INPUT_FILENAMES@
--in-format $in_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use \$IN_FORMAT

--data-type $data_type
--cores "\${GALAXY_SLOTS:-1}"
$check_align
]]></command>

<inputs>
<param name="input_files" type="data" format="fasta,phylip,nex" label="Sequence(s) to remove taxa" multiple="true"
help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." />
<expand macro="input_format" />
<expand macro="output_format" label="Select output format for alignment(s) with taxa removed"/>
<param name="taxa_to_remove" type="text" label="Taxa to remove"
help="Space-separated list of taxon names to remove (e.g., 'OTU9 OTU10 Sample_A'). Note: AMAS converts spaces to underscores and strips quotes from sequence names, so use 'Species_1' to remove a taxon named 'Species 1'.">
<validator type="regex" message="Please provide at least one taxon name (alphanumeric, underscores, hyphens, and dots allowed)">[A-Za-z0-9_.\-]+(\s+[A-Za-z0-9_.\-]+)*</validator>
</param>
<expand macro="data_type" />
<expand macro="check_align" />
</inputs>

<outputs>
<expand macro="collection_outputs" name="reduced_alignments" />
</outputs>

<tests>
<test expect_num_outputs="1">
<param name="input_files" value="inputs/remove_input.nex" />
<param name="taxa_to_remove" value="OTU9 OTU10" />
<param name="out_format" value="nexus-int" />
<param name="in_format" value="nexus" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output_collection name="reduced_alignments_nexus" type="list">
<element name="reduced_remove_input.nex-out.int-nex" file="outputs/expected_remove_filtered.int-nex" ftype="nex" />
</output_collection>
</test>
</tests>

<help><![CDATA[
**What it does**

AMAS Remove excludes specified taxa (sequences) from one or more alignments. This is useful for removing problematic sequences, outgroups, or creating taxon subsets for comparative analyses.

**Inputs**

- **Alignment files**: One or more pre-aligned sequence files (FASTA, PHYLIP, or NEXUS format)
- **Taxa to remove**: Space-separated list of sequence names to exclude (e.g., 'OTU9 OTU10 Sample_A')

**Important**: AMAS converts spaces to underscores and strips quotes from sequence names during processing. If your input file contains a taxon named 'Species 1' or '"Species 1"', you must specify it as 'Species_1' in the taxa to remove list.

- **Input format**: Specify the format of your input files
- **Data type**: Choose DNA for nucleotide sequences or Protein for amino acid sequences
- **Output format**: Select the desired format for the reduced alignments

**Outputs**

A collection of alignment files with specified taxa removed. Each output file contains the same alignment as the input, minus the excluded sequences.

**Tip:** You may want to realign your files after taxon removal.

**Use cases**

- Remove sequences with excessive missing data
- Exclude contaminated or mis-identified samples
- Create taxon subsets for sensitivity analyses
- Remove outgroups after tree rooting

@AMAS_SHARED_HELP@
]]></help>

<expand macro="citations" />
</tool>
96 changes: 96 additions & 0 deletions tools/amas/amas_replicate.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<tool id="amas_replicate" name="AMAS replicate" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>replicate multiple alignments</description>

<macros>
<import>macros.xml</import>
</macros>

<xrefs>
<xref type="bio.tools">amas</xref>
</xrefs>

<expand macro="requirements" />
<expand macro="version_command" />

<command detect_errors="exit_code"><![CDATA[
#import re
set -eu;

@SYMLINK_INPUTS@

python -m amas.AMAS
replicate
--rep-aln $replicate_replicates $replicate_loci
--out-format $out_format
--in-files
@INPUT_FILENAMES@
--in-format $in_format
--data-type $data_type
--cores "\${GALAXY_SLOTS:-1}"
$check_align
]]></command>

<inputs>
<param name="input_files" type="data" format="fasta,phylip,nex" label="Sequence(s) to replicate" multiple="true"
help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." />
<expand macro="input_format" />
<expand macro="output_format" label="Select output format for replicated alignment(s)" />
<param name="replicate_replicates" type="integer" value="10" min="1" label="Number of replicate datasets to build" />
<param name="replicate_loci" type="integer" value="2" min="1" label="Number of loci per replicate" />
<expand macro="data_type" />
<expand macro="check_align" />
</inputs>

<outputs>
<expand macro="collection_outputs" name="replicate_alignments" />
</outputs>

<tests>
<test expect_num_outputs="1">
<param name="input_files" value="inputs/fasta1.fas" />
<param name="replicate_replicates" value="2" />
<param name="replicate_loci" value="1" />
<param name="out_format" value="nexus" />
<param name="in_format" value="fasta" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output_collection name="replicate_alignments_nexus" type="list">
<element name="replicate1_1-loci-out.nex" file="outputs/expected_replicate1.nex" ftype="nex" />
<element name="replicate2_1-loci-out.nex" file="outputs/expected_replicate2.nex" ftype="nex" />
</output_collection>
</test>
</tests>

<help><![CDATA[
**What it does**

AMAS Replicate generates jackknife or bootstrap replicates by randomly sampling loci (genes) from your dataset. This is used to assess phylogenetic signal distribution and node support across different genomic regions.

**Inputs**

- **Alignment files**: Multiple pre-aligned sequence files, one per locus/gene (FASTA, PHYLIP, or NEXUS format)
- **Number of replicates**: How many replicate datasets to generate
- **Loci per replicate**: How many loci to include in each replicate
- **Input format**: Specify the format of your input files
- **Data type**: Choose DNA for nucleotide sequences or Protein for amino acid sequences
- **Output format**: Select the desired format for the replicate alignments

**Outputs**

A collection of replicate alignment files. Each replicate contains a random subset of the input loci concatenated together.

**Use cases**

- **Phylogenetic jackknifing**: Assess whether phylogenetic signal is driven by specific loci
- **Node support evaluation**: Test robustness of tree topology across different gene combinations
- **Signal heterogeneity**: Identify whether conflicting signals come from particular genomic regions

**Example**

From 100 input genes, create 10 replicates each containing 50 randomly sampled genes. Each replicate can then be used to build a phylogenetic tree, and consistency across replicates indicates robust phylogenetic signal.

@AMAS_SHARED_HELP@
]]></help>

<expand macro="citations" />
</tool>
Loading