Skip to content

cd-hit-2d excluding from clusters good aligments #146

@laugior

Description

@laugior

Running this command:
cd-hit-2d -i main_set -i2 transcriptos_alternativos -o comparacion_70 -c 0.70 -M 8000 -T 15 -g 1 -G 0 -aS 0.3 -aL 0.3 -A 0.3

Both -i and -i2 are nucleotide

The following cluster is created:

Cluster 6508
0 1305aa, >NonamEVm004010t1... *
1 1303aa, >NonamEVm004010t2... at 1:1177:129:1305/100.00%

in -i2 exists >NonamEVm004010t4 which has 100% identity with >NonamEVm004010t1 (when aligned with external tool). (Identity 100% cover 92%)
NonamEVm004010t4 is found on non-redundant sequences from db2 output

Alignment NonamEVm004010t1 CGCAAGCAAGGCACGCCGCTTTCACGCCCCAAGCCCGCCGTCGAGCCGGAATGCGGTGCC

NonamEVm004010t4 --------------------------------------------------------------------------------------------

NonamEVm004010t1 CAATTGGTTCCGCATTTCGGCTGCGCAAATGCAACGCCAGTCGCCAGACCACACAGCCTA
NonamEVm004010t4 -------------------------------------------------------------------------------------------

NonamEVm004010t1 TTCCCCGCCGCTCGATCTGCAACCACTCTGCTCCGACGTGACCCTCGAGCCACGACCCGT
NonamEVm004010t4 ------------CGCTCGATCTGCAACCACTCTGCTCCGACGTGACCCTCGAGCCACGACCCGT

NonamEVm004010t1 AACACCTGCTTTACTTCCATATTCCGTTTGCATCGCCTTTCGCTCTCTTTTTCCCATCCT
NonamEVm004010t4 AACACCTGCTTTACTTCCATATTCCGTTTGCATCGCCTTTCGCTCTCTTTTTCCCATCCT

NonamEVm004010t1 CACGAACTGGCGATCCACCACCTCCCCAATATGATTCCCTCTTCTTCGTCCAAGCTCTTC
NonamEVm004010t4 CACGAACTGGCGATCCACCACCTCCCCAATATGATTCCCTCTTCTTCGTCCAAGCTCTTC

NonamEVm004010t1 TTGCGATCCAGCGTCGCCGCCTCTCGTGCCACCATGACCGCGCGGCCGGCCATTCGCGCC
NonamEVm004010t4 TTGCGATCCAGCGTCGCCGCCTCTCGTGCCACCATGACCGCGCGGCCGGCCATTCGCGCC

NonamEVm004010t1 TTTTCAACGTCTCTCGCTAGCCAAAAGGAGGTCAAGAACGTCACAGTCTTCGGTGCCGGC
NonamEVm004010t4 TTTTCAACGTCTCTCGCTAGCCAAAAGGAGGTCAAGAACGTCACAGTCTTCGGTGCCGGC

NonamEVm004010t1 CTCATGGGCGCCGGCATCGCTCAGGTCCTGGCGCACAAGGGAAAGTACAACGTCACTCTC
NonamEVm004010t4 CTCATGGGCGCCGGCATCGCTCAGGTCCTGGCGCACAAGGGAAAGTACAACGTCACTCTC

NonamEVm004010t1 TCGGACGTCACCGACAAGGCGCTCGCCAACGGACAGTCGATTATCTCCAAGTCACTCACA
NonamEVm004010t4 TCGGACGTCACCGACAAGGCGCTCGCCAACGGACAGTCGATTATCTCCAAGTCACTCACA

NonamEVm004010t1 AGGATCGCAAAGAAGGCCTTGGCCGAATCGTCGGCCGACGAGCAGTCCCACTTTGTCAAA
NonamEVm004010t4 AGGATCGCAAAGAAGGCCTTGGCCGAATCGTCGGCCGACGAGCAGTCCCACTTTGTCAAA

NonamEVm004010t1 GGCATCGTTGATTCGATCAAGGTCACCACCGATCCCGAGGCCGCCGTGGAAGACACCGAC
NonamEVm004010t4 GGCATCGTTGATTCGATCAAGGTCACCACCGATCCCGAGGCCGCCGTGGAAGACACCGAC

NonamEVm004010t1 CTCGTTATCGAGGCCATCATCGAGAACGTCGGCATCAAGAAGGACCTCTTTGGCTTCCTC
NonamEVm004010t4 CTCGTTATCGAGGCCATCATCGAGAACGTCGGCATCAAGAAGGACCTCTTTGGCTTCCTC

NonamEVm004010t1 GACGGCAAGGCGCCCAAGGACGCTATCTTTGCGACCAACACGAGCTCGCTCAGCATCACC
NonamEVm004010t4 GACGGCAAGGCGCCCAAGGACGCTATCTTTGCGACCAACACGAGCTCGCTCAGCATCACC

NonamEVm004010t1 GACGTCGCCGAGGCTGTCGAGAGAAAGGAGCGGTTCGCCGGCTTCCACGCCTTCAACCCG
NonamEVm004010t4 GACGTCGCCGAGGCTGTCGAGAGAAAGGAGCGGTTCGCCGGCTTCCACGCCTTCAACCCG

NonamEVm004010t1 GTGCCCCAGATGAAGCTGGTCGAGATCGTGCGCACCAGCCAGACCAGCGACGAGACCTAC
NonamEVm004010t4 GTGCCCCAGATGAAGCTGGTCGAGATCGTGCGCACCAGCCAGACCAGCGACGAGACCTAC

NonamEVm004010t1 GACAGCCTGATGGAGGTGGCCAAGAGGATGGGCAAGGTCCCCGTCACTTGCGTCGACTCG
NonamEVm004010t4 GACAGCCTGATGGAGGTGGCCAAGAGGATGGGCAAGGTCCCCGTCACTTGCGTCGACTCG

NonamEVm004010t1 CCGG--------------------------------------------------------
NonamEVm004010t4 CCGGGGTGAGTCGACAGCTCCGACTAGGAGGCATCCCGATGTGTCCGGAAACGCTTTCCC

NonamEVm004010t1 --------------------------------------------------GATTCATCGT
NonamEVm004010t4 CTCTGAAGAAACTCCTGGCGACTCTGCCTTTCGTCTCCTGCTGCGGCACAGATTCATCGT

NonamEVm004010t1 CAACCGACTGCTGGTACCCTACATGTTTGAGGCCATCCGACTCGTTGAGCGAGGCGAGGC
NonamEVm004010t4 CAACCGACTGCTGGTACCCTACATGTTTGAGGCCATCCGACTCGTTGAGCGAGGCGAGGC

NonamEVm004010t1 GTCGATCAAGGATGTGGACACCGCTATGAAGCTCGGCGCTGGATACCCCATGGGTCCGTT
NonamEVm004010t4 GTCGATCAAGGATGTGGACACCGCTATGAAGCTCGGCGCTGGATACCCCATGGGTCCGTT

NonamEVm004010t1 TGAGCTCGCCGATCTGGTCGGCCTCGATACGCTGTCCCACATTGCCAAGGGCTGGAGGGA
NonamEVm004010t4 TGAGCTCGCCGATCTGGTCGGCCTCGATACGCTGTCCCACATTGCCAAGGGCTGGAGGGA

NonamEVm004010t1 GACTCGGGTCAAGACGGGAGAGATCAGCGCCGAGGCGGTCAAGGAATCGAAGCTGCTCGA
NonamEVm004010t4 GACTCGGGTCAAGACGGGAGAGATCAGCGCCGAGGCGGTCAAGGAATCGAAGCTGCTCGA

NonamEVm004010t1 GGATCTGGTGGCCCAGGGCAAGCTTGGCAAAAAGAGCGGAGAGAAGGGCGGTTTCTACAA
NonamEVm004010t4 GGATCTGGTGGCCCAGGGCAAGCTTGGCAAAAAGAGCGGAGAGAAGGGCGGTTTCTACAA

NonamEVm004010t1 ATACCCGGCGTCCAATAAGTGAGGCGAGGCA-----------------------------
NonamEVm004010t4 ATACCCGGCGTCCAATAAGTGAGGCGAGGCAGCTTTCCGGATGGTCTCTGCTAGCCGATC

NonamEVm004010t1 ------------------------------------------------------------
NonamEVm004010t4 GAGGACAGATCGGTTGAGGCACCAGTGTAGGGCTGCATCATTGGATCAGCAATGAGGAAG

NonamEVm004010t1 -------------------------------------
NonamEVm004010t4 GCGGTTGCGTACAAATACAAAGAGTGATAGAAGGTGC

Can someone explain this behavior? is something wrong with my command?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions