Python-para-Bioinformatica/chapter23.txt at main · ToyokoLabs/Python-para-Bioinformatica · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
{:: encoding="UTF-8" /}

# Mutaciones de ADN con restricción

## DESCRIPCIÓN DEL PROBLEMA

Un investigador necesita diseñar una secuencia de ADN basada en una secuencia codificante. Esta secuencia codifica un polipéptido en el que está interesado el investigador. La nueva secuencia debe codificar para el mismo polipéptido, por lo que todos los cambios de nucleótidos en la nueva secuencia deben ser "mutaciones silenciosas". Una mutación silenciosa es aquella que no produce un cambio en la secuencia de aminoácidos de la proteína resultante. Esto se hace aprovechando la redundancia del código de ADN. Por ejemplo, los códigos **AAG** y **AAA** codifican para "Lisina" (K), por lo que cambiar **G** por **A** no cambia el polipéptido resultante.

El objetivo de generar diferentes secuencias de ADN es poder clasificarlas utilizando enzimas de restricción.

Dada una secuencia de ADN, el programa primero debe convertirla en un polipéptido y luego generar todas las posibles secuencias de ADN que codifican para dicho polipéptido. El siguiente paso es calcular qué enzimas cortan la secuencia original, cada nueva secuencia de ADN generada y compararlas. El programa debe imprimir los nombres de las enzimas que son exclusivas para cada secuencia.

### Introducir mutaciones puntuales y dar un perfil de restricción

La solución planteada más abajo es un programa (`pointmutations.py`) con un archivo de plantilla (`mutation.tpl`). Dicho archivo se utiliza para que **Jinja2** genere la salida.

**Listado 23.1:** `pointmutations.py`: Introducir mutaciones puntuales.

```
from argparse import ArgumentParser

from Bio import Seq
from Bio.Alphabet import IUPAC
from Bio import Restriction
from Bio.Data import CodonTable
from jinja2 import Template

helpstr = '''Given a DNA sequence of a polypeptide, this program
generates alternative DNA sequences that code for the same
polypeptide but that can be sorted out by DNA restriction.
Requires Biopython >= 1.69
Author: Sebastian Bassi (sbassi@genesdigitales.com)'''
usage = helpstr + '\n\nusage: %(prog)s input_sequence [options]'
parser = ArgumentParser(usage=usage)
parser.add_argument('input', help='Input sequence')
parser.add_argument('-o', '--output', help=
                    'name of the output file',
                    default='output.txt')
parser.add_argument('-m', '--mutations', type=int,
                  help='number of allowed mutations',
                  dest='n_mut', default=1)
parser.add_argument('-t', '--table', type=int,
                  help='translation table',
                  dest='table_id', default=1)

def backtrans(ori_pep, table_id=1):
    """
    Function to make backtranslation (from peptide to DNA)
    This function needs the peptide sequence and the code of
    translation table. Code number is the same as posted in:
    http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
    """
    def recurs(order, pos):
        for letter in bt[order[pos]]:
            if pos == len(order) - 1:
                yield letter
                continue
            for prox in recurs(order, pos+1):
                yield (letter + prox)

    def combine(order):
        ordened = set()
        for frase in recurs(order, 0):
            ordened.add(frase)
        return ordened

    t = CodonTable.generic_by_id[table_id]
    bt = dict()
    for a1 in "ATCG" :
        for a2 in "ATCG" :
            for a3 in "ATCG" :
                codon = a1 + a2 + a3
                try:
                    amino = t.forward_table[codon]
                except KeyError:
                    assert codon in t.stop_codons
                    continue
                try:
                    bt[amino].append(codon)
                except KeyError:
                    bt[amino] = [codon]
    return list(combine(ori_pep))

def seqcomp(s1, s2):
    """
    Compares 2 sequences and returns a value with
    how many differents elements they have.
    """
    p = len(s1)
    for x,y in zip(s1, s2): # Walk through 2 sequences.
        if x==y:
            p -= 1
    return p

args = parser.parse_args()
dna = Seq.Seq(args.input, IUPAC.unambiguous_dna)
# Translate DNA sequence.
ori_pep = dna.translate()
# Get all backtranslations.
bakpeps = backtrans(ori_pep, args.table_id)
# Make a restriction analysis for the orignal sequence.
analysis = Restriction.Analysis(Restriction.CommOnly, dna)
analysis.print_as('map')
ori_map = analysis.format_output()
# Store the enzymes that cut in the original sequence.
enz = list(analysis.with_sites().keys())
# Get a string out of the enzyme list, for printing.
oname = str(enz)[1:-1]
enz = set(enz)
bakpeps_out = []
for bakpep in bakpeps:
    tmp_d = {}
    if bakpep not in args.input:
        # Make a restriction analysis for each sequence.
        analysis = Restriction.Analysis(Restriction.CommOnly,
                Seq.Seq(bakpep, IUPAC.unambiguous_dna))
        # Store the enzymes that cut in this sequence.
        enz_tmp = list(analysis.with_sites().keys())
        enz_tmp = set(enz_tmp)
        # Get the number of mutations in backpep sequence.
        y = seqcomp(args.input, bakpep)
        if enz_tmp != enz and enz and y <= args.n_mut:
            analysis.print_as('map')
            tmp_d['pames'] = str(enz_tmp)[1:-1]
            tmp_d['graph'] = analysis.format_output()
            tmp_d['ori_seq'] = str(list(enz.difference(
                                        enz_tmp)))[1:-1]
            tmp_d['proposed_seq'] = str(list(
                        enz_tmp.difference(enz)))[1:-1]
            bakpeps_out.append(tmp_d)
with open('mutation.tpl') as fh:
    template = Template(fh.read())
    render = template.render(dna_input=args.input,
                             ori_pep=ori_pep,
                             ori_map=ori_map,
                             oname=oname,
                             bakpeps_out=bakpeps_out)
with open(args.output, 'w') as file_out:
    file_out.write(render)
```

### Correr el programa para introducir las mutaciones puntuales

Cuando ejecutamos desde la línea de comando:

{line-numbers=off}
```
$ python pointmutation.py ATGGGTAATTGCAACGGGGCATCCAAG
```

Obtenemos una salida como esta:

{line-numbers=off}
```
Original Sequence: ATGGGTAATTGCAACGGGGCATCCAAG
Peptide: MGNCNGASK

Restriction map for original sequence:
ORIGINAL SEQUENCE:

      7 FokI Tsp509I TspEI Sse9I
      |
      |    12 HpyCH4V CviRI
      |    |
      |    |       20 BseGI BstF5I
      |    |        |
      ATGGGTAATTGCAACGGGGCATCCAAG
      |||||||||||||||||||||||||||
      TACCCATTAACGTTGCCCCGTAGGTTC
      1                         27

      Enzymes which do not cut the sequence.

      AccII AciI  AfaI  AluI  AspLEI  BfaI ...
      BshFI BsiSI Bsp143I BspANI BstFNI BstHHI ...
      (...cut...)
      =========================
      Original sequence enzymes: BstF5I, Tsp509I, TspEI, FokI, <=
      HpyCH4V, BseGI, Sse9I, CviRI

Proposed sequence enzymes: FokI, BseGI, HpyCH4V, BstF5I, <=
CviRI, MaeIII

      5 MaeIII
      |
      |     7 FokI
      |     |
      |     |    12 HpyCH4V CviRI
      |     |    |
      |     |    |       20 BseGI BstF5I
      |     |    |       |
      ATGGGTAACTGCAACGGGGCATCCAAG
      |||||||||||||||||||||||||||
      TACCCATTGACGTTGCCCCGTAGGTTC
      1                           27

        Enzymes which do not cut the sequence.

      AccII AciI AfaI AluI AspLEI BshFI BfaI ...
      BsiSI Bsp143I BspANI BstFNI BstHHI ...
      (...cut...)

      Enzimes only in original sequence: TspEI, Sse9I, Tsp509I

      Enzimes only in proposed sequence: MaeIII
      =========================
      (...cut...)
```

La salida real es un archivo llamado `restriccion_output.txt`. Como todos los archivos mencionados en este libro está disponible en el repositorio de GitHub <https://github.com/ToyokoLabs/Py4Bio>.

## RECURSOS ADICIONALES

* [Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2007). “REBASE–enzymes and genes for DNA restriction and modification.” Nucleic Acids Res. 35:D269-D270](http://rebase.neb.com/rebase/rebase.html)

* [Bickle TA, Kruger DH (June 1993). “Biology of DNA restriction.” Microbiol. Rev. 57 (2):434-50.](http://mmbr.asm.org/cgi/reprint/57/2/434?view=long&pmid=8336674)