Skip to content

Commit d04e2af

Browse files
Merge pull request #158 from tiagofilipe12/backend_1.6.0
Backend 1.6.0
2 parents fc1d59d + 878da31 commit d04e2af

24 files changed

+287
-122
lines changed

changelog.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
11
# Changelog
22

3-
## Upcoming version (1.6.0)
3+
## Version 1.6.0
44

5-
### Database
5+
### Back end
6+
- Added code that allows to save a list of all the accession numbers to a file,
7+
so that future changes to the database can be easily documented.
8+
- Added black list for accession numbers that are reported to be misplaced as
9+
plasmids in refseq database.
10+
- Update database to NCBI refseq 091418.
11+
- Added first implementation to parse result from plasmidfinder new database.
612

713
### Front end
814

@@ -30,6 +36,10 @@ multiple nodes.
3036
modals.
3137
- Implemented highlight and filter for all node selections (taxa, resistances,
3238
plasmid families, virulence and combined selections).
39+
- Added faq on how to report sequences that aren't plasmids.
40+
- Removed histogram from length plot.
41+
- Added new button that allow users to more easily report a sequence, by using
42+
github api for pre-filled issues.
3343

3444
#### Bug fixes
3545
- Fixed minor issues after filtering datasets for link selections and for shift

docs/Sidebar.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,33 @@ that display pATLAS key features.
4141

4242
#### FAQs
4343

44-
A set of questions that may occur to users when using pATLAS
44+
A set of questions that may occur to users when using pATLAS.
45+
46+
#### Report sequence.
47+
48+
Here users can report any problem that they find with a plasmid available in
49+
pATLAS. For instance, if there is a gene sequence rather than a plasmid. This
50+
is part of the _crowd curation_ of pATLAS, which is an initiative that aims to
51+
ease the curation of the database by users that find an issue with a RefSeq
52+
sequence that shouldn't be a plasmid. pATLAS already has some filters that
53+
prevent non-plasmid sequences from getting into the database, however
54+
in each database update it is expected that new issues may arise, thus we
55+
acknowledge every contribution to help us curate the database.
56+
57+
When clicked, this button will open a pre-formatted GitHub issue like the one below:
58+
59+
![](gitbook/images/gh_issue.png)
60+
61+
The user reporting will have to replace `<brief title of the issue>` with the
62+
desired title and then the body of the issue is already pre-filled with
63+
two headers:
64+
65+
* `Sequences accessions` - In which the users should state the accession numbers
66+
of the sequences that have issues.
67+
68+
* `Description of the issue` - Here the users should state the reason by which
69+
the sequence should be removed/curated from the pATLAS plasmid database.
70+
4571

4672
## About
4773

patlas/MASHix.py

Lines changed: 42 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
#!/usr/bin/env python3
22

3-
## Last update: 11/6/2018
4-
## Author: T.F. Jesus
5-
## This script runs MASH in plasmid databases making a pairwise diagonal matrix
6-
## for each pairwise comparison between libraries
7-
## Note: each header in fasta is considered a reference
3+
# Last update: 14/8/2018
4+
# Author: T.F. Jesus
5+
# This script runs MASH in plasmid databases making a pairwise diagonal matrix
6+
# for each pairwise comparison between libraries
7+
# Note: each header in fasta is considered a reference
88

99
import argparse
1010
import sys
@@ -21,16 +21,17 @@
2121
try:
2222
from utils.hist_util import plot_histogram
2323
from utils.taxa_fetch import executor
24+
from utils.crowd_curation import black_list
2425
from db_manager.db_app import db, models
2526
except ImportError:
2627
from patlas.utils.hist_util import plot_histogram
2728
from patlas.utils.taxa_fetch import executor
29+
from patlas.utils.crowd_curation import black_list
2830
from patlas.db_manager.db_app import db, models
2931

30-
# This is a rather sketchy solution TODO remove this with a refactor of node_crawler
32+
# TODO This is a rather sketchy solution, remove this with a refactor of node_crawler
3133
sys.setrecursionlimit(10000)
3234

33-
3435
class Record:
3536

3637
def __init__(self, accession, size, distance, percentage_hashes):
@@ -103,7 +104,6 @@ def output_tree(infile, tag):
103104
104105
"""
105106

106-
107107
mother_directory = os.path.join(os.path.dirname(os.path.abspath(infile)),
108108
tag)
109109
dirs = ["", "tmp", "results", "reference_sketch", "genome_sketchs",
@@ -233,6 +233,10 @@ def master_fasta(fastas, output_tag, mother_directory):
233233

234234
species_output = open(species_out, "w")
235235

236+
# creates a list of accession numbers, listing all accessions in
237+
# input seuqences
238+
all_accessions = []
239+
236240
# sets first length instance
237241
length = 0
238242
accession = False
@@ -306,18 +310,21 @@ def master_fasta(fastas, output_tag, mother_directory):
306310
plasmid_name = search_substing(line)
307311
# species related functions
308312
all_species.append(" ".join(species.split("_")))
313+
# append accession that will be outputed to file
314+
all_accessions.append(accession)
309315

310316
# added this if statement to check whether CDS is present in
311317
# fasta header, since database contain them with CDS in string
312318
if "cds" in line.lower() and line.lower().count("cds") <= 1 \
313319
and "plasmid" not in line.lower():
314320
truePlasmid = False
315321
reason = "cds"
316-
#continue
317322
elif "origin" in line.lower():
318323
truePlasmid = False
319324
reason = "origin"
320-
#continue
325+
elif accession in black_list:
326+
truePlasmid = False
327+
reason = black_list[accession]
321328
else:
322329
truePlasmid = True
323330

@@ -362,6 +369,14 @@ def master_fasta(fastas, output_tag, mother_directory):
362369
# writes a species list to output file
363370
species_output.write("\n".join(str(i) for i in list(set(all_species))))
364371
species_output.close()
372+
373+
# write accessions to a file
374+
accession_out = os.path.join(mother_directory,
375+
"accessions_list_{}.lst".format(output_tag))
376+
with open(accession_out, "w") as fh:
377+
fh.write("version: 1.5.2\n")
378+
fh.write("\n".join(all_accessions))
379+
365380
return out_file, sequence_info, all_species
366381

367382

@@ -873,7 +888,8 @@ def main():
873888

874889
mash_options = parser.add_argument_group("MASH related options")
875890
mash_options.add_argument("-k", "--kmers", dest="kmer_size", default="21",
876-
help="Provide the number of k-mers to be provided to mash "
891+
help="Provide the number of k-mers to be provided"
892+
" to mash "
877893
"sketch. Default: 21.")
878894
mash_options.add_argument("-p", "--pvalue", dest="pvalue",
879895
default="0.05", help="Provide the p-value to "
@@ -888,8 +904,8 @@ def main():
888904
other_options = parser.add_argument_group("Other options")
889905
other_options.add_argument("-rm", "--remove", dest="remove",
890906
action="store_true", help="Remove any temporary "
891-
"files and folders not "
892-
"needed (not present "
907+
"files and folders not"
908+
" needed (not present "
893909
"in results "
894910
"subdirectory).")
895911
other_options.add_argument("-hist", "--histograms", dest="histograms",
@@ -911,8 +927,8 @@ def main():
911927
help="this option allows to only run the part "
912928
"of the script that is required to "
913929
"generate the filtered fasta. Allowing for "
914-
"instance to debug sequences that shoudn't "
915-
"be removed using 'cds' and 'origin' "
930+
"instance to debug sequences that shouldn't"
931+
" be removed using 'cds' and 'origin' "
916932
"keywords")
917933

918934
args = parser.parse_args()
@@ -929,23 +945,22 @@ def main():
929945
names_file = args.names_file
930946
nodes_file = args.nodes_file
931947

932-
## lists all fastas given to argparser
948+
# lists all fastas given to argparser
933949
fastas = [f for f in args.inputfile if f.endswith((".fas", ".fasta",
934950
".fna", ".fsa", ".fa"))]
935951

936-
## creates output directory tree
937-
output_tag = args.output_tag.replace("/", "") ## if the user gives and
952+
# creates output directory tree
953+
output_tag = args.output_tag.replace("/", "") # if the user gives and
938954
# input tag that is already a folder
939955
mother_directory = output_tree(fastas[0], output_tag)
940956

941-
## checks if multiple fastas are provided or not avoiding master_fasta
957+
# checks if multiple fastas are provided or not avoiding master_fasta
942958
# function
943959
print("***********************************")
944960
print("Creating main database...\n")
945961
main_fasta, sequence_info, all_species = master_fasta(fastas, output_tag,
946962
mother_directory)
947963

948-
949964
# if the parameter sequences_to_remove is provided the script will only
950965
# generate the fasta files and a list of the sequences that were removed
951966
# from ncbi refseq original fasta.
@@ -954,11 +969,11 @@ def main():
954969
"Leaving script...")
955970
sys.exit(0)
956971

957-
#########################
958-
### genera block here ###
959-
#########################
972+
#####################
973+
# genera block here #
974+
#####################
960975

961-
## runs mash related functions
976+
# runs mash related functions
962977
print("***********************************")
963978
print("Sketching reference...\n")
964979
ref_sketch = sketch_references(main_fasta, output_tag, threads, kmer_size,
@@ -969,7 +984,7 @@ def main():
969984
print("Making temporary files for each genome in fasta...\n")
970985
genomes = genomes_parser(main_fasta, mother_directory)
971986

972-
## This must be multiprocessed since it is extremely fast to do mash
987+
# This must be multiprocessed since it is extremely fast to do mash
973988
# against one plasmid sequence
974989
print("***********************************")
975990
print("Sketching genomes and running mash distances...\n")
@@ -979,7 +994,7 @@ def main():
979994
output_tag, kmer_size, mother_directory),
980995
genomes) # process genomes iterable with pool
981996

982-
## loop to print a nice progress bar
997+
# loop to print a nice progress bar
983998
try:
984999
for _ in tqdm.tqdm(mp, total=len(genomes)):
9851000
pass
@@ -991,7 +1006,7 @@ def main():
9911006
# remaining options are triggered
9921007
print("\nFinished MASH... uf uf uf!")
9931008

994-
## Makes distances matrix csv file
1009+
# Makes distances matrix csv file
9951010
print("\n***********************************")
9961011
print("Creating distance matrix...\n")
9971012
lists_traces = mash_distance_matrix(mother_directory, sequence_info,

patlas/db_manager/db_app/resources.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,6 @@ class GetResistances(Resource):
8585
def post(self):
8686
var_response = request.form["accession"].replace("[", "")\
8787
.replace("]", "").replace('"', "").split(",")
88-
print(var_response)
8988
single_query = db.session.query(Card).filter(
9089
Card.plasmid_id.in_(var_response)).all()
9190
return single_query
@@ -149,13 +148,15 @@ def get(self):
149148
args = req_parser.parse_args()
150149
# This queries name object in json_entry and retrieves an array with
151150
# all objects that matched the args (json_entry, plasmid_id)
151+
parsed_gene = args.gene.replace('"', '') # TODO parser for new plasmidfinder db
152152
records = db.session.query(Database).filter(
153-
Database.json_entry["gene"].astext.contains(args.gene)
153+
Database.json_entry["gene"].astext.contains(parsed_gene)
154154
).all()
155155
# contains method allows us to query in array that is converted to a
156156
# string
157157
return records
158158

159+
159160
class GetAccessionVir(Resource):
160161
@marshal_with(card_field)
161162
def get(self):

patlas/db_manager/db_app/static/js/download/abrPlusFamilies.js

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -390,12 +390,37 @@ const plasmidFamilyGetter = (nodeId) => {
390390

391391
try{
392392
// totalLength array corresponds to gene names
393-
const totalLength = data[0].json_entry.gene.replace(/['u\[\] ]/g, "").split(",")
394-
const accessionList = data[0].json_entry.accession.replace(/['u\[\] ]/g, "").split(",")
393+
let totalLength = data[0].json_entry.gene.replace(/['u\[\] ]/g, "").split(",")
394+
let accessionList = data[0].json_entry.accession.replace(/['u\[\] ]/g, "").split(",")
395395
const coverageList = data[0].json_entry.coverage.replace(/['u\[\] ]/g, "").split(",")
396396
const identityList = data[0].json_entry.identity.replace(/['u\[\] ]/g, "").split(",")
397397
const rangeList = data[0].json_entry.seq_range.replace("[[", "[").replace("]]", "]").split("],")
398398

399+
// TODO parser required for new plasmidfinder db
400+
if (accessionList[0] === "") {
401+
402+
accessionList = totalLength.map( (el) => {
403+
const length_split = el.split("_")
404+
if (length_split.indexOf("NC") > 0) {
405+
return length_split.slice(length_split.length - 2).join("_")
406+
} else {
407+
return length_split.slice(length_split.length - 1).join("_")
408+
}
409+
})
410+
411+
totalLength = totalLength.map( (el) => {
412+
const length_split = el.split("_")
413+
// check if there is a NC
414+
if (length_split.indexOf("NC") > 0) {
415+
return length_split.slice(0, length_split.length - 2).join("_").replace(/\_$/, "")
416+
} else {
417+
return length_split.slice(0, length_split.length - 1).join("_").replace(/\_$/, "")
418+
}
419+
})
420+
421+
422+
}
423+
399424
for (const i in totalLength) {
400425
if ({}.hasOwnProperty.call(totalLength, i)) {
401426

@@ -405,7 +430,7 @@ const plasmidFamilyGetter = (nodeId) => {
405430

406431
queryArrayPFRange.push( {
407432
"range": rangeEntry,
408-
"genes": customTrim(totalLenght[i], "'"),
433+
"genes": customTrim(totalLength[i], "'"),
409434
"accessions": makeItClickable(accessionList[i].split(":")[0]),
410435
"coverage": coverageList[i],
411436
"identity": identityList[i]
@@ -510,7 +535,7 @@ const virulenceGetter = (nodeId) => {
510535
queryArrayVirRange.push(
511536
{
512537
"range": rangeEntry,
513-
"genes": customTrim(totalLenght[i], "'"),
538+
"genes": customTrim(totalLength[i], "'"),
514539
"accessions": makeItClickable(accessionList[i].split(":")[0]),
515540
"coverage": coverageList[i],
516541
"identity": identityList[i]

patlas/db_manager/db_app/static/js/dropdowns/dropdownPopulation.js

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -95,11 +95,22 @@ getArrayPf().done((json) => {
9595
// iterate over the file
9696
$.each(json, (accession, entry) => {
9797
const geneEntries = entry.gene
98-
for (let i in geneEntries) {
99-
if (geneEntries.hasOwnProperty(i)) {
100-
if (listPF.indexOf(geneEntries[i]) < 0) {
101-
listPF.push(geneEntries[i])
102-
}
98+
for (let i of geneEntries) {
99+
100+
//TODO this should be removed once plasmidefinder abricate is used - listPF.push(i), everything else should be ignored
101+
const length_split = i.split("_")
102+
const parsed_i = length_split
103+
.slice(0, length_split.length - 1)
104+
.join("_")
105+
// replace every _NC in the end
106+
.replace(/\_NC$/, "")
107+
// then remove _ in the end of the plasmidfinder gene name
108+
.replace(/\_$/, "")
109+
110+
// checks if entry is already in listPF and if so doesn't populate the
111+
// dropdown.
112+
if (listPF.indexOf(parsed_i) < 0) {
113+
listPF.push(parsed_i)
103114
}
104115
}
105116
})

patlas/db_manager/db_app/static/js/input_file_handling/dropdownPopulation.js

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
/*globals colorList, listGiFilter, colorNodes, legendInst, typeOfProject */
1+
/*globals colorList, listGiFilter, colorNodes, legendInst, typeOfProject,
2+
blockFilterModal */
23

34
// function to remove first char from every string in array
45
const removeFirstCharFromArray = (arr) => {
@@ -202,7 +203,9 @@ const pfSubmitFunction = async (g, graphics, renderer, tempPageReRun) => {
202203
// now processes the current selection
203204
const pfQuery = document.getElementById("p_PlasmidFinder").innerHTML
204205

205-
let selectedPf = pfQuery.replace("PlasmidFinder:", "").split(",").filter(Boolean)
206+
let selectedPf = pfQuery.replace("PlasmidFinder:", "")
207+
.split(",")
208+
.filter(Boolean)
206209

207210
selectedPf = removeFirstCharFromArray(selectedPf)
208211

patlas/db_manager/db_app/static/js/node_handling/advancedFilters.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
/*globals speciesRequest, taxaRequest, resRequest, pfRequest, virRequest, listGiFilter, colorNodes, selectedFilter */
1+
/*globals speciesRequest, taxaRequest, resRequest, pfRequest, virRequest,
2+
listGiFilter, colorNodes, selectedFilter, blockFilterModal*/
23

34
/**
45
* Function to calculate intersection between arrays. Note that this function

0 commit comments

Comments
 (0)