You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
caption: [Comparison of core features of packages offering access to the NCBI
180
-
taxonomic backbone. "Library": ability to be called from code. "CLI": ability to
181
-
work as a command-line tool. "Local DB": ability to store a copy of the database
182
-
locally. "Fuzzy": ability to perform fuzzy matching on inputs. "Case": ability
183
-
to perform case-insensitive search. "Subsets": ability to limit the search to a
184
-
subset of the raw database. "Ranks": ability to limit the search to specific
185
-
taxonomic ranks. The features of the various packages have been determined from
186
-
reading their documentation.]
187
-
) <comparison>
188
-
189
156
An up-to-date version of the documentation for `NCBITaxonomy.jl` can be found in
190
157
the package's _GitHub_ repository (#link("https://github.com/PoisotLab/NCBITaxonomy.jl")[`PoisotLab/NCBITaxonomy.jl`]), including
191
158
examples and in-line documentation of every method. The package is released
@@ -326,30 +293,6 @@ hemipteran genus _Lisarda_ rather than the class _Lepidosauria_).
326
293
Note that the use of a restricted list of names can have significant performance
327
294
consequences. This is illustrated in @benchmark[Tab.]. When possible, the optimal search strategy is to (i) rely on name filters to ensure that searches are conducted within the appropriate NCBI division, and (ii) only rely on fuzzy matching when the strict or lowercase match fails to return a name, as fuzzy matching can result in order of magnitude more run time and memory footprint.
caption: [Time and performance of different search strategies for the string `"chimpanzees"`. These numbers were obtained on a single Intel i7-8665U CPU (1.90GHz). Using `"Pan"` as the search string (for which `"chimpanzees"`is a recognized vernacular) gave qualitatively similar results, suggesting
349
-
that there is no performance cost associated with working with synonyms or
350
-
verncular input data.]
351
-
) <benchmark>
352
-
353
296
== Quality of life functions
354
297
355
298
In order to facilitate working with names, we provide the `authority` function
@@ -400,4 +343,67 @@ distributed systems was enabled by support provided by Calcul Québec
400
343
the initial code, TP and CJC contributed to API design, and all authors
401
344
contributed to functionalities and usability testing.
caption: [Time and performance of different search strategies for the string `"chimpanzees"`. These numbers were obtained on a single Intel i7-8665U CPU (1.90GHz). Using `"Pan"` as the search string (for which `"chimpanzees"`is a recognized vernacular) gave qualitatively similar results, suggesting
407
+
that there is no performance cost associated with working with synonyms or
This paper describes a Julia package for identifying and standardizing species
10
+
names in text, with the purpose of ensuring that species are not over-counted,
11
+
mis-identified, or misunderstood. While I have not been able to check the
12
+
software, I note that the code and its repository looks nicely engineered and
13
+
that it is in use in more than one system setup by the authors. While not a new
14
+
concept, the novelty with the software is its combination of features.
15
+
16
+
#response[We appreciate the feedback by the reviewer, and have addressed all of their comments in the revision.]
17
+
18
+
My general impression of the paper is that it describes useful software but that
19
+
the text needs work. It seems the abstract and background has been given far
20
+
less attention than the rest of the paper. I am confident that more readers will
21
+
be found if those parts are reworked.
22
+
23
+
#response[We have updated the abstract, and clarified the background section of the
24
+
manuscript, notably in the last paragraph. We hope that this will help readers
25
+
understand the purpose of the package.
26
+
]
27
+
28
+
== Specific comments
29
+
30
+
=== Abstract
31
+
32
+
I don't think "the NCBI taxonomic backbone" is an established term and, as such, should not be used in the abstract. When googling, at least the top three hits are to the authors' own papers and a preprint of the present manuscript.
33
+
34
+
#response[Corrected as part of the abstract changes]
35
+
36
+
I have been programming my whole life and I struggle with the following sentence: "The basic search functions are coupled with quality-of-life functions including case-insensitive search and custom fuzzy string matching to facilitate the amount of information that can be extracted automatically while allowing efficient manual curation and inspection of results." In particular, "quality-of-life functions" and "custom fuzzy string matching" is not helpful for anyone curious about your work.
37
+
38
+
#response[
39
+
Corrected as part of the abstract changes. We now provide longer list of
40
+
functionalities.
41
+
]
42
+
43
+
Is relying on the Apache Arrow format a dependency or a feature? If it is an implementation detail, I would say it does not belong in the abstract.
44
+
45
+
#response[
46
+
Both - we have kept it in the abstract as it allows high-performance access to
47
+
the data.
48
+
]
49
+
50
+
The abstract does not speak to a broader public. What are the applications of
51
+
the software? The phrase "to facilitate the reconciliation and cleaning of
52
+
taxonomic names" probably only makes sense to a quite narrow audience.
53
+
54
+
#response[
55
+
We have clarified the list of common issues in taxonomic names that the
56
+
software is intended to correct.
57
+
]
58
+
59
+
=== Background
60
+
61
+
The first paragraph reads as a copy of the abstract.
62
+
63
+
#response[The first paragraph has been reworked.]
64
+
65
+
Please define "the NCBI taxonomic backbone" before use.
66
+
67
+
#response[We define taxonomic backbones in the first paragraph.]
68
+
69
+
"Unambiguously identifying species" should be "Unambiguously identifying species names in text".
70
+
71
+
#response[Thank you for the suggestion, fixed.]
72
+
73
+
Avoid "presented below". Write "presented in Table 1" instead. You cannot assume the table in print ends up where you expect it.
74
+
75
+
#response[Fixed.]
76
+
77
+
I note that Table 1 has a column "Reference", which is good, but it is empty.
78
+
79
+
#response[Our apology for the omission, it has been filled-in.]
80
+
81
+
=== Language
82
+
83
+
Opening your submitted file in Word, I get spell and grammar warnings on quite trivial mistakes, for example "occuring", "litterature", "to the point were", and more.
84
+
85
+
#response[These have been fixed]
86
+
87
+
I also note simple mistakes that are hard for Word to notice: "a string of character"
88
+
89
+
#response[These have been fixed]
90
+
91
+
=== Code and code access
92
+
93
+
I am not a Julia user, but from a general (programming language agnostic) standpoint it looks like well-structured code.
94
+
95
+
#response[Thank you.]
96
+
97
+
The Zenodo page is either not existing or it is not accessible to the public.
98
+
99
+
#response[There was an issue with the link, it has been fixed in the revision.]
100
+
101
+
The GitHub repository is acessible. It is also setup for and invites for collaboration. There are no instructions for how to install and get started, from what I can find. How much of a Julia user does one need to try this package out? It would be nice with some basic install and get-started instructions. That is extra work I do not want to demand, but it would certainly help with "pickup" of users. For example, I have text would be curious to test your package on, so what would I do?
102
+
103
+
#response[Installation instructions have been added to the README, and more detailed "gertting started" instructions are in the documentation.]
104
+
105
+
It does not strike me as important to have details about error handling in the article. It is good programming and it should be boasted as a feature, but such programming details belongs in the package documentation (or README if you want to make it more public), in my humble opinion.
106
+
107
+
#response[We feel strongly that keeping this code snippet in the text is important, as it will help users adopt it as basis to build pipelines that use the error catching system.]
108
+
109
+
= Reviewer 2
110
+
111
+
Poisot and colleagues present their software package aimed at making taxonomic classifications/searches more efficient on a local copy of the NCBI database. This appears to be a useful tool that the bioinformatics community will appreciate.
112
+
113
+
== Minor editorial comments:
114
+
115
+
P3: improve what? Maybe ‘improve classifications such as conservation outcomes’.
116
+
117
+
#response[Clarified as part of changes to the background section.]
118
+
119
+
P4: should be ‘string of characters’.
120
+
121
+
#response[Fixed.]
122
+
123
+
P4: literature is misspelled.
124
+
125
+
#response[Fixed]
126
+
127
+
P4: there are ‘(ii)’s, one of them should be ‘(iii)’.
128
+
129
+
#response[Fixed]
130
+
131
+
P5: what is a ‘raxonomi’?
132
+
133
+
#response[Fixed]
134
+
135
+
P6: ‘the the’ is incorrect.
136
+
137
+
#response[Fixed]
138
+
139
+
P7: ‘possible’ should be ‘possibly’.
140
+
141
+
#response[Fixed]
142
+
143
+
P7: ‘table has’ should be ‘table currently has’.
144
+
145
+
#response[Not fixed, "at the time of writing" is specified immediately after in the sentence]
146
+
147
+
P7: omit ‘at the time of writing’.
148
+
149
+
#response[Not fixed]
150
+
151
+
P7: replace ‘search faster’ with ‘searches faster’.
0 commit comments