Skip to content

Commit ddfdedd

Browse files
committed
tables
1 parent 224178b commit ddfdedd

File tree

1 file changed

+25
-20
lines changed

1 file changed

+25
-20
lines changed

manuscript.typ

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -322,28 +322,33 @@ not an issue with _Pan_, but is an issue with _e.g._ _Io_ as mentioned in the
322322
introduction, or with the common name _Lizard_, which fuzzy-matches on the
323323
hemipteran genus _Lisarda_ rather than the class _Lepidosauria_).
324324

325+
325326
Note that the use of a restricted list of names can have significant performance
326-
consequences: compare, for example, the time taken to return the taxon _Pan_ in
327-
the entire database, in all mammals, and in all primates:
328-
329-
| Names list | Fuzzy matching | Time (ms) | Allocations | Memory allocated |
330-
| -------------------- | :------------: | --------- | ----------- | ---------------- |
331-
| all | no | 23 | 34 | 2 KiB |
332-
| | yes | 105 | 2580 | 25 MiB |
333-
| `mammalfilter(true)` | no | 0.55 | 32 | 2 KiB |
334-
| | yes | 1.9 | 551 | 286 KiB |
335-
| `primatefilter()` | no | 0.15 | 33 | 2 KiB |
336-
| | yes | 0.3 | 92 | 27 KiB |
337-
338-
Clearly, the optimal search strategy is to (i) rely on name filters to ensure
339-
that searches are conducted within the appropriate NCBI division, and (ii) only
340-
rely on fuzzy matching when the strict or lowercase match fails to return a
341-
name, as fuzzy matching can result in order of magnitude more run time and
342-
memory footprint. These numbers were obtained on a single Intel i7-8665U CPU (@
343-
(1.90GHz). Using `"chimpanzees"` as the search string (one of the NCBI
344-
recognized vernaculars for _Pan_) gave qualitatively similar results, suggesting
327+
consequences. This is illustrated in @benchmark[Tab.]. When possible, the optimal search strategy is to (i) rely on name filters to ensure that searches are conducted within the appropriate NCBI division, and (ii) only rely on fuzzy matching when the strict or lowercase match fails to return a name, as fuzzy matching can result in order of magnitude more run time and memory footprint.
328+
329+
330+
#figure(
331+
placement: bottom,
332+
table(
333+
columns: 5,
334+
table.header(
335+
[Names list],
336+
[Fuzzy matching],
337+
[Time (ms)],
338+
[Allocations],
339+
[Memory footprint],
340+
),
341+
[all], [no], [23], [34], [2 KiB],
342+
[], [yes], [105], [2580], [25 MiB],
343+
[`mammalfilter(true)`], [no], [0.55], [32], [2 KiB],
344+
[], [yes], [1.9], [551], [286 KiB],
345+
[`primatefilter(true)`], [no], [0.15], [33], [2 KiB],
346+
[], [yes], [0.3], [92], [27 KiB],
347+
),
348+
caption: [Time and performance of different search strategies for the string `"chimpanzees"`. These numbers were obtained on a single Intel i7-8665U CPU (1.90GHz). Using `"Pan"` as the search string (for which `"chimpanzees"`is a recognized vernacular) gave qualitatively similar results, suggesting
345349
that there is no performance cost associated with working with synonyms or
346-
verncular input data.
350+
verncular input data.]
351+
) <benchmark>
347352

348353
== Quality of life functions
349354

0 commit comments

Comments
 (0)