Skip to content

Commit 7588668

Browse files
committed
abstract
1 parent d5ec7c5 commit 7588668

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

thesis/index.Rmd

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,26 @@ acknowledgments: |
3131
Lab colleagues, present and past (Michael, Likit, Jiarong, ...)
3232
3333
abstract: |
34-
"TODO TODO TODO"
34+
Biology as a field is being transformed by the increasing availability of data, especially genomic sequencing data.
35+
Computational methods that can adapt and take advantage from this data deluge are essential
36+
for exploring and providing insights for new hypothesis,
37+
helping to unveil the biological processes that expensive or even impossible to study before.
38+
39+
This dissertation introduces data structures and approaches for scaling data
40+
analysis to hundreds of thousands of datasets using _Scaled MinHash_ sketches,
41+
a reduced space representation of the original datasets that can lower computational
42+
requirements for similarity and containment estimation;
43+
_MHBT_ and _LCA_ indices,
44+
structures for indexing and searching large collections of _Scaled MinHash_ sketches;
45+
`gather`, a new top-down approach for decomposing datasets into a collection of
46+
reference components that can be implemented efficiently with _Scaled MinHash_
47+
sketches and _MHBT_ and _LCA_ indices;
48+
`wort`,
49+
a distributed system for large scale sketch computation across heterogeneous systems,
50+
from laptops to academic clusters and cloud instances,
51+
including prototypes for containment searches across millions of datasets;
52+
as well as explorations on how to facilitate sharing and increase the
53+
resilience of sketches collections built from public genomic data.
3554
3655
# End of UCD thesis fields
3756
knit: "bookdown::render_book"

0 commit comments

Comments
 (0)