@@ -31,7 +31,26 @@ acknowledgments: |
3131 Lab colleagues, present and past (Michael, Likit, Jiarong, ...)
3232
3333abstract : |
34- "TODO TODO TODO"
34+ Biology as a field is being transformed by the increasing availability of data, especially genomic sequencing data.
35+ Computational methods that can adapt and take advantage from this data deluge are essential
36+ for exploring and providing insights for new hypothesis,
37+ helping to unveil the biological processes that expensive or even impossible to study before.
38+
39+ This dissertation introduces data structures and approaches for scaling data
40+ analysis to hundreds of thousands of datasets using _Scaled MinHash_ sketches,
41+ a reduced space representation of the original datasets that can lower computational
42+ requirements for similarity and containment estimation;
43+ _MHBT_ and _LCA_ indices,
44+ structures for indexing and searching large collections of _Scaled MinHash_ sketches;
45+ `gather`, a new top-down approach for decomposing datasets into a collection of
46+ reference components that can be implemented efficiently with _Scaled MinHash_
47+ sketches and _MHBT_ and _LCA_ indices;
48+ `wort`,
49+ a distributed system for large scale sketch computation across heterogeneous systems,
50+ from laptops to academic clusters and cloud instances,
51+ including prototypes for containment searches across millions of datasets;
52+ as well as explorations on how to facilitate sharing and increase the
53+ resilience of sketches collections built from public genomic data.
3554
3655# End of UCD thesis fields
3756knit : " bookdown::render_book"
0 commit comments