Skip to content

Commit 76d7d52

Browse files
committed
add missing citation and finish loose end in NB02 markdown [ci skip]
1 parent c218db4 commit 76d7d52

File tree

2 files changed

+12
-1
lines changed

2 files changed

+12
-1
lines changed

manuscript/literature.bib

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,3 +642,14 @@ @article{plattner_complete_2017
642642
year = {2017},
643643
pages = {1005}
644644
}
645+
@inproceedings{aggarwal_surprising_2001,
646+
series = {Lecture {Notes} in {Computer} {Science}},
647+
title = {On the {Surprising} {Behavior} of {Distance} {Metrics} in {High} {Dimensional} {Space}},
648+
isbn = {978-3-540-44503-6},
649+
booktitle = {Database {Theory} — {ICDT} 2001},
650+
publisher = {Springer Berlin Heidelberg},
651+
author = {Aggarwal, Charu C. and Hinneburg, Alexander and Keim, Daniel A.},
652+
editor = {Van den Bussche, Jan and Vianu, Victor},
653+
year = {2001},
654+
pages = {420--434},
655+
}

notebooks/02-dimension-reduction-and-discretization.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -760,7 +760,7 @@
760760
"\n",
761761
"## Case 3: another molecular dynamics data set (pentapeptide)\n",
762762
"\n",
763-
"Before we start to load and discretize the pentapeptide data set, let us discuss what the difficulties with larger protein systems are. The goal of this notebook is to find a state space discretization for MSM estimation. This means that an algorithm such as $k$-means has to be able to find a meaningful state space partitioning. In general, this works better in lower dimensional spaces. The modeler should be aware that a discretization of hundreds of dimensions will most likely yield unsatisfactory results due to the . \n",
763+
"Before we start to load and discretize the pentapeptide data set, let us discuss what the difficulties with larger protein systems are. The goal of this notebook is to find a state space discretization for MSM estimation. This means that an algorithm such as $k$-means has to be able to find a meaningful state space partitioning. In general, this works better in lower dimensional spaces because Euclidean distances become less meaningful with increasing dimensionality <a id=\"ref-4\" href=\"#cite-aggarwal_surprising_2001\">aggarwal-01</a>. The modeler should be aware that a discretization of hundreds of dimensions will be computationally expensive and most likely yield unsatisfactory results. \n",
764764
"\n",
765765
"The first goal is thus to map the data to a reasonable number of dimensions, e.g. with a smart choice of features and/or by using TICA. Large systems often require significant parts of the kinetic variance to be discarded in order to obtain a balance between capturing as much of the kinetic variance as possible and achieving a reasonable discretization.\n",
766766
"\n",

0 commit comments

Comments
 (0)