@@ -150,7 +150,6 @@ \section*{Introduction}
150150should both update our characterizations of `` what is known'' and also
151151unlock any now-satisfied dependencies of those newly learned concepts so that
152152they are `` tagged'' as available for future learning.
153- % TODO: is the second half of this paragraph relevant to this paper??
154153
155154Here we develop a framework for modeling how conceptual knowledge is acquired
156155during learning. The central idea behind our framework is to use text embedding
@@ -516,15 +515,10 @@ \section*{Results}
516515content tested by a given question, our estimates of their knowledge should carry some
517516predictive information about whether the participant is likely to answer that
518517question correctly or incorrectly. We developed a statistical approach to test this claim.
519- For each question in turn, we used Equation~\ref {eqn:prop } to estimate each
518+ For each question, in turn, we used Equation~\ref {eqn:prop } to estimate each
520519participant's knowledge at the given question's embedding space coordinate,
521520using all \textit {other } questions that participant answered on the same quiz.
522- % For each question in turn, for each
523- % participant, we used Equation~\ref{eqn:prop} to estimate (using all
524- % \textit{other} questions from the same quiz, from the same participant) the
525- % participant's knowledge at the held-out question's embedding coordinate.
526- For
527- each quiz, we grouped these estimates into two distributions: one for the
521+ For each quiz, we grouped these estimates into two distributions: one for the
528522estimated knowledge at the coordinates of \textit {correctly } answered
529523questions, and another for the estimated knowledge at the coordinates of
530524\textit {incorrectly } answered questions (Fig.~\ref {fig:predictions }). We then
@@ -929,19 +923,10 @@ \subsubsection*{Constructing text embeddings of multiple lectures and questions}
929923sliding window covered only its first line, the second
930924sliding window covered the first two lines, and so on. This ensured that each
931925line from the transcripts appeared in the same number ($ w$ ) of sliding windows.
932- % and was equally represented in the topic model's training corpus.
933926After performing various standard text preprocessing (e.g., normalizing case,
934927lemmatizing, removing punctuation and stop-words), we treated the text from
935928each sliding window as a single `` document,'' and combined these documents
936929across the two videos' windows to create a single training corpus for the topic model.
937- % To select an appropriate number of topics for the model, we identified the
938- % minimum $k$ that yielded at least one ``unused'' topic (i.e., in which all words
939- % in the vocabulary were assigned zero weight) after training. This indicated that the
940- % number of topics was sufficient to capture the set of latent themes present in the two
941- % lectures. We found this value to be $k = 15$ topics.
942- % Supplementary Figure~\topicWordWeights~displays the distribution of weights over words
943- % in the vocabulary for each discovered topic, and each topic's top-weighted words may be found
944- % in Supplementary Table~\topics.
945930
946931After fitting a topic model to the two videos' transcripts, we could use the
947932trained model to transform arbitrary (potentially new) documents into
0 commit comments