@@ -950,11 +950,11 @@ \section*{Discussion}
950950line). The BERT embeddings of the lectures and questions do not show this
951951property (Supp.~Fig.~\ldaVsBERT ). We also examined per-question `` content
952952matches'' between individual questions and individual moments of each lecture
953- (Figs .~\ref {fig:question-correlations },~ \ldaVsBERT ). The time series plot of
954- individual questions' correlations are different from each other when computed
955- using LDA (e.g., the traces can be clearly visually separated), whereas the
956- correlations computed from BERT embeddings of different questions all look very
957- similar. This tells us that LDA is capturing some differences in content
953+ (Fig .~\ref {fig:question-correlations }, Supp.~Fig.~ \ldaVsBERT ). The time series
954+ plot of individual questions' correlations are different from each other when
955+ computed using LDA (e.g., the traces can be clearly visually separated), whereas
956+ the correlations computed from BERT embeddings of different questions all look
957+ very similar. This tells us that LDA is capturing some differences in content
958958between the questions, whereas BERT is not. The time series plots of individual
959959questions' correlations have clear `` peaks'' when computed using LDA, but not
960960when computed using BERT. This tells us that LDA is capturing a `` match''
@@ -1013,17 +1013,17 @@ \section*{Discussion}
10131013computing simple word overlap metrics. For example, the Jaccard similarity
10141014between text $ A$ and $ B$ is computed as the number of unique words in the
10151015intersection of words from $ A$ and $ B$ divided by the number of unique words in
1016- the union of words from $ A$ and $ B$ . In a supplementary analysis (Supp.
1017- Fig.~\jaccard ), we compared the LDA-based question-lecture matches we reported
1018- in Figure~\ref {fig:question-correlations } with the Jaccard similarities between
1019- each question and each sliding window of text from the corresponding lecture.
1020- As shown in Supplementary Figure~\jaccard , this simple word-matching approach
1021- does not appear to capture the same level of specificity as the LDA-based
1022- approach. Whereas the LDA-based approach often yields a clear peak in the
1023- time series of correlations between each question and the corresponding lecture,
1024- the Jaccard similarity-based approach does not. Furthermore, these LDA-based
1025- matches appear to capture conceptual overlaps between the questions and
1026- lectures (Supp.~Tab.~\matchTab ), whereas simple word matching does not. For
1016+ the union of words from $ A$ and $ B$ . In a supplementary analysis
1017+ (Supp.~ Fig.~\jaccard ), we compared the LDA-based question-lecture matches we
1018+ reported in Figure~\ref {fig:question-correlations } with the Jaccard similarities
1019+ between each question and each sliding window of text from the corresponding
1020+ lecture. As shown in Supplementary Figure~\jaccard , this simple word-matching
1021+ approach does not appear to capture the same level of specificity as the
1022+ LDA-based approach. Whereas the LDA-based approach often yields a clear peak in
1023+ the time series of correlations between each question and the corresponding
1024+ lecture, the Jaccard similarity-based approach does not. Furthermore, these
1025+ LDA-based matches appear to capture conceptual overlaps between the questions
1026+ and lectures (Supp.~Tab.~\matchTab ), whereas simple word matching does not. For
10271027example, one of the example questions examined in Supplementary
10281028Figure~\jaccard ~asks `` Which of the following occurs as a cloud of atoms gets
10291029more dense?'' The LDA-based matches identify lecture timepoints where the
0 commit comments