@@ -676,13 +676,14 @@ \section*{Results}
676676Our primary assumption in building our knowledge estimates is that knowledge
677677about a given concept is similar to knowledge about other concepts that are
678678nearby in the embedding space. However, our analyses in Figure~\ref {fig:topics }
679- and Supplementary Figure~\topicWeights ~show that the embeddings of content from
680- the two lectures are largely distinct. Therefore any predictive power of the
681- knowledge estimates must overcome large distances in the embedding space. To
682- put this in concrete terms, this test requires predicting participants'
683- performance on individual highly specific questions about the formation of
684- stars, using each participants' responses to just five multiple choice
685- questions about the fundamental forces of the universe (and vice versa).
679+ and Supplementary Figure~\topicWeights \ show that the embeddings of content from
680+ the two lectures (and of their associated quiz questions) are largely distinct
681+ from each other. Therefore, any predictive power of these across-lecture
682+ knowledge estimates must overcome large distances in the embedding space. To put
683+ this in concrete terms, this test requires predicting participants' performance
684+ on individual, highly specific questions about the formation of stars from their
685+ responses to just five multiple-choice questions about the fundamental forces of
686+ the universe (and vice versa).
686687
687688We found that, before viewing either lecture (i.e., on Quiz~1), participants'
688689abilities to answer \textit {Four Fundamental Forces }-related questions could
@@ -706,36 +707,35 @@ \section*{Results}
706707Stars } questions: $ OR = 11.294 ,\ 95 \%\ \textnormal {CI} = [1.375 ,\ 47.744 ],\
707708\lambda _{LR} = 10.396 ,\ p < 0.001 $ ; \textit {Birth of Stars } questions given
708709\textit {Four Fundamental Forces } questions: $ OR = 7.302 ,\ 95 \%\ \textnormal {CI}
709- = [1.077 ,\ 44.879 ],\ \lambda _{LR} = 4.708 ,\ p = 0.038 $ ). Taken together, our
710+ = [1.077 ,\ 44.879 ],\ \lambda _{LR} = 4.708 ,\ p = 0.038 $ ). Taken together, these
710711results suggest that our ability to form estimates solely across different
711712content areas is more limited than our ability to form estimates that
712- incorporate responses to questions across both content areas (as in
713- Fig.~\ref {fig:predictions }, `` All questions'' ) or within a single content area (as
714- in Fig.~\ref {fig:predictions }, `` Within-lecture'' ). However, if participants have recently
715- received some training on both content areas, the knowledge estimates appear to be informative
716- even across content areas.
713+ incorporate responses to questions from both content areas (as in
714+ Fig.~\ref {fig:predictions }, `` All questions'' ) or within a single content area
715+ (as in Fig.~\ref {fig:predictions }, `` Within-lecture'' ). However, if participants
716+ have recently received some training on both content areas, the knowledge
717+ estimates appear to be informative even across content areas.
717718
718719We speculate that these `` Across-lecture'' results might relate to some of our
719- earlier work on the nature of semantic representations~\citep {MannKaha12 }. In
720- that work, we asked whether semantic similarities could be captured through
721- behavioral measures, even if participants' `` true'' internal representations
722- differed from the embeddings used to \textit {characterize } participants'
723- behaviors. We found that mismatches between someone's internal representation
724- of a set of concepts and the representation used to characterize their
725- behaviors can lead to underestimates of how semantically driven their behaviors
726- are. Along similar lines, we suspect that in our current study, participants'
727- conceptual representations may initially differ from the representations
728- learned by our topic model. (Although the topic models are still
729- \textit {related } to participants' initial internal representations; otherwise
730- we would have found that knowledge estimates derived from Quiz 1 and 2
731- responses would have no predictive power in the other tests we conducted.)
732- After watching both lectures, however, participants' internal representations
733- may become more aligned with the embeddings used to estimate their knowledge
734- (since those embeddings were trained on the lecture transcripts). This could
735- help explain why the knowledge estimates derived from Quizzes 1 and 2 (before
736- both lectures had been watched) do not reliably predict performance across
737- content areas, whereas estiamtes derived from Quiz 3 \textit {do } reliably
738- predict performance across content areas.
720+ earlier work on the nature of semantic representations~\citep {MannKaha12 }. In
721+ that work, we asked whether semantic similarities could be captured through
722+ behavioral measures, even if participants' `` true'' internal representations
723+ differed from the embeddings used to \textit {characterize } their behaviors. We
724+ found that mismatches between an individual's internal representation of a set
725+ of concepts and the representation used to characterize their behaviors can lead
726+ to underestimates of how semantically driven those behaviors are. Along similar
727+ lines, we suspect that in our current study, participants' conceptual
728+ representations may initially differ from the representations learned by our
729+ topic model. (Although the topic model's representations are still
730+ \textit {related } to participants' initial internal representations; otherwise we
731+ would have found that knowledge estimates derived from Quizzes~1 and 2 had no
732+ predictive power in the other tests we conducted.) After watching both lectures,
733+ however, participants' internal representations may become more aligned with the
734+ embeddings used to estimate their knowledge (since those embeddings were trained
735+ on the lectures' transcripts). This could help explain why the knowledge
736+ estimates derived from Quizzes~1 and 2 (before both lectures had been watched)
737+ do not reliably predict performance across content areas, whereas estimates
738+ derived from Quiz~3 do.
739739
740740That the knowledge predictions derived from the text embedding space reliably
741741distinguish between held-out correctly versus incorrectly answered questions
0 commit comments