@@ -672,37 +672,72 @@ \section*{Results}
672672responses to extract meaningful information about both what they know and what
673673they do not know.
674674
675- Finally, when we estimated participants' knowledge for each question about one
675+ Finally, we estimated participants' knowledge for each question about one
676676lecture using their performance on questions (from the same quiz) about the
677- \textit {other } lecture, we observed a somewhat different pattern of results.
678- Here we found that before viewing either lecture (i.e., on Quiz~1),
679- participants' abilities to answer \textit {Four Fundamental Forces }-related
680- questions could not be predicted from their responses to \textit {Birth of
681- Stars }-related questions ($ OR = 1.896 ,\ 95 \%\ \textnormal {CI} = [0.419 ,\
682- 9.088 ],\ \lambda _{LR} = 0.712 ,\ p = 0.404 $ ), nor could their abilities to
683- answer \textit {Birth of Stars }-related questions be predicted from their
684- responses to \textit {Four Fundamental Forces }-related questions ($ OR = 1.522 ,\
685- 95 \%\ \textnormal {CI} = [0.332 ,\ 6.835 ],\ \lambda _{LR} = 0.286 ,\ p = 0.611 $ ).
686- Similarly, we found that participants' performance on questions about either
687- lecture could not be predicted given their responses to questions about the
688- other lecture after viewing \textit {Four Fundamental Forces } but before viewing
689- \textit {Birth of Stars } (i.e., on Quiz~2; \textit {Four Fundamental Forces }
690- questions given \textit {Birth of Stars } questions: $ OR = 3.49 ,\ 95 \%\
691- \textnormal {CI} = [0.739 ,\ 12.849 ],\ \lambda _{LR} = 3.266 ,\ p = 0.083 $ ;
692- \textit {Birth of Stars } questions given \textit {Four Fundamental Forces }
693- questions: $ OR = 2.199 ,\ 95 \%\ \textnormal {CI} = [0.711 ,\ 5.623 ],\
694- \lambda _{LR} = 2.304 ,\ p = 0.141 $ ). Only after viewing \textit {both } lectures
695- (i.e., on Quiz~3) did these across-lecture knowledge estimates reliably predict
696- participants' success on individual quiz questions (\textit {Four Fundamental
697- Forces } questions given \textit {Birth of Stars } questions: $ OR = 11.294 ,\
698- 95 \%\ \textnormal {CI} = [1.375 ,\ 47.744 ],\ \lambda _{LR} = 10.396 ,\ p < 0.001 $ ;
699- \textit {Birth of Stars } questions given \textit {Four Fundamental Forces }
700- questions: $ OR = 7.302 ,\ 95 \%\ \textnormal {CI} = [1.077 ,\ 44.879 ],\
701- \lambda _{LR} = 4.708 ,\ p = 0.038 $ ). Taken together, these results suggest that
702- our knowledge estimates can be used to predict participants' success across
703- content areas once they have received some training on both the content about
704- which their knowledge is estimated and the content used to construct these
705- estimates.
677+ \textit {other } lecture. This is an especially stringent test of our approach.
678+ Our primary assumption in building our knowledge estimates is that knowledge
679+ about a given concept is similar to knowledge about other concepts that are
680+ nearby in the embedding space. However, our analyses in Figure~\ref {fig:topics }
681+ and Supplementary Figure~\topicWeights ~show that the embeddings of content from
682+ the two lectures are largely distinct. Therefore any predictive power of the
683+ knowledge estimates must overcome large distances in the embedding space. To
684+ put this in concrete terms, this test requires predicting participants'
685+ performance on individual highly specific questions about the formation of
686+ stars, using each participants' responses to just five multiple choice
687+ questions about the fundamental forces of the universe (and vice versa).
688+
689+ We found that, before viewing either lecture (i.e., on Quiz~1), participants'
690+ abilities to answer \textit {Four Fundamental Forces }-related questions could
691+ not be predicted from their responses to \textit {Birth of Stars }-related
692+ questions ($ OR = 1.896 ,\ 95 \%\ \textnormal {CI} = [0.419 ,\ 9.088 ],\ \lambda _{LR}
693+ = 0.712 ,\ p = 0.404 $ ), nor could their abilities to answer \textit {Birth of
694+ Stars }-related questions be predicted from their responses to \textit {Four
695+ Fundamental Forces }-related questions ($ OR = 1.522 ,\ 95 \%\ \textnormal {CI} =
696+ [0.332 ,\ 6.835 ],\ \lambda _{LR} = 0.286 ,\ p = 0.611 $ ). Similarly, we found that
697+ participants' performance on questions about either lecture could not be
698+ predicted given their responses to questions about the other lecture after
699+ viewing \textit {Four Fundamental Forces } but before viewing \textit {Birth of
700+ Stars } (i.e., on Quiz~2; \textit {Four Fundamental Forces } questions given
701+ \textit {Birth of Stars } questions: $ OR = 3.49 ,\ 95 \%\ \textnormal {CI} =
702+ [0.739 ,\ 12.849 ],\ \lambda _{LR} = 3.266 ,\ p = 0.083 $ ; \textit {Birth of Stars }
703+ questions given \textit {Four Fundamental Forces } questions: $ OR = 2.199 ,\ 95 \%\
704+ \textnormal {CI} = [0.711 ,\ 5.623 ],\ \lambda _{LR} = 2.304 ,\ p = 0.141 $ ). Only
705+ after viewing \textit {both } lectures (i.e., on Quiz~3) did these across-lecture
706+ knowledge estimates reliably predict participants' success on individual quiz
707+ questions (\textit {Four Fundamental Forces } questions given \textit {Birth of
708+ Stars } questions: $ OR = 11.294 ,\ 95 \%\ \textnormal {CI} = [1.375 ,\ 47.744 ],\
709+ \lambda _{LR} = 10.396 ,\ p < 0.001 $ ; \textit {Birth of Stars } questions given
710+ \textit {Four Fundamental Forces } questions: $ OR = 7.302 ,\ 95 \%\ \textnormal {CI}
711+ = [1.077 ,\ 44.879 ],\ \lambda _{LR} = 4.708 ,\ p = 0.038 $ ). Taken together, our
712+ results suggest that our ability to form estimates solely across different
713+ content areas is more limited than our ability to form estimates that
714+ incorporate responses to questions across both content areas (as in
715+ Fig.~\ref {fig:predictions }, `` All questions'' ) or within a single content area (as
716+ in Fig.~\ref {fig:predictions }, `` Within-lecture'' ). However, if participants have recently
717+ received some training on both content areas, the knowledge estimates appear to be informative
718+ even across content areas.
719+
720+ We speculate that these `` Across-lecture'' results might relate to some of our
721+ earlier work on the nature of semantic representations~\citep {MannKaha12 }. In
722+ that work, we asked whether semantic similarities could be captured through
723+ behavioral measures, even if participants' `` true'' internal representations
724+ differed from the embeddings used to \textit {characterize } participants'
725+ behaviors. We found that mismatches between someone's internal representation
726+ of a set of concepts and the representation used to characterize their
727+ behaviors can lead to underestimates of how semantically driven their behaviors
728+ are. Along similar lines, we suspect that in our current study, participants'
729+ conceptual representations may initially differ from the representations
730+ learned by our topic model. (Although the topic models are still
731+ \textit {related } to participants' initial internal representations; otherwise
732+ we would have found that knowledge estimates derived from Quiz 1 and 2
733+ responses would have no predictive power in the other tests we conducted.)
734+ After watching both lectures, however, participants' internal representations
735+ may become more aligned with the embeddings used to estimate their knowledge
736+ (since those embeddings were trained on the lecture transcripts). This could
737+ help explain why the knowledge estimates derived from Quizzes 1 and 2 (before
738+ both lectures had been watched) do not reliably predict performance across
739+ content areas, whereas estiamtes derived from Quiz 3 \textit {do } reliably
740+ predict performance across content areas.
706741
707742That the knowledge predictions derived from the text embedding space reliably
708743distinguish between held-out correctly versus incorrectly answered questions
0 commit comments