@@ -670,37 +670,72 @@ \section*{Results}
670670participants' quiz responses to extract meaningful information about both what
671671they know and what they do not know.
672672
673- Finally, when we estimated participants' knowledge for each question about one
673+ Finally, we estimated participants' knowledge for each question about one
674674lecture using their performance on questions (from the same quiz) about the
675- \textit {other } lecture, we observed a somewhat different pattern of results.
676- Here we found that before viewing either lecture (i.e., on Quiz~1),
677- participants' abilities to answer \textit {Four Fundamental Forces }-related
678- questions could not be predicted from their responses to \textit {Birth of
679- Stars }-related questions ($ OR = 1.896 ,\ 95 \%\ \textnormal {CI} = [0.419 ,\
680- 9.088 ],\ \lambda _{LR} = 0.712 ,\ p = 0.404 $ ), nor could their abilities to
681- answer \textit {Birth of Stars }-related questions be predicted from their
682- responses to \textit {Four Fundamental Forces }-related questions ($ OR = 1.522 ,\
683- 95 \%\ \textnormal {CI} = [0.332 ,\ 6.835 ],\ \lambda _{LR} = 0.286 ,\ p = 0.611 $ ).
684- Similarly, we found that participants' performance on questions about either
685- lecture could not be predicted given their responses to questions about the
686- other lecture after viewing \textit {Four Fundamental Forces } but before viewing
687- \textit {Birth of Stars } (i.e., on Quiz~2; \textit {Four Fundamental Forces }
688- questions given \textit {Birth of Stars } questions: $ OR = 3.49 ,\ 95 \%\
689- \textnormal {CI} = [0.739 ,\ 12.849 ],\ \lambda _{LR} = 3.266 ,\ p = 0.083 $ ;
690- \textit {Birth of Stars } questions given \textit {Four Fundamental Forces }
691- questions: $ OR = 2.199 ,\ 95 \%\ \textnormal {CI} = [0.711 ,\ 5.623 ],\
692- \lambda _{LR} = 2.304 ,\ p = 0.141 $ ). Only after viewing \textit {both } lectures
693- (i.e., on Quiz~3) did these across-lecture knowledge estimates reliably predict
694- participants' success on individual quiz questions (\textit {Four Fundamental
695- Forces } questions given \textit {Birth of Stars } questions: $ OR = 11.294 ,\
696- 95 \%\ \textnormal {CI} = [1.375 ,\ 47.744 ],\ \lambda _{LR} = 10.396 ,\ p < 0.001 $ ;
697- \textit {Birth of Stars } questions given \textit {Four Fundamental Forces }
698- questions: $ OR = 7.302 ,\ 95 \%\ \textnormal {CI} = [1.077 ,\ 44.879 ],\
699- \lambda _{LR} = 4.708 ,\ p = 0.038 $ ). Taken together, these results suggest that
700- our knowledge estimates can be used to predict participants' success across
701- content areas once they have received some training on both the content about
702- which their knowledge is estimated and the content used to construct these
703- estimates.
675+ \textit {other } lecture. This is an especially stringent test of our approach.
676+ Our primary assumption in building our knowledge estimates is that knowledge
677+ about a given concept is similar to knowledge about other concepts that are
678+ nearby in the embedding space. However, our analyses in Figure~\ref {fig:topics }
679+ and Supplementary Figure~\topicWeights ~show that the embeddings of content from
680+ the two lectures are largely distinct. Therefore any predictive power of the
681+ knowledge estimates must overcome large distances in the embedding space. To
682+ put this in concrete terms, this test requires predicting participants'
683+ performance on individual highly specific questions about the formation of
684+ stars, using each participants' responses to just five multiple choice
685+ questions about the fundamental forces of the universe (and vice versa).
686+
687+ We found that, before viewing either lecture (i.e., on Quiz~1), participants'
688+ abilities to answer \textit {Four Fundamental Forces }-related questions could
689+ not be predicted from their responses to \textit {Birth of Stars }-related
690+ questions ($ OR = 1.896 ,\ 95 \%\ \textnormal {CI} = [0.419 ,\ 9.088 ],\ \lambda _{LR}
691+ = 0.712 ,\ p = 0.404 $ ), nor could their abilities to answer \textit {Birth of
692+ Stars }-related questions be predicted from their responses to \textit {Four
693+ Fundamental Forces }-related questions ($ OR = 1.522 ,\ 95 \%\ \textnormal {CI} =
694+ [0.332 ,\ 6.835 ],\ \lambda _{LR} = 0.286 ,\ p = 0.611 $ ). Similarly, we found that
695+ participants' performance on questions about either lecture could not be
696+ predicted given their responses to questions about the other lecture after
697+ viewing \textit {Four Fundamental Forces } but before viewing \textit {Birth of
698+ Stars } (i.e., on Quiz~2; \textit {Four Fundamental Forces } questions given
699+ \textit {Birth of Stars } questions: $ OR = 3.49 ,\ 95 \%\ \textnormal {CI} =
700+ [0.739 ,\ 12.849 ],\ \lambda _{LR} = 3.266 ,\ p = 0.083 $ ; \textit {Birth of Stars }
701+ questions given \textit {Four Fundamental Forces } questions: $ OR = 2.199 ,\ 95 \%\
702+ \textnormal {CI} = [0.711 ,\ 5.623 ],\ \lambda _{LR} = 2.304 ,\ p = 0.141 $ ). Only
703+ after viewing \textit {both } lectures (i.e., on Quiz~3) did these across-lecture
704+ knowledge estimates reliably predict participants' success on individual quiz
705+ questions (\textit {Four Fundamental Forces } questions given \textit {Birth of
706+ Stars } questions: $ OR = 11.294 ,\ 95 \%\ \textnormal {CI} = [1.375 ,\ 47.744 ],\
707+ \lambda _{LR} = 10.396 ,\ p < 0.001 $ ; \textit {Birth of Stars } questions given
708+ \textit {Four Fundamental Forces } questions: $ OR = 7.302 ,\ 95 \%\ \textnormal {CI}
709+ = [1.077 ,\ 44.879 ],\ \lambda _{LR} = 4.708 ,\ p = 0.038 $ ). Taken together, our
710+ results suggest that our ability to form estimates solely across different
711+ content areas is more limited than our ability to form estimates that
712+ incorporate responses to questions across both content areas (as in
713+ Fig.~\ref {fig:predictions }, `` All questions'' ) or within a single content area (as
714+ in Fig.~\ref {fig:predictions }, `` Within-lecture'' ). However, if participants have recently
715+ received some training on both content areas, the knowledge estimates appear to be informative
716+ even across content areas.
717+
718+ We speculate that these `` Across-lecture'' results might relate to some of our
719+ earlier work on the nature of semantic representations~\citep {MannKaha12 }. In
720+ that work, we asked whether semantic similarities could be captured through
721+ behavioral measures, even if participants' `` true'' internal representations
722+ differed from the embeddings used to \textit {characterize } participants'
723+ behaviors. We found that mismatches between someone's internal representation
724+ of a set of concepts and the representation used to characterize their
725+ behaviors can lead to underestimates of how semantically driven their behaviors
726+ are. Along similar lines, we suspect that in our current study, participants'
727+ conceptual representations may initially differ from the representations
728+ learned by our topic model. (Although the topic models are still
729+ \textit {related } to participants' initial internal representations; otherwise
730+ we would have found that knowledge estimates derived from Quiz 1 and 2
731+ responses would have no predictive power in the other tests we conducted.)
732+ After watching both lectures, however, participants' internal representations
733+ may become more aligned with the embeddings used to estimate their knowledge
734+ (since those embeddings were trained on the lecture transcripts). This could
735+ help explain why the knowledge estimates derived from Quizzes 1 and 2 (before
736+ both lectures had been watched) do not reliably predict performance across
737+ content areas, whereas estiamtes derived from Quiz 3 \textit {do } reliably
738+ predict performance across content areas.
704739
705740That the knowledge predictions derived from the text embedding space reliably
706741distinguish between held-out correctly versus incorrectly answered questions
0 commit comments