ContextLab
diff --git a/‎paper/main.pdf‎
2.69 KB b/‎paper/main.pdf‎
2.69 KB
diff --git a/‎paper/main.tex‎
Lines changed: 65 additions & 30 deletions b/‎paper/main.tex‎
Lines changed: 65 additions & 30 deletions
@@ -672,37 +672,72 @@ \section*{Results}
 responses to extract meaningful information about both what they know and what
 they do not know.
 
-Finally, when we estimated participants' knowledge for each question about one
+Finally, we estimated participants' knowledge for each question about one
 lecture using their performance on questions (from the same quiz) about the
-\textit{other} lecture, we observed a somewhat different pattern of results.
-Here we found that before viewing either lecture (i.e., on Quiz~1),
-participants' abilities to answer \textit{Four Fundamental Forces}-related
-questions could not be predicted from their responses to \textit{Birth of
-Stars}-related questions ($OR = 1.896,\ 95\%\ \textnormal{CI} = [0.419,\
-9.088],\ \lambda_{LR} = 0.712,\ p = 0.404$), nor could their abilities to
-answer \textit{Birth of Stars}-related questions be predicted from their
-responses to \textit{Four Fundamental Forces}-related questions ($OR = 1.522,\
-95\%\ \textnormal{CI} = [0.332,\ 6.835],\ \lambda_{LR} = 0.286,\ p = 0.611$).
-Similarly, we found that participants' performance on questions about either
-lecture could not be predicted given their responses to questions about the
-other lecture after viewing \textit{Four Fundamental Forces} but before viewing
-\textit{Birth of Stars} (i.e., on Quiz~2; \textit{Four Fundamental Forces}
-questions given \textit{Birth of Stars} questions: $OR = 3.49,\ 95\%\
-\textnormal{CI} = [0.739,\ 12.849],\ \lambda_{LR} = 3.266,\ p = 0.083$;
-\textit{Birth of Stars} questions given \textit{Four Fundamental Forces}
-questions: $OR = 2.199,\ 95\%\ \textnormal{CI} = [0.711,\ 5.623],\
-\lambda_{LR} = 2.304,\ p = 0.141$). Only after viewing \textit{both} lectures
-(i.e., on Quiz~3) did these across-lecture knowledge estimates reliably predict
-participants' success on individual quiz questions (\textit{Four Fundamental
-Forces} questions given \textit{Birth of Stars} questions: $OR = 11.294,\
-95\%\ \textnormal{CI} = [1.375,\ 47.744],\ \lambda_{LR} = 10.396,\ p < 0.001$;
-\textit{Birth of Stars} questions given \textit{Four Fundamental Forces}
-questions: $OR = 7.302,\ 95\%\ \textnormal{CI} = [1.077,\ 44.879],\
-\lambda_{LR} = 4.708,\ p = 0.038$). Taken together, these results suggest that
-our knowledge estimates can be used to predict participants' success across
-content areas once they have received some training on both the content about
-which their knowledge is estimated and the content used to construct these
-estimates.
+\textit{other} lecture. This is an especially stringent test of our approach.
+Our primary assumption in building our knowledge estimates is that knowledge
+about a given concept is similar to knowledge about other concepts that are
+nearby in the embedding space. However, our analyses in Figure~\ref{fig:topics}
+and Supplementary Figure~\topicWeights~show that the embeddings of content from
+the two lectures are largely distinct. Therefore any predictive power of the
+knowledge estimates must overcome large distances in the embedding space. To
+put this in concrete terms, this test requires predicting participants'
+performance on individual highly specific questions about the formation of
+stars, using each participants' responses to just five multiple choice
+questions about the fundamental forces of the universe (and vice versa).
+
+We found that, before viewing either lecture (i.e., on Quiz~1), participants'
+abilities to answer \textit{Four Fundamental Forces}-related questions could
+not be predicted from their responses to \textit{Birth of Stars}-related
+questions ($OR = 1.896,\ 95\%\ \textnormal{CI} = [0.419,\ 9.088],\ \lambda_{LR}
+= 0.712,\ p = 0.404$), nor could their abilities to answer \textit{Birth of
+Stars}-related questions be predicted from their responses to \textit{Four
+Fundamental Forces}-related questions ($OR = 1.522,\ 95\%\ \textnormal{CI} =
+[0.332,\ 6.835],\ \lambda_{LR} = 0.286,\ p = 0.611$). Similarly, we found that
+participants' performance on questions about either lecture could not be
+predicted given their responses to questions about the other lecture after
+viewing \textit{Four Fundamental Forces} but before viewing \textit{Birth of
+Stars} (i.e., on Quiz~2; \textit{Four Fundamental Forces} questions given
+\textit{Birth of Stars} questions: $OR = 3.49,\ 95\%\ \textnormal{CI} =
+[0.739,\ 12.849],\ \lambda_{LR} = 3.266,\ p = 0.083$; \textit{Birth of Stars}
+questions given \textit{Four Fundamental Forces} questions: $OR = 2.199,\ 95\%\
+\textnormal{CI} = [0.711,\ 5.623],\ \lambda_{LR} = 2.304,\ p = 0.141$). Only
+after viewing \textit{both} lectures (i.e., on Quiz~3) did these across-lecture
+knowledge estimates reliably predict participants' success on individual quiz
+questions (\textit{Four Fundamental Forces} questions given \textit{Birth of
+Stars} questions: $OR = 11.294,\ 95\%\ \textnormal{CI} = [1.375,\ 47.744],\
+\lambda_{LR} = 10.396,\ p < 0.001$; \textit{Birth of Stars} questions given
+\textit{Four Fundamental Forces} questions: $OR = 7.302,\ 95\%\ \textnormal{CI}
+= [1.077,\ 44.879],\ \lambda_{LR} = 4.708,\ p = 0.038$). Taken together, our
+results suggest that our ability to form estimates solely across different
+content areas is more limited than our ability to form estimates that
+incorporate responses to questions across both content areas (as in
+Fig.~\ref{fig:predictions}, ``All questions'') or within a single content area (as
+in Fig.~\ref{fig:predictions}, ``Within-lecture'').  However, if participants have recently
+received some training on both content areas, the knowledge estimates appear to be informative
+even across content areas.
+
+We speculate that these ``Across-lecture'' results might relate to some of our
+earlier work on the nature of semantic representations~\citep{MannKaha12}. In
+that work, we asked whether semantic similarities could be captured through
+behavioral measures, even if participants' ``true'' internal representations
+differed from the embeddings used to \textit{characterize} participants'
+behaviors. We found that mismatches between someone's internal representation
+of a set of concepts and the representation used to characterize their
+behaviors can lead to underestimates of how semantically driven their behaviors
+are. Along similar lines, we suspect that in our current study, participants'
+conceptual representations may initially differ from the representations
+learned by our topic model. (Although the topic models are still
+\textit{related} to participants' initial internal representations; otherwise
+we would have found that knowledge estimates derived from Quiz 1 and 2
+responses would have no predictive power in the other tests we conducted.)
+After watching both lectures, however, participants' internal representations
+may become more aligned with the embeddings used to estimate their knowledge
+(since those embeddings were trained on the lecture transcripts). This could
+help explain why the knowledge estimates derived from Quizzes 1 and 2 (before
+both lectures had been watched) do not reliably predict performance across
+content areas, whereas estiamtes derived from Quiz 3 \textit{do} reliably
+predict performance across content areas.
 
 That the knowledge predictions derived from the text embedding space reliably
 distinguish between held-out correctly versus incorrectly answered questions