Skip to content

Commit c7bf630

Browse files
Merge pull request #107 from paxtonfitzpatrick/revision-3
minor updates to fig 6 within- and across-lecture results section
2 parents 0b8ff62 + 6ed73bf commit c7bf630

File tree

2 files changed

+47
-49
lines changed

2 files changed

+47
-49
lines changed

paper/main.pdf

-135 Bytes
Binary file not shown.

paper/main.tex

Lines changed: 47 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -655,36 +655,35 @@ \section*{Results}
655655
participants' quiz performance. On Quiz~3, after viewing both lectures, no
656656
participant answered more than three \textit{Four Fundamental Forces}-related
657657
questions incorrectly, and all but five participants (out of 50) answered two or
658-
fewer questions incorrectly. (This was the only subset of questions about either
659-
lecture, across all three quizzes, for which this was true.) Because of this,
660-
when we held out one incorrectly answered
661-
\textit{Four Fundamental Forces}-related question from a given participant's
662-
Quiz~3 responses and estimated their knowledge at its embedding coordinate using
663-
the remaining \textit{Four Fundamental Forces}-related questions they answered,
664-
for 90\% of participants, that estimate leveraged information about at most a
665-
single other question they were \textit{not} able to correctly answer. This
666-
homogeneity in participants' success on questions used to estimate their
667-
knowledge may have hurt our ability to accurately characterize the specific (and
668-
by Quiz~3, relatively few) aspects of the lecture content they did \textit{not}
669-
know about. Taken together, these results suggest that our knowledge estimates
670-
can reliably distinguish between questions about different content covered by a
671-
single lecture, provided there is sufficient diversity in participants' quiz
672-
responses to extract meaningful information about both what they know and what
673-
they do not know.
658+
fewer incorrectly. (This was the only subset of questions about either lecture,
659+
across all three quizzes, for which this was true.) Because of this, for 90\% of
660+
participants, our within-lecture estimates of their knowledge for \textit{Four
661+
Fundamental Forces}-related questions that they answered incorrectly leveraged
662+
information from at most a single other question they were \textit{not} able to
663+
correctly answer. This likely hampered our ability to accurately characterize
664+
the specific (and by the time they took Quiz~3, relatively few) aspects of the
665+
lecture content these participants did \textit{not} know about, and successfully
666+
distinguish them from the far more numerous aspects of the lecture content they
667+
now \textit{did} know about. Taken together, these results suggest that our
668+
knowledge estimates can reliably distinguish between questions about different
669+
content covered by a single lecture, provided there is sufficient diversity in
670+
participants' quiz responses to extract meaningful information about both what
671+
they know and what they do not know.
674672

675673
Finally, we estimated participants' knowledge for each question about one
676674
lecture using their performance on questions (from the same quiz) about the
677675
\textit{other} lecture. This is an especially stringent test of our approach.
678676
Our primary assumption in building our knowledge estimates is that knowledge
679677
about a given concept is similar to knowledge about other concepts that are
680678
nearby in the embedding space. However, our analyses in Figure~\ref{fig:topics}
681-
and Supplementary Figure~\topicWeights~show that the embeddings of content from
682-
the two lectures are largely distinct. Therefore any predictive power of the
683-
knowledge estimates must overcome large distances in the embedding space. To
684-
put this in concrete terms, this test requires predicting participants'
685-
performance on individual highly specific questions about the formation of
686-
stars, using each participants' responses to just five multiple choice
687-
questions about the fundamental forces of the universe (and vice versa).
679+
and Supplementary Figure~\topicWeights\ show that the embeddings of content from
680+
the two lectures (and of their associated quiz questions) are largely distinct
681+
from each other. Therefore, any predictive power of these across-lecture
682+
knowledge estimates must overcome large distances in the embedding space. To put
683+
this in concrete terms, this test requires predicting participants' performance
684+
on individual, highly specific questions about the formation of stars from their
685+
responses to just five multiple-choice questions about the fundamental forces of
686+
the universe (and vice versa).
688687

689688
We found that, before viewing either lecture (i.e., on Quiz~1), participants'
690689
abilities to answer \textit{Four Fundamental Forces}-related questions could
@@ -708,36 +707,35 @@ \section*{Results}
708707
Stars} questions: $OR = 11.294,\ 95\%\ \textnormal{CI} = [1.375,\ 47.744],\
709708
\lambda_{LR} = 10.396,\ p < 0.001$; \textit{Birth of Stars} questions given
710709
\textit{Four Fundamental Forces} questions: $OR = 7.302,\ 95\%\ \textnormal{CI}
711-
= [1.077,\ 44.879],\ \lambda_{LR} = 4.708,\ p = 0.038$). Taken together, our
710+
= [1.077,\ 44.879],\ \lambda_{LR} = 4.708,\ p = 0.038$). Taken together, these
712711
results suggest that our ability to form estimates solely across different
713712
content areas is more limited than our ability to form estimates that
714-
incorporate responses to questions across both content areas (as in
715-
Fig.~\ref{fig:predictions}, ``All questions'') or within a single content area (as
716-
in Fig.~\ref{fig:predictions}, ``Within-lecture''). However, if participants have recently
717-
received some training on both content areas, the knowledge estimates appear to be informative
718-
even across content areas.
713+
incorporate responses to questions from both content areas (as in
714+
Fig.~\ref{fig:predictions}, ``All questions'') or within a single content area
715+
(as in Fig.~\ref{fig:predictions}, ``Within-lecture''). However, if participants
716+
have recently received some training on both content areas, the knowledge
717+
estimates appear to be informative even across content areas.
719718

720719
We speculate that these ``Across-lecture'' results might relate to some of our
721-
earlier work on the nature of semantic representations~\citep{MannKaha12}. In
722-
that work, we asked whether semantic similarities could be captured through
723-
behavioral measures, even if participants' ``true'' internal representations
724-
differed from the embeddings used to \textit{characterize} participants'
725-
behaviors. We found that mismatches between someone's internal representation
726-
of a set of concepts and the representation used to characterize their
727-
behaviors can lead to underestimates of how semantically driven their behaviors
728-
are. Along similar lines, we suspect that in our current study, participants'
729-
conceptual representations may initially differ from the representations
730-
learned by our topic model. (Although the topic models are still
731-
\textit{related} to participants' initial internal representations; otherwise
732-
we would have found that knowledge estimates derived from Quiz 1 and 2
733-
responses would have no predictive power in the other tests we conducted.)
734-
After watching both lectures, however, participants' internal representations
735-
may become more aligned with the embeddings used to estimate their knowledge
736-
(since those embeddings were trained on the lecture transcripts). This could
737-
help explain why the knowledge estimates derived from Quizzes 1 and 2 (before
738-
both lectures had been watched) do not reliably predict performance across
739-
content areas, whereas estiamtes derived from Quiz 3 \textit{do} reliably
740-
predict performance across content areas.
720+
earlier work on the nature of semantic representations~\citep{MannKaha12}. In
721+
that work, we asked whether semantic similarities could be captured through
722+
behavioral measures, even if participants' ``true'' internal representations
723+
differed from the embeddings used to \textit{characterize} their behaviors. We
724+
found that mismatches between an individual's internal representation of a set
725+
of concepts and the representation used to characterize their behaviors can lead
726+
to underestimates of how semantically driven those behaviors are. Along similar
727+
lines, we suspect that in our current study, participants' conceptual
728+
representations may initially differ from the representations learned by our
729+
topic model. (Although the topic model's representations are still
730+
\textit{related} to participants' initial internal representations; otherwise we
731+
would have found that knowledge estimates derived from Quizzes~1 and 2 had no
732+
predictive power in the other tests we conducted.) After watching both lectures,
733+
however, participants' internal representations may become more aligned with the
734+
embeddings used to estimate their knowledge (since those embeddings were trained
735+
on the lectures' transcripts). This could help explain why the knowledge
736+
estimates derived from Quizzes~1 and 2 (before both lectures had been watched)
737+
do not reliably predict performance across content areas, whereas estimates
738+
derived from Quiz~3 do.
741739

742740
That the knowledge predictions derived from the text embedding space reliably
743741
distinguish between held-out correctly versus incorrectly answered questions

0 commit comments

Comments
 (0)