You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reflects its embedding-space distance from the }\DIFaddend held-out \DIFdelbegin\DIFdel{question about the
780
+
}\textit{\DIFdel{other}} %DIFAUXCMD
781
+
\DIFdel{lecture (``Across-lecture''; predicting knowledge for }\DIFdelend\DIFaddbegin\DIFadd{question; see
782
782
Eqn.~\ref{eqn:prop}), if these weights are uninformative (e.g., randomly
783
783
distributed), then we should expect to see this same inverse relationship
784
784
between estimated knowledge and performance, on average. On the other hand, if
@@ -797,23 +797,23 @@ \section*{Results}
797
797
\DIFadd{Before presenting our results, it is worth considering three possible
798
798
explanations of why a participant might answer a given question correctly or
799
799
incorrectly. One possibility is that the participant simply }\textit{\DIFadd{guessed}}
800
-
\DIFadd{the answer. A second is that they selected an answer by mistake, despite
801
-
``knowing'' the correct answer. In both of these scenarios, the participant's
802
-
knowledge about the question's content should be uninformative about their
803
-
observed response. A third possibility is that the participant's response
804
-
reflects their }\textit{\DIFadd{actual}} \DIFadd{knowledge about the question's content. In this
805
-
case, we }\textit{\DIFadd{might}} \DIFadd{expect to see a positive relationship between the
806
-
participant's knowledge and their likelihood of answering the question
807
-
correctly. However, in order to see this positive relationship, the
808
-
participant's knowledge must be structured in a way that is reflected (at least
809
-
partially) by the embedding space. In other words, if the participant's
810
-
performance reflects their true knowledge, but our text embedding space does
811
-
not sufficiently capture the structure of that knowledge, then the }\DIFaddend knowledge
812
-
\DIFaddbegin\DIFadd{estimates we generate will not be predictive of the participant's performance.
813
-
In the extreme, if the embedding space is completely unstructured with respect
814
-
to the content of the quiz questions, then we would expect to see the negative
815
-
relationship between estimated knowledge and performance that we described
816
-
above.
800
+
\DIFadd{the answer. A second is that they selected the incorrect answer by mistake,
801
+
despite ``knowing'' the correct answer (or vice versa). In both of these
802
+
scenarios, the participant's knowledge about the question's content should be
803
+
uninformative about their observed response. A third possibility is that the
804
+
participant's response reflects their }\textit{\DIFadd{actual}} \DIFadd{knowledge about the
805
+
question's content. In this case, we }\textit{\DIFadd{might}} \DIFadd{expect to see a positive
806
+
relationship between the participant's knowledge and their likelihood of
807
+
answering the question correctly. However, in order to see this positive
808
+
relationship, the participant's knowledge must be structured in a way that is
809
+
reflected (at least partially) by the embedding space. In other words, if the
810
+
participant's performance reflects their true knowledge, but our text embedding
811
+
space does not sufficiently capture the structure of that knowledge, then the
812
+
}\DIFaddend knowledge \DIFaddbegin\DIFadd{estimates we generate will not be predictive of the participant's
813
+
performance. In the extreme, if the embedding space is completely unstructured
814
+
with respect to the content of the quiz questions, then we would expect to see
815
+
the negative relationship between estimated knowledge and performance that we
816
+
described above.
817
817
}
818
818
819
819
\DIFadd{When we fit a GLMM to estimates of participants' knowledge for each Quiz~1
@@ -973,7 +973,7 @@ \section*{Results}
973
973
95\%\ \textnormal{CI} = [3.033, 3.866],\ p = 0.094$). These ``prediction
974
974
failures'' appear to come from the fact that any signal derived from
975
975
participants' knowledge about the content of the }\textit{\DIFadd{Birth of Stars}}
976
-
\DIFadd{lecture (prior to watching it) is swamped by the much more dramatic increase in
976
+
\DIFadd{lecture (prior to watching it) is overwhelmed by the much more dramatic increase in
977
977
their knowledge about the content of the }\textit{\DIFadd{Four Fundamental Forces}}
978
978
\DIFadd{(which they watched just prior to taking Quiz~2). This is reflected in their
979
979
Quiz~2 performance for questions about each lecture (mean proportion correct
@@ -988,15 +988,11 @@ \section*{Results}
988
988
p = 0.017$) using responses to questions about the other lecture's content.
989
989
Across all three versions of these analyses, our results suggest that (by and
990
990
large) our knowledge estimates can reliably predict participants' abilities to
991
-
answer individual quiz questions, }\DIFaddend distinguish between questions about \DIFdelbegin\DIFdel{more subtly different contentwithin the same lecture}\DIFdelend\DIFaddbegin\DIFadd{similar
991
+
answer individual quiz questions, }\DIFaddend distinguish between questions about \DIFdelbegin\DIFdel{more subtly different contentwithin the
992
+
same lecture}\DIFdelend\DIFaddbegin\DIFadd{similar
992
993
content, and generalize across content areas, provided that participants' quiz
993
994
responses reflect a minimum level of ``real'' knowledge about both content on
994
-
which these predictions are based and that for which they are made. Our results
995
-
also indicate some important limitations of our approach: if participants' quiz
996
-
performance does not reflect what they know (e.g., when they ``guess''), or if
997
-
their knowledge is not structured in a way that is reflected by the embedding
998
-
space, then our knowledge estimates will not be predictive of their
999
-
performance}\DIFaddend .
995
+
which these predictions are based and that for which they are made}\DIFaddend .
1000
996
1001
997
%DIF > our approach works when participants have a minimal baseline level of knowledge about content predicted and used to predict
1002
998
%DIF > our approach generalizes when knowledge of content used to predict can be assumed to be a reasonable indicator of knowledge of content predicted
0 commit comments