ContextLab
diff --git a/‎paper/changes.pdf‎
1.03 KB b/‎paper/changes.pdf‎
1.03 KB
diff --git a/‎paper/changes.tex‎
Lines changed: 20 additions & 27 deletions b/‎paper/changes.tex‎
Lines changed: 20 additions & 27 deletions
diff --git a/‎paper/compile.sh‎
Lines changed: 8 additions & 0 deletions b/‎paper/compile.sh‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎paper/main.pdf‎
1.05 KB b/‎paper/main.pdf‎
1.05 KB
@@ -1,7 +1,7 @@
 \documentclass[10pt]{article}
 %DIF LATEXDIFF DIFFERENCE FILE
 %DIF DEL old.tex    Mon Feb 19 07:49:49 2024
-%DIF ADD main.tex   Mon Feb 19 07:48:14 2024
+%DIF ADD main.tex   Mon Feb 19 12:49:27 2024
 \usepackage[utf8]{inputenc}
 \usepackage[english]{babel}
 \usepackage[font=small,labelfont=bf]{caption}
@@ -698,30 +698,23 @@ \section*{Results}
 Second, we \DIFdelbegin \DIFdel{used questions
 about one lecture to predict knowledge at the embedding coordinate of a held-out
 question about the }\textit{\DIFdel{other}} %DIFAUXCMD
-\DIFdel{lecture , }\DIFdelend \DIFaddbegin \DIFadd{estimated knowledge for each question about one lecture using only questions (}\DIFaddend from the same \DIFdelbegin \DIFdel{quiz and participant }\DIFdelend \DIFaddbegin \DIFadd{participant and quiz) about the }\textit{\DIFadd{other}} \DIFadd{lecture }\DIFaddend (``Across-lecture''\DIFdelbegin \DIFdel{in }\DIFdelend \DIFaddbegin \DIFadd{; }\DIFaddend Fig.~\ref{fig:predictions}\DIFaddbegin \DIFadd{, middle rows}\DIFaddend ). 
-This test was intended to \DIFdelbegin \DIFdel{test }\DIFdelend \DIFaddbegin \DIFadd{assess }\DIFaddend the \textit{generalizability} of our approach by asking whether our \DIFdelbegin \DIFdel{knowledge }\DIFdelend predictions held across the content areas of the two lectures.
+\DIFdel{lecture , }\DIFdelend \DIFaddbegin \DIFadd{estimated knowledge for each question about a given lecture using only the other questions (}\DIFaddend from the same \DIFdelbegin \DIFdel{quiz and participant (``Across-lecture''in }\DIFdelend \DIFaddbegin \DIFadd{participant and quiz) about that }\textit{\DIFadd{same}} \DIFadd{lecture (``Within-lecture''; }\DIFaddend Fig.~\ref{fig:predictions}\DIFaddbegin \DIFadd{, middle rows}\DIFaddend ). 
+This test was intended to \DIFdelbegin \DIFdel{test the }\DIFdelend \DIFaddbegin \DIFadd{assess the }\DIFaddend \textit{\DIFdelbegin \DIFdel{generalizability}\DIFdelend \DIFaddbegin \DIFadd{specificity}\DIFaddend } of our approach by asking whether our \DIFdelbegin \DIFdel{knowledge predictions held across the content areas of the two lectures}\DIFdelend \DIFaddbegin \DIFadd{predictions could distinguish between questions about different content covered by the same lecture}\DIFaddend .
 Third, we \DIFdelbegin \DIFdel{used questions about one lecture to predict knowledge at the embedding
 coordinate of a held-out question about the }\textit{\DIFdel{same}} %DIFAUXCMD
-\DIFdel{lecture , }\DIFdelend \DIFaddbegin \DIFadd{estimated knowledge for each question about a given lecture using only the other questions (}\DIFaddend from the same \DIFdelbegin \DIFdel{quiz and participant }\DIFdelend \DIFaddbegin \DIFadd{participant and quiz) about that }\textit{\DIFadd{same}} \DIFadd{lecture }\DIFaddend (``Within-lecture''\DIFdelbegin \DIFdel{in }\DIFdelend \DIFaddbegin \DIFadd{; }\DIFaddend Fig.~\ref{fig:predictions}\DIFaddbegin \DIFadd{, bottom rows}\DIFaddend ). 
-This test was intended to \DIFdelbegin \DIFdel{test }\DIFdelend \DIFaddbegin \DIFadd{assess }\DIFaddend the \textit{specificity} of our approach by asking whether our \DIFdelbegin \DIFdel{knowledge }\DIFdelend predictions could distinguish between questions about different content covered by the same lecture.
-\DIFdelbegin \DIFdel{We repeated each of these
+\DIFdel{lecture , }\DIFdelend \DIFaddbegin \DIFadd{estimated knowledge for each question about one lecture using only questions (}\DIFaddend from the same \DIFdelbegin \DIFdel{quiz and participant (``Within-lecture''in }\DIFdelend \DIFaddbegin \DIFadd{participant and quiz) about the }\textit{\DIFadd{other}} \DIFadd{lecture (``Across-lecture''; }\DIFaddend Fig.~\ref{fig:predictions}\DIFaddbegin \DIFadd{, bottom rows}\DIFaddend ). 
+This test was intended to \DIFdelbegin \DIFdel{test the }\DIFdelend \DIFaddbegin \DIFadd{assess the }\DIFaddend \textit{\DIFdelbegin \DIFdel{specificity}\DIFdelend \DIFaddbegin \DIFadd{generalizability}\DIFaddend } of our approach by asking whether our \DIFdelbegin \DIFdel{knowledge predictions could distinguish between questions about
+different content covered by the same lecture.
+We repeated each of these
 analysesusing all possible held-out questions for each quiz and participant.
-}\DIFdelend 
+}\DIFdelend \DIFaddbegin \DIFadd{predictions held across the content areas of the two lectures.
+}\DIFaddend 
 
 \DIFdelbegin \DIFdel{For the initial quizzes participantstook (prior to watching either lecture),
 predicted knowledge tended to be low overall, and relatively
 unstructured (Fig. 
 ~\ref{fig:predictions}, left column). 
-When }\DIFdelend %DIF > When we estimated participants' knowledge for each Quiz~1 question based on all other Quiz~1 questions, we found an inverse relationship. 
-%DIF > Specifically, higher estimated knowledge at the embedding coordinate at a held-out question was associated with a lower likelihood of answering the question correctly ($\textrm{odds ratio}\ (OR) = 0.136,\ \textrm{likelihood-ratio test statistic}\ (\lambda_{LR}) = 19.749,\ \textrm{95\% CI} = [14.352,\ 26.545],\ p = 0.001$). 
-%DIF > However, this inverse relationship in fact represents the expected result under our null hypothesis (that estimated knowledge is \textit{not} predictive of success on a question). 
-%DIF > An intuition for this can be taken from the expected outcome of same analysis based on the simple proportion correct, rather than estimated knowledge. 
-%DIF > Suppose a participant answered $n$ out of 13 quiz questions correctly. 
-%DIF > If we held out a single correctly answered question and computed the proportion of remaining questions answered correctly, that proportion would be $(n - 1) / 12$. 
-%DIF > Whereas if we held out a single incorrectly answered question, the proportion of remaining questions answered correctly would be $n / 12$. 
-\DIFaddbegin 
-
-\DIFadd{In performing this set of analyses, our null hypothesis is that the knowledge estimates we compute based on the quiz questions' embedding coordinates do }\textit{\DIFadd{not}} \DIFadd{provide useful information about participants' abilities to answer those questions. 
+When }\DIFdelend \DIFaddbegin \DIFadd{In performing this set of analyses, our null hypothesis is that the knowledge estimates we compute based on the quiz questions' embedding coordinates do }\textit{\DIFadd{not}} \DIFadd{provide useful information about participants' abilities to answer those questions. 
 What result might we expect to see if this is the case? 
 To provide an intuition for this, consider the expected outcome if we carried out these same analyses using a simple proportion-correct measure in lieu of our knowledge estimates. 
 Suppose a participant correctly answered $n$ out of 13 questions on a given quiz. 
@@ -736,7 +729,7 @@ \section*{Results}
 Given that our knowledge estimates are computed as a weighted version of this same proportion-correct score (where each held-in question's weight reflects its embedding-space distance from the }\DIFaddend held-out question\DIFdelbegin \DIFdel{(``All questions''; $\U
 = 50587,~p = 0.723$), when we used questionsfrom one lecture to predict
 knowledge }\DIFdelend \DIFaddbegin \DIFadd{; see Eqn.~\ref{eqn:prop}), if these weights are uninformative (e.g., simply randomly distributed), then we should expect to see this same inverse relationship emerge, on average. 
-It is only if the spatial relationships among the quiz questions' embedding coordinates map onto participants' knowledge in a meaningful way that we would we expect this relationship to be non-negative }[\textbf{\DIFadd{PHRASING}}]\DIFadd{.
+It is only if the spatial relationships among the quiz questions' embedding coordinates map onto participants' knowledge in a meaningful way that we would we expect this relationship to be non-negative.
 }
 
 \DIFadd{When we fit a GLMM to estimates of participants' knowledge for each Quiz~1 question based on all other Quiz~1 questions, we observed this null-hypothesized inverse relationship. 
@@ -751,8 +744,8 @@ \section*{Results}
 questions from one lecture to predict knowledge at the embedding coordinate of a held-out question }\DIFdelend \DIFaddbegin \DIFadd{likelihood-ratio test statistic $(\lambda_{LR}) = 19.749$, 95\%\ $\textnormal{CI} = [14.352,\ 26.545],\ p = 0.001$). 
 However, when we repeated this analysis for quizzes 2 and 3, the direction of this relationship reversed: higher estimated knowledge for a given question predicted a greater likelihood of answering it correctly (Quiz~2: $OR = 2.905,\ \lambda_{LR} = 17.333,\ 95\%\ \textnormal{CI} = [14.966,\ 29.309],\ p = 0.002$; Quiz~3: $OR = 3.238,\ \lambda_{LR} = 6.882,\ 95\%\ \textnormal{CI} = [6.228,\ 8.184],\ p = 0.017$). 
 Taken together, these results suggest that our knowledge estimations can reliably predict participants' likelihood of success on individual quiz questions, provided they have at least some amount of structured knowledge about the underlying concepts being tested. 
-In other words, when participants' correct responses primarily arise from knowledge about the content probed by each question (e.g., after watching one or both lectures), these successes can be predicted from their ability to answer other questions about conceptually similar content (as captured by embedding-space distance).
-However, when a sufficiently large portion of participants' correct responses (presumably) reflect successful random guessing (such as on a multiple-choice quiz taken before viewing either lecture), our approach fails to accurately predict these successes since they do not map onto embedding space distances in a meaningful way }[\textbf{\DIFadd{PHRASING}}]\DIFadd{.
+In other words, when participants' correct responses arise primarily from knowledge about the content probed by each question (e.g., after watching one or both lectures), these successes can be predicted from their ability to answer other questions about conceptually similar content (as captured by embedding-space distance).
+However, when a sufficiently large portion of participants' correct responses (presumably) reflect successful random guessing (such as on a multiple-choice quiz taken before viewing either lecture), our approach fails to accurately predict these successes because they are not structured (with respect to spatial distance within the embedding space) in a meaningful way.
 }
 
 \DIFadd{We observed a similar pattern when we fit GLMMs to estimates of participants' knowledge for each question about one lecture derived from other questions }\DIFaddend about the \textit{same} \DIFdelbegin \DIFdel{lecture (``Within-lecture'';
@@ -823,18 +816,18 @@ \section*{Results}
 %DIF >  ALTERNATE EXPLANATION -- embedding space is essentially ``saturated'' with correctly answered questions, so just like how on quiz 1 when relatively few questions are correct, most questions ``around'' them will be incorrect, on quiz 3 when relatively few questions are incorrect, most questions nearby will be correct. And because of this, on average, when the ``held-out'' question is one of the few incorrect ones, there will tend to be more correct ones ``held in'' than there will be when the held-out question is correct.
 %DIF >  Also, maybe worth noting: while negative relationship is significant, it's super weak -- per the model, a "1-unit" increase in estimated knowledge corresponds to only a 1.28% decrease in probability of correct answer (p = OR / (1 + OR)). For comparison, for quiz3/within-lecture/birth of stars, 1-unit increase in estimated knowledge corresponds to a 84.5% increase in probability. So decrease is sig. but basically negligible.
 Taken together, \DIFdelbegin \DIFdel{the results in Figure~\ref{fig:predictions} indicate }\DIFdelend \DIFaddbegin \DIFadd{these results suggest }\DIFaddend that our approach can \DIFdelbegin \DIFdel{reliably predict acquired knowledge (especially about recently
-learned content), and that the knowledge predictions are generalizable across the content areas spanned by the two lectures, while also specific enough to }\DIFdelend \DIFaddbegin \DIFadd{distinguish between questions about different content covered by a single lecture when participants have sufficiently structured knowledge about that lecture's content, though this specificity may decrease further in time from when the lecture in question was viewed.
+learned content ), and that the knowledge predictions are generalizable across the content areas spanned by the two lectures, while also specific enough to }\DIFdelend \DIFaddbegin \DIFadd{distinguish between questions about different content covered by a single lecture when participants have sufficiently structured knowledge about its contents, though this specificity may decrease further in time from when the lecture in question was viewed.
 }
 
-\DIFadd{Finally, when we fit GLMMs to estimates of participants' knowledge for questions about one lecture based on questions (from the same quiz) about the other lecture, we observed a similar but slightly more nuanced pattern. 
-Essentially, while the previous set of analyses suggest that our approach's ability to make }\textit{\DIFadd{specific}} \DIFadd{predictions within content areas depends on participants having a minimum level of knowledge about the given content, the across-lecture analyses we performed suggest that our ability to }\textit{\DIFadd{generalize}} \DIFadd{these predictions across different content areas requires that participants' level of knowledge about the content used to make predictions be reasonably similar to their level of knowledge about the content for which these predictions are made }[\textbf{\DIFadd{PHRASING}}]\DIFadd{.
-We found that using questions answered on Quiz~1, participants abilities to correctly answer questions about }\textit{\DIFadd{Four Fundamental Forces}} \DIFadd{could be predicted from their responses to questions about }\textit{\DIFadd{Birth of Stars}} \DIFadd{($OR = 1.896,\ \lambda_{LR} = 7.205,\ 95\%\ \textnormal{CI} = [6.224, 7.524],\ p = 0.039$) and their ability to correctly answer }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions could be predicted from their responses to }\textit{\DIFadd{Four Fundamental Forces}}\DIFadd{-related questions ($OR = 1.522,\ \lambda_{LR} =  6.448,\ 95\%\ \textnormal{CI} = [5.656, 6.843],\ p = 0.043$).
-We note, however, that these Quiz~1 knowledge estimates suffer from the same ``noise'' due to the (presumably) higher rate of participants successfully guessing correct answers on Quiz~1 as noted above, and as a result provide the weakest signal of any of the knowledge estimates that we found to reliably predict success.
+\DIFadd{Finally, when we fit GLMMs to estimates of participants' knowledge for questions about one lecture using questions they answered (on the same quiz) about the }\textit{\DIFadd{other}} \DIFadd{lecture, we observed a similar but slightly more nuanced pattern. 
+Essentially, while the previous set of within-lecture analyses suggest that the }\textit{\DIFadd{specificity}} \DIFadd{of our predictions within a single content area depends on participants having a minimum level of knowledge about that content, these across-lecture analyses suggest that our ability to }\textit{\DIFadd{generalize}} \DIFadd{our predictions across different content areas requires that participants' level of knowledge about the content used to make predictions be reasonably similar to their level of knowledge about the content for which these predictions are made.
+Using questions answered on Quiz~1, we found that participants' abilities to correctly answer questions about }\textit{\DIFadd{Four Fundamental Forces}} \DIFadd{could be predicted from their responses to questions about }\textit{\DIFadd{Birth of Stars}} \DIFadd{($OR = 1.896,\ \lambda_{LR} = 7.205,\ 95\%\ \textnormal{CI} = [6.224, 7.524],\ p = 0.039$) and similarly, that their ability to correctly answer }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions could be predicted from their responses to }\textit{\DIFadd{Four Fundamental Forces}}\DIFadd{-related questions ($OR = 1.522,\ \lambda_{LR} =  6.448,\ 95\%\ \textnormal{CI} = [5.656, 6.843],\ p = 0.043$).
+We note, however, that these Quiz~1 knowledge estimates are subject to the same increased ``noise'' due to the (presumably) higher incidence of observed correct answers arising from successful random guessing (compared to the other two quizzes) as noted above, and as a result, provide the weakest signal of any of the knowledge estimates that we found reliably predicted success.
 When we repeated this analysis using questions from Quiz~2, we found participants' responses to }\textit{\DIFadd{Four Fundamental Forces}}\DIFadd{-related questions did not reliably predict their success on }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions ($OR = 1.865,\ \lambda_{LR} = 3.205,\ 95\%\ \textnormal{CI} = [3.027, 3.600],\ p = 0.125$), nor did their responses to }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions reliably predict their success on }\textit{\DIFadd{Four Fundamental Forces}}\DIFadd{-related questions ($OR = 3.490,\ \lambda_{LR} = 3.266,\ 95\%\ \textnormal{CI} = [3.033, 3.866],\ p = 0.094$). 
 }\textbf{\DIFadd{Sentence about why this makes sense given that participants hadn't viewed BoS yet. i.e., when predicting held-out FFF questions, correct vs. incorrect labels for held-in q's aren't meaningfully structured w.r.t. embedding space; when predicting held-out BoS q's, whether or not held-out q was correctly answered isn't meaningfully related to spatial structure of correctly answered q's in embedding space.}}
-\DIFadd{However, when we again computed these across-lecture knowledge predictions using questions from Quiz~3 (when participants had now viewed }\textit{\DIFadd{both}} \DIFadd{lectures, we found that we could again reliably predict success on questions about }\textit{\DIFadd{Four Fundamental Forces}} \DIFadd{($OR = 11.294),\ \lambda_{LR} = 11.055,\ 95\%\ \textnormal{CI} = [9.126, 18.476],\ p = 0.004$) and }\textit{\DIFadd{Birth of Stars}} \DIFadd{($OR = 7.302),\ \lambda_{LR} = 7.068,\ 95\%\ \textnormal{CI} = [6.490, 8.584],\ p = 0.017$).
+\DIFadd{However, when we again computed these across-lecture knowledge predictions using questions from Quiz~3 (when participants had now viewed }\textit{\DIFadd{both}} \DIFadd{lectures, we found that we could again reliably predict success on questions about both }\textit{\DIFadd{Four Fundamental Forces}} \DIFadd{($OR = 11.294),\ \lambda_{LR} = 11.055,\ 95\%\ \textnormal{CI} = [9.126, 18.476],\ p = 0.004$) and }\textit{\DIFadd{Birth of Stars}} \DIFadd{($OR = 7.302,\ \lambda_{LR} = 7.068,\ 95\%\ \textnormal{CI} = [6.490, 8.584],\ p = 0.017$) using responses to questions questions about the other lecture's content.
 Across all three versions of these analyses, our results suggest that our knowledge estimations can reliably predict participants' abilities to answer individual quiz questions, }\DIFaddend distinguish between questions about \DIFdelbegin \DIFdel{more subtly different contentwithin the
-same lecture}\DIFdelend \DIFaddbegin \DIFadd{similar content, and generalize across content areas, provided that participants' quiz responses reflect a minimum level of ``real'' knowledge about both content on which these predictions are based and that for which they are made }[\textbf{\DIFadd{PHRASING}}]\DIFaddend .
+same lecture}\DIFdelend \DIFaddbegin \DIFadd{similar content, and generalize across content areas, provided that participants' quiz responses reflect a minimum level of ``real'' knowledge about both content on which these predictions are based and that for which they are made}\DIFaddend .
 
 %DIF >  our approach works when participants have a minimal baseline level of knowledge about content predicted and used to predict
 %DIF >  our approach generalizes when knowledge of content used to predict can be assumed to be a reasonable indicator of knowledge of content predicted
 
@@ -14,5 +14,13 @@ latex -interaction=nonstopmode supplement
 latex -interaction=nonstopmode supplement
 pdflatex -interaction=nonstopmode supplement
 
+latexdiff old.tex main.tex > changes.tex
+latex -interaction=nonstopmode changes
+bibtex changes
+latex -interaction=nonstopmode changes
+latex -interaction=nonstopmode changes
+pdflatex -interaction=nonstopmode changes
+
+
 rm *.cb* *.dvi *.log *.blg *.aux *.fff *.out