Skip to content

Commit 5bcef25

Browse files
committed
recompiled docs
2 parents a441ac3 + 44add3b commit 5bcef25

File tree

10 files changed

+169
-138
lines changed

10 files changed

+169
-138
lines changed

code/notebooks/main/5_predictive-analyses.ipynb

Lines changed: 120 additions & 85 deletions
Large diffs are not rendered by default.

paper/CDL-bibliography/cdl.bib

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11

22

3+
@article{Ewen03,
4+
author = {Ewens, W J},
5+
date-added = {2024-02-21 19:20:19 -0500},
6+
date-modified = {2024-02-21 19:26:45 -0500},
7+
doi = {10.1086/346174},
8+
journal = {American Journal of Human Genetics},
9+
month = {February},
10+
number = {2},
11+
pages = {496--498},
12+
title = {{On Estimating \textit{P} Values by Monte Carlo Methods}},
13+
volume = {72},
14+
year = {2003}}
15+
316
@article{NortEtal02,
417
author = {North, B V and Curtis, D and Sham, P C},
518
date-added = {2024-02-21 18:40:44 -0500},

paper/changes.pdf

-1.94 KB
Binary file not shown.

paper/changes.tex

Lines changed: 20 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
\documentclass[10pt]{article}
22
%DIF LATEXDIFF DIFFERENCE FILE
33
%DIF DEL old.tex Mon Feb 19 07:49:49 2024
4-
%DIF ADD main.tex Wed Feb 21 21:04:59 2024
4+
%DIF ADD main.tex Thu Feb 22 00:08:08 2024
55
%DIF 2a2
66
\usepackage{amsmath} %DIF >
77
%DIF -------
@@ -830,7 +830,7 @@ \section*{Results}
830830
between what is known versus unknown. }\DIFdelend \DIFaddbegin \DIFadd{was associated with a lower likelihood of
831831
answering the question correctly (odds ratio $(OR) = 0.136$, likelihood-ratio
832832
test statistic $(\lambda_{LR}) = 19.749$, 95\%\ $\textnormal{CI} = [14.352,\
833-
26.545],\ p = 0.001$). This outcome suggests that our knowledge estimates do
833+
26.545],\ p < 0.001$). This outcome suggests that our knowledge estimates do
834834
}\textit{\DIFadd{not}} \DIFadd{provide useful information about participants' Quiz~1 performance
835835
when we aggregated across all question content areas. We speculated that this
836836
might either indicate that the knowledge estimates are uninformative in
@@ -840,8 +840,8 @@ \section*{Results}
840840
analysis for Quizzes~2 and~3, we found that }\textit{\DIFadd{higher}} \DIFadd{estimated knowledge
841841
for a given question predicted a greater likelihood of answering it correctly
842842
(Quiz~2: $OR = 2.905,\ \lambda_{LR} = 17.333,\ 95\%\ \textnormal{CI} =
843-
[14.966,\ 29.309],\ p = 0.002$; Quiz~3: $OR = 3.238,\ \lambda_{LR} = 6.882,\
844-
95\%\ \textnormal{CI} = [6.228,\ 8.184],\ p = 0.017$). Taken together, these
843+
[14.966,\ 29.309],\ p = 0.001$; Quiz~3: $OR = 3.238,\ \lambda_{LR} = 6.882,\
844+
95\%\ \textnormal{CI} = [6.228,\ 8.184],\ p = 0.016$). Taken together, these
845845
results suggest that our knowledge estimates reliably predict participants'
846846
performance on individual held-out quiz questions, but only after participants
847847
have received at least some training.
@@ -868,7 +868,7 @@ \section*{Results}
868868
difference did }\textit{\DIFdel{not}} %DIFAUXCMD
869869
\DIFdel{hold for }\DIFdelend \DIFaddbegin \DIFadd{-related questions did not reliably predict whether those questions were
870870
answered correctly ($OR = 1.891,\ \lambda_{LR} = 2.293,\ 95\%\ \textnormal{CI}
871-
= [2.091,\ 2.622],\ p = 0.139$). The same was true of knowledge estimates for
871+
= [2.091,\ 2.622],\ p = 0.138$). The same was true of knowledge estimates for
872872
held-out }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions based on other }\textit{\DIFadd{Birth
873873
of Stars}}\DIFadd{-related questions ($OR = 0.722,\ \lambda_{LR} = 5.115,\ 95\%\
874874
\textnormal{CI} = [0.094,\ 0.146],\ p = 0.738$). As in our analysis that
@@ -880,12 +880,12 @@ \section*{Results}
880880
Stars}\DIFdelbegin \DIFdel{questions
881881
}\DIFdelend \DIFaddbegin \DIFadd{), we found that they now reliably predicted success on }\textit{\DIFadd{Four
882882
Fundamental Forces}}\DIFadd{-related questions ($OR = 9.023,\ \lambda_{LR} = 18.707,\
883-
95\%\ \textnormal{CI} = [10.877,\ 22.222],\ p = 0.001$) but not on
883+
95\%\ \textnormal{CI} = [10.877,\ 22.222],\ p < 0.001$) but not on
884884
}\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions }\DIFaddend (\DIFdelbegin \DIFdel{$\U = 7419,~p = 0.739$). Again, we suggest that this might reflect
885885
a floor
886886
effect whereby, at that point in the participants' training, their knowledge
887887
about the content of the }\DIFdelend \DIFaddbegin \DIFadd{$OR = 0.306,\ \lambda_{LR} = 5.115,\
888-
95\%\ \textnormal{CI} = [4.624,\ 5.655],\ p = 0.055$). Here, we speculate that
888+
95\%\ \textnormal{CI} = [4.624,\ 5.655],\ p = 0.054$). Here, we speculate that
889889
participants might have been guessing about the }\DIFaddend \textit{Birth of Stars} \DIFdelbegin \DIFdel{material is relatively low
890890
everywhere in that region of text embedding space.
891891
}%DIFDELCMD <
@@ -908,7 +908,7 @@ \section*{Results}
908908
\DIFdel{questions from the same quiz and
909909
participant ($\U = 6126,~p = 0.006$}\DIFdelend \DIFaddbegin \DIFadd{-related
910910
questions could now reliably predict success on those questions ($OR = 5.467,\
911-
\lambda_{LR} = 10.670,\ 95\%\ \textnormal{CI} = [7.998, 12.532],\ p = 0.006$}\DIFaddend ).
911+
\lambda_{LR} = 10.670,\ 95\%\ \textnormal{CI} = [7.998, 12.532],\ p = 0.005$}\DIFaddend ).
912912
However, \DIFdelbegin \DIFdel{we found the }\textit{\DIFdel{opposite}}
913913
%DIFAUXCMD
914914
\DIFdel{effect when we carried out }\DIFdelend within-lecture knowledge \DIFdelbegin \DIFdel{predictions for held-out
@@ -925,7 +925,7 @@ \section*{Results}
925925
likelihood of successfully answering them and instead exhibited the inverse
926926
relationship we would expect to arise from unstructured knowledge (with respect
927927
to the embedding space; $OR = 0.013,\ \lambda_{LR} = 14.648,\ 95\%\
928-
\textnormal{CI} = [10.695, 23.096],\ p = 0.001$). }\DIFaddend Speculatively, we suggest
928+
\textnormal{CI} = [10.695, 23.096],\ p < 0.001$). }\DIFaddend Speculatively, we suggest
929929
that this may reflect participants forgetting some of the \textit{Four
930930
Fundamental Forces} content \DIFaddbegin \DIFadd{(e.g., perhaps in favor of prioritizing encoding
931931
the just-watched }\textit{\DIFadd{Birth of Stars}} \DIFadd{content in preparation for the third
@@ -952,11 +952,11 @@ \section*{Results}
952952
participants' abilities to correctly answer questions about }\textit{\DIFadd{Four
953953
Fundamental Forces}} \DIFadd{could be predicted from their responses to questions about
954954
}\textit{\DIFadd{Birth of Stars}} \DIFadd{($OR = 1.896,\ \lambda_{LR} = 7.205,\ 95\%\
955-
\textnormal{CI} = [6.224, 7.524],\ p = 0.039$) and similarly, that their
955+
\textnormal{CI} = [6.224, 7.524],\ p = 0.038$) and similarly, that their
956956
ability to correctly answer }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related questions could be
957957
predicted from their responses to }\textit{\DIFadd{Four Fundamental Forces}}\DIFadd{-related
958958
questions ($OR = 1.522,\ \lambda_{LR} = 6.448,\ 95\%\ \textnormal{CI} = [5.656,
959-
6.843],\ p = 0.043$). Given the results from our analyses that included all
959+
6.843],\ p = 0.042$). Given the results from our analyses that included all
960960
questions and within-lecture predictions, we were surprised to find }\DIFaddend that the
961961
knowledge \DIFdelbegin \DIFdel{predictions are generalizable across the content areas spanned by the two lectures, while also specific enough to }\DIFdelend \DIFaddbegin \DIFadd{estimates could reliably (if weakly) predict participants'
962962
performance across content from different lectures. It is possible that this
@@ -967,10 +967,10 @@ \section*{Results}
967967
responses to }\textit{\DIFadd{Four Fundamental Forces}}\DIFadd{-related questions did
968968
}\textit{\DIFadd{not}} \DIFadd{reliably predict their success on }\textit{\DIFadd{Birth of Stars}}\DIFadd{-related
969969
questions ($OR = 1.865,\ \lambda_{LR} = 3.205,\ 95\%\ \textnormal{CI} = [3.027,
970-
3.600],\ p = 0.125$), nor did their responses to }\textit{\DIFadd{Birth of
970+
3.600],\ p = 0.124$), nor did their responses to }\textit{\DIFadd{Birth of
971971
Stars}}\DIFadd{-related questions reliably predict their success on }\textit{\DIFadd{Four
972972
Fundamental Forces}}\DIFadd{-related questions ($OR = 3.490,\ \lambda_{LR} = 3.266,\
973-
95\%\ \textnormal{CI} = [3.033, 3.866],\ p = 0.094$). These ``prediction
973+
95\%\ \textnormal{CI} = [3.033, 3.866],\ p = 0.093$). These ``prediction
974974
failures'' appear to come from the fact that any signal derived from
975975
participants' knowledge about the content of the }\textit{\DIFadd{Birth of Stars}}
976976
\DIFadd{lecture (prior to watching it) is overwhelmed by the much more dramatic increase in
@@ -983,9 +983,9 @@ \section*{Results}
983983
questions from Quiz~3 (when participants had now viewed }\textit{\DIFadd{both}}
984984
\DIFadd{lectures), we could again reliably predict success on questions about both
985985
}\textit{\DIFadd{Four Fundamental Forces}} \DIFadd{($OR = 11.294),\ \lambda_{LR} = 11.055,\ 95\%\
986-
\textnormal{CI} = [9.126, 18.476],\ p = 0.004$) and }\textit{\DIFadd{Birth of Stars}}
986+
\textnormal{CI} = [9.126, 18.476],\ p = 0.003$) and }\textit{\DIFadd{Birth of Stars}}
987987
\DIFadd{($OR = 7.302,\ \lambda_{LR} = 7.068,\ 95\%\ \textnormal{CI} = [6.490, 8.584],\
988-
p = 0.017$) using responses to questions about the other lecture's content.
988+
p = 0.016$) using responses to questions about the other lecture's content.
989989
Across all three versions of these analyses, our results suggest that (by and
990990
large) our knowledge estimates can reliably predict participants' abilities to
991991
answer individual quiz questions, }\DIFaddend distinguish between questions about \DIFdelbegin \DIFdel{more subtly different contentwithin the
@@ -994,14 +994,7 @@ \section*{Results}
994994
responses reflect a minimum level of ``real'' knowledge about both content on
995995
which these predictions are based and that for which they are made}\DIFaddend .
996996

997-
%DIF > our approach works when participants have a minimal baseline level of knowledge about content predicted and used to predict
998-
%DIF > our approach generalizes when knowledge of content used to predict can be assumed to be a reasonable indicator of knowledge of content predicted
999-
%DIF > our approach has enough specificity to distinguish between content within the same lecture when it was just watched -- maybe when people forget a little bit they forget "randomly"?.
1000-
\DIFaddbegin
1001-
1002-
%DIF > potential new transition/motivation -- in the previous analyses, we identified a particular set of constraints on our estimates of participants' knowledge. This made us wonder about another potential constraint: how far away in topic space does the relevance of being able to answer a question extend and influence ability to answer a different question?
1003-
1004-
\DIFaddend That the knowledge predictions derived from the text embedding space reliably
997+
That the knowledge predictions derived from the text embedding space reliably
1005998
distinguish between held-out correctly versus incorrectly answered questions
1006999
(Fig.~\ref{fig:predictions}) suggests that spatial relationships within this
10071000
space can help explain what participants know. But how far does this
@@ -1765,8 +1758,6 @@ \subsubsection*{Estimating dynamic knowledge traces}\label{subsec:traces}
17651758
%DIF > %We chose this stopping criterion as a conceptual ``middle ground'' between two popular but opposing approaches to model selection that advocate (respectively) for either retaining the maximal model that allows convergence, regardless of singular fits~\citep[at the potential cost of decreased power; e.g.,~][]{BarrEtal13} or testing individual parameters achieving a parsimonious model by discarding all parameters that don't significantly decrease goodness of fit ~\citep[at the potential cost of increased Type I error rates; e.g.,~][]{BateEtal15b}.
17661759
%DIF > Our threshold for inclusion of random effects is intended to achieve a reasonable balance between these trade-offs.
17671760
1768-
%DIF > To assess the predictive value of our knowledge estimates for individual quiz questions, we used the same sets of observations used to fit each GLMM to fit a second set of ``null'' models (similarly with a logistic link function). We fit these models with the formula:
1769-
17701761
\DIFadd{To assess the predictive value of our knowledge estimates, we compared each
17711762
GLMM's ability to discriminate between correctly and incorrectly answered
17721763
questions to that of an analogous model that did }\textit{\DIFadd{not}} \DIFadd{consider estimated
@@ -1780,15 +1771,15 @@ \subsubsection*{Estimating dynamic knowledge traces}\label{subsec:traces}
17801771
We then compared each full model to its reduced (null) equivalent using a likelihood-ratio test (LRT).
17811772
Because the typical asymptotic $\chi^2_d$ approximation of the null distribution for the LRT statistic ($\lambda_{LR}$) is anti-conservative for models that differ in their random slope terms~\mbox{%DIFAUXCMD
17821773
\citep{GoldSimo00,ScheEtal08b,SnijBosk11}}\hskip0pt%DIFAUXCMD
1783-
, we computed $p$-values for these tests using a parametric bootstrapping procedure~\mbox{%DIFAUXCMD
1784-
\citep{HaleHojs14}}\hskip0pt%DIFAUXCMD
1774+
, we computed $p$-values for these tests using a parametric bootstrap procedure~\mbox{%DIFAUXCMD
1775+
\citep{DaviHink97,HaleHojs14}}\hskip0pt%DIFAUXCMD
17851776
.
17861777
For each of 1,000 bootstraps, we used the fitted null model to simulate a sample of observations of equal size to our original sample.
17871778
We then re-fit both the null and full models to this simulated sample and compared them via an LRT.
17881779
This yielded a distribution of $\lambda_{LR}$ statistics we may expect to observe under our null hypothesis.
17891780
Following~\mbox{%DIFAUXCMD
1790-
\citep{DaviHink97,NortEtal02}}\hskip0pt%DIFAUXCMD
1791-
, we computed a corrected $p$-value for our observed $\lambda_{LR}$ as $\frac{r + 1}{n + 1}$, where $r$ is the number of simulated model comparisons that yielded a $\lambda_{LR}$ greater than or equal to our observed value and $n$ is the number of simulations we ran (1,000).
1781+
\citet{Ewen03}}\hskip0pt%DIFAUXCMD
1782+
, we computed a corrected $p$-value for our observed $\lambda_{LR}$ as $\frac{r}{n}$, where $r$ is the number of simulated model comparisons that yielded a $\lambda_{LR}$ greater than or equal to our observed value and $n$ is the number of simulations we ran (1,000).
17921783
}
17931784
17941785
\DIFaddend \subsubsection*{Estimating the ``smoothness'' of knowledge}\label{subsec:smoothness}
-2.17 KB
Binary file not shown.
85 Bytes
Binary file not shown.
-116 Bytes
Binary file not shown.

paper/main.pdf

-2.03 KB
Binary file not shown.

0 commit comments

Comments
 (0)