@@ -562,9 +562,9 @@ \section*{Results}
562562three quizzes. Then, separately for each quiz, we fit a generalized linear
563563mixed model (GLMM) with a logistic link function to explain the probability of
564564correctly answering a question as a function of estimated knowledge for its
565- embedding coordinate, while accounting for varied effects of individual
566- participants and questions (see \nameref {subsec:glmm }). To assess the predictive
567- value of the knowledge estimates, we compared each GLMM to an analogous (i.e.,
565+ embedding coordinate, while accounting for varied effects of individual
566+ participants and questions (see \nameref {subsec:glmm }). To assess the predictive
567+ value of the knowledge estimates, we compared each GLMM to an analogous (i.e.,
568568nested) `` null'' model that assumed these estimates carried no predictive information using parametric bootstrap likelihood-ratio tests.
569569
570570\begin {figure }[tp]
@@ -611,9 +611,9 @@ \section*{Results}
611611Fig.~\ref {fig:predictions }, bottom rows). This test was intended to assess the
612612\textit {generalizability } of our approach by asking whether our predictions
613613could extend across the content areas of the two lectures. When estimating
614- participants' knowledge, we used a rebalancing procedure to ensure that (for a
615- given participant and quiz) their knowledge estimates for correctly and
616- incorrectly answered questions were computed from the same underlying proportion
614+ participants' knowledge, we used a rebalancing procedure to ensure that (for a
615+ given participant and quiz) their knowledge estimates for correctly and
616+ incorrectly answered questions were computed from the same underlying proportion
617617of correctly answered questions (see~\nameref {subsec:glmm }).
618618
619619When we fit a GLMM to estimates of participants' knowledge for each Quiz~1
@@ -626,7 +626,7 @@ \section*{Results}
626626p < 0.001 $ ) and again for Quiz~3 ($ OR = 37.409 ,\ 95 \%\ \textnormal {CI} =
627627[10.425 ,\ 107.145 ],\ \lambda _{LR} = 40.948 ,\ p < 0.001 $ ). Taken together, these
628628results suggest that our knowledge estimates can reliably predict participants'
629- performance on individual questions when they incorporate information from all
629+ performance on individual questions when they incorporate information from all
630630(other) quiz content.
631631
632632We observed a similar set of results when we restricted our estimates of
@@ -657,19 +657,19 @@ \section*{Results}
657657questions incorrectly, and all but five participants (out of 50) answered two or
658658fewer questions incorrectly. (This was the only subset of questions about either
659659lecture, across all three quizzes, for which this was true.) Because of this,
660- when we held out one incorrectly answered
661- \textit {Four Fundamental Forces }-related question from a given participant's
662- Quiz~3 responses and estimated their knowledge at its embedding coordinate using
663- the remaining \textit {Four Fundamental Forces }-related questions they answered,
664- for 90\% of participants, that estimate leveraged information about at most a
660+ when we held out one incorrectly answered
661+ \textit {Four Fundamental Forces }-related question from a given participant's
662+ Quiz~3 responses and estimated their knowledge at its embedding coordinate using
663+ the remaining \textit {Four Fundamental Forces }-related questions they answered,
664+ for 90\% of participants, that estimate leveraged information about at most a
665665single other question they were \textit {not } able to correctly answer. This
666- broad homogeneity in participants' success on questions used to estimate their
667- knowledge may have hurt our ability to accurately characterize the specific (and
668- by Quiz~3, relatively few) aspects of the lecture content they did \textit {not }
669- know about. Taken together, these results suggest that our knowledge estimates
670- can reliably distinguish between questions about different content covered by a
671- single lecture, provided there is sufficient diversity in participants' quiz
672- responses to extract meaningful information about both what they know and what
666+ homogeneity in participants' success on questions used to estimate their
667+ knowledge may have hurt our ability to accurately characterize the specific (and
668+ by Quiz~3, relatively few) aspects of the lecture content they did \textit {not }
669+ know about. Taken together, these results suggest that our knowledge estimates
670+ can reliably distinguish between questions about different content covered by a
671+ single lecture, provided there is sufficient diversity in participants' quiz
672+ responses to extract meaningful information about both what they know and what
673673they do not know.
674674
675675Finally, when we estimated participants' knowledge for each question about one
@@ -683,9 +683,10 @@ \section*{Results}
683683answer \textit {Birth of Stars }-related questions be predicted from their
684684responses to \textit {Four Fundamental Forces }-related questions ($ OR = 1.522 ,\
68568595 \%\ \textnormal {CI} = [0.332 ,\ 6.835 ],\ \lambda _{LR} = 0.286 ,\ p = 0.611 $ ).
686- We similarly found that participants' success on questions about either lecture
687- could not be predicted given their responses to questions about the other
688- lecture after viewing \textit {Four Fundamental Forces } but before viewing \textit {Birth of Stars } (i.e., on Quiz~2; \textit {Four Fundamental Forces }
686+ Similarly, we found that participants' performance on questions about either
687+ lecture could not be predicted given their responses to questions about the
688+ other lecture after viewing \textit {Four Fundamental Forces } but before viewing
689+ \textit {Birth of Stars } (i.e., on Quiz~2; \textit {Four Fundamental Forces }
689690questions given \textit {Birth of Stars } questions: $ OR = 3.49 ,\ 95 \%\
690691\textnormal {CI} = [0.739 ,\ 12.849 ],\ \lambda _{LR} = 3.266 ,\ p = 0.083 $ ;
691692\textit {Birth of Stars } questions given \textit {Four Fundamental Forces }
@@ -726,7 +727,7 @@ \section*{Results}
726727beyond the maximum distance at which the participant's ability to answer the
727728question at $ x$ is informative of their ability to answer a second question at
728729location $ y$ , then guessing the outcome at $ y$ based on $ x$ should be no more
729- successful than guessing based on a measure that does not consider
730+ successful than guessing based on a measure that does not consider
730731embedding-space distance.
731732
732733\begin {figure }[t]
@@ -770,22 +771,22 @@ \section*{Results}
770771quizzes or regions of the embedding space.
771772
772773Knowledge estimates need not be limited to the contents of these particular
773- lectures and quizzes. As illustrated in Figure~\ref {fig:knowledge-maps }, our
774- general approach to estimating knowledge from a small number of quiz questions
775- may be extended to \textit {any } content, given its text embedding coordinate. To
776- visualize how knowledge `` spreads'' through text embedding space to content
777- beyond the lectures participants watched and the questions they answered, we
778- first fit a new topic model to the lectures' sliding windows with $ k =
779- 100 $ ~topics. Conceptually, increasing the number of topics used by the model
780- functions to increase the `` resolution'' of the embedding space, providing a
781- greater ability to estimate knowledge for content that is highly similar to (but
782- not precisely the same as) that contained in the two lectures used to train the
783- model. We note that we used these 2D maps solely for visualization; all relevant
784- comparisons, distance computations, and statistical tests we report above were
785- carried out in the original 15-dimensional space, using the 15-topic model.
786- Aside from increasing the number of topics from 15 to 100, all other procedures
787- and model parameters were carried over from the preceding analyses. As in our
788- other analyses, we resampled each lecture's topic trajectory to 1~Hz and
774+ lectures and quizzes. As illustrated in Figure~\ref {fig:knowledge-maps }, our
775+ general approach to estimating knowledge from a small number of quiz questions
776+ may be extended to \textit {any } content, given its text embedding coordinate. To
777+ visualize how knowledge `` spreads'' through text embedding space to content
778+ beyond the lectures participants watched and the questions they answered, we
779+ first fit a new topic model to the lectures' sliding windows with $ k =
780+ 100 $ ~topics. Conceptually, increasing the number of topics used by the model
781+ functions to increase the `` resolution'' of the embedding space, providing a
782+ greater ability to estimate knowledge for content that is highly similar to (but
783+ not precisely the same as) that contained in the two lectures used to train the
784+ model. We note that we used these 2D maps solely for visualization; all relevant
785+ comparisons, distance computations, and statistical tests we report above were
786+ carried out in the original 15-dimensional space, using the 15-topic model.
787+ Aside from increasing the number of topics from 15 to 100, all other procedures
788+ and model parameters were carried over from the preceding analyses. As in our
789+ other analyses, we resampled each lecture's topic trajectory to 1~Hz and
789790projected each question into a shared text embedding space.
790791
791792\begin {figure }[tp]
@@ -901,17 +902,17 @@ \section*{Discussion}
901902model, and how much their knowledge of those concepts changes with training
902903(Fig.~\ref {fig:knowledge-maps }).
903904
904- We view our work as making several contributions to the study of how people
905- acquire conceptual knowledge. First, from a methodological standpoint, our
906- modeling framework provides a systematic means of mapping out and
907- characterizing knowledge in maps that have infinite (arbitrarily many) numbers
908- of coordinates, and of `` filling out'' those maps using relatively small
909- numbers of multiple choice quiz questions. Our experimental finding that we can
910- use these maps to predict responses to held-out questions has several
911- psychological implications as well. For example, concepts that are assigned to
912- nearby coordinates by the text embedding model also appear to be `` known to a
913- similar extent '' (as reflected by participants' responses to held-out
914- questions; Fig.~\ref {fig:predictions }). This suggests that participants also
905+ Our work makes several contributions to the study of how people acquire
906+ conceptual knowledge. First, from a methodological standpoint, our modeling
907+ framework provides a systematic means of mapping out and characterizing
908+ knowledge in maps that have infinite (arbitrarily many) numbers of coordinates,
909+ and of `` filling out'' those maps using relatively small numbers of multiple
910+ choice quiz questions. Our experimental finding that we can use these maps to
911+ predict responses to held-out questions has several psychological implications
912+ as well. For example, concepts that are assigned to nearby coordinates by the
913+ text embedding model also appear to be `` known to a similar extent '' (as
914+ reflected by participants' responses to held-out questions;
915+ Fig.~\ref {fig:predictions }). This suggests that participants also
915916\textit {conceptualize } similarly the content reflected by nearby embedding
916917coordinates. How participants' knowledge falls off with spatial distance is
917918captured by the knowledge maps we infer from their quiz responses
@@ -1244,7 +1245,7 @@ \subsection*{Analysis}
12441245\subsubsection* {Statistics }
12451246
12461247All of the statistical tests performed in our study were two-sided. The 95\%
1247- confidence intervals we reported for each correlation were estimated from
1248+ confidence intervals we reported for each correlation were estimated from
12481249bootstrap distributions of 10,000 correlation coefficients obtained by
12491250sampling (with replacement) from the observed data.
12501251
@@ -1361,15 +1362,15 @@ \subsubsection*{Estimating dynamic knowledge traces}\label{subsec:traces}
13611362$ s$ \textsuperscript {th} topic vector from the set of topic vectors $ \Omega $ .
13621363Here $ t$ indexes the set of lecture topic vectors $ L$ , and $ i$ and $ j$ index
13631364the topic vectors of questions $ Q$ used to estimate the knowledge trace. Note
1364- that `` $ \mathrm {correct}$ '' denotes the set of indices of the questions the
1365+ that `` $ \mathrm {correct}$ '' denotes the set of indices of the questions the
13651366participant answered correctly on the given quiz.
13661367
13671368Intuitively, $ \mathrm {ncorr}(x, y)$ is the correlation between two topic
13681369vectors (e.g., the topic vector $ x$ for one timepoint in a lecture and the
1369- topic vector $ y$ for one question on a quiz), normalized by the minimum and
1370- maximum correlations (across all timepoints $ t$ and questions $ j$ ) to range
1371- between 0 and 1, inclusive. Equation~\ref {eqn:prop } then computes the weighted
1372- average proportion of correctly answered questions about the content presented
1370+ topic vector $ y$ for one question on a quiz), normalized by the minimum and
1371+ maximum correlations (across all timepoints $ t$ and questions $ j$ ) to range
1372+ between 0 and 1, inclusive. Equation~\ref {eqn:prop } then computes the weighted
1373+ average proportion of correctly answered questions about the content presented
13731374at timepoint $ t$ , where the weights are given by the normalized correlations
13741375between timepoint $ t$ 's topic vector and the topic vectors for each question.
13751376The normalization step (i.e., using $ \mathrm {ncorr}$ instead of the raw
@@ -1506,7 +1507,8 @@ \subsubsection*{Generalized linear mixed models}\label{subsec:glmm}
15061507
15071508To assess the predictive value of our knowledge estimates, we compared each
15081509GLMM's ability to explain participants' success on individual quiz questions to
1509- that of an analogous model which assumed (as we assume under our null hypothesis) that knowledge estimates for correctly and incorrectly answered
1510+ that of an analogous model which assumed (as we assume under our null
1511+ hypothesis) that knowledge estimates for correctly and incorrectly answered
15101512questions did \textit {not } systematically differ, on average. Specifically, we
15111513used the same sets of observations with which we fit each `` full'' model to fit
15121514a second `` null'' model that had the same random effects structure, but in which
@@ -1654,11 +1656,12 @@ \subsubsection*{Creating knowledge and learning map visualizations}\label{subsec
16541656 \hat {k}(x) = \frac {\sum _{i \in \mathrm {correct}} \mathrm {RBF}(x, q_i, \lambda )}{\sum _{j = 1}^N \mathrm {RBF}(x, q_j, \lambda )}.
16551657 \label {eqn:rbf-knowledge }
16561658\end {equation }
1657- Intuitively, Equation~\ref {eqn:rbf-knowledge } computes the weighted proportion of
1658- correctly answered questions, where the weights are given by how nearby (in the 2D space)
1659- each question is to the $ x$ . We also defined \textit {learning maps } as the coordinate-by-coordinate
1660- differences between any pair of knowledge maps. Intuitively, learning maps reflect the \textit {change }
1661- in knowledge across two maps.
1659+ Equation~\ref {eqn:rbf-knowledge } computes the weighted proportion of correctly
1660+ answered questions, where the weights are given by how nearby (in the 2D space)
1661+ each question is to the $ x$ . We also defined \textit {learning maps } as the
1662+ coordinate-by-coordinate differences between any pair of knowledge maps.
1663+ Intuitively, learning maps reflect the \textit {change } in knowledge
1664+ across two maps.
16621665
16631666\section* {Author contributions }
16641667
0 commit comments