Skip to content

Commit 379965f

Browse files
input jeremy's suggestions
1 parent fa47b3c commit 379965f

File tree

2 files changed

+65
-62
lines changed

2 files changed

+65
-62
lines changed

paper/main.pdf

-1 Bytes
Binary file not shown.

paper/main.tex

Lines changed: 65 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -562,9 +562,9 @@ \section*{Results}
562562
three quizzes. Then, separately for each quiz, we fit a generalized linear
563563
mixed model (GLMM) with a logistic link function to explain the probability of
564564
correctly answering a question as a function of estimated knowledge for its
565-
embedding coordinate, while accounting for varied effects of individual
566-
participants and questions (see \nameref{subsec:glmm}). To assess the predictive
567-
value of the knowledge estimates, we compared each GLMM to an analogous (i.e.,
565+
embedding coordinate, while accounting for varied effects of individual
566+
participants and questions (see \nameref{subsec:glmm}). To assess the predictive
567+
value of the knowledge estimates, we compared each GLMM to an analogous (i.e.,
568568
nested) ``null'' model that assumed these estimates carried no predictive information using parametric bootstrap likelihood-ratio tests.
569569

570570
\begin{figure}[tp]
@@ -611,9 +611,9 @@ \section*{Results}
611611
Fig.~\ref{fig:predictions}, bottom rows). This test was intended to assess the
612612
\textit{generalizability} of our approach by asking whether our predictions
613613
could extend across the content areas of the two lectures. When estimating
614-
participants' knowledge, we used a rebalancing procedure to ensure that (for a
615-
given participant and quiz) their knowledge estimates for correctly and
616-
incorrectly answered questions were computed from the same underlying proportion
614+
participants' knowledge, we used a rebalancing procedure to ensure that (for a
615+
given participant and quiz) their knowledge estimates for correctly and
616+
incorrectly answered questions were computed from the same underlying proportion
617617
of correctly answered questions (see~\nameref{subsec:glmm}).
618618

619619
When we fit a GLMM to estimates of participants' knowledge for each Quiz~1
@@ -626,7 +626,7 @@ \section*{Results}
626626
p < 0.001$) and again for Quiz~3 ($OR = 37.409,\ 95\%\ \textnormal{CI} =
627627
[10.425,\ 107.145],\ \lambda_{LR} = 40.948,\ p < 0.001$). Taken together, these
628628
results suggest that our knowledge estimates can reliably predict participants'
629-
performance on individual questions when they incorporate information from all
629+
performance on individual questions when they incorporate information from all
630630
(other) quiz content.
631631

632632
We observed a similar set of results when we restricted our estimates of
@@ -657,19 +657,19 @@ \section*{Results}
657657
questions incorrectly, and all but five participants (out of 50) answered two or
658658
fewer questions incorrectly. (This was the only subset of questions about either
659659
lecture, across all three quizzes, for which this was true.) Because of this,
660-
when we held out one incorrectly answered
661-
\textit{Four Fundamental Forces}-related question from a given participant's
662-
Quiz~3 responses and estimated their knowledge at its embedding coordinate using
663-
the remaining \textit{Four Fundamental Forces}-related questions they answered,
664-
for 90\% of participants, that estimate leveraged information about at most a
660+
when we held out one incorrectly answered
661+
\textit{Four Fundamental Forces}-related question from a given participant's
662+
Quiz~3 responses and estimated their knowledge at its embedding coordinate using
663+
the remaining \textit{Four Fundamental Forces}-related questions they answered,
664+
for 90\% of participants, that estimate leveraged information about at most a
665665
single other question they were \textit{not} able to correctly answer. This
666-
broad homogeneity in participants' success on questions used to estimate their
667-
knowledge may have hurt our ability to accurately characterize the specific (and
668-
by Quiz~3, relatively few) aspects of the lecture content they did \textit{not}
669-
know about. Taken together, these results suggest that our knowledge estimates
670-
can reliably distinguish between questions about different content covered by a
671-
single lecture, provided there is sufficient diversity in participants' quiz
672-
responses to extract meaningful information about both what they know and what
666+
homogeneity in participants' success on questions used to estimate their
667+
knowledge may have hurt our ability to accurately characterize the specific (and
668+
by Quiz~3, relatively few) aspects of the lecture content they did \textit{not}
669+
know about. Taken together, these results suggest that our knowledge estimates
670+
can reliably distinguish between questions about different content covered by a
671+
single lecture, provided there is sufficient diversity in participants' quiz
672+
responses to extract meaningful information about both what they know and what
673673
they do not know.
674674

675675
Finally, when we estimated participants' knowledge for each question about one
@@ -683,9 +683,10 @@ \section*{Results}
683683
answer \textit{Birth of Stars}-related questions be predicted from their
684684
responses to \textit{Four Fundamental Forces}-related questions ($OR = 1.522,\
685685
95\%\ \textnormal{CI} = [0.332,\ 6.835],\ \lambda_{LR} = 0.286,\ p = 0.611$).
686-
We similarly found that participants' success on questions about either lecture
687-
could not be predicted given their responses to questions about the other
688-
lecture after viewing \textit{Four Fundamental Forces} but before viewing \textit{Birth of Stars} (i.e., on Quiz~2; \textit{Four Fundamental Forces}
686+
Similarly, we found that participants' performance on questions about either
687+
lecture could not be predicted given their responses to questions about the
688+
other lecture after viewing \textit{Four Fundamental Forces} but before viewing
689+
\textit{Birth of Stars} (i.e., on Quiz~2; \textit{Four Fundamental Forces}
689690
questions given \textit{Birth of Stars} questions: $OR = 3.49,\ 95\%\
690691
\textnormal{CI} = [0.739,\ 12.849],\ \lambda_{LR} = 3.266,\ p = 0.083$;
691692
\textit{Birth of Stars} questions given \textit{Four Fundamental Forces}
@@ -726,7 +727,7 @@ \section*{Results}
726727
beyond the maximum distance at which the participant's ability to answer the
727728
question at $x$ is informative of their ability to answer a second question at
728729
location $y$, then guessing the outcome at $y$ based on $x$ should be no more
729-
successful than guessing based on a measure that does not consider
730+
successful than guessing based on a measure that does not consider
730731
embedding-space distance.
731732

732733
\begin{figure}[t]
@@ -770,22 +771,22 @@ \section*{Results}
770771
quizzes or regions of the embedding space.
771772

772773
Knowledge estimates need not be limited to the contents of these particular
773-
lectures and quizzes. As illustrated in Figure~\ref{fig:knowledge-maps}, our
774-
general approach to estimating knowledge from a small number of quiz questions
775-
may be extended to \textit{any} content, given its text embedding coordinate. To
776-
visualize how knowledge ``spreads'' through text embedding space to content
777-
beyond the lectures participants watched and the questions they answered, we
778-
first fit a new topic model to the lectures' sliding windows with $k =
779-
100$~topics. Conceptually, increasing the number of topics used by the model
780-
functions to increase the ``resolution'' of the embedding space, providing a
781-
greater ability to estimate knowledge for content that is highly similar to (but
782-
not precisely the same as) that contained in the two lectures used to train the
783-
model. We note that we used these 2D maps solely for visualization; all relevant
784-
comparisons, distance computations, and statistical tests we report above were
785-
carried out in the original 15-dimensional space, using the 15-topic model.
786-
Aside from increasing the number of topics from 15 to 100, all other procedures
787-
and model parameters were carried over from the preceding analyses. As in our
788-
other analyses, we resampled each lecture's topic trajectory to 1~Hz and
774+
lectures and quizzes. As illustrated in Figure~\ref{fig:knowledge-maps}, our
775+
general approach to estimating knowledge from a small number of quiz questions
776+
may be extended to \textit{any} content, given its text embedding coordinate. To
777+
visualize how knowledge ``spreads'' through text embedding space to content
778+
beyond the lectures participants watched and the questions they answered, we
779+
first fit a new topic model to the lectures' sliding windows with $k =
780+
100$~topics. Conceptually, increasing the number of topics used by the model
781+
functions to increase the ``resolution'' of the embedding space, providing a
782+
greater ability to estimate knowledge for content that is highly similar to (but
783+
not precisely the same as) that contained in the two lectures used to train the
784+
model. We note that we used these 2D maps solely for visualization; all relevant
785+
comparisons, distance computations, and statistical tests we report above were
786+
carried out in the original 15-dimensional space, using the 15-topic model.
787+
Aside from increasing the number of topics from 15 to 100, all other procedures
788+
and model parameters were carried over from the preceding analyses. As in our
789+
other analyses, we resampled each lecture's topic trajectory to 1~Hz and
789790
projected each question into a shared text embedding space.
790791

791792
\begin{figure}[tp]
@@ -901,17 +902,17 @@ \section*{Discussion}
901902
model, and how much their knowledge of those concepts changes with training
902903
(Fig.~\ref{fig:knowledge-maps}).
903904

904-
We view our work as making several contributions to the study of how people
905-
acquire conceptual knowledge. First, from a methodological standpoint, our
906-
modeling framework provides a systematic means of mapping out and
907-
characterizing knowledge in maps that have infinite (arbitrarily many) numbers
908-
of coordinates, and of ``filling out'' those maps using relatively small
909-
numbers of multiple choice quiz questions. Our experimental finding that we can
910-
use these maps to predict responses to held-out questions has several
911-
psychological implications as well. For example, concepts that are assigned to
912-
nearby coordinates by the text embedding model also appear to be ``known to a
913-
similar extent'' (as reflected by participants' responses to held-out
914-
questions; Fig.~\ref{fig:predictions}). This suggests that participants also
905+
Our work makes several contributions to the study of how people acquire
906+
conceptual knowledge. First, from a methodological standpoint, our modeling
907+
framework provides a systematic means of mapping out and characterizing
908+
knowledge in maps that have infinite (arbitrarily many) numbers of coordinates,
909+
and of ``filling out'' those maps using relatively small numbers of multiple
910+
choice quiz questions. Our experimental finding that we can use these maps to
911+
predict responses to held-out questions has several psychological implications
912+
as well. For example, concepts that are assigned to nearby coordinates by the
913+
text embedding model also appear to be ``known to a similar extent'' (as
914+
reflected by participants' responses to held-out questions;
915+
Fig.~\ref{fig:predictions}). This suggests that participants also
915916
\textit{conceptualize} similarly the content reflected by nearby embedding
916917
coordinates. How participants' knowledge falls off with spatial distance is
917918
captured by the knowledge maps we infer from their quiz responses
@@ -1244,7 +1245,7 @@ \subsection*{Analysis}
12441245
\subsubsection*{Statistics}
12451246

12461247
All of the statistical tests performed in our study were two-sided. The 95\%
1247-
confidence intervals we reported for each correlation were estimated from
1248+
confidence intervals we reported for each correlation were estimated from
12481249
bootstrap distributions of 10,000 correlation coefficients obtained by
12491250
sampling (with replacement) from the observed data.
12501251

@@ -1361,15 +1362,15 @@ \subsubsection*{Estimating dynamic knowledge traces}\label{subsec:traces}
13611362
$s$\textsuperscript{th} topic vector from the set of topic vectors $\Omega$.
13621363
Here $t$ indexes the set of lecture topic vectors $L$, and $i$ and $j$ index
13631364
the topic vectors of questions $Q$ used to estimate the knowledge trace. Note
1364-
that ``$\mathrm{correct}$'' denotes the set of indices of the questions the
1365+
that ``$\mathrm{correct}$'' denotes the set of indices of the questions the
13651366
participant answered correctly on the given quiz.
13661367

13671368
Intuitively, $\mathrm{ncorr}(x, y)$ is the correlation between two topic
13681369
vectors (e.g., the topic vector $x$ for one timepoint in a lecture and the
1369-
topic vector $y$ for one question on a quiz), normalized by the minimum and
1370-
maximum correlations (across all timepoints $t$ and questions $j$) to range
1371-
between 0 and 1, inclusive. Equation~\ref{eqn:prop} then computes the weighted
1372-
average proportion of correctly answered questions about the content presented
1370+
topic vector $y$ for one question on a quiz), normalized by the minimum and
1371+
maximum correlations (across all timepoints $t$ and questions $j$) to range
1372+
between 0 and 1, inclusive. Equation~\ref{eqn:prop} then computes the weighted
1373+
average proportion of correctly answered questions about the content presented
13731374
at timepoint $t$, where the weights are given by the normalized correlations
13741375
between timepoint $t$'s topic vector and the topic vectors for each question.
13751376
The normalization step (i.e., using $\mathrm{ncorr}$ instead of the raw
@@ -1506,7 +1507,8 @@ \subsubsection*{Generalized linear mixed models}\label{subsec:glmm}
15061507

15071508
To assess the predictive value of our knowledge estimates, we compared each
15081509
GLMM's ability to explain participants' success on individual quiz questions to
1509-
that of an analogous model which assumed (as we assume under our null hypothesis) that knowledge estimates for correctly and incorrectly answered
1510+
that of an analogous model which assumed (as we assume under our null
1511+
hypothesis) that knowledge estimates for correctly and incorrectly answered
15101512
questions did \textit{not} systematically differ, on average. Specifically, we
15111513
used the same sets of observations with which we fit each ``full'' model to fit
15121514
a second ``null'' model that had the same random effects structure, but in which
@@ -1654,11 +1656,12 @@ \subsubsection*{Creating knowledge and learning map visualizations}\label{subsec
16541656
\hat{k}(x) = \frac{\sum_{i \in \mathrm{correct}} \mathrm{RBF}(x, q_i, \lambda)}{\sum_{j = 1}^N \mathrm{RBF}(x, q_j, \lambda)}.
16551657
\label{eqn:rbf-knowledge}
16561658
\end{equation}
1657-
Intuitively, Equation~\ref{eqn:rbf-knowledge} computes the weighted proportion of
1658-
correctly answered questions, where the weights are given by how nearby (in the 2D space)
1659-
each question is to the $x$. We also defined \textit{learning maps} as the coordinate-by-coordinate
1660-
differences between any pair of knowledge maps. Intuitively, learning maps reflect the \textit{change}
1661-
in knowledge across two maps.
1659+
Equation~\ref{eqn:rbf-knowledge} computes the weighted proportion of correctly
1660+
answered questions, where the weights are given by how nearby (in the 2D space)
1661+
each question is to the $x$. We also defined \textit{learning maps} as the
1662+
coordinate-by-coordinate differences between any pair of knowledge maps.
1663+
Intuitively, learning maps reflect the \textit{change} in knowledge
1664+
across two maps.
16621665

16631666
\section*{Author contributions}
16641667

0 commit comments

Comments
 (0)