@@ -1238,7 +1238,7 @@ \subsection*{Analysis}
12381238\subsubsection* {Statistics }
12391239
12401240All of the statistical tests performed in our study were two-sided. The 95\%
1241- confidence intervals we reported for each correlation were estimated from
1241+ confidence intervals we report for each correlation were estimated from
12421242bootstrap distributions of 10,000 correlation coefficients obtained by
12431243sampling (with replacement) from the observed data.
12441244
@@ -1270,10 +1270,10 @@ \subsubsection*{Constructing text embeddings of multiple lectures and questions}
12701270Supplementary Figure~\topicWordWeights , and each topic's top-weighted words may
12711271be found in Supplementary Table~\topics .
12721272
1273- As illustrated in Figure~\ref {fig:sliding-windows }A, we start by building up a
1274- corpus of documents using overlapping sliding windows that span each video 's
1273+ As illustrated in Figure~\ref {fig:sliding-windows }A, we started by building up a
1274+ corpus of documents using overlapping sliding windows that spanned each lecture 's
12751275transcript. Khan Academy provides professionally created, manual transcriptions
1276- of all videos for closed captioning. However, such transcripts would not be
1276+ of all lecture videos for closed captioning. However, such transcripts would not be
12771277readily available in all contexts to which our framework could potentially be
12781278applied. Khan Academy videos are hosted on the YouTube platform, which
12791279additionally provides automated captions. We opted to use these automated
@@ -1283,9 +1283,9 @@ \subsubsection*{Constructing text embeddings of multiple lectures and questions}
12831283it more directly extensible and adaptable by others in the future.
12841284
12851285We fetched these automated transcripts using the
1286- \texttt {youtube-transcript-api } Python package~\citep {Depo18 }. The transcripts
1286+ \texttt {youtube-transcript-api } Python package~\citep {Depo18 }. Each transcript
12871287consisted of one timestamped line of text for every few seconds (mean: 2.34~s;
1288- standard deviation: 0.83~s) of spoken content in the video (i.e., corresponding
1288+ standard deviation: 0.83~s) of spoken content in the lecture (i.e., corresponding
12891289to each individual caption that would appear on-screen if viewing the lecture
12901290via YouTube, and when those lines would appear). We defined a sliding window
12911291length of (up to) $ w = 30 $ transcript lines and assigned each window a
@@ -1307,13 +1307,13 @@ \subsubsection*{Constructing text embeddings of multiple lectures and questions}
13071307approaches suggested by~\citet {BoydEtal14 }: `` actual,'' `` actually,'' `` also,''
13081308`` bit,'' `` could,'' `` e,'' `` even,'' `` first,'' `` follow,'' `` following,''
13091309`` four,'' `` let,'' `` like,'' `` mc,'' `` really,'' , `` saw,'' `` see,'' `` seen,''
1310- `` thing,'' and `` two.'' This yielded sliding windows with an average of 73.8
1311- remaining words, and lasting for an average of 62.22~seconds. We treated the
1310+ `` thing,'' and `` two.'' This yielded sliding windows containing an average of 73.8
1311+ remaining words, and spanning an average of 62.22~seconds. We treated the
13121312text from each sliding window as a single `` document'' and combined these
1313- documents across the two videos ' windows to create a single training corpus for
1313+ documents across the two lectures ' windows to create a single training corpus for
13141314the topic model.
13151315
1316- After fitting the topic model to the two videos ' transcripts, we could use the
1316+ After fitting the topic model to the two lectures ' transcripts, we could use the
13171317trained model to transform arbitrary (potentially new) documents into
13181318$ k$ -dimensional topic vectors. A convenient property of these topic vectors is
13191319that documents that reflect similar blends of topics (i.e., documents that
@@ -1326,13 +1326,13 @@ \subsubsection*{Constructing text embeddings of multiple lectures and questions}
13261326We transformed each sliding window's text into a topic vector, and then used
13271327linear interpolation (independently for each topic dimension) to resample the
13281328resulting time series to one vector per second. We also used the fitted model to
1329- obtain topic vectors for each question in our pool (see Supp.~Tab.~\questions ).
1330- Taken together, we obtained a \textit {trajectory } for each video, describing
1329+ obtain topic vectors for each quiz question in our pool (see Supp.~Tab.~\questions ).
1330+ Taken together, we obtained a \textit {trajectory } for each lecture video, describing
13311331its path through topic space, and a single coordinate for each question
1332- (Fig.~\ref {fig:sliding-windows }C). Embedding both videos and all of the
1332+ (Fig.~\ref {fig:sliding-windows }C). Embedding both lectures and all of the
13331333questions using a common model enables us to compare the content from different
1334- moments of videos , compare the content across videos , and estimate potential
1335- associations between specific questions and specific moments of video .
1334+ moments of the lectures , compare the content across lectures , and estimate potential
1335+ associations between specific questions and specific moments of lecture content .
13361336
13371337
13381338\subsubsection* {Estimating dynamic knowledge traces }\label {subsec:traces }
@@ -1349,12 +1349,12 @@ \subsubsection*{Estimating dynamic knowledge traces}\label{subsec:traces}
13491349 \mathrm {ncorr}(x, y) = \frac {\mathrm {corr}(x, y) - \mathrm {mincorr}}{\mathrm {maxcorr} - \mathrm {mincorr}},
13501350\end {equation }
13511351and where $ \mathrm {mincorr}$ and $ \mathrm {maxcorr}$ are the minimum and maximum
1352- correlations between any lecture timepoint and question, taken over all
1352+ correlations between the topic vectors for any lecture timepoint and quiz question, taken over all
13531353timepoints in the given lecture and all questions \textit {about } that
13541354lecture appearing on the given quiz. We also define $ f(s, \Omega )$ as the
13551355$ s$ \textsuperscript {th} topic vector from the set of topic vectors $ \Omega $ .
1356- Here $ t$ indexes the set of lecture topic vectors $ L$ , and $ i$ and $ j$ index
1357- the topic vectors of questions $ Q$ used to estimate the knowledge trace . Note
1356+ Here $ t$ indexes the time series of lecture topic vectors $ L$ , and $ i$ and $ j$ index
1357+ the topic vectors of questions $ Q$ used to estimate the participant's knowledge . Note
13581358that `` $ \mathrm {correct}$ '' denotes the set of indices of the questions the
13591359participant answered correctly on the given quiz.
13601360
@@ -1391,17 +1391,17 @@ \subsubsection*{Generalized linear mixed models}\label{subsec:glmm}
13911391
13921392In performing these analyses, our null hypothesis is that the knowledge
13931393estimates we compute based on the quiz questions' embedding coordinates do
1394- \textit {not } provide useful information about participants' abilities to answer
1394+ \textit {not } provide useful information about participants' abilities to correctly answer
13951395those questions---in other words, that there is no meaningful difference (on
13961396average) between the knowledge estimates we compute for questions participants
1397- answered correctly and those they answered incorrectly. Specifically, since we
1397+ answered correctly versus incorrectly. Specifically, since we
13981398estimate knowledge for a given embedding coordinate as a weighted
13991399proportion-correct score (where each question's weight reflects its
14001400embedding-space distance from the target coordinate; see Eqn.~\ref {eqn:prop }),
14011401if these weights are uninformative (e.g., randomly distributed), then our
14021402estimates of participants' knowledge should be equivalent (on average) to the
14031403\textit {unweighted } proportion of correctly answered questions used to compute
1404- them. In general, for a given participant and quiz, this expected value (i.e.,
1404+ them. In general, for a given participant and quiz, this expected null value (i.e.,
14051405that participant's proportion-correct score on that quiz) is the same for any
14061406coordinate in the embedding space (e.g., any lecture timepoint, quiz question,
14071407etc.). However, in the `` All questions'' and `` Within-lecture'' versions of the
@@ -1413,42 +1413,41 @@ \subsubsection*{Generalized linear mixed models}\label{subsec:glmm}
14131413available to estimate their knowledge for it. For example, suppose a participant
14141414correctly answered $ n$ out of $ q$ questions on a given quiz. If we hold out a
14151415single \textit {correctly } answered question as the target, the proportion of
1416- remaining questions answered correctly would be $ \frac {n - 1}{q - 1}$ . Whereas
1416+ remaining questions answered correctly would be $ \frac {n - 1}{q - 1}$ , whereas
14171417if we hold out a single \textit {incorrectly } answered question, the proportion
14181418of remaining questions answered correctly would be $ \frac {n}{q - 1}$ . Thus, the
14191419proportion of correctly answered remaining questions (and therefore the
14201420null-hypothesized value of a knowledge estimate computed from them) is always
14211421\textit {lower } for target questions a participant answered correctly than for
14221422those they answered incorrectly.
14231423
1424- To correct for this baseline inverse relationship between a participant's
1425- success on a target question and their estimated knowledge for it, we used a
1424+ To correct for this baseline difference under our null hypothesis, we used a
14261425rebalancing procedure that ensured our knowledge estimates for questions each
14271426participant answered correctly and incorrectly were computed from the
14281427\textit {same } proportion of correctly answered questions. For each target
1429- question on a given participant's quiz, we identified all remaining questions
1428+ question on a given participant's quiz, we first identified all remaining questions
14301429with the opposite `` correctness'' label (i.e., if the target question was
14311430answered correctly, we identified all remaining incorrectly answered questions,
14321431and vice versa). We then held out each of these opposite-label questions, in
14331432turn, along with the target question, and estimated the participant's knowledge
14341433for the target question using all \textit {other } remaining questions. Since each
14351434of these subsets of remaining questions was constructed by holding out one
14361435correctly answered question and one incorrectly answered question from the
1437- participant's quiz, if the participant correctly answered $ n$ out of $ q$
1436+ participant's quiz responses , if the participant correctly answered $ n$ out of $ q$
14381437questions total, then their proportion-correct score on each subset of questions
1439- used to estimate their knowledge for the target question is $ \frac {n-1}{q-2}$ ,
1438+ used to estimate their knowledge would be $ \frac {n-1}{q-2}$ ,
14401439regardless of whether they answered the target question correctly or
1441- incorrectly. Finally, averaging over these per-subset knowledge estimates
1442- yielded a rebalanced estimate of the participant's knowledge for the target
1440+ incorrectly. Finally, we averaged over these per-subset knowledge estimates
1441+ to obtain a rebalanced estimate of the participant's knowledge for the target
14431442question that leveraged information from all remaining questions' embedding
14441443coordinates, but whose expected value under our null hypothesis was the same as
14451444that of each individual subset ($ \frac {n-1}{q-2}$ ). By equalizing the
14461445null-hypothesized values of knowledge estimates for correctly and incorrectly
14471446answered questions, this procedure ensures that any meaningful relationships we
14481447observe between participants' estimated knowledge for individual quiz questions
1449- and their abilities to correctly answer them are attributable to the predictive
1450- power of the embedding-space distances used to weight questions' contributions
1451- to the knowledge estimates, rather than an artifact of our estimation procedure.
1448+ and their abilities to correctly answer them reflect the predictive
1449+ power of the embedding-space distances we use to weight questions' contributions
1450+ to the knowledge estimates, rather than an artifact of our testing procedure.
14521451Note that if a participant answered all or no questions on a given quiz
14531452correctly, their responses contained no opposite-label questions with which to
14541453perform this rebalancing, and we therefore excluded their data from our analyses
@@ -1503,9 +1502,9 @@ \subsubsection*{Generalized linear mixed models}\label{subsec:glmm}
15031502that of an analogous model which assumed (as we assume under our null
15041503hypothesis) that knowledge estimates for correctly and incorrectly answered
15051504questions did \textit {not } systematically differ, on average. Specifically, we
1506- used the same sets of observations with which we fit each `` full'' model to fit
1507- a second `` null'' model that had the same random effects structure, but in which
1508- the coefficient for the fixed effect of '' \texttt {knowledge }'' was fixed at 0
1505+ used the same sets of observations to which we fit each `` full'' model to fit
1506+ a second `` null'' model with the same random effects structure, but with
1507+ the coefficient for the fixed effect of '' \texttt {knowledge }'' constrained to zero
15091508(i.e., we removed this term from the null model). We then compared each full
15101509model to its reduced (null) equivalent using a likelihood-ratio test (LRT).
15111510Because the standard asymptotic $ \chi ^2 _d$ approximation of the null
@@ -1581,7 +1580,7 @@ \subsubsection*{Estimating the ``smoothness'' of knowledge}\label{subsec:smoothn
15811580\subsubsection* {Creating knowledge and learning map visualizations }\label {subsec:knowledge-maps }
15821581
15831582An important feature of our approach is that, given a trained text embedding
1584- model and participants' quiz performance on each question, we can estimate
1583+ model and participants' performance on each quiz question, we can estimate
15851584their knowledge about \textit {any } content expressible by the embedding
15861585model---not solely the content explicitly probed by the quiz questions, or even
15871586appearing in the lectures. To visualize these estimates
@@ -1636,7 +1635,7 @@ \subsubsection*{Creating knowledge and learning map visualizations}\label{subsec
16361635To generate our estimates, we placed a set of 39 radial basis functions (RBFs)
16371636throughout the embedding space, centered on the 2D projections for each
16381637question (i.e., we included one RBF for each question). At coordinate $ x$ , the
1639- value of an RBF centered on a question's coordinate $ \mu $ , is given by:
1638+ value of an RBF centered on a question's coordinate $ \mu $ is given by:
16401639\begin {equation }
16411640 \mathrm {RBF}(x, \mu , \lambda ) = \exp\left \{ -\frac {||x - \mu ||^2}{\lambda }\right \} .
16421641 \label {eqn:rbf }
0 commit comments