@@ -951,9 +951,6 @@ \section*{Discussion}
951951related concepts, and we also show how estimated knowledge falls off with
952952distance in text embedding space.
953953
954- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
955- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
956-
957954In our study, we characterize the `` coordinates'' of participants' knowledge
958955using a relatively simple `` bag of words'' text embedding model~\citep [LDA;
959956][]{BleiEtal03}. More sophisticated text embedding models, such as
@@ -969,12 +966,12 @@ \section*{Discussion}
969966the domain of physics lectures and questions) \textit {and } sufficiently broad
970967as to enable them to cover a wide range of domains. Essentially, these
971968`` larger'' language models learn these more complex features of language through
972- pre- training on enormous and diverse text corpora. But as a result, their
969+ training on enormous and diverse text corpora. But as a result, their
973970embedding spaces also `` span'' an enormous and diverse range of conceptual
974971content, sacrificing a degree of specificity in their capacities to distinguish
975972subtle conceptual differences within a more narrow range of content.
976973In comparing our LDA model (trained specifically on the lectures used in our
977- study) to a larger transformer-based model (BERT), we found that LDA provides
974+ study) to a larger transformer-based model (BERT), we found that our LDA model provides
978975both coverage of the requisite material and specificity at the level of
979976individual questions, while BERT essentially relegates the contents of both
980977lectures and all quiz questions (which are all broadly about `` physics'' ) to a
@@ -985,11 +982,13 @@ \section*{Discussion}
985982point is that simpler models trained on relatively small but specialized
986983corpora can outperform much more complex models trained on much larger corpora
987984when we are specifically interested in capturing subtle conceptual differences
988- at the level of a single course lecture or question. On the other hand, if our
989- goal had been to choose a model that generalized to many different content
990- domains, we would expect our LDA model to perform comparatively poorly to BERT
991- or other much larger models. We suggest that bridging this tradeoff will be
992- an important challenge for future work.
985+ at the level of a single course lecture or quiz question. On the other hand, if
986+ our goal had been to choose a model that generalized to many different content
987+ areas, we would expect our LDA model to perform comparatively poorly to BERT or
988+ other much larger general-purpose models. We suggest that bridging this tradeoff
989+ between high resolution within a single content area and the ability to
990+ generalize to many diverse content areas will be an important challenge for
991+ future work.
993992
994993At the opposite end of the spectrum from large language models, one could also
995994imagine using an even \textit {simpler } `` model'' than LDA that relates the
0 commit comments