Data Structure

Data downloaded from CourseKata comes in a zip file with following the structure (this example assumes data for two classes were downloaded):

[datetime of download]
|   classes.csv
|   README.pdf
|  
\---classes
    +---[class id of the first class]
    |       items.csv
    |       media_views.csv
    |       page_views.csv
    |       responses.csv
    |       tags.csv
    |          
    \--- [class id of the next class]  
            items.csv
            media_views.csv
            page_views.csv
            responses.csv
            tags.csv

Each of the following six sections will describe what you should expect in each of the .csv files.

`classes.csv`

The classes data is organized into a table with a single row for each class included in the data download. Note that each of the class_id values will correspond to the name of a subfolder in the classes folder. Here is a description of each variable in the classes table:

Column	Description
institution_id	a unique identifier for the institution the class was taken at
class_id	a unique identifier for this particular class
course_name	the GitHub repository for the course
release	the GitHub branch for the course
teacher_id	a unique identifier for the teacher on CourseKata
lms	the Learning Managment System the class was deployed on
setup_yaml	a JSON string including the full structure of the course (converted to a `list` object when the data is processed)

`responses.csv`

The response data is organized into a table with a variable number of columns (depending on lrn_option_<n>, see column description below) and a number of rows equivalent to the number of responses made to questions in the course. This table will likely be very large (200 students will yield around 300,000 responses). Here is a description of each variable in the responses table:

Column	Description
institution_id	a unique identifier for the institution the class was taken at
class_id	a unique identifier for this particular class
course_name	the GitHub repository for the course
release	the GitHub branch for the course
student_id	a unique identifier for each student on CourseKata
item_id	a unique identifier for this particular question (the values are arbitrary --- treat them as random identifiers)
item_type	whether this is a learnosity or datacamp item
chapter	the chapter that the item appears in
page	the page that the item appears on
prompt	the question prompt for this response
response	the value of the response: either the value (for shorttext, plaintext, ratings, datacamp, etc.) or an array of numbers indicating the position of the multiple choice answers chosen and which correspond to the columns `lrn_option_<number>`.
points_possible	the number of points possible if the completely correct answer is given
points_earned	the number of points earned
dt_submitted	a datetime object indicating when the response was submitted (timezone: GMT/UTC)
attempt	the number of times the question has been attempted, including the current attempt
user_agent	the browser user agent string for the user (see for details: https://developer.chrome.com/multidevice/user-agent; also, this R package for parsing: `uaparserjs`)
lrn_session_id	the unique ID for this user session on Learnosity
lrn_response_id	the unique ID for this particular response on Learnosity
lrn_items_api_version	the version of the Learnosity Items API for this item
lrn_response_api_version	the version of the Learnosity Response API for this response
lrn_activity_id	the unique ID for the activity on Learnosity
lrn_question_reference	the unique ID for the question on Learnosity
lrn_question_position	for multi-question items, the position of the question in the item
lrn_type	the Learnosity type of the question (e.g. mcq, shorttext, plaintext, rating, etc.)
lrn_dt_started	a datetime object indicating when the responses was started (timezone: GMT/UTC)
lrn_dt_saved	a datetime object indicating when the responses was saved (timezone: GMT/UTC)
lrn_status	the status of the question response on Learnosity (e.g. “Completed”)
lrn_option_	for multiple choice questions (`lrn_type`: “mcq”), the value of the option at position `n`.
lrn_response_json	the fully-detailed JSON response object (converted to a `list` object when the data is processed)

For more about the distinction between Learnosity Activities, Items, and Questions, see the documentation at https://authorguide.learnosity.com/hc/en-us.

`page_views.csv`

The page view data comes in a table of student-page-datetime cases. That is, there is a row for every time a student accessed a page in the course. The table has five variables:

Column	Description
class_id	a unique identifier for this particular class
student_id	a unique identifier for each student on CourseKata
chapter	the chapter the page view occurred in
page	the page that was viewed
dt_accessed	a datetime object indicating when the page was accessed (timezone: GMT/UTC)

`media_views.csv`

The media view data refers to student interactions with the videos in the course. It comes in a table of student-video cases. That is, there is a row for each video for each student. The table has five variables:

Column	Description
institution_id	a unique identifier for the institution the class was taken at
class_id	a unique identifier for this particular class
student_id	a unique identifier for each student on CourseKata
chapter	the chapter the video is in
page	the page that the video is on
type	the type of media object (currently only “video”)
media_id	the unique identifier for the video
dt_started	a datetime object indicating when the video was first started (timezone: GMT/UTC)
dt_last_event	a datetime object indicating when the video was last interacted with (timezone: GMT/UTC)
proportion_video	the proportion of the video that student has watched, summed across all interactions with the video
proportion_time	the proportion of the full video runtime that the student has spent watching a video, regardless of which part of the video they watched (e.g. if the video is 10 seconds long, and the student watches the first 5 seconds three times, this value should be 1.5)
log_json	the fully-detailed JSON object about student interactions with the video (converted to a `list` object when the data is processed)

`items.csv`

Items are organized in a table where each row represents all of the data for a particular question in the class. There are 18 columns in the table, though they will never all be relevant for a particular item. Columns prefixed with dcl_ are only filled for DataCamp-Light items (the R-sandboxes) and columns prefixed with lrn_ are only filled for Learnosity items. Here are descriptions of the columns in the table:

Column	Description
institution_id	a unique identifier for the institution the class was taken at
class_id	a unique identifier for this particular class
item_id	a unique identifier for this particular question (the values are arbitrary --- treat them as random identifiers)
item_type	whether this is a learnosity or datacamp item
chapter	the chapter that the item appears in
page	the page that the item appears on
dcl_pre_exercise_code	the code run invisibly to set up the module
dcl_sample_code	the code in the module when it first loads
dcl_solution	the code that appears in the solution tab
dcl_sct	the solution checking code (see the `testwhat` package for details)
dcl_hint	the text that appears in the hint box
lrn_activity_reference	the unique ID for the activity on Learnosity
lrn_question_reference	the unique ID for the question on Learnosity
lrn_question_position	for multi-question items, the position of the question in the item
lrn_template_name	the template used to create the item
lrn_template_reference	a unique ID for the item template on Learnosity
lrn_item_status	the status of the item on Learnosity (e.g. “published”)
lrn_question_data	the fully-detailed JSON object that sets up the item (converted to a `list` object when the data is processed)

For more about the distinction between Learnosity Activities, Items, and Questions, see the documentation at https://authorguide.learnosity.com/hc/en-us.

`tags.csv`

Tags are not currently utilized heavily within CourseKata but may have a larger role in the future (e.g. tagging specific learning outcomes). For completeness, the table is described here, but it will likely not be of much use.

The tags table is organized at the item-tag level where each tag for each item has its own row. There are three columns in the table:

Column	Description
item_id	a unique identifier for this particular question (the values are arbitrary --- treat them as random identifiers)
tag	the tag given to this item
tag_type	the hierarchical parent tag for this tag (e.g. “Chapter” holds all of the chapter name tags)

CourseKata is a project of the UCLA Psychology Department’s Teaching and Learning Lab. It is supported in part by a grant from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Structure

`classes.csv`

`responses.csv`

`page_views.csv`

`media_views.csv`

`items.csv`

`tags.csv`

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Table of Contents

Home

Guides

Quick Start Guide

Full Guide

Understanding the Data

Data Structure

Code books

Version 1.6

Version 1.7

Version 1.8

Version 2.2

Clone this wiki locally