-
Notifications
You must be signed in to change notification settings - Fork 1
Data Structure
Data downloaded from CourseKata comes in a zip file with following the structure (this example assumes data for two classes were downloaded):
[datetime of download]
| classes.csv
| README.pdf
|
\---classes
+---[class id of the first class]
| items.csv
| media_views.csv
| page_views.csv
| responses.csv
| tags.csv
|
\--- [class id of the next class]
items.csv
media_views.csv
page_views.csv
responses.csv
tags.csv
Each of the following six sections will describe what you should expect in each of the .csv files.
The classes data is organized into a table with a single row for each class included in the data download. Note that each of the class_id values will correspond to the name of a subfolder in the classes folder. Here is a description of each variable in the classes table:
| Column | Description |
|---|---|
| institution_id | a unique identifier for the institution the class was taken at |
| class_id | a unique identifier for this particular class |
| course_name | the GitHub repository for the course |
| release | the GitHub branch for the course |
| teacher_id | a unique identifier for the teacher on CourseKata |
| lms | the Learning Managment System the class was deployed on |
| setup_yaml | a JSON string including the full structure of the course (converted to a list object when the data is processed) |
The response data is organized into a table with a variable number of columns (depending on lrn_option_<n>, see column description below) and a number of rows equivalent to the number of responses made to questions in the course. This table will likely be very large (200 students will yield around 300,000 responses). Here is a description of each variable in the responses table:
| Column | Description |
|---|---|
| institution_id | a unique identifier for the institution the class was taken at |
| class_id | a unique identifier for this particular class |
| course_name | the GitHub repository for the course |
| release | the GitHub branch for the course |
| student_id | a unique identifier for each student on CourseKata |
| item_id | a unique identifier for this particular question (the values are arbitrary --- treat them as random identifiers) |
| item_type | whether this is a learnosity or datacamp item |
| chapter | the chapter that the item appears in |
| page | the page that the item appears on |
| prompt | the question prompt for this response |
| response | the value of the response: either the value (for shorttext, plaintext, ratings, datacamp, etc.) or an array of numbers indicating the position of the multiple choice answers chosen and which correspond to the columns lrn_option_<number>. |
| points_possible | the number of points possible if the completely correct answer is given |
| points_earned | the number of points earned |
| dt_submitted | a datetime object indicating when the response was submitted (timezone: GMT/UTC) |
| attempt | the number of times the question has been attempted, including the current attempt |
| user_agent | the browser user agent string for the user (see for details: https://developer.chrome.com/multidevice/user-agent; also, this R package for parsing: uaparserjs) |
| lrn_session_id | the unique ID for this user session on Learnosity |
| lrn_response_id | the unique ID for this particular response on Learnosity |
| lrn_items_api_version | the version of the Learnosity Items API for this item |
| lrn_response_api_version | the version of the Learnosity Response API for this response |
| lrn_activity_id | the unique ID for the activity on Learnosity |
| lrn_question_reference | the unique ID for the question on Learnosity |
| lrn_question_position | for multi-question items, the position of the question in the item |
| lrn_type | the Learnosity type of the question (e.g. mcq, shorttext, plaintext, rating, etc.) |
| lrn_dt_started | a datetime object indicating when the responses was started (timezone: GMT/UTC) |
| lrn_dt_saved | a datetime object indicating when the responses was saved (timezone: GMT/UTC) |
| lrn_status | the status of the question response on Learnosity (e.g. “Completed”) |
| lrn_option_ | for multiple choice questions (lrn_type: “mcq”), the value of the option at position n. |
| lrn_response_json | the fully-detailed JSON response object (converted to a list object when the data is processed) |
For more about the distinction between Learnosity Activities, Items, and Questions, see the documentation at https://authorguide.learnosity.com/hc/en-us.
The page view data comes in a table of student-page-datetime cases. That is, there is a row for every time a student accessed a page in the course. The table has five variables:
| Column | Description |
|---|---|
| class_id | a unique identifier for this particular class |
| student_id | a unique identifier for each student on CourseKata |
| chapter | the chapter the page view occurred in |
| page | the page that was viewed |
| dt_accessed | a datetime object indicating when the page was accessed (timezone: GMT/UTC) |
The media view data refers to student interactions with the videos in the course. It comes in a table of student-video cases. That is, there is a row for each video for each student. The table has five variables:
| Column | Description |
|---|---|
| institution_id | a unique identifier for the institution the class was taken at |
| class_id | a unique identifier for this particular class |
| student_id | a unique identifier for each student on CourseKata |
| chapter | the chapter the video is in |
| page | the page that the video is on |
| type | the type of media object (currently only “video”) |
| media_id | the unique identifier for the video |
| dt_started | a datetime object indicating when the video was first started (timezone: GMT/UTC) |
| dt_last_event | a datetime object indicating when the video was last interacted with (timezone: GMT/UTC) |
| proportion_video | the proportion of the video that student has watched, summed across all interactions with the video |
| proportion_time | the proportion of the full video runtime that the student has spent watching a video, regardless of which part of the video they watched (e.g. if the video is 10 seconds long, and the student watches the first 5 seconds three times, this value should be 1.5) |
| log_json | the fully-detailed JSON object about student interactions with the video (converted to a list object when the data is processed) |
Items are organized in a table where each row represents all of the data for a particular question in the class. There are 18 columns in the table, though they will never all be relevant for a particular item. Columns prefixed with dcl_ are only filled for DataCamp-Light items (the R-sandboxes) and columns prefixed with lrn_ are only filled for Learnosity items. Here are descriptions of the columns in the table:
| Column | Description |
|---|---|
| institution_id | a unique identifier for the institution the class was taken at |
| class_id | a unique identifier for this particular class |
| item_id | a unique identifier for this particular question (the values are arbitrary --- treat them as random identifiers) |
| item_type | whether this is a learnosity or datacamp item |
| chapter | the chapter that the item appears in |
| page | the page that the item appears on |
| dcl_pre_exercise_code | the code run invisibly to set up the module |
| dcl_sample_code | the code in the module when it first loads |
| dcl_solution | the code that appears in the solution tab |
| dcl_sct | the solution checking code (see the testwhat package for details) |
| dcl_hint | the text that appears in the hint box |
| lrn_activity_reference | the unique ID for the activity on Learnosity |
| lrn_question_reference | the unique ID for the question on Learnosity |
| lrn_question_position | for multi-question items, the position of the question in the item |
| lrn_template_name | the template used to create the item |
| lrn_template_reference | a unique ID for the item template on Learnosity |
| lrn_item_status | the status of the item on Learnosity (e.g. “published”) |
| lrn_question_data | the fully-detailed JSON object that sets up the item (converted to a list object when the data is processed) |
For more about the distinction between Learnosity Activities, Items, and Questions, see the documentation at https://authorguide.learnosity.com/hc/en-us.
Tags are not currently utilized heavily within CourseKata but may have a larger role in the future (e.g. tagging specific learning outcomes). For completeness, the table is described here, but it will likely not be of much use.
The tags table is organized at the item-tag level where each tag for each item has its own row. There are three columns in the table:
| Column | Description |
|---|---|
| item_id | a unique identifier for this particular question (the values are arbitrary --- treat them as random identifiers) |
| tag | the tag given to this item |
| tag_type | the hierarchical parent tag for this tag (e.g. “Chapter” holds all of the chapter name tags) |
CourseKata is a project of the UCLA Psychology Department’s Teaching and Learning Lab. It is supported in part by a grant from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.