This repository contains information on the overall Capstone sequence and material for the lecture component of the course.
The course materials for each domain of inquiry is maintained by the domain expert. Links to materials for each domain may be found below, otherwise contact the section leader for your domain of choice.
- Wikipedia Edit Wars (Roberts)
- Quantitative Measurement of Artistic Style (Twomey)
- Fair Policing and Predictive Policing (Fraenkel)
- Clustering the Human Genome (Ellis)
- Malware and Graph Embeddings (Fraenkel)
Lecture is held on Mondays at two different times, in the same location:
- Monday 9:00am - 9:50am, CENTR 222 (A00)
- Monday 10:00am - 10:50am, CENTR 222 (B00)
You must attend the discussion corresponding to your chosen domain of inquiry. Attendence is mandatory.
Section | Time | Location | Title |
---|---|---|---|
Discussion A01 | W 9am-9:50am | CENTR 207 | Quantitative Measurement of Artistic Style |
Discussion A02 | W 9am-9:50a | WLH 2113 | Wikipedia Edit Wars |
Discussion A03 | W 9am-9:50a | SDSC E145 | Fair Policing and Predictive Policing |
Discussion B01 | W 10am-10:50a | CENTR 207 | Clustering the Human Genome |
Discussion B02 | W 10am-10:50am, | WLH 220 | Malware and Graph Embeddings |
Lab hours are for one-on-one help with both domain experts and methodological experts.
Lab hours serve two purposes: help with lecture-HW and guidance with the code development portion of your domain project. Methodology help is available anytime on Friday between 9:00-10:50 in B250/B260 in the CSE Basement.
-
Sometimes the lecture HW will require you to come to CSE basement to complete a portion of the HW. This will be noted on the HW assignment.
-
You are encouraged to come every week to discuss and get feedback on the code development for the work in your domain project. This course depends on self-motivated work and you should take advantage of the access to help.
-
At certain points in the quarter, you will be required to check in with course staff in lab to go through a code review of your ETL pipeline and replication work.
Unless separately scheduled with domain experts, lab hours are held Fridays in the CSE Basement (B250 and B260), either from 9:00 - 10:00 or 10:00 - 10:50.
-
You are encouraged to come to lab hours for domain specific questions as often as possible. Friday lab hours are a perfect time to come with questions about the readings or data work that is assigned for the following Wednesday. A better understanding of the concepts on Friday will pay dividents in productive work on your project Sat-Tues.
-
At various points in the quarter, you will be required to come to domain lab hours to check in with your domain expert.
The syllabus for the course may be found here.
Week | Topic: Methodology | Topic: Domain |
---|---|---|
1 | Introduction | Intro to domain problem |
2 | Anatomy of a DS project | Data generating process (context) |
3 | HOLIDAY | Description of data |
4 | Handling data | Domain specific techniques I |
5 | Workflow patterns I | Domain specific techniques II |
6 | Version control and data | Discussion of main result |
7 | HOLIDAY | Standards for evaluation in domain |
8 | Environment independence | Impacts and ethics |
9 | Advanced data handling | Related questions in domain |
10 | Multilingual workflows | Project proposals |
While the course assignments for each domain differ, they all follow a similar template, included here.
You are welcome to develop your work on your own computer, however DataHub is available for your use as well. These servers at least as large as your laptop and you can use them either as Jupyter Servers, as well as via a command-line interface. As the quarters progress, they may be provisioned for more memory intensive jobs.