Skip to content

DoC-Noah/DSC180A-DS-Methodology

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSC 180A Capstone Quarter I

Contents

This repository contains information on the overall Capstone sequence and material for the lecture component of the course.

The course materials for each domain of inquiry is maintained by the domain expert. Links to materials for each domain may be found below, otherwise contact the section leader for your domain of choice.

Course Times and Locations

Lecture

Lecture is held on Mondays at two different times, in the same location:

  • Monday 9:00am - 9:50am, CENTR 222 (A00)
  • Monday 10:00am - 10:50am, CENTR 222 (B00)

Discussion

You must attend the discussion corresponding to your chosen domain of inquiry. Attendence is mandatory.

Section Time Location Title
Discussion A01 W 9am-9:50am CENTR 207 Quantitative Measurement of Artistic Style
Discussion A02 W 9am-9:50a WLH 2113 Wikipedia Edit Wars
Discussion A03 W 9am-9:50a SDSC E145 Fair Policing and Predictive Policing
Discussion B01 W 10am-10:50a CENTR 207 Clustering the Human Genome
Discussion B02 W 10am-10:50am, WLH 220 Malware and Graph Embeddings

Lab

Lab hours are for one-on-one help with both domain experts and methodological experts.

Lab hours for methodology

Lab hours serve two purposes: help with lecture-HW and guidance with the code development portion of your domain project. Methodology help is available anytime on Friday between 9:00-10:50 in B250/B260 in the CSE Basement.

  • Sometimes the lecture HW will require you to come to CSE basement to complete a portion of the HW. This will be noted on the HW assignment.

  • You are encouraged to come every week to discuss and get feedback on the code development for the work in your domain project. This course depends on self-motivated work and you should take advantage of the access to help.

  • At certain points in the quarter, you will be required to check in with course staff in lab to go through a code review of your ETL pipeline and replication work.

Lab hours with domain experts

Unless separately scheduled with domain experts, lab hours are held Fridays in the CSE Basement (B250 and B260), either from 9:00 - 10:00 or 10:00 - 10:50.

  • You are encouraged to come to lab hours for domain specific questions as often as possible. Friday lab hours are a perfect time to come with questions about the readings or data work that is assigned for the following Wednesday. A better understanding of the concepts on Friday will pay dividents in productive work on your project Sat-Tues.

  • At various points in the quarter, you will be required to come to domain lab hours to check in with your domain expert.

Syllabus

The syllabus for the course may be found here.

Course Schedule

Week Topic: Methodology Topic: Domain
1 Introduction Intro to domain problem
2 Anatomy of a DS project Data generating process (context)
3 HOLIDAY Description of data
4 Handling data Domain specific techniques I
5 Workflow patterns I Domain specific techniques II
6 Version control and data Discussion of main result
7 HOLIDAY Standards for evaluation in domain
8 Environment independence Impacts and ethics
9 Advanced data handling Related questions in domain
10 Multilingual workflows Project proposals

Assignments

While the course assignments for each domain differ, they all follow a similar template, included here.

Computing Resources

You are welcome to develop your work on your own computer, however DataHub is available for your use as well. These servers at least as large as your laptop and you can use them either as Jupyter Servers, as well as via a command-line interface. As the quarters progress, they may be provisioned for more memory intensive jobs.

About

Concepts in Data Science Methodology and Software Development

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published