Skip to content

Latest commit

 

History

History
326 lines (235 loc) · 14 KB

File metadata and controls

326 lines (235 loc) · 14 KB

Project

Goals

As stated in the course description:

Over the semester, students will build a complex end-to-end data system.

You'll be building a live dashboard, with all the infrastructure behind it:

  • Automated data ingestion
  • A database
  • Web-based interactive data visualization

All of this will be in the cloud.

Inspiration

Expectations

At all times

By the end

  • Create dashboard(s) that explore relationships between variables.
    • It should be more open-ended than Projects from Computing in Context, allowing users to explore.
    • The site doesn't need to read like a blog post necessarily, but it should explain what's going on.
    • It should layer several sources/dimensions of data to add depth and context (e.g. location, weather, demographics).
    • It should have multiple visualizations that work together.
    • The idea is that it should make it easy for someone (maybe a policymaker or other decision maker) to spot meaningful patterns or insights.
  • The site + codebase should be a polished portfolio piece.
  • Data is being automatically updated.

Teams

See the grouping by team_id. The teams have a corresponding repository and Google Cloud Project automatically created.

Part 1

Goals

Your group will pick an initial:

  • Problem space
  • Dataset

Part of this project is getting experience with automated data ingestion. Doing so is more interesting with data that changes regularly. You can incorporate additional datasets in the future.

Steps

Do the following as a group:

  1. Discuss what you'd like your project to focus on.
    • Don't need to get too specific yet.
    • This doesn't lock you in.
  2. Explore datasets that are updated weekly (the more often, the better) and pick one that could be relevant.
    • You can pick multiple, but start simple.
  3. Create a new notebook in your team repository.
  4. In the notebook, load and display the data from all your data sources.
    • If there's an API, use the API.
  5. Draw one (or more) example visualizations that you'd like to produce.
    • Have at least one be a time series.
    • You can do so digitally or on a piece of paper.
    • Include a title, legend, and axes labels (where appropriate).
    • This is just a sketch; don't worry about the specific values.

Proposal

You will then submit the following to the Discussion on Ed:

  • What dataset are you going to use?
    • Please include a link.
  • What are your research question(s) What questions should your users be able to answer by using your app?
    • Come up with at least three.
    • Go deep, not broad.
  • What's the link to your notebook?
  • What's your target visualization?
    • Include a picture.
  • What are your known unknowns?
  • What challenges do you anticipate?

Only one person from your group needs to submit. None of this is set in stone long term, it is just a starting place. It can all be changed later.

Part 2

Goal: Get experience with an application development framework

Steps

  1. Using your dataset from Part 1:

    1. Create a Streamlit app.
    2. Deploy to the Streamlit Community Cloud.
    3. Add a visualization.
      • You can get fancy, but don't have to at this stage. Get something simple working first.
  2. Bring in a second relevant dataset. (This one doesn't need to be regularly updated.)

    • This can be shown on a separate page of your Streamlit app, or combined in a single visualization.
  3. Add the names of the people on your team to your Streamlit app homepage.

  4. Set the repository Website to the app URL (https://<something>.streamlit.app/).

    Click the gear, then fill in the Website field

  5. Turn in the link to your live app via CourseWorks.

Tips

Part 3

Goal: Get experience with unit testing

Steps

Work on branches and submit pull requests for the chunks of work — you decide what the "chunks" are.

  1. Without writing any code:
    1. Review your existing code.
      • What can be refactored into functions?
      • Where can we make our code DRY?
    2. Decide what function you're going to create.
    3. Come up with test cases (inputs) and expected outputs.
      • This can be in a text file, doc, piece of paper, etc.
  2. Then, as code:
    1. Write tests.
    2. Confirm they fail.
    3. Refactor your code into the function.
    4. Make the tests pass.
  3. Repeat until you feel your code is well-organized and well-tested.
  4. Submit the links to the pull requests via CourseWorks.

Outcome

As a result, your:

should be relatively short and easy to read.

This isn't a one-time thing; continue testing and refactoring as you continue with the Project.

Part 4

Retro

You will hold a team retrospective, with the goal of improving how your team works together. Since the groups are small, it can be fairly informal.

  1. Schedule 45 minutes for the retro.
    • The retro needs to be done live/synchronous, not asynchronous.
  2. Read about retros.
  3. Decide who will be the Facilitator.
    • Optional: Get someone from outside the team.
  4. Facilitator: Set up EasyRetro. Instructions.
  5. In the actual retro:
    1. Read the Agile Prime Directive out loud.
    2. 5 minutes: Individually write down "what went well" and "what could be better".
    3. 10-15 minutes: Discuss what has gone well.
    4. 20-25 minutes: Discuss what could be better.
    5. 5 minutes: Document takeaways / action items.

Keep going

  1. Move your Proposal to the Streamlit app as is.
  2. Revisit the Proposal.
    • Any new insights?
    • Anything you want to adjust?
  3. Document any changes to the Proposal on the Streamlit page.
  4. Proceed with enhancing the app.
    • If the majority of your code (to call APIs, etc.) is in modules/functions, it can be imported from a Jupyter notebook. You can do exploratory analysis there, moving things to modules/Streamlit as you go.
    • You might not be able to fully answer the question(s) yet, but get as close as you can.

At this point, your project should be looking more like one of the examples. Looking through the Streamlit data elements may be helpful.

Submit

Submit links to:

  • The EasyRetro board
  • Jupyter notebook(s), if any
  • The (updated) Streamlit app

Part 5

Goal: Understand how to work with a cloud-based database

Notes

  • A service account has been created in your Project for you. It has been given read-only access to BigQuery.
  • There are various things that can go wrong in these steps. Don't wait until the last minute.

Steps

Do the following for your regularly-updated data source. Only do one for now — we'll do the rest in Lab 10.

  1. Install pandas-gbq.
  2. Load data.
  3. Have your app use BigQuery.
    1. Each team member will need to:
      1. Create a service account key as JSON. The service account is streamlit@[project].iam.gserviceaccount.com.
      2. Set up secrets management locally.
        • Make sure to add secrets.toml to your .gitignore so that you don't accidentally commit it to Git.
      3. Copy the key information to your secrets.toml file.
    2. Modify your app to read data from BigQuery.
    3. Copy the secrets to your deployed app.
    4. Re-deploy.
  4. Submit the links via CourseWorks for:
    • The pull request(s)
    • The link to your live Streamlit app

Part 6

Data flow

Visually map your data flow, end to end.

  • What happens at each step?
  • What can go wrong?
  • Get granular
  • Go all the way upstream. How does the data get collected/generated?
  • You can use:
    • Paper
    • Google Drawings
    • A fancier diagramming tool
      • Don't over-complicate this

Submit

Submit via CourseWorks:

  • An image of / a link to your map
  • Link(s) to:
    • Your pull request(s)
    • A successful run of the GitHub Action

Part 7

Goal: Determine and prioritize TODOs

You'll do this prioritization exercise as a group.

  • This must be done synchronously.
  • Look back at the expectations.
  • The Prep can be done in the meeting itself.
  • You can use paper/stickies or a digital template like Miro's

Submit a photo/link to the matrix via CourseWorks.

Part 8

Refinement

Goal: Meet the expectations

Do tasks you came up with in the prioritization exercise in order of priority.

Presentation

Goal: Force clarity of the project and code by having to show and explain them to others

Each group will do a presentation on their Project in class.

  • 10-ish minutes
  • Slides optional
  • Everyone in the group should speak.
  • Explain the initial proposal and how it's evolved.
  • Show the live app.
  • Walk through the code.
  • Talk through your findings.

Final check-in