Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 689 Bytes

File metadata and controls

18 lines (13 loc) · 689 Bytes
description
Why is data management important?

Overview

{% embed url="https://www.youtube.com/watch?v=xz-Uzcpc4AE" caption="Overview - Data Management" %}

Summary

  • Data science has never been as much about machine learning as it has about cleaning, shaping, and moving data from place to place.
  • Here are the important concepts in data management:
    • Sources - how to get training data
    • Labeling - how to label proprietary data at scale
    • Storage - how to store data and metadata appropriately
    • Versioning - how to update data through user activity or additional labeling
    • Processing - how to aggregate and convert raw data and metadata