Skip to content

Commit 4d9f3c8

Browse files
authored
Merge branch 'microsoft:main' into translation-pt-br
2 parents 8b676e3 + a0f0801 commit 4d9f3c8

File tree

17 files changed

+6668
-421
lines changed

17 files changed

+6668
-421
lines changed

1-Introduction/01-defining-data-science/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Defining Data Science
22

3-
|![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](../../sketchnotes/01-Definitions.png)|
4-
|:---:|
5-
|Defining Data Science - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
3+
| ![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](../../sketchnotes/01-Definitions.png) |
4+
| :----------------------------------------------------------------------------------------------------: |
5+
| Defining Data Science - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
66

77
---
88

9-
[![Defining Data Science Video](images/video-def-ds.png)](https://youtu.be/pqqsm5reGvs)
9+
[![Defining Data Science Video](images/video-def-ds.png)](https://youtu.be/beZ7Mb_oz9I)
1010

1111
## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
1212

@@ -33,7 +33,7 @@ This definition highlights the following important aspects of data science:
3333
> Another important aspect of Data Science is that it studies how data can be gathered, stored and operated upon using computers. While statistics gives us mathematical foundations, data science applies mathematical concepts to actually draw insights from data.
3434
3535
One of the ways (attributed to [Jim Gray](https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist))) to look at the data science is to consider it to be a separate paradigm of science:
36-
* **Empyrical**, in which we rely mostly on observations and results of experiments
36+
* **Empirical**, in which we rely mostly on observations and results of experiments
3737
* **Theoretical**, where new concepts emerge from existing scientific knowledge
3838
* **Computational**, where we discover new principles based on some computational experiments
3939
* **Data-Driven**, based on discovering relationships and patterns in the data
@@ -69,11 +69,11 @@ Vast amounts of data are incomprehensible for a human being, but once we create
6969

7070
As we have already mentioned - data is everywhere, we just need to capture it in the right way! It is useful to distinguish between **structured** and **unstructured** data. The former are typically represented in some well-structured form, often as a table or number of tables, while latter is just a collection of files. Sometimes we can also talk about **semistructured** data, that have some sort of a structure that may vary greatly.
7171

72-
| Structured | Semi-structured | Unstructured |
73-
|----------- |-----------------|--------------|
74-
| List of people with their phone numbers | Wikipedia pages with links | Text of Encyclopaedia Britannica |
75-
| Temperature in all rooms of a building at every minute for the last 20 years | Collection of scientific papers in JSON format with authors, data of publication, and abstract | File share with corporate documents |
76-
| Data for age and gender of all people entering the building | Internet pages | Raw video feed from surveillance camera |
72+
| Structured | Semi-structured | Unstructured |
73+
| ---------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | --------------------------------------- |
74+
| List of people with their phone numbers | Wikipedia pages with links | Text of Encyclopaedia Britannica |
75+
| Temperature in all rooms of a building at every minute for the last 20 years | Collection of scientific papers in JSON format with authors, data of publication, and abstract | File share with corporate documents |
76+
| Data for age and gender of all people entering the building | Internet pages | Raw video feed from surveillance camera |
7777

7878
## Where to get Data
7979

@@ -107,7 +107,7 @@ First step is to collect the data. While in many cases it can be a straightforwa
107107
Storing the data can be challenging, especially if we are talking about big data. When deciding how to store data, it makes sense to anticipate the way you would want later on to query them. There are several ways data can be stored:
108108
<ul>
109109
<li>Relational database stores a collection of tables, and uses a special language called SQL to query them. Typically, tables would be connected to each other using some schema. In many cases we need to convert the data from original form to fit the schema.</li>
110-
<li><a href="https://en.wikipedia.org/wiki/NoSQL">NoSQL</a> database, such as <a href="https://azure.microsoft.com/services/cosmos-db/?WT.mc_id=acad-31812-dmitryso">CosmosDB</a>, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL database does not have rich querying capabilities of SQL, and cannot enforce referential integrity between data.</li>
110+
<li><a href="https://en.wikipedia.org/wiki/NoSQL">NoSQL</a> database, such as <a href="https://azure.microsoft.com/services/cosmos-db/?WT.mc_id=academic-31812-dmitryso">CosmosDB</a>, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL database does not have rich querying capabilities of SQL, and cannot enforce referential integrity between data.</li>
111111
<li><a href="https://en.wikipedia.org/wiki/Data_lake">Data Lake</a> storage is used for large collections of data in raw form. Data lakes are often used with big data, where all data cannot fit into one machine, and has to be stored and processed by a cluster. <a href="https://en.wikipedia.org/wiki/Apache_Parquet">Parquet</a> is the data format that is often used in conjunction with big data.</li>
112112
</ul>
113113
</dd>

0 commit comments

Comments
 (0)