Skip to content

Commit 9306275

Browse files
authored
Merge branch 'microsoft:main' into main
2 parents 530f5ae + b1a9aff commit 9306275

File tree

43 files changed

+2123
-324
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2123
-324
lines changed

1-Introduction/01-defining-data-science/notebook.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
"\r\n",
7171
"The next step is to convert the data into the form suitable for processing. In our case, we have downloaded HTML source code from the page, and we need to convert it into plain text.\r\n",
7272
"\r\n",
73-
"There are many ways this can be done. We will use the simplest build-in [HTMLParser](https://docs.python.org/3/library/html.parser.html) object from Python. We need to subclass the `HTMLParser` class and define the code that will collect all text inside HTML tags, except `<script>` and `<style>` tags."
73+
"There are many ways this can be done. We will use the simplest built-in [HTMLParser](https://docs.python.org/3/library/html.parser.html) object from Python. We need to subclass the `HTMLParser` class and define the code that will collect all text inside HTML tags, except `<script>` and `<style>` tags."
7474
],
7575
"metadata": {}
7676
},
@@ -416,4 +416,4 @@
416416
},
417417
"nbformat": 4,
418418
"nbformat_minor": 2
419-
}
419+
}

1-Introduction/01-defining-data-science/translations/README.es.md

Lines changed: 171 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Tarea: Escenarios de la ciencia de datos
2+
3+
En esta primera tarea, os pedimos pensar sobre algún problema o proceso de la vida real en distintos contextos, y como se podrían solucionar o mejorar utilizando procesos de ciencia de datos. Piensa en lo siguiente:
4+
5+
1. ¿Qué datos puedes obtener?
6+
1. ¿Cómo los obtendrías?
7+
1. ¿Cómo los almacenarías? ¿Qué tamaño es podemos esperar que tengan los datos?
8+
1. ¿Qué información podrías ser capaz de extraer de estos datos? ¿qué decisiones podríamos tomar basándonos en ellos?
9+
10+
Intenta pensar en 3 diferentes problemas/procesos y describe cada uno de los puntos de arriba para el contexto de cada problema.
11+
12+
Estos son algunos problemas o contextos que pueden ayudarte a empezar a pensar:
13+
14+
1. ¿Cómo se pueden usar los datos para mejorar el proceso de educación de niños en los colegios?
15+
1. ¿Cómo podemos usar los datos para controlar la vacunación durante la pandemia?
16+
1. ¿Cómo se pueden usar los datos para asegurarnos de que somos productivos en nuestro trabajo?
17+
18+
## Instrucciones
19+
20+
Rellena la siguiente table (sustituye los problemas sugeridos por los propuestos por tí si es necesario):
21+
22+
| Contexto del problema | Problema | Qué datos obtener | Cómo almacenar los datos | Qué información/decisiones podemos tomar |
23+
|----------------|---------|-----------------------|-----------------------|--------------------------------------|
24+
| Educación | | | | |
25+
| Vacunación | | | | |
26+
| Productividad | | | | |
27+
28+
## Rúbrica
29+
30+
Ejemplar | Adecuada | Necesita mejorar
31+
--- | --- | -- |
32+
Es capaz de indentificar fuentes de datos razonables, formas de almacenarlos y posibles decisiones/información para todos los contextos | Algunos aspectos de la solución no están detallados, no se habla sobre el almacenamiento de los datos, al menos se describen dos contextos distintos | Solo se describen partes de la solución, solo se considera un contexto.

1-Introduction/03-defining-data/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,14 @@ A benefit of structured data is that it can be organized in such a way that it c
2525
Examples of structured data: spreadsheets, relational databases, phone numbers, bank statements
2626

2727
### Unstructured Data
28-
Unstructured data typically cannot be categorized into into rows or columns and doesn't contain a format or set of rules to follow. Because unstructured data has less restrictions on its structure it's easier to add new information in comparison to a structured dataset. If a sensor capturing data on barometric pressure every 2 minutes has received an update that now allows it to measure and record temperature, it doesn't require altering the existing data if it's unstructured. However, this may make analyzing or investigating this type of data take longer. For example, a scientist who wants to find the average temperature of the previous month from the sensors data, but discovers that the sensor recorded an "e" in some of its recorded data to note that it was broken instead of a typical number, which means the data is incomplete.
28+
Unstructured data typically cannot be categorized into rows or columns and doesn't contain a format or set of rules to follow. Because unstructured data has less restrictions on its structure it's easier to add new information in comparison to a structured dataset. If a sensor capturing data on barometric pressure every 2 minutes has received an update that now allows it to measure and record temperature, it doesn't require altering the existing data if it's unstructured. However, this may make analyzing or investigating this type of data take longer. For example, a scientist who wants to find the average temperature of the previous month from the sensors data, but discovers that the sensor recorded an "e" in some of its recorded data to note that it was broken instead of a typical number, which means the data is incomplete.
2929

3030
Examples of unstructured data: text files, text messages, video files
3131

3232
### Semi-structured
3333
Semi-structured data has features that make it a combination of structured and unstructured data. It doesn't typically conform to a format of rows and columns but is organized in a way that is considered structured and may follow a fixed format or set of rules. The structure will vary between sources, such as a well defined hierarchy to something more flexible that allows for easy integration of new information. Metadata are indicators that help decide how the data is organized and stored and will have various names, based on the type of data. Some common names for metadata are tags, elements, entities and attributes. For example, a typical email message will have a subject, body and a set of recipients and can be organized by whom or when it was sent.
3434

35-
Examples of unstructured data: HTML, CSV files, JavaScript Object Notation (JSON)
35+
Examples of semi-structured data: HTML, CSV files, JavaScript Object Notation (JSON)
3636

3737
## Sources of Data
3838

1-Introduction/04-stats-and-probability/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ In the case of discrete random variables, it is easy to describe the probability
2525

2626
The most well-known discrete distribution is **uniform distribution**, in which there is a sample space of N elements, with equal probability of 1/N for each of them.
2727

28-
It is more difficult to describe the probability distribution of a continuous variable, with values drawn from some interval [a,b], or the whole set of real numbers &Ropf;. Consider the case of bus arrival time. In fact, for each exact arrival time $t$, the probability of a bus arriving at exactly that time is 0!
28+
It is more difficult to describe the probability distribution of a continuous variable, with values drawn from some interval [a,b], or the whole set of real numbers &Ropf;. Consider the case of bus arrival time. In fact, for each exact arrival time *t*, the probability of a bus arriving at exactly that time is 0!
2929

3030
> Now you know that events with 0 probability happen, and very often! At least each time when the bus arrives!
3131
@@ -240,8 +240,8 @@ While this is definitely not exhaustive list of topics that exist within probabi
240240
## 🚀 Challenge
241241

242242
Use the sample code in the notebook to test other hypothesis that:
243-
1. First basemen and older that second basemen
244-
2. First basemen and taller than third basemen
243+
1. First basemen are older than second basemen
244+
2. First basemen are taller than third basemen
245245
3. Shortstops are taller than second basemen
246246

247247
## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/7)

1-Introduction/translations/README.es.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ cómo se definen los datos y un poco de probabilidad y estadística, el núcleo
1212
1. [Definiendo la Ciencia de Datos](../01-defining-data-science/README.md)
1313
2. [Ética de la Ciencia de Datos](../02-ethics/README.md)
1414
3. [Definición de Datos](../03-defining-data/translations/README.es.md)
15-
4. [introducción a la probabilidad y estadística](../04-stats-and-probability/README.md)
15+
4. [Introducción a la probabilidad y estadística](../04-stats-and-probability/README.md)
1616

1717
### Créditos
1818

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<div dir="rtl">
2+
3+
# مقدمه‌ای بر علم داده
4+
5+
6+
![data in action](../images/data.jpg)
7+
> تصویر از <a href="https://unsplash.com/@dawson2406?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Stephen Dawson</a> در <a href="https://unsplash.com/s/photos/data?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
8+
9+
شما در این بخش با تعریف علم داده و ملاحظات اخلاقی که یک دانشمند علوم داده باید در نظر داشته باشد آشنا خواهید شد. همچنین با تعریف داده و کمی هم با آمار و احتمالات که پایه و اساس علم داده است آشنا خواهید شد.
10+
11+
### سرفصل ها
12+
13+
1. [تعریف علم داده](../01-defining-data-science/README.md)
14+
2. [اصول اخلاقی علم داده](../02-ethics/README.md)
15+
3. [تعریف داده](../03-defining-data/README.md)
16+
4. [مقدمه ای بر آمار و احتمال](../04-stats-and-probability/README.md)
17+
18+
### تهیه کنندگان
19+
20+
این درس ها با ❤️ توسط [Nitya Narasimhan](https://twitter.com/nitya) و [Dmitry Soshnikov](https://twitter.com/shwars) تهیه شده است.
21+
</div>
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Inleiding tot datawetenschap
2+
3+
![data in actie](images/data.jpg)
4+
> Beeld door <a href="https://unsplash.com/@dawson2406?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Stephen Dawson</a> op <a href="https://unsplash.com/s/photos/data?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
5+
6+
In deze lessen ontdek je hoe Data Science wordt gedefinieerd en leer je over ethische overwegingen waarmee een datawetenschapper rekening moet houden. Je leert ook hoe gegevens worden gedefinieerd en leert over statistiek en waarschijnlijkheid, de academische kerndomeinen van Data Science.
7+
8+
### Onderwerpen
9+
10+
1. [Data Science definiëren](01-defining-data-science/README.md)
11+
2. [Ethiek in Data Science](02-ethics/README.md)
12+
3. [Data definiëren](03-defining-data/README.md)
13+
4. [Inleiding tot statistiek en kansrekening](04-stats-and-probability/README.md)
14+
15+
### Credits
16+
17+
Dit lesmateriaal is met liefde ❤️ geschreven door [Nitya Narasimhan](https://twitter.com/nitya) en [Dmitry Soshnikov](https://twitter.com/shwars).

2-Working-With-Data/06-non-relational/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ NoSQL is an umbrella term for the different ways to store non-relational data an
4949

5050
![Graphical representation of a columnar data store showing a customer database with two column families named Identity and Contact Info](images/columnar-db.png)
5151

52-
[Columnar](https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data#columnar-data-stores) data stores organizes data into columns and rows like a relational data structure but each column is divided into groups called a column family, where the all the data under one column is related and can be retrieved and changed in one unit.
52+
[Columnar](https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data#columnar-data-stores) data stores organizes data into columns and rows like a relational data structure but each column is divided into groups called a column family, where all the data under one column is related and can be retrieved and changed in one unit.
5353

5454

5555
### Document Data Stores with the Azure Cosmos DB

2-Working-With-Data/07-python/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Pandas is centered around a few basic concepts.
5252

5353
### Series
5454

55-
**Series** is a sequence of values, similar to a list or numpy array. The main difference is that series also has and **index**, and when we operate on series (eg., add them), the index is taken into account. Index can be as simple as integer row number (it is the index used by default when creating a series from list or array), or it can have a complex structure, such as date interval.
55+
**Series** is a sequence of values, similar to a list or numpy array. The main difference is that series also has an **index**, and when we operate on series (eg., add them), the index is taken into account. Index can be as simple as integer row number (it is the index used by default when creating a series from list or array), or it can have a complex structure, such as date interval.
5656

5757
> **Note**: There is some introductory Pandas code in the accompanying notebook [`notebook.ipynb`](notebook.ipynb). We only outline some the examples here, and you are definitely welcome to check out the full notebook.
5858

0 commit comments

Comments
 (0)