Skip to content

Commit 8cdb64b

Browse files
authored
🌐 Update translations via Co-op Translator
1 parent 2a5928c commit 8cdb64b

File tree

65 files changed

+6606
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+6606
-0
lines changed
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
<!--
2+
CO_OP_TRANSLATOR_METADATA:
3+
{
4+
"original_hash": "2583a9894af7123b2fcae3376b14c035",
5+
"translation_date": "2025-08-31T11:09:34+00:00",
6+
"source_file": "1-Introduction/01-defining-data-science/README.md",
7+
"language_code": "en"
8+
}
9+
-->
10+
We can also analyze the test results to identify which questions are most often answered incorrectly. This could indicate areas where the material might need to be clarified or expanded. Additionally, we could track how students interact with the course content—such as which videos they replay, which sections they skip, or how often they participate in discussions. This data could help us understand how students engage with the material and identify opportunities to make the course more engaging and effective.
11+
12+
By collecting and analyzing this data, we are essentially digitizing the learning process. Once we have this data, we can apply data science techniques to gain insights and make informed decisions about how to improve the course. This is an example of digital transformation in education.
13+
14+
Digital transformation is not limited to education—it can be applied to virtually any industry. For example:
15+
16+
- In **healthcare**, digital transformation might involve using patient data to predict disease outbreaks or personalize treatment plans.
17+
- In **retail**, it could mean analyzing customer purchase data to optimize inventory or create personalized marketing campaigns.
18+
- In **manufacturing**, it might involve using sensor data from machines to predict maintenance needs and reduce downtime.
19+
20+
The key idea is that by digitizing processes and applying data science, businesses can gain valuable insights, improve efficiency, and make better decisions.
21+
You might say this method isn't perfect, as modules can vary in length. It might be more reasonable to divide the time by the module's length (measured in the number of characters) and compare those results instead.
22+
When we start analyzing the results of multiple-choice tests, we can try to identify which concepts students struggle to understand and use that information to improve the content. To achieve this, we need to design tests so that each question corresponds to a specific concept or piece of knowledge.
23+
24+
If we want to go a step further, we can compare the time taken for each module with the age category of the students. We might discover that for certain age groups, it takes an unusually long time to complete the module, or that students drop out before finishing it. This can help us provide age-appropriate recommendations for the module and reduce dissatisfaction caused by unmet expectations.
25+
26+
## 🚀 Challenge
27+
28+
In this challenge, we will try to identify concepts relevant to the field of Data Science by analyzing texts. We will take a Wikipedia article on Data Science, download and process the text, and then create a word cloud like this one:
29+
30+
![Word Cloud for Data Science](../../../../1-Introduction/01-defining-data-science/images/ds_wordcloud.png)
31+
32+
Visit [`notebook.ipynb`](../../../../../../../../../1-Introduction/01-defining-data-science/notebook.ipynb ':ignore') to review the code. You can also run the code and observe how it performs all the data transformations in real time.
33+
34+
> If you are unfamiliar with running code in a Jupyter Notebook, check out [this article](https://soshnikov.com/education/how-to-execute-notebooks-from-github/).
35+
36+
## [Post-lecture quiz](https://purple-hill-04aebfb03.1.azurestaticapps.net/quiz/1)
37+
38+
## Assignments
39+
40+
* **Task 1**: Modify the code above to identify related concepts for the fields of **Big Data** and **Machine Learning**.
41+
* **Task 2**: [Think About Data Science Scenarios](assignment.md)
42+
43+
## Credits
44+
45+
This lesson was created with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
46+
47+
---
48+
49+
**Disclaimer**:
50+
This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
<!--
2+
CO_OP_TRANSLATOR_METADATA:
3+
{
4+
"original_hash": "4e0f1773b9bee1be3b28f9fe2c71b3de",
5+
"translation_date": "2025-08-31T11:09:48+00:00",
6+
"source_file": "1-Introduction/01-defining-data-science/assignment.md",
7+
"language_code": "en"
8+
}
9+
-->
10+
# Assignment: Data Science Scenarios
11+
12+
In this first assignment, we ask you to think about some real-life processes or problems in different domains, and how you can improve them using the Data Science process. Consider the following:
13+
14+
1. What data can you collect?
15+
1. How would you collect it?
16+
1. How would you store the data? How large is the data likely to be?
17+
1. What insights might you be able to derive from this data? What decisions could be made based on the data?
18+
19+
Try to think about 3 different problems/processes and describe each of the points above for each domain.
20+
21+
Here are some domains and problems to help you start thinking:
22+
23+
1. How can data be used to improve the education process for children in schools?
24+
1. How can data be used to manage vaccination during a pandemic?
25+
1. How can data be used to ensure productivity at work?
26+
27+
## Instructions
28+
29+
Fill in the following table (replace the suggested domains with your own if needed):
30+
31+
| Problem Domain | Problem | What data to collect | How to store the data | What insights/decisions we can make |
32+
|----------------|---------|-----------------------|-----------------------|--------------------------------------|
33+
| Education | | | | |
34+
| Vaccination | | | | |
35+
| Productivity | | | | |
36+
37+
## Rubric
38+
39+
Exemplary | Adequate | Needs Improvement
40+
--- | --- | -- |
41+
The solution identifies reasonable data sources, methods of storing data, and possible decisions/insights for all domains | Some aspects of the solution lack detail, data storage is not discussed, at least 2 domains are described | Only parts of the data solution are described, only one domain is considered.
42+
43+
---
44+
45+
**Disclaimer**:
46+
This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
<!--
2+
CO_OP_TRANSLATOR_METADATA:
3+
{
4+
"original_hash": "a8f79b9c0484c35b4f26e8aec7fc4d56",
5+
"translation_date": "2025-08-31T11:09:55+00:00",
6+
"source_file": "1-Introduction/01-defining-data-science/solution/assignment.md",
7+
"language_code": "en"
8+
}
9+
-->
10+
# Assignment: Data Science Scenarios
11+
12+
In this first assignment, we ask you to think about some real-life processes or problems in different domains, and how you can improve them using the Data Science process. Consider the following:
13+
14+
1. What data can you collect?
15+
1. How would you collect it?
16+
1. How would you store the data? How large is the data likely to be?
17+
1. What insights might you be able to derive from this data? What decisions could be made based on the data?
18+
19+
Try to think about three different problems or processes and describe each of the points above for each domain.
20+
21+
Here are some domains and problems to help you start thinking:
22+
23+
1. How can you use data to improve the education process for children in schools?
24+
1. How can you use data to manage vaccination during a pandemic?
25+
1. How can you use data to ensure you are being productive at work?
26+
27+
## Instructions
28+
29+
Fill in the following table (replace the suggested domains with your own if needed):
30+
31+
| Problem Domain | Problem | What data to collect | How to store the data | What insights/decisions we can make |
32+
|----------------|---------|-----------------------|-----------------------|--------------------------------------|
33+
| Education | At universities, lecture attendance is often low, and we hypothesize that students who attend lectures more frequently tend to perform better in exams. We want to encourage attendance and test this hypothesis. | Attendance can be tracked using photos taken by security cameras in classrooms or by tracking the Bluetooth/Wi-Fi addresses of students' mobile phones in class. Exam data is already available in the university database. | If we use security camera images, we need to store a few (5-10) photos taken during class (unstructured data) and then use AI to identify students' faces (convert data to structured form). | We can calculate average attendance for each student and check for correlations with exam grades. We'll discuss correlation further in the [probability and statistics](../../04-stats-and-probability/README.md) section. To encourage attendance, we can publish weekly attendance rankings on the school portal and hold prize draws for students with the highest attendance. |
34+
| Vaccination | | | | |
35+
| Productivity | | | | |
36+
37+
> *We provide just one example answer to give you an idea of what is expected in this assignment.*
38+
39+
## Rubric
40+
41+
Exemplary | Adequate | Needs Improvement
42+
--- | --- | -- |
43+
Reasonable data sources, storage methods, and possible decisions/insights are identified for all domains | Some aspects of the solution lack detail, data storage is not discussed, at least two domains are described | Only parts of the data solution are described, and only one domain is considered.
44+
45+
---
46+
47+
**Disclaimer**:
48+
This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.

0 commit comments

Comments
 (0)