Skip to content

Commit e9b21de

Browse files
authored
Merge pull request microsoft#283 from quake2005/patch-12
Update README.md
2 parents f4c6b0b + 9baf45f commit e9b21de

File tree

1 file changed

+4
-4
lines changed
  • 2-Working-With-Data/08-data-preparation

1 file changed

+4
-4
lines changed

2-Working-With-Data/08-data-preparation/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Depending on its source, raw data may contain some inconsistencies that will cau
2424

2525
- **Formatting**: Depending on the source, data can have inconsistencies in how it’s presented. This can cause problems in searching for and representing the value, where it’s seen within the dataset but is not properly represented in visualizations or query results. Common formatting problems involve resolving whitespace, dates, and data types. Resolving formatting issues is typically up to the people who are using the data. For example, standards on how dates and numbers are presented can differ by country.
2626

27-
- **Duplications**: Data that has more than one occurrence can produce inaccurate results and usually should be removed. This can be a common occurrence when joining more two or more datasets together. However, there are instances where duplication in joined datasets contain pieces that can provide additional information and may need to be preserved.
27+
- **Duplications**: Data that has more than one occurrence can produce inaccurate results and usually should be removed. This can be a common occurrence when joining two or more datasets together. However, there are instances where duplication in joined datasets contain pieces that can provide additional information and may need to be preserved.
2828

2929
- **Missing Data**: Missing data can cause inaccuracies as well as weak or biased results. Sometimes these can be resolved by a "reload" of the data, filling in the missing values with computation and code like Python, or simply just removing the value and corresponding data. There are numerous reasons for why data may be missing and the actions that are taken to resolve these missing values can be dependent on how and why they went missing in the first place.
3030

@@ -300,9 +300,9 @@ example4.drop_duplicates()
300300
1 B 2
301301
3 B 3
302302
```
303-
Both `duplicated` and `drop_duplicates` default to consider all columnsm but you can specify that they examine only a subset of columns in your `DataFrame`:
303+
Both `duplicated` and `drop_duplicates` default to consider all columns but you can specify that they examine only a subset of columns in your `DataFrame`:
304304
```python
305-
example6.drop_duplicates(['letters'])
305+
example4.drop_duplicates(['letters'])
306306
```
307307
```
308308
letters numbers
@@ -315,7 +315,7 @@ letters numbers
315315

316316
## 🚀 Challenge
317317

318-
All of the discussed materials are provided as a [Jupyter Notebook](https://github.com/microsoft/Data-Science-For-Beginners/blob/main/4-Data-Science-Lifecycle/15-analyzing/notebook.ipynb). Additionally, there are exercises present after each section, give them a try!
318+
All of the discussed materials are provided as a [Jupyter Notebook](https://https://github.com/microsoft/Data-Science-For-Beginners/blob/main/2-Working-With-Data/08-data-preparation/notebook.ipynb). Additionally, there are exercises present after each section, give them a try!
319319

320320
## [Post-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/15)
321321

0 commit comments

Comments
 (0)