Skip to content

Commit c013d4d

Browse files
author
tcpdumpfbacke
committed
EN proof
1 parent 95d1c95 commit c013d4d

File tree

15 files changed

+181
-180
lines changed

15 files changed

+181
-180
lines changed

pages/platform/data-processing/42_TUTORIAL_notebook-data-cleaning/guide.de-de.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Notebooks for Apache Spark - Data cleaning
33
slug: notebook-spark-data-cleaning
4-
excerpt: Data cleaning of 2 CSV dataset with aggregation into a single clean Parquet file.
4+
excerpt: Data cleaning of 2 CSV datasets with aggregation into a single clean Parquet file
55
section: Tutorials
66
order: 3
77
routes:
@@ -22,7 +22,7 @@ The purpose of this tutorial is to show how to clean data with [Apache Spark](ht
2222

2323
*Data Cleaning* or *Data Cleansing* is the preparation of raw data by detecting and correcting records within a dataset.
2424

25-
The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
25+
This tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
2626

2727
## Requirements
2828

@@ -33,13 +33,14 @@ The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
3333

3434
### Upload data
3535

36-
First, download these 2 datasets CSV files locally:
37-
* [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
38-
* [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
36+
First, download these 2 dataset CSV files locally:
37+
38+
- [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
39+
- [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
3940

4041
Then, from the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de), go to the Object Storage section, locate your S3 bucket and upload your data by clicking `Add object`{.action}.
4142

42-
Select both files from your computer and add them to the root `/` of your bucket.
43+
Select both files from your computer and add them to the root (`/`) of your bucket.
4344

4445
![image](images/object-storage-datasets.png){.thumbnail}
4546

@@ -49,29 +50,29 @@ Select both files from your computer and add them to the root `/` of your bucket
4950
>
5051
> Please be aware that notebooks are only available in `public access` during the `alpha` of the Notebooks for Apache Spark feature. As such, be careful of the **data** and the **credentials** you may expose in these notebooks.
5152
52-
There is a few information that we will need as inputs of the notebook.
53+
There is some information that we will need as inputs of the notebook.
5354

5455
First, and while we're on the container page of the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de) we will copy the `Endpoint` information and save it.
5556

5657
Go back to the Object Storage home page and then to the S3 users tab, copy the user's `access key` and save it.
5758

58-
Finally, click on action "hamburger" at the end of the user row `(...)`{.action} > `View the secret key`{.action}, copy the value and save it.
59+
Finally, click on the `...`{.action} button at the end of the user row, click on `View the secret key`{.action}, copy the value and save it.
5960

6061
### Launch and access a Notebook for Apache Spark
6162

62-
From the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} > `Create notebook`{.action}.
63+
From the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} and then `Create notebook`{.action}.
6364

6465
You can then reach the `JupyterLab` URL directly from the notebooks list or from the notebook page.
6566

66-
### Experiment with notebook
67+
### Experiment with the notebook
6768

68-
Now that you have your initial datasets ready on an Object Storage and a notebook running, you could start cleaning this data!
69+
Now that you have your initial datasets ready on an Object Storage and a notebook running, you can start cleaning this data.
6970

7071
A preview of this notebook can be found on [GitHub](https://github.com/ovh/data-processing-samples/blob/master/apache_spark_notebook_data_cleaning/apache_spark_notebook_data_cleaning_tutorial.ipynb).
7172

7273
### Go further
7374

74-
- Do you want to create a data cleaning job you could replay based on your notebook? [Here it is](https://docs.ovh.com/de/data-processing/submit-python/).
75+
- Do you want to create a data cleaning job you could replay based on your notebook? [Please refer to this guide](https://docs.ovh.com/de/data-processing/submit-python/).
7576

7677
## Feedback
7778

pages/platform/data-processing/42_TUTORIAL_notebook-data-cleaning/guide.en-asia.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Notebooks for Apache Spark - Data cleaning
33
slug: notebook-spark-data-cleaning
4-
excerpt: Data cleaning of 2 CSV dataset with aggregation into a single clean Parquet file.
4+
excerpt: Data cleaning of 2 CSV datasets with aggregation into a single clean Parquet file
55
section: Tutorials
66
order: 3
77
updated: 2023-03-14
@@ -20,7 +20,7 @@ The purpose of this tutorial is to show how to clean data with [Apache Spark](ht
2020

2121
*Data Cleaning* or *Data Cleansing* is the preparation of raw data by detecting and correcting records within a dataset.
2222

23-
The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
23+
This tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
2424

2525
## Requirements
2626

@@ -31,13 +31,13 @@ The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
3131

3232
### Upload data
3333

34-
First, download these 2 datasets CSV files locally:
35-
* [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
36-
* [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
34+
First, download these 2 dataset CSV files locally:
35+
- [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
36+
- [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
3737

3838
Then, from the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/asia/&ovhSubsidiary=asia), go to the Object Storage section, locate your S3 bucket and upload your data by clicking `Add object`{.action}.
3939

40-
Select both files from your computer and add them to the root `/` of your bucket.
40+
Select both files from your computer and add them to the root (`/`) of your bucket.
4141

4242
![image](images/object-storage-datasets.png){.thumbnail}
4343

@@ -47,29 +47,29 @@ Select both files from your computer and add them to the root `/` of your bucket
4747
>
4848
> Please be aware that notebooks are only available in `public access` during the `alpha` of the Notebooks for Apache Spark feature. As such, be careful of the **data** and the **credentials** you may expose in these notebooks.
4949
50-
There is a few information that we will need as inputs of the notebook.
50+
There is some information that we will need as inputs of the notebook.
5151

5252
First, and while we're on the container page of the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/asia/&ovhSubsidiary=asia) we will copy the `Endpoint` information and save it.
5353

5454
Go back to the Object Storage home page and then to the S3 users tab, copy the user's `access key` and save it.
5555

56-
Finally, click on action "hamburger" at the end of the user row `(...)`{.action} > `View the secret key`{.action}, copy the value and save it.
56+
Finally, click on the `...`{.action} button at the end of the user row, click on `View the secret key`{.action}, copy the value and save it.
5757

5858
### Launch and access a Notebook for Apache Spark
5959

60-
From the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/asia/&ovhSubsidiary=asia), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} > `Create notebook`{.action}.
60+
From the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/asia/&ovhSubsidiary=asia), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} and then `Create notebook`{.action}.
6161

6262
You can then reach the `JupyterLab` URL directly from the notebooks list or from the notebook page.
6363

64-
### Experiment with notebook
64+
### Experiment with the notebook
6565

66-
Now that you have your initial datasets ready on an Object Storage and a notebook running, you could start cleaning this data!
66+
Now that you have your initial datasets ready on an Object Storage and a notebook running, you can start cleaning this data.
6767

6868
A preview of this notebook can be found on [GitHub](https://github.com/ovh/data-processing-samples/blob/master/apache_spark_notebook_data_cleaning/apache_spark_notebook_data_cleaning_tutorial.ipynb).
6969

7070
### Go further
7171

72-
- Do you want to create a data cleaning job you could replay based on your notebook? [Here it is](https://docs.ovh.com/asia/en/data-processing/submit-python/).
72+
- Do you want to create a data cleaning job you could replay based on your notebook? [Please refer to this guide](https://docs.ovh.com/asia/en/data-processing/submit-python/).
7373

7474
## Feedback
7575

pages/platform/data-processing/42_TUTORIAL_notebook-data-cleaning/guide.en-au.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Notebooks for Apache Spark - Data cleaning
33
slug: notebook-spark-data-cleaning
4-
excerpt: Data cleaning of 2 CSV dataset with aggregation into a single clean Parquet file.
4+
excerpt: Data cleaning of 2 CSV datasets with aggregation into a single clean Parquet file
55
section: Tutorials
66
order: 3
77
updated: 2023-03-14
@@ -20,7 +20,7 @@ The purpose of this tutorial is to show how to clean data with [Apache Spark](ht
2020

2121
*Data Cleaning* or *Data Cleansing* is the preparation of raw data by detecting and correcting records within a dataset.
2222

23-
The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
23+
This tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
2424

2525
## Requirements
2626

@@ -31,13 +31,13 @@ The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
3131

3232
### Upload data
3333

34-
First, download these 2 datasets CSV files locally:
35-
* [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
36-
* [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
34+
First, download these 2 dataset CSV files locally:
35+
- [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
36+
- [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
3737

3838
Then, from the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com.au/&ovhSubsidiary=au), go to the Object Storage section, locate your S3 bucket and upload your data by clicking `Add object`{.action}.
3939

40-
Select both files from your computer and add them to the root `/` of your bucket.
40+
Select both files from your computer and add them to the root (`/`) of your bucket.
4141

4242
![image](images/object-storage-datasets.png){.thumbnail}
4343

@@ -47,29 +47,29 @@ Select both files from your computer and add them to the root `/` of your bucket
4747
>
4848
> Please be aware that notebooks are only available in `public access` during the `alpha` of the Notebooks for Apache Spark feature. As such, be careful of the **data** and the **credentials** you may expose in these notebooks.
4949
50-
There is a few information that we will need as inputs of the notebook.
50+
There is some information that we will need as inputs of the notebook.
5151

5252
First, and while we're on the container page of the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com.au/&ovhSubsidiary=au) we will copy the `Endpoint` information and save it.
5353

5454
Go back to the Object Storage home page and then to the S3 users tab, copy the user's `access key` and save it.
5555

56-
Finally, click on action "hamburger" at the end of the user row `(...)`{.action} > `View the secret key`{.action}, copy the value and save it.
56+
Finally, click on the `...`{.action} button at the end of the user row, click on `View the secret key`{.action}, copy the value and save it.
5757

5858
### Launch and access a Notebook for Apache Spark
5959

60-
From the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com.au/&ovhSubsidiary=au), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} > `Create notebook`{.action}.
60+
From the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com.au/&ovhSubsidiary=au), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} and then `Create notebook`{.action}.
6161

6262
You can then reach the `JupyterLab` URL directly from the notebooks list or from the notebook page.
6363

64-
### Experiment with notebook
64+
### Experiment with the notebook
6565

66-
Now that you have your initial datasets ready on an Object Storage and a notebook running, you could start cleaning this data!
66+
Now that you have your initial datasets ready on an Object Storage and a notebook running, you can start cleaning this data.
6767

6868
A preview of this notebook can be found on [GitHub](https://github.com/ovh/data-processing-samples/blob/master/apache_spark_notebook_data_cleaning/apache_spark_notebook_data_cleaning_tutorial.ipynb).
6969

7070
### Go further
7171

72-
- Do you want to create a data cleaning job you could replay based on your notebook? [Here it is](https://docs.ovh.com/au/en/data-processing/submit-python/).
72+
- Do you want to create a data cleaning job you could replay based on your notebook? [Please refer to this guide](https://docs.ovh.com/au/en/data-processing/submit-python/).
7373

7474
## Feedback
7575

pages/platform/data-processing/42_TUTORIAL_notebook-data-cleaning/guide.en-ca.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Notebooks for Apache Spark - Data cleaning
33
slug: notebook-spark-data-cleaning
4-
excerpt: Data cleaning of 2 CSV dataset with aggregation into a single clean Parquet file.
4+
excerpt: Data cleaning of 2 CSV datasets with aggregation into a single clean Parquet file
55
section: Tutorials
66
order: 3
77
updated: 2023-03-14
@@ -20,7 +20,7 @@ The purpose of this tutorial is to show how to clean data with [Apache Spark](ht
2020

2121
*Data Cleaning* or *Data Cleansing* is the preparation of raw data by detecting and correcting records within a dataset.
2222

23-
The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
23+
This tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
2424

2525
## Requirements
2626

@@ -31,13 +31,13 @@ The tutorial presents a simple data cleaning with `Notebooks for Apache Spark`.
3131

3232
### Upload data
3333

34-
First, download these 2 datasets CSV files locally:
35-
* [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
36-
* [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
34+
First, download these 2 dataset CSV files locally:
35+
- [books.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/books.csv)
36+
- [ratings.csv](https://raw.githubusercontent.com/ovh/data-processing-samples/master/apache_spark_notebook_data_cleaning/ratings.csv)
3737

3838
Then, from the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/ca/en/&ovhSubsidiary=ca), go to the Object Storage section, locate your S3 bucket and upload your data by clicking `Add object`{.action}.
3939

40-
Select both files from your computer and add them to the root `/` of your bucket.
40+
Select both files from your computer and add them to the root (`/`) of your bucket.
4141

4242
![image](images/object-storage-datasets.png){.thumbnail}
4343

@@ -47,29 +47,29 @@ Select both files from your computer and add them to the root `/` of your bucket
4747
>
4848
> Please be aware that notebooks are only available in `public access` during the `alpha` of the Notebooks for Apache Spark feature. As such, be careful of the **data** and the **credentials** you may expose in these notebooks.
4949
50-
There is a few information that we will need as inputs of the notebook.
50+
There is some information that we will need as inputs of the notebook.
5151

5252
First, and while we're on the container page of the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/ca/en/&ovhSubsidiary=ca) we will copy the `Endpoint` information and save it.
5353

5454
Go back to the Object Storage home page and then to the S3 users tab, copy the user's `access key` and save it.
5555

56-
Finally, click on action "hamburger" at the end of the user row `(...)`{.action} > `View the secret key`{.action}, copy the value and save it.
56+
Finally, click on the `...`{.action} button at the end of the user row, click on `View the secret key`{.action}, copy the value and save it.
5757

5858
### Launch and access a Notebook for Apache Spark
5959

60-
From the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/ca/en/&ovhSubsidiary=ca), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} > `Create notebook`{.action}.
60+
From the [OVHcloud Control Panel](https://ca.ovh.com/auth/?action=gotomanager&from=https://www.ovh.com/ca/en/&ovhSubsidiary=ca), go to the Data Processing section and create a new notebook by clicking `Data Processing`{.action} and then `Create notebook`{.action}.
6161

6262
You can then reach the `JupyterLab` URL directly from the notebooks list or from the notebook page.
6363

64-
### Experiment with notebook
64+
### Experiment with the notebook
6565

66-
Now that you have your initial datasets ready on an Object Storage and a notebook running, you could start cleaning this data!
66+
Now that you have your initial datasets ready on an Object Storage and a notebook running, you can start cleaning this data.
6767

6868
A preview of this notebook can be found on [GitHub](https://github.com/ovh/data-processing-samples/blob/master/apache_spark_notebook_data_cleaning/apache_spark_notebook_data_cleaning_tutorial.ipynb).
6969

7070
### Go further
7171

72-
- Do you want to create a data cleaning job you could replay based on your notebook? [Here it is](https://docs.ovh.com/ca/en/data-processing/submit-python/).
72+
- Do you want to create a data cleaning job you could replay based on your notebook? [Please refer to this guide](https://docs.ovh.com/ca/en/data-processing/submit-python/).
7373

7474
## Feedback
7575

0 commit comments

Comments
 (0)