Skip to content

Commit 9a1818b

Browse files
authored
Update processing_data.md
1 parent 4392719 commit 9a1818b

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

content/academy/web_scraping_for_beginners/crawling/processing_data.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,13 @@ To access the default dataset, we can use the [`Dataset`](https://crawlee.dev/a
2121

2222
```JavaScript
2323
// dataset.js
24-
import { Dataset } from 'crawlee';
24+
import { Dataset, } from 'crawlee';
25+
26+
// Crawlee automatically deletes data from its previous runs.
27+
// We can turn this off by setting 'purgeOnStart' to false.
28+
// If we did not do this, we would have no data to process.
29+
// This is a temporary workaround, and we'll add a better interface soon.
30+
Configuration.getGlobalConfig().set('purgeOnStart', false);
2531

2632
const dataset = await Dataset.open();
2733

@@ -39,6 +45,8 @@ Let's say we wanted to print the title for each product that is more expensive t
3945
// dataset.js
4046
import { Dataset } from 'crawlee';
4147

48+
Configuration.getGlobalConfig().set('purgeOnStart', false);
49+
4250
const { items } = await Dataset.getData();
4351

4452
let mostExpensive;
@@ -47,7 +55,7 @@ console.log('All items over $50 USD:');
4755
for (const { title, price } of items) {
4856
// Use a regular expression to filter out the
4957
// non-number and non-decimal characters
50-
const numPrice = +price.replace(/[^0-9.]/g, '');
58+
const numPrice = Number(price.replace(/[^0-9.]/g, ''));
5159
if (numPrice > 50) console.table({ title, price });
5260
if (numPrice > mostExpensive.price) mostExpensive = { title, price };
5361
}
@@ -60,7 +68,7 @@ In our case, the most expensive product was the Macbook Pro. Surprising? Heh, no
6068

6169
## [](#converting-to-excel) Converting the dataset to Excel
6270

63-
We promised that you won't need an Apify account for anything in this course, and it's true. You can use the skills learned in the [Save to CSV lesson]({{@link web_scraping_for_beginners/data_collection/save_to_csv.md}}) to save the dataset to a CSV. Just use the loading code from above, plug it in there and then open the CSV in Excel. However, we really want to show you this neat trick. It won't cost you anything, we promise, and it's a cool and fast way to convert datasets to any format.
71+
We promised that you won't need an Apify account for anything in this course, and it's true. You can use the skills learned in the [Save to CSV lesson]({{@link web_scraping_for_beginners/data_collection/save_to_csv.md}}) to save the dataset to a CSV. Just use the loading code from above, plug it in there and then open the CSV in Excel. However, we really want to show you this neat trick. It won't cost you anything, and it's a cool and fast way to convert datasets to any format.
6472

6573
### [](#get-apify-token) Getting an Apify token
6674

@@ -77,6 +85,8 @@ Now that you have a token, you can upload your local dataset to the Apify platfo
7785
import { Dataset } from 'crawlee';
7886
import { ApifyClient } from 'apify-client';
7987

88+
Configuration.getGlobalConfig().set('purgeOnStart', false);
89+
8090
const { items } = await Dataset.getData();
8191

8292
// We will use the Apify API client to access the Apify API.
@@ -110,6 +120,8 @@ import { Dataset } from 'crawlee';
110120
import { ApifyClient } from 'apify-client';
111121
import { writeFileSync } from 'fs';
112122

123+
Configuration.getGlobalConfig().set('purgeOnStart', false);
124+
113125
const { items } = await Dataset.getData();
114126

115127
const apifyClient = new ApifyClient({

0 commit comments

Comments
 (0)