Skip to content

Commit 4ea16ba

Browse files
committed
refactor: use for/of and .toArray() when looping over Cheerio selections
Also fixes a few bugs I noticed when trying out the code.
1 parent c3337c9 commit 4ea16ba

File tree

7 files changed

+90
-102
lines changed

7 files changed

+90
-102
lines changed

sources/academy/webscraping/scraping_basics_javascript2/06_locating_elements.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -23,20 +23,19 @@ const response = await fetch(url);
2323
if (response.ok) {
2424
const html = await response.text();
2525
const $ = cheerio.load(html);
26-
// highlight-next-line
27-
$(".product-item").each((i, element) => {
28-
// highlight-next-line
26+
// highlight-start
27+
for (const element of $(".product-item").toArray()) {
2928
console.log($(element).text());
30-
// highlight-next-line
31-
});
29+
}
30+
// highlight-end
3231
} else {
3332
throw new Error(`HTTP ${response.status}`);
3433
}
3534
```
3635

37-
We're using [`each()`](https://cheerio.js.org/docs/api/classes/Cheerio#each) to loop over the items in the Cheerio container. It calls given function for each of the elements, with two arguments. The first is an index (0, 1, 2…), and the second is the element being processed.
36+
Calling [`toArray()`](https://cheerio.js.org/docs/api/classes/Cheerio#toarray) converts the Cheerio selection to a standard JavaScript array. We can then loop over that array and process each selected element.
3837

39-
Cheerio requires us to wrap the element with `$()` again before we can work with it further, and then we call `.text()`. If we run the code, it… well, it definitely prints _something_
38+
Cheerio requires us to wrap each element with `$()` again before we can work with it further, and then we call `.text()`. If we run the code, it… well, it definitely prints _something_
4039

4140
```text
4241
$ node index.js
@@ -79,7 +78,7 @@ if (response.ok) {
7978
const html = await response.text();
8079
const $ = cheerio.load(html);
8180

82-
$(".product-item").each((i, element) => {
81+
for (const element of $(".product-item").toArray()) {
8382
const $productItem = $(element);
8483

8584
const $title = $productItem.find(".product-item__title");
@@ -89,7 +88,7 @@ if (response.ok) {
8988
const price = $price.text();
9089

9190
console.log(`${title} | ${price}`);
92-
});
91+
}
9392
} else {
9493
throw new Error(`HTTP ${response.status}`);
9594
}
@@ -175,7 +174,7 @@ if (response.ok) {
175174
const html = await response.text();
176175
const $ = cheerio.load(html);
177176

178-
$(".product-item").each((i, element) => {
177+
for (const element of $(".product-item").toArray()) {
179178
const $productItem = $(element);
180179

181180
const $title = $productItem.find(".product-item__title");
@@ -186,7 +185,7 @@ if (response.ok) {
186185
const price = $price.text();
187186

188187
console.log(`${title} | ${price}`);
189-
});
188+
}
190189
} else {
191190
throw new Error(`HTTP ${response.status}`);
192191
}
@@ -248,11 +247,11 @@ Djibouti
248247
const html = await response.text();
249248
const $ = cheerio.load(html);
250249

251-
$(".wikitable").each((i, tableElement) => {
250+
for (const tableElement of $(".wikitable").toArray()) {
252251
const $table = $(tableElement);
253252
const $rows = $table.find("tr");
254253

255-
$rows.each((j, rowElement) => {
254+
for (const rowElement of $rows.toArray()) {
256255
const $row = $(rowElement);
257256
const $cells = $row.find("td");
258257

@@ -261,12 +260,11 @@ Djibouti
261260
const $link = $thirdColumn.find("a").first();
262261
console.log($link.text());
263262
}
264-
});
265-
});
263+
}
264+
}
266265
} else {
267266
throw new Error(`HTTP ${response.status}`);
268267
}
269-
270268
```
271269

272270
Because some rows contain [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th), we skip processing a row if `table_row.select("td")` doesn't find any [table data](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td) cells.
@@ -293,11 +291,11 @@ Simplify the code from previous exercise. Use a single for loop and a single CSS
293291
const html = await response.text();
294292
const $ = cheerio.load(html);
295293

296-
$(".wikitable tr td:nth-child(3)").each((i, element) => {
294+
for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
297295
const $nameCell = $(element);
298296
const $link = $nameCell.find("a").first();
299297
console.log($link.text());
300-
});
298+
}
301299
} else {
302300
throw new Error(`HTTP ${response.status}`);
303301
}
@@ -335,9 +333,9 @@ Max Verstappen wins Canadian Grand Prix: F1 – as it happened
335333
const html = await response.text();
336334
const $ = cheerio.load(html);
337335

338-
$("#maincontent ul li h3").each((i, element) => {
336+
for (const element of $("#maincontent ul li h3").toArray()) {
339337
console.log($(element).text());
340-
});
338+
}
341339
} else {
342340
throw new Error(`HTTP ${response.status}`);
343341
}

sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ if (response.ok) {
7070
const html = await response.text();
7171
const $ = cheerio.load(html);
7272

73-
$(".product-item").each((i, element) => {
73+
for (const element of $(".product-item").toArray()) {
7474
const $productItem = $(element);
7575

7676
const $title = $productItem.find(".product-item__title");
@@ -87,7 +87,7 @@ if (response.ok) {
8787
}
8888

8989
console.log(`${title} | ${priceRange.minPrice} | ${priceRange.price}`);
90-
});
90+
}
9191
} else {
9292
throw new Error(`HTTP ${response.status}`);
9393
}
@@ -177,7 +177,7 @@ if (response.ok) {
177177
const html = await response.text();
178178
const $ = cheerio.load(html);
179179

180-
$(".product-item").each((i, element) => {
180+
for (const element of $(".product-item").toArray()) {
181181
const $productItem = $(element);
182182

183183
const $title = $productItem.find(".product-item__title");
@@ -200,7 +200,7 @@ if (response.ok) {
200200
}
201201

202202
console.log(`${title} | ${priceRange.minPrice} | ${priceRange.price}`);
203-
});
203+
}
204204
} else {
205205
throw new Error(`HTTP ${response.status}`);
206206
}
@@ -248,7 +248,7 @@ Denon AH-C720 In-Ear Headphones | 236
248248
const html = await response.text();
249249
const $ = cheerio.load(html);
250250

251-
$(".product-item").each((i, element) => {
251+
for (const element of $(".product-item").toArray()) {
252252
const $productItem = $(element);
253253

254254
const title = $productItem.find(".product-item__title");
@@ -266,7 +266,7 @@ Denon AH-C720 In-Ear Headphones | 236
266266
: parseInt(unitsText);
267267

268268
console.log(`${title} | ${unitsCount}`);
269-
});
269+
}
270270
} else {
271271
throw new Error(`HTTP ${response.status}`);
272272
}
@@ -297,7 +297,7 @@ Simplify the code from previous exercise. Use [regular expressions](https://deve
297297
const html = await response.text();
298298
const $ = cheerio.load(html);
299299

300-
$(".product-item").each((i, element) => {
300+
for (const element of $(".product-item").toArray()) {
301301
const $productItem = $(element);
302302

303303
const $title = $productItem.find(".product-item__title");
@@ -311,7 +311,7 @@ Simplify the code from previous exercise. Use [regular expressions](https://deve
311311
: parseInt(unitsText.match(/\d+/));
312312

313313
console.log(`${title} | ${unitsCount}`);
314-
});
314+
}
315315
} else {
316316
throw new Error(`HTTP ${response.status}`);
317317
}
@@ -363,7 +363,7 @@ Hints:
363363
const html = await response.text();
364364
const $ = cheerio.load(html);
365365

366-
$("#maincontent ul li").each((i, element) => {
366+
for (const element of $("#maincontent ul li").toArray()) {
367367
const $article = $(element);
368368

369369
const title = $article
@@ -377,7 +377,7 @@ Hints:
377377
const date = new Date(dateText);
378378

379379
console.log(`${title} | ${date.toDateString()}`);
380-
});
380+
}
381381
} else {
382382
throw new Error(`HTTP ${response.status}`);
383383
}

sources/academy/webscraping/scraping_basics_javascript2/08_saving_data.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ if (response.ok) {
3838
const $ = cheerio.load(html);
3939

4040
// highlight-next-line
41-
const $items = $(".product-item").map((i, element) => {
41+
const data = $(".product-item").toArray().map(element => {
4242
const $productItem = $(element);
4343

4444
const $title = $productItem.find(".product-item__title");
@@ -64,15 +64,13 @@ if (response.ok) {
6464
return { title, ...priceRange };
6565
});
6666
// highlight-next-line
67-
const data = $items.get();
68-
// highlight-next-line
6967
console.log(data);
7068
} else {
7169
throw new Error(`HTTP ${response.status}`);
7270
}
7371
```
7472

75-
Instead of printing each line, we now return the data for each product as a JavaScript object. We've replaced `.each()` with [`.map()`](https://cheerio.js.org/docs/api/classes/Cheerio#map-3), which also iterates over the selection but, in addition, collects all the results and returns them as a Cheerio collection. We then convert it into a standard JavaScript array by calling [`.get()`](https://cheerio.js.org/docs/api/classes/Cheerio#call-signature-32). Near the end of the program, we print the entire array.
73+
Instead of printing each line, we now return the data for each product as a JavaScript object. We've replaced the `for` loop with [`.map()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map), which also iterates over the selection but, in addition, collects all the results and returns them as another array. Near the end of the program, we print this entire array.
7674

7775
:::tip Advanced syntax
7876

sources/academy/webscraping/scraping_basics_javascript2/09_getting_links.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ if (response.ok) {
4343
const html = await response.text();
4444
const $ = cheerio.load(html);
4545

46-
const $items = $(".product-item").map((i, element) => {
46+
const data = $(".product-item").toArray().map(element => {
4747
const $productItem = $(element);
4848

4949
const $title = $productItem.find(".product-item__title");
@@ -67,7 +67,6 @@ if (response.ok) {
6767

6868
return { title, ...priceRange };
6969
});
70-
const data = $items.get();
7170

7271
const jsonData = JSON.stringify(data);
7372
await writeFile('products.json', jsonData);
@@ -190,12 +189,11 @@ async function exportCSV(data) {
190189
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
191190
const $ = await download(listingURL);
192191

193-
const $items = $(".product-item").map((i, element) => {
192+
const data = $(".product-item").toArray().map(element => {
194193
const $productItem = $(element);
195194
const item = parseProduct($productItem);
196195
return item;
197196
});
198-
const data = $items.get();
199197

200198
await writeFile('products.json', exportJSON(data));
201199
await writeFile('products.csv', await exportCSV(data));
@@ -286,13 +284,12 @@ Now we'll pass the base URL to the function in the main body of our program:
286284
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
287285
const $ = await download(listingURL);
288286

289-
const $items = $(".product-item").map((i, element) => {
287+
const data = $(".product-item").toArray().map(element => {
290288
const $productItem = $(element);
291289
// highlight-next-line
292290
const item = parseProduct($productItem, listingURL);
293291
return item;
294292
});
295-
const data = $items.get();
296293
```
297294

298295
When we run the scraper now, we should see full URLs in our exports:
@@ -353,12 +350,12 @@ https://en.wikipedia.org/wiki/Botswana
353350
const html = await response.text();
354351
const $ = cheerio.load(html);
355352

356-
$(".wikitable tr td:nth-child(3)").each((i, element) => {
353+
for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
357354
const nameCell = $(element);
358355
const link = nameCell.find("a").first();
359356
const url = new URL(link.attr("href"), listingURL).href;
360357
console.log(url);
361-
});
358+
}
362359
} else {
363360
throw new Error(`HTTP ${response.status}`);
364361
}
@@ -397,11 +394,11 @@ https://www.theguardian.com/sport/article/2024/sep/02/max-verstappen-damns-his-u
397394
const html = await response.text();
398395
const $ = cheerio.load(html);
399396

400-
$("#maincontent ul li").each((i, element) => {
397+
for (const element of $("#maincontent ul li").toArray()) {
401398
const link = $(element).find("a").first();
402399
const url = new URL(link.attr("href"), listingURL).href;
403400
console.log(url);
404-
});
401+
}
405402
} else {
406403
throw new Error(`HTTP ${response.status}`);
407404
}

0 commit comments

Comments
 (0)