Skip to content

Commit 2d40cc9

Browse files
honzajavorekdaveomri
authored andcommitted
style: use dollar variables (extracting data) (apify#1842)
As I progressed with apify#1584 I felt the code examples were starting to be more and more complex. Then I remembered that when I was young, us jQuery folks used to lean towards a naming convention where variables holding jQuery selections were prefixed with $. I changed the code examples in all lessons to adhere to this as I feel it makes them more readable and less cluttered. ----- ℹ️ The changes still use `$.map` and `$.each`, because they were made prior to the facb3c0 commit. It's gonna happen, but not yet.
1 parent 9b1619f commit 2d40cc9

File tree

1 file changed

+33
-33
lines changed

1 file changed

+33
-33
lines changed

sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ It's because some products have variants with different prices. Later in the cou
3636
Ideally we'd go and discuss the problem with those who are about to use the resulting data. For their purposes, is the fact that some prices are just minimum prices important? What would be the most useful representation of the range for them? Maybe they'd tell us that it's okay if we just remove the `From` prefix?
3737

3838
```js
39-
const priceText = price.text().replace("From ", "");
39+
const priceText = $price.text().replace("From ", "");
4040
```
4141

4242
In other cases, they'd tell us the data must include the range. And in cases when we just don't know, the safest option is to include all the information we have and leave the decision on what's important to later stages. One approach could be having the exact and minimum prices as separate values. If we don't know the exact price, we leave it empty:
4343

4444
```js
4545
const priceRange = { minPrice: null, price: null };
46-
const priceText = price.text()
46+
const priceText = $price.text()
4747
if (priceText.startsWith("From ")) {
4848
priceRange.minPrice = priceText.replace("From ", "");
4949
} else {
@@ -71,22 +71,22 @@ if (response.ok) {
7171
const $ = cheerio.load(html);
7272

7373
$(".product-item").each((i, element) => {
74-
const productItem = $(element);
74+
const $productItem = $(element);
7575

76-
const title = productItem.find(".product-item__title");
77-
const titleText = title.text();
76+
const $title = $productItem.find(".product-item__title");
77+
const title = $title.text();
7878

79-
const price = productItem.find(".price").contents().last();
79+
const $price = $productItem.find(".price").contents().last();
8080
const priceRange = { minPrice: null, price: null };
81-
const priceText = price.text();
81+
const priceText = $price.text();
8282
if (priceText.startsWith("From ")) {
8383
priceRange.minPrice = priceText.replace("From ", "");
8484
} else {
8585
priceRange.minPrice = priceText;
8686
priceRange.price = priceRange.minPrice;
8787
}
8888

89-
console.log(`${titleText} | ${priceRange.minPrice} | ${priceRange.price}`);
89+
console.log(`${title} | ${priceRange.minPrice} | ${priceRange.price}`);
9090
});
9191
} else {
9292
throw new Error(`HTTP ${response.status}`);
@@ -100,9 +100,9 @@ Often, the strings we extract from a web page start or end with some amount of w
100100
We call the operation of removing whitespace _trimming_ or _stripping_, and it's so useful in many applications that programming languages and libraries include ready-made tools for it. Let's add JavaScript's built-in [.trim()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim):
101101

102102
```js
103-
const titleText = title.text().trim();
103+
const title = $title.text().trim();
104104

105-
const priceText = price.text().trim();
105+
const priceText = $price.text().trim();
106106
```
107107

108108
## Removing dollar sign and commas
@@ -124,7 +124,7 @@ The demonstration above is inside the Node.js' [interactive REPL](https://nodejs
124124
We need to remove the dollar sign and the decimal commas. For this type of cleaning, [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions) are often the best tool for the job, but in this case [`.replace()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) is also sufficient:
125125

126126
```js
127-
const priceText = price
127+
const priceText = $price
128128
.text()
129129
.trim()
130130
.replace("$", "")
@@ -137,7 +137,7 @@ Now we should be able to add `parseFloat()`, so that we have the prices not as a
137137

138138
```js
139139
const priceRange = { minPrice: null, price: null };
140-
const priceText = price.text()
140+
const priceText = $price.text()
141141
if (priceText.startsWith("From ")) {
142142
priceRange.minPrice = parseFloat(priceText.replace("From ", ""));
143143
} else {
@@ -156,7 +156,7 @@ Great! Only if we didn't overlook an important pitfall called [floating-point er
156156
These errors are small and usually don't matter, but sometimes they can add up and cause unpleasant discrepancies. That's why it's typically best to avoid floating point numbers when working with money. We won't store dollars, but cents:
157157

158158
```js
159-
const priceText = price
159+
const priceText = $price
160160
.text()
161161
.trim()
162162
.replace("$", "")
@@ -178,14 +178,14 @@ if (response.ok) {
178178
const $ = cheerio.load(html);
179179

180180
$(".product-item").each((i, element) => {
181-
const productItem = $(element);
181+
const $productItem = $(element);
182182

183-
const title = productItem.find(".product-item__title");
184-
const titleText = title.text().trim();
183+
const $title = $productItem.find(".product-item__title");
184+
const titleText = $title.text().trim();
185185

186-
const price = productItem.find(".price").contents().last();
186+
const $price = $productItem.find(".price").contents().last();
187187
const priceRange = { minPrice: null, price: null };
188-
const priceText = price
188+
const priceText = $price
189189
.text()
190190
.trim()
191191
.replace("$", "")
@@ -199,7 +199,7 @@ if (response.ok) {
199199
priceRange.price = priceRange.minPrice;
200200
}
201201

202-
console.log(`${titleText} | ${priceRange.minPrice} | ${priceRange.price}`);
202+
console.log(`${title} | ${priceRange.minPrice} | ${priceRange.price}`);
203203
});
204204
} else {
205205
throw new Error(`HTTP ${response.status}`);
@@ -259,15 +259,15 @@ Denon AH-C720 In-Ear Headphones | 236
259259
const $ = cheerio.load(html);
260260

261261
$(".product-item").each((i, element) => {
262-
const productItem = $(element);
262+
const $productItem = $(element);
263263

264-
const title = productItem.find(".product-item__title");
265-
const titleText = title.text().trim();
264+
const title = $productItem.find(".product-item__title");
265+
const title = $title.text().trim();
266266

267-
const unitsText = productItem.find(".product-item__inventory").text();
267+
const unitsText = $productItem.find(".product-item__inventory").text();
268268
const unitsCount = parseUnitsText(unitsText);
269269

270-
console.log(`${titleText} | ${unitsCount}`);
270+
console.log(`${title} | ${unitsCount}`);
271271
});
272272
} else {
273273
throw new Error(`HTTP ${response.status}`);
@@ -308,15 +308,15 @@ Simplify the code from previous exercise. Use [regular expressions](https://deve
308308
const $ = cheerio.load(html);
309309

310310
$(".product-item").each((i, element) => {
311-
const productItem = $(element);
311+
const $productItem = $(element);
312312

313-
const title = productItem.find(".product-item__title");
314-
const titleText = title.text().trim();
313+
const $title = $productItem.find(".product-item__title");
314+
const title = $title.text().trim();
315315

316-
const unitsText = productItem.find(".product-item__inventory").text();
316+
const unitsText = $productItem.find(".product-item__inventory").text();
317317
const unitsCount = parseUnitsText(unitsText);
318318

319-
console.log(`${titleText} | ${unitsCount}`);
319+
console.log(`${title} | ${unitsCount}`);
320320
});
321321
} else {
322322
throw new Error(`HTTP ${response.status}`);
@@ -370,19 +370,19 @@ Hints:
370370
const $ = cheerio.load(html);
371371

372372
$("#maincontent ul li").each((i, element) => {
373-
const article = $(element);
373+
const $article = $(element);
374374

375-
const titleText = article
375+
const title = $article
376376
.find("h3")
377377
.text()
378378
.trim();
379-
const dateText = article
379+
const dateText = $article
380380
.find("time")
381381
.attr("datetime")
382382
.trim();
383383
const date = new Date(dateText);
384384

385-
console.log(`${titleText} | ${date.toDateString()}`);
385+
console.log(`${title} | ${date.toDateString()}`);
386386
});
387387
} else {
388388
throw new Error(`HTTP ${response.status}`);

0 commit comments

Comments
 (0)