Skip to content

Commit ea4ec88

Browse files
committed
style: better English in the crawling lesson
1 parent 16ea039 commit ea4ec88

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

sources/academy/webscraping/scraping_basics_python/10_crawling.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@ slug: /scraping-basics-python/crawling
88

99
import Exercises from './_exercises.mdx';
1010

11-
**In this lesson, we'll follow links to individual product pages. We'll use HTTPX to download them, and BeautifulSoup to process them.**
11+
**In this lesson, we'll follow links to individual product pages. We'll use HTTPX to download them and BeautifulSoup to process them.**
1212

1313
---
1414

15-
In previous lessons we've managed to download HTML code of a single page, parse it with BeautifulSoup, and extract relevant data from it. We'll do the same now for each of the products.
15+
In previous lessons we've managed to download the HTML code of a single page, parse it with BeautifulSoup, and extract relevant data from it. We'll do the same now for each of the products.
1616

17-
Thanks to the refactoring we have functions ready for each of the tasks, so we won't need to repeat ourselves in our code. This is what you should see in your editor now:
17+
Thanks to the refactoring, we have functions ready for each of the tasks, so we won't need to repeat ourselves in our code. This is what you should see in your editor now:
1818

1919
```py
2020
import httpx
@@ -89,7 +89,7 @@ Each product URL points to a so-called _product detail page_, or PDP. If we open
8989

9090
![Product detail page](./images/pdp.png)
9191

92-
Depending on what's valuable for our use case, we can now use the same techniques as in previous lessons to extract any of the above. As a demonstration, let's scrape the vendor name. In browser DevTools we can see that the HTML around the vendor name has the following structure:
92+
Depending on what's valuable for our use case, we can now use the same techniques as in previous lessons to extract any of the above. As a demonstration, let's scrape the vendor name. In browser DevTools, we can see that the HTML around the vendor name has the following structure:
9393

9494
```html
9595
<div class="product-meta">
@@ -123,7 +123,7 @@ Depending on what's valuable for our use case, we can now use the same technique
123123
</div>
124124
```
125125

126-
It looks like using a CSS selector to locate element having the `product-meta__vendor` class and extracting its text should be enough to get the vendor name as a string:
126+
It looks like using a CSS selector to locate the element with the `product-meta__vendor` class, and then extracting its text, should be enough to get the vendor name as a string:
127127

128128
```py
129129
vendor = product_soup.select_one(".product-meta__vendor").text.strip()
@@ -133,7 +133,7 @@ But where do we put this line in our program?
133133

134134
## Crawling product detail pages
135135

136-
In the `data` loop we already go through all the products. Let's expand it so it also includes downloading the product detail page, parsing it, extracting the name of the vendor, and adding it as a new dictionary key to the item:
136+
In the `data` loop we're already going through all the products. Let's expand it to include downloading the product detail page, parsing it, extracting the vendor's name, and adding it as a new key in the item's dictionary:
137137

138138
```py
139139
...
@@ -153,7 +153,7 @@ for product in listing_soup.select(".product-item"):
153153
...
154154
```
155155

156-
If you run the program now, it will take longer to finish, as it now makes 24 more HTTP requests, but in the end it should produce exports with a new field containing the vendor:
156+
If you run the program now, it'll take longer to finish since it's making 24 more HTTP requests. But in the end, it should produce exports with a new field containing the vendor's name:
157157

158158
<!-- eslint-skip -->
159159
```json title=products.json
@@ -178,13 +178,13 @@ If you run the program now, it will take longer to finish, as it now makes 24 mo
178178

179179
## Extracting price
180180

181-
Being able to scrape vendor name is nice, but the main reason we started peeking at the detail pages in the first place was to figure out how to get a price for each product, because from the product listing we could only scrape the min price. And we're building a Python application for watching prices, remember?
181+
Scraping the vendor's name is nice, but the main reason we started checking the detail pages in the first place was to figure out how to get a price for each product. From the product listing, we could only scrape the min price, and remember—we’re building a Python app to track prices!
182182

183-
Looking at [Sony XBR-950G BRAVIA](https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv), it's apparent that the listing features only min prices, because some of the products have variants, each with a different price. And different stock availability. And different SKU
183+
Looking at the [Sony XBR-950G BRAVIA](https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv), it's clear that the listing only shows min prices, because some products have variants, each with a different price. And different stock availability. And different SKUs
184184

185185
![Morpheus revealing the existence of product variants](images/variants.png)
186186

187-
In the next lesson we'll scrape the product detail pages in such way that each product variant gets represented as a separate item in our dataset.
187+
In the next lesson, we'll scrape the product detail pages so that each product variant is represented as a separate item in our dataset.
188188

189189
---
190190

0 commit comments

Comments
 (0)