Skip to content

Commit 30a75bd

Browse files
committed
fix: part with samples more comprehensible, add gif of variants
1 parent 017d2ae commit 30a75bd

File tree

2 files changed

+6
-4
lines changed

2 files changed

+6
-4
lines changed

sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ First, let's extract information about the variants. If we go to [Sony XBR-950G
3939

4040
Nice! We can extract the variant names, but we also need to extract the price for each variant. Switching the variants using the buttons shows us that the HTML changes dynamically. This means the page uses JavaScript to display information about the variants.
4141

42+
![Switching variants](images/variants-js.gif)
43+
4244
If we can't find a workaround, we'd need our scraper to run JavaScript. That's not impossible. Scrapers can spin up their own browser instance and automate clicking on buttons, but it's slow and resource-intensive. Ideally, we want to stick to plain HTTP requests and Beautiful Soup as much as possible.
4345

4446
After a bit of detective work, we notice that not far below the `block-swatch-list` there's also a block of HTML with a class `no-js`, which contains all the data!
@@ -103,7 +105,7 @@ Since Python 3.9, you can use `|` to merge two dictionaries. If the [docs](https
103105

104106
:::
105107

106-
If you run the program, you should see 34 items in total. Some items should have no variant:
108+
If you run the program, you should see 34 items in total. Some items don't have variants, so they won't have a variant name. However, they should still have a price set—our scraper should already have that info from the product listing page.
107109

108110
<!-- eslint-skip -->
109111
```json title=products.json
@@ -121,7 +123,7 @@ If you run the program, you should see 34 items in total. Some items should have
121123
]
122124
```
123125

124-
Some products where we're missing the actual price should now have several variants:
126+
Some products will break into several items, each with a different variant name. We don't know their exact prices from the product listing, just the min price. In the next step, we should be able to parse the actual price from the variant name for those items.
125127

126128
<!-- eslint-skip -->
127129
```json title=products.json
@@ -147,7 +149,7 @@ Some products where we're missing the actual price should now have several varia
147149
]
148150
```
149151

150-
However, some products with variants will have the `price` field set. That's because the shop sells all these variants for the same price, so the product listing displays the price as a fixed amount:
152+
Perhaps surprisingly, some products with variants will have the price field set. That's because the shop sells all variants of the product for the same price, so the product listing shows the price as a fixed amount, like _$74.95_, instead of _from $74.95_.
151153

152154
<!-- eslint-skip -->
153155
```json title=products.json
@@ -167,7 +169,7 @@ However, some products with variants will have the `price` field set. That's bec
167169

168170
## Parsing price
169171

170-
The items now contain the variant as text, which is good for a start, but it would be more useful to set the price in the `price` key. Let's introduce a new function to handle that:
172+
The items now contain the variant as text, which is good for a start, but we want the price to be in the `price` key. Let's introduce a new function to handle that:
171173

172174
```py
173175
def parse_variant(variant):
1.26 MB
Loading

0 commit comments

Comments
 (0)