You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/12_framework.md
+91-4Lines changed: 91 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -175,13 +175,100 @@ In the final statistics, you can see that we made 25 requests (1 listing page +
175
175
176
176
## Extracting data
177
177
178
-
The BeautifulSoup crawler provides handlers with the `context.soup` attribute, where we can find the parsed HTML of the handled page. This is the same as the `soup` we had in our previous program.
178
+
The BeautifulSoup crawler provides handlers with the `context.soup` attribute, where we can find the parsed HTML of the handled page. This is the same as the `soup` we had in our previous program. Let's locate and extract the same data as before:
This course is incomplete. As we work on adding new lessons, we would love to hear your feedback. You can comment right here under each page or [file a GitHub Issue](https://github.com/apify/apify-docs/issues) to discuss a problem.
191
+
Now the price. We won't be inventing anything new here-let's add `Decimal` import and copy-paste code from our old scraper.
183
192
184
-
:::
193
+
The only change will be in the selector. In `main.py`, we were looking for `.price` inside a `product_soup` representing a product card. Now we're looking for `.price` inside the whole product detail page. It's safer to be more specific so that we won't match another price on the same page:
Finally, variants. We can reuse the `parse_variant()` function as it is, and even the handler code will look similar to what we already had. The whole program will look like this:
217
+
218
+
```py
219
+
import asyncio
220
+
from decimal import Decimal
221
+
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
If you run this scraper, you should see the same data about the 24 products as before. Crawlee has saved us a lot of work with downloading, parsing, logging, and parallelization. The code is also easier to follow with the two handlers separated and labeled.
270
+
271
+
Crawlee doesn't help much with locating and extracting the data-that code is almost identical with or without framework. That's because the detective work involved, and taking care of the extraction, are the main added value of custom-made scrapers. With Crawlee, you can focus on just that, and let the framework take care of the rest.
0 commit comments