You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/12_framework.md
+85-3Lines changed: 85 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -406,8 +406,90 @@ In the next lesson, we'll use a scraping platform to set up our application to r
406
406
407
407
<Exercises />
408
408
409
-
:::danger Work in progress
409
+
### Build a Crawlee scraper of F1 Academy drivers
410
410
411
-
This course is incomplete. As we work on adding new lessons, we would love to hear your feedback. You can comment right here under each page or [file a GitHub Issue](https://github.com/apify/apify-docs/issues)to discuss a problem.
411
+
Scrape information about all [F1 Academy](https://en.wikipedia.org/wiki/F1_Academy) drivers listed on the official [Drivers](https://www.f1academy.com/Racing-Series/Drivers) page. Each item you push to the Crawlee's default dataset should contain the following data:
412
412
413
-
:::
413
+
- URL of the driver's f1academy.com page
414
+
- Name
415
+
- Team
416
+
- Nationality
417
+
- Date of birth (as a `date()` object)
418
+
- Instagram URL
419
+
420
+
If you export the dataset as a JSON, you should see something like this:
- Use Python's native `datetime.strptime(text, "%d/%m/%Y").date()` to parse the `DD/MM/YYYY` date format. See [docs](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime) to learn more.
447
+
- Use the attribute selector `a[href*='instagram']` to locate the Instagram URL. See [docs](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors) to learn more.
448
+
449
+
<details>
450
+
<summary>Solution</summary>
451
+
452
+
```py
453
+
import asyncio
454
+
from datetime import datetime
455
+
456
+
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
0 commit comments