You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/10_crawling.md
+113-1Lines changed: 113 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -190,4 +190,116 @@ In the next lesson, we'll scrape the product detail pages so that each product v
190
190
191
191
<Exercises />
192
192
193
-
TODO
193
+
### Scrape calling codes of African countries
194
+
195
+
This is a follow-up to an exercise from the previous lesson, so feel free to reuse code. Scrape links to Wikipedia pages of all African states and territories. Follow the links and for each country extract the calling code, which is in the info table. Print URL and the calling code for all the countries. Start with this URL:
Hint: Locating cells in tables is sometimes easier if you know how to [go up](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#going-up) in the HTML element soup.
for name_cell in listing_soup.select(".wikitable tr td:nth-child(3)"):
239
+
link = name_cell.select_one("a")
240
+
country_url = urljoin(listing_url, link["href"])
241
+
country_soup = download(country_url)
242
+
calling_code = parse_calling_code(country_soup)
243
+
print(country_url, calling_code)
244
+
```
245
+
246
+
</details>
247
+
248
+
### Scrape authors of F1 news articles
249
+
250
+
This is a follow-up to an exercise from the previous lesson, so feel free to reuse code. Scrape links to Guardian's latest F1 news. Follow the link for each article and extract both the author's name and the article's title. Print the author's name and the title for all the articles. Start with this URL:
251
+
252
+
```text
253
+
https://www.theguardian.com/sport/formulaone
254
+
```
255
+
256
+
Your program should print something like the following:
257
+
258
+
```text
259
+
Colin Horgan: The NHL is getting its own Drive to Survive. But could it backfire?
260
+
Reuters: US GP ticket sales ‘took off’ after Max Verstappen stopped winning in F1
261
+
Giles Richards: Liam Lawson gets F1 chance to replace Pérez alongside Verstappen at Red Bull
262
+
PA Media: Lewis Hamilton reveals lifelong battle with depression after school bullying
263
+
Giles Richards: Red Bull must solve Verstappen’s ‘monster’ riddle or Norris will pounce
264
+
...
265
+
```
266
+
267
+
Hints:
268
+
269
+
- You can use [attribute selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors) to select HTML elements based on values of their attributes.
270
+
- Notice that sometimes a person authors the article, but sometimes it's a contribution by a news agency.
0 commit comments