Skip to content

Commit 7680d36

Browse files
committed
style: better English
1 parent f88abc8 commit 7680d36

File tree

2 files changed

+17
-17
lines changed

2 files changed

+17
-17
lines changed

sources/academy/webscraping/scraping_basics_python/10_crawling.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ In the next lesson, we'll scrape the product detail pages so that each product v
192192

193193
### Scrape calling codes of African countries
194194

195-
This is a follow-up to an exercise from the previous lesson, so feel free to reuse code. Scrape links to Wikipedia pages of all African states and territories. Follow the links and for each country extract the calling code, which is in the info table. Print URL and the calling code for all the countries. Start with this URL:
195+
This is a follow-up to an exercise from the previous lesson, so feel free to reuse your code. Scrape links to Wikipedia pages for all African states and territories. Follow each link and extract the calling code from the info table. Print the URL and the calling code for each country. Start with this URL:
196196

197197
```text
198198
https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa
@@ -211,7 +211,7 @@ https://en.wikipedia.org/wiki/Cameroon +237
211211
...
212212
```
213213

214-
Hint: Locating cells in tables is sometimes easier if you know how to [go up](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#going-up) in the HTML element soup.
214+
Hint: Locating cells in tables is sometimes easier if you know how to [navigate up](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#going-up) in the HTML element soup.
215215

216216
<details>
217217
<summary>Solution</summary>
@@ -247,13 +247,13 @@ Hint: Locating cells in tables is sometimes easier if you know how to [go up](ht
247247

248248
### Scrape authors of F1 news articles
249249

250-
This is a follow-up to an exercise from the previous lesson, so feel free to reuse code. Scrape links to Guardian's latest F1 news. Follow the link for each article and extract both the author's name and the article's title. Print the author's name and the title for all the articles. Start with this URL:
250+
This is a follow-up to an exercise from the previous lesson, so feel free to reuse your code. Scrape links to the Guardian's latest F1 news articles. For each article, follow the link and extract both the author's name and the article's title. Print the author's name and the title for all the articles. Start with this URL:
251251

252252
```text
253253
https://www.theguardian.com/sport/formulaone
254254
```
255255

256-
Your program should print something like the following:
256+
Your program should print something like this:
257257

258258
```text
259259
Daniel Harris: Sports quiz of the week: Johan Neeskens, Bond and airborne antics
@@ -266,8 +266,8 @@ PA Media: Lewis Hamilton reveals lifelong battle with depression after school bu
266266

267267
Hints:
268268

269-
- You can use [attribute selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors) to select HTML elements based on values of their attributes.
270-
- Notice that sometimes a person authors the article, but sometimes it's a contribution by a news agency.
269+
- You can use [attribute selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors) to select HTML elements based on their attribute values.
270+
- Sometimes a person authors the article, but other times it's contributed by a news agency.
271271

272272
<details>
273273
<summary>Solution</summary>

sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -309,10 +309,10 @@ Is this the end? Maybe! In the next lesson, we'll use a scraping framework to bu
309309

310310
### Build a scraper for watching Python jobs
311311

312-
You're now able to build a scraper, are you? Let's build another one, then! Python's official website features a [job board](https://www.python.org/jobs/). Scrape job postings which match the following criteria:
312+
You're able to build a scraper now, aren't you? Let's build another one! Python's official website has a [job board](https://www.python.org/jobs/). Scrape the job postings that match the following criteria:
313313

314-
- Tagged as Database
315-
- Not older than 60 days
314+
- Tagged as "Database"
315+
- Posted within the last 60 days
316316

317317
For each job posting found, use [`pp()`](https://docs.python.org/3/library/pprint.html#pprint.pp) to print a dictionary containing the following data:
318318

@@ -321,7 +321,7 @@ For each job posting found, use [`pp()`](https://docs.python.org/3/library/pprin
321321
- URL to the job posting
322322
- Date of posting
323323

324-
Your program should print something like the following:
324+
Your output should look something like this:
325325

326326
```text
327327
{'title': 'Senior Full Stack Developer',
@@ -335,12 +335,12 @@ Your program should print something like the following:
335335
...
336336
```
337337

338-
In Python's [`datetime`](https://docs.python.org/3/library/datetime.html) module you should find everything you need for manipulating time: `date.today()`, `datetime.fromisoformat()`, `datetime.date()`, `timedelta()`.
338+
You can find everything you need for working with dates and times in Python's [`datetime`](https://docs.python.org/3/library/datetime.html) module, including `date.today()`, `datetime.fromisoformat()`, `datetime.date()`, and `timedelta()`.
339339

340340
<details>
341341
<summary>Solution</summary>
342342

343-
After inspecting how the job board works, we can notice that job postings tagged as Database have their own URL. We'll use it as the starting point, as it'll save us from needing to scrape and check the tags.
343+
After inspecting the job board, you'll notice that job postings tagged as "Database" have a dedicated URL. We'll use that as our starting point, which saves us from having to scrape and check the tags manually.
344344

345345
```py
346346
from pprint import pp
@@ -376,13 +376,13 @@ In Python's [`datetime`](https://docs.python.org/3/library/datetime.html) module
376376

377377
Scrape the [CNN Sports](https://edition.cnn.com/sport) homepage. For each linked article, calculate its length in characters:
378378

379-
- Locate element which holds the main content of the article.
380-
- Use [`get_text()`](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#get-text) to get all its content as a plain text.
381-
- Use `len()` to calculate the length.
379+
- Locate the element that holds the main content of the article.
380+
- Use [`get_text()`](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#get-text) to extract all the content as plain text.
381+
- Use `len()` to calculate the character count.
382382

383-
Skip pages without text, e.g. those which contain only a video. Sort the results and print URL to the shortest article which made it to the homepage.
383+
Skip pages without text (like those that only have a video). Sort the results and print the URL of the shortest article that made it to the homepage.
384384

385-
At the time of writing this exercise, the shortest article which made it to the CNN Sports homepage is [one about a donation to the Augusta National Golf Club](https://edition.cnn.com/2024/10/03/sport/masters-donation-hurricane-helene-relief-spt-intl/). It's just 1,642 characters long.
385+
At the time of writing, the shortest article on the CNN Sports homepage is [about a donation to the Augusta National Golf Club](https://edition.cnn.com/2024/10/03/sport/masters-donation-hurricane-helene-relief-spt-intl/), which is just 1,642 characters long.
386386

387387
<details>
388388
<summary>Solution</summary>

0 commit comments

Comments
 (0)