You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_javascript2/11_scraping_variants.md
+88-49Lines changed: 88 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -348,69 +348,108 @@ Is this the end? Maybe! In the next lesson, we'll use a scraping framework to bu
348
348
349
349
<Exercises />
350
350
351
-
### Build a scraper for watching Python jobs
351
+
### Build a scraper for watching npm packages
352
352
353
-
You're able to build a scraper now, aren't you? Let's build another one! Python's official website has a [job board](https://www.python.org/jobs/). Scrape the job postings that match the following criteria:
353
+
You can build a scraper now, can't you? Let's build another one! From the registry at [npmjs.com](https://www.npmjs.com/), scrape information about npm packages that match the following criteria:
354
354
355
-
-Tagged as "Database"
356
-
-Posted within the last 60 days
355
+
-Have the keyword "llm" (as in _large language model_)
356
+
-Updated within the last two years ("2 years ago" is okay; "3 years ago" is too old)
357
357
358
-
For each job posting found, use [`pp()`](https://docs.python.org/3/library/pprint.html#pprint.pp) to print a dictionary containing the following data:
358
+
Print an array of the top 5 packages with the most dependents. Each package should be represented by an object containing the following data:
description:'Core LangChain.js abstractions and schemas',
381
+
dependents:730,
382
+
downloads:5994
383
+
},
384
+
...
385
+
]
377
386
```
378
387
379
-
You can find everything you need for working with dates and times in Python's [`datetime`](https://docs.python.org/3/library/datetime.html) module, including `date.today()`, `datetime.fromisoformat()`, `datetime.date()`, and `timedelta()`.
380
-
381
388
<details>
382
389
<summary>Solution</summary>
383
390
384
-
After inspecting the job board, you'll notice that job postings tagged as "Database" have a dedicated URL. We'll use that as our starting point, which saves us from having to scrape and check the tags manually.
After inspecting the registry, you'll notice that packages with the keyword "llm" have a dedicated URL. Also, changing the sorting dropdown results in a page with its own URL. We'll use that as our starting point, which saves us from having to scrape the whole registry and then filter by keyword or sort by the number of dependents.
Since the HTML doesn't contain any descriptive classes, we must rely on its structure. We're using [`.children()`](https://cheerio.js.org/docs/api/classes/Cheerio#children) to carefully navigate the HTML element tree.
450
+
451
+
For items older than 2 years, we return `null` instead of an item. Before printing the results, we use [.filter()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter) to remove these empty values and [.splice()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/splice) the array down to just 5 items.
452
+
414
453
</details>
415
454
416
455
### Find the shortest CNN article which made it to the Sports homepage
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -308,7 +308,7 @@ Is this the end? Maybe! In the next lesson, we'll use a scraping framework to bu
308
308
309
309
### Build a scraper for watching Python jobs
310
310
311
-
You're able to build a scraper now, aren't you? Let's build another one! Python's official website has a [job board](https://www.python.org/jobs/). Scrape the job postings that match the following criteria:
311
+
You can build a scraper now, can't you? Let's build another one! Python's official website has a [job board](https://www.python.org/jobs/). Scrape the job postings that match the following criteria:
0 commit comments