Skip to content

Commit 9d531da

Browse files
another try
1 parent 58f576f commit 9d531da

File tree

4 files changed

+3
-5
lines changed

4 files changed

+3
-5
lines changed

sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ For most sitemaps, you can make a simple HTTP request and parse the downloaded X
5858

5959
## [](#how-to-parse-urls-from-sitemaps) How to parse URLs from sitemaps
6060

61-
The easiest part is to parse the actual URLs from the sitemap. The URLs are usually listed under `<loc>` tags. You can use Cheerio to parse the XML text and extract the URLs. Just be careful that the sitemap might contain other URLs that you don't want to crawl (e.g. /about, /contact, or various special category sections). [This article](/academy/tutorials/node-js/scraping-from-sitemaps.md) provides code examples for parsing sitemaps.
61+
The easiest part is to parse the actual URLs from the sitemap. The URLs are usually listed under `<loc>` tags. You can use Cheerio to parse the XML text and extract the URLs. Just be careful that the sitemap might contain other URLs that you don't want to crawl (e.g. /about, /contact, or various special category sections). [This article](/academy/node-js/scraping-from-sitemaps.md) provides code examples for parsing sitemaps.
6262

6363
## [](#using-crawlee) Using Crawlee
6464

sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,6 @@ await crawler.addRequests(requestsToEnqueue);
282282

283283
## Summary {#summary}
284284

285-
And that's it. We have an elegant solution for a complicated problem. In a real project, you would want to make this a bit more robust and [save analytics data](/academy/platform/expert_scraping_with_apify/saving_useful_stats.md). This will let you know what filters you went through and how many products each of them had.
285+
And that's it. We have an elegant solution for a complicated problem. In a real project, you would want to make this a bit more robust and [save analytics data](/academy/expert_scraping_with_apify/saving_useful_stats.md). This will let you know what filters you went through and how many products each of them had.
286286

287287
Check out the [full code example](https://github.com/apify-projects/apify-extra-library/tree/master/examples/crawler-with-filters).

sources/academy/webscraping/advanced_web_scraping/index.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,3 @@ If you've managed to follow along with all of the courses prior to this one, the
3030
## [](#first-up) First up
3131

3232
First, we will explore [advanced crawling section](academy/webscraping/advanced-web-scraping/advanced-crawling) that will help us to find all pages or products on the website.
33-
34-

sources/academy/webscraping/puppeteer_playwright/common_use_cases/paginating_through_results.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ import TabItem from '@theme/TabItem';
1616

1717
If you're trying to [collect data](../executing_scripts/extracting_data.md) on a website that has millions, thousands, or even hundreds of results, it is very likely that they are paginating their results to reduce strain on their back-end as well as on the users loading and rendering the content.
1818

19-
![Amazon pagination](/academy/webscraping/advanced_web_scraping/crawling/images/pagination.png)
19+
![Amazon pagination](/academy/advanced_web_scraping/crawling/images/pagination.png)
2020

2121
## Page number-based pagination {#page-number-based-pagination}
2222

0 commit comments

Comments
 (0)