Skip to content

Commit 009f6fd

Browse files
fix: handle all broken links
1 parent e787093 commit 009f6fd

File tree

5 files changed

+7
-7
lines changed

5 files changed

+7
-7
lines changed

sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ For most sitemaps, you can make a simple HTTP request and parse the downloaded X
5757

5858
## How to parse URLs from sitemaps
5959

60-
The easiest part is to parse the actual URLs from the sitemap. The URLs are usually listed under `<loc>` tags. You can use Cheerio to parse the XML text and extract the URLs. Just be careful that the sitemap might contain other URLs that you don't want to crawl (e.g. /about, /contact, or various special category sections). [This article](/academy/node-js/scraping-from-sitemaps.md) provides code examples for parsing sitemaps.
60+
The easiest part is to parse the actual URLs from the sitemap. The URLs are usually listed under `<loc>` tags. You can use Cheerio to parse the XML text and extract the URLs. Just be careful that the sitemap might contain other URLs that you don't want to crawl (e.g. /about, /contact, or various special category sections). [This article](/academy/node-js/scraping-from-sitemaps) provides code examples for parsing sitemaps.
6161

6262
## Using Crawlee
6363

sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ sidebar_position:: 3
55
slug: /advanced-web-scraping/crawling/crawling-with-search
66
---
77

8-
# Scraping websites with search
8+
# Scraping websites with search
99

10-
In this lesson, we will start with a simpler example of scraping HTML based websites with limited pagination.
10+
In this lesson, we will start with a simpler example of scraping HTML based websites with limited pagination.
1111

1212
Limited pagination is a common practice on e-commerce sites and is becoming more popular over time. It makes sense: a real user will never want to look through more than 200 pages of results – only bots love unlimited pagination. Fortunately, there are ways to overcome this limit while keeping our code clean and generic.
1313

@@ -281,6 +281,6 @@ await crawler.addRequests(requestsToEnqueue);
281281

282282
## Summary {#summary}
283283

284-
And that's it. We have an elegant solution for a complicated problem. In a real project, you would want to make this a bit more robust and [save analytics data](/academy/expert_scraping_with_apify/saving_useful_stats.md). This will let you know what filters you went through and how many products each of them had.
284+
And that's it. We have an elegant solution for a complicated problem. In a real project, you would want to make this a bit more robust and [save analytics data](../../../platform/expert_scraping_with_apify/saving_useful_stats.md). This will let you know what filters you went through and how many products each of them had.
285285

286286
Check out the [full code example](https://github.com/apify-projects/apify-extra-library/tree/master/examples/crawler-with-filters).

sources/academy/webscraping/advanced_web_scraping/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@ If you've managed to follow along with all of the courses prior to this one, the
2828

2929
## First up
3030

31-
First, we will explore [advanced crawling section](academy/webscraping/advanced-web-scraping/advanced-crawling) that will help us to find all pages or products on the website.
31+
First, we will explore [advanced crawling section](./crawling/sitemaps-vs-search.md) that will help us to find all pages or products on the website.

sources/academy/webscraping/api_scraping/general_api_scraping/handling_pagination.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ Here's what the output of this code looks like:
198198
199199
## Final note {#final-note}
200200
201-
Sometimes, APIs have limited pagination. That means that they limit the total number of results that can appear for a set of pages, or that they limit the pages to a certain number. To learn how to handle these cases, take a look at [this short article](/academy/advanced-web-scraping/scraping-paginated-sites).
201+
Sometimes, APIs have limited pagination. That means that they limit the total number of results that can appear for a set of pages, or that they limit the pages to a certain number. To learn how to handle these cases, take a look at [this short article](/academy/advanced-web-scraping/crawling/crawling-with-search).
202202
203203
## Next up {#next}
204204

sources/academy/webscraping/puppeteer_playwright/common_use_cases/paginating_through_results.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ import TabItem from '@theme/TabItem';
1616

1717
If you're trying to [collect data](../executing_scripts/extracting_data.md) on a website that has millions, thousands, or even hundreds of results, it is very likely that they are paginating their results to reduce strain on their back-end as well as on the users loading and rendering the content.
1818

19-
![Amazon pagination](/academy/advanced_web_scraping/crawling/images/pagination.png)
19+
![Amazon pagination](../../advanced_web_scraping/crawling/images/pagination.png)
2020

2121
## Page number-based pagination {#page-number-based-pagination}
2222

0 commit comments

Comments
 (0)