Skip to content

Commit 060149d

Browse files
committed
a
2 parents fe2ef64 + ce6dc84 commit 060149d

19 files changed

+23
-19
lines changed

content/academy/anti_scraping.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Solely based on the way how the bots operate. It comperes data-rich pages visits
7373

7474
By definition, this is not an anti-scraping method, but it can heavily affect the reliability of a scraper. If your target website drastically changes its CSS selectors, and your scraper is heavily reliant on selectors, it could break. In principle, websites using this method change their HTML structure or CSS selectors randomly and frequently, making the parsing of the data harder, and requiring more maintenance of the bot.
7575

76-
One of the best ways of avoiding the possible breaking of your scraper due to website structure changes is to limit your reliance on data from HTML elements as much as possible (see [API Scraping]({{@link api_scraping.md}}) and [JavaScript objects within HTML]({{@link tutorials/js_in_html.md}}))
76+
One of the best ways of avoiding the possible breaking of your scraper due to website structure changes is to limit your reliance on data from HTML elements as much as possible (see [API Scraping]({{@link api_scraping.md}}) and [JavaScript objects within HTML]({{@link js_in_html.md}}))
7777

7878
### IP session consistency
7979

content/academy/tutorials/dealing_with_dynamic_pages.md renamed to content/academy/dealing_with_dynamic_pages.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
---
22
title: Dealing with dynamic pages
33
description: Learn about dynamic pages and dynamic content. How can we find out if a page is dynamic? How do we programmatically scrape dynamic content?
4-
menuWeight: 1
4+
menuWeight: 13
5+
category: tutorials
56
paths:
6-
- tutorials/dealing-with-dynamic-pages
7+
- dealing-with-dynamic-pages
78
---
89

910
# [](#dealing-with-dynamic-pages) Dealing with dynamic pages
@@ -16,7 +17,7 @@ In this lesson, we'll be discussing dynamic content and how to scrape it while u
1617

1718
From our adored and beloved [Fakestore](https://demo-webstore.apify.org/), we have been tasked to scrape each product's title, price, and image from the [new arrivals](https://demo-webstore.apify.org/search/new-arrivals) page. Easy enough! We did something very similar in the previous modules.
1819

19-
![New arrival products in Fakestore]({{@asset tutorials/images/new-arrivals.webp}})
20+
![New arrival products in Fakestore]({{@asset images/new-arrivals.webp}})
2021

2122
First, create a file called **dynamic.js** and copy-paste the following boiler plate code into it:
2223

@@ -78,7 +79,7 @@ await crawler.run([{ url: 'https://demo-webstore.apify.org/search/new-arrivals'
7879
7980
After running it, you might say, "Great! It works!" **But wait...** What are those results being logged to console?
8081

81-
![Bad results in console]({{@asset tutorials/images/bad-results.webp}})
82+
![Bad results in console]({{@asset images/bad-results.webp}})
8283

8384
Every single image seems to have the same exact "URL," but they are most definitely not the image URLs we are looking for. This is strange, because in the browser, we were getting URLs that looked like this:
8485

@@ -133,7 +134,7 @@ await crawler.run([{ url: 'https://demo-webstore.apify.org/search/new-arrivals'
133134

134135
After running this one, we can see that our results look different from before. We're getting the image links!
135136

136-
![Not perfect results]({{@asset tutorials/images/almost-there.webp}})
137+
![Not perfect results]({{@asset images/almost-there.webp}})
137138

138139
Well... Not quite. It seems that the only images which we got the full links to were the ones that were being displayed within the view of the browser. This means that the images are lazy-loaded. **Lazy-loading** is a common technique used across the web to improve performance. Lazy-loaded items allow the user to load content incrementally, as they perform some action. In most cases, including our current one, this action is scrolling.
139140

0 commit comments

Comments
 (0)