Skip to content

Commit 9192c00

Browse files
committed
feat(choosing-the-right-scraper)
1 parent afdfee2 commit 9192c00

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: How to choose the right scraper for the job
3+
description: Understand how to choose the best scraper for your use-case by understanding some basic concepts.
4+
menuWeight: 20
5+
category: tutorials
6+
paths:
7+
- choosing-the-right-scraper
8+
---
9+
10+
# [](#choosing-the-right-scraper) Choosing the right scraper for the job
11+
12+
There are two main ways you can proceed with building your crawler:
13+
14+
1. Using plain HTTP requests.
15+
2. Using an automated browser.
16+
17+
We will briefly go through the pros and cons of both, and also will cover the basic steps on how to determine which one should you go with.
18+
19+
## [](#performance) Performance
20+
21+
First, let's discuss performance. Plain HTTP request-based scraping will **always** be faster than browser-based scraping. When using plain requests, the page's HTML is not rendered, no JavaScript is executed, no images are loaded, etc. Also, there's no memory used by the browser, and there are no CPU-hungry operations.
22+
23+
If it were only a question of performance, you'd of course use request-based scraping every time; however, it's unfortunately not that simple.
24+
25+
## [](#dynamic-pages) Dynamic pages & blocking
26+
27+
Some websites do not load any data without a browser, as they need to execute some scripts to show it (these are known as [dynamic pages]({{@link dealing_with_dynamic_pages.md}})). Another problem is blocking. If the website is collecting a [browser fingerprint]({{@link anti_scraping/techniques/fingerprinting.md}}), it is very easy for it to distinguish between a real user and a bot (crawler) and block access.
28+
29+
## [](#making-the-choice) Making the choice
30+
31+
When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the [Quick Javascript Switcher]({{@link tools/quick_javascript_switcher.md}}) extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser browser. You can then check what data is received in response using [Postman]({{@link tools/postman.md}}) or [Insomnia]({{@link tools/insomnia.md}}) or try to sending a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go.
32+
33+
It also depends of course on whether you need to fill in some data (like a username and password) or select a location (such as entering zip code manually). Tasks where interacting with the page is absolutely necessary cannot be done using plain HTTP scraping, and require headless browsers. In some cases, you might also decide to use a browser-based solution in order to better blend in with the rest of the "regular" traffic coming from real users.
34+

0 commit comments

Comments
 (0)