Skip to content

Commit 42f4d84

Browse files
committed
Add to intro
1 parent 2dd27af commit 42f4d84

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

content/academy/anti_scraping.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,14 @@ One of the most successful and advanced methods is collecting the browser's "fin
8989

9090
> It's important to note that this method also blocks all users that cannot evaluate JavaScript (such as bots sending only static HTTP requests), and combines both of the fundamental methods mentioned earlier.
9191
92+
### Honeypots
93+
94+
The honeypot approach is based on providing links that only bots can see. A typical example is hidden pagination. Usually, the bot needs to go through all the pages in the pagination, so the website's last "fake" page has a hidden link for the user, but has the same selector as the real one. Once the bot visits the link, it is automatically blacklisted. This method needs only the HTTP information.
95+
96+
### IP-session consistency
97+
98+
This technique is common for blocking the bot from accessing the website. It works on the principle that every entity that accesses the site gets a token. This token is then saved together with the IP address and HTTP request information such as user-agent and other specific headers. If the entity makes another request, but without the session cookie, the IP address is added on the grey list.
99+
92100
## [](#first) First up
93101

94102
In our [first section]({{@link anti_scraping/techniques.md}}), we'll be discussing more in-depth about the various anti-scraping methods and techniques websites use, as well as how to mitigate these protections.

0 commit comments

Comments
 (0)