|
| 1 | +--- |
| 2 | +id: avoid-blocking |
| 3 | +title: Avoid getting blocked |
| 4 | +description: How to avoid getting blocked when scraping |
| 5 | +--- |
| 6 | +import ApiLink from '@site/src/components/ApiLink'; |
| 7 | +import CodeBlock from '@theme/CodeBlock'; |
| 8 | + |
| 9 | +import PlaywrightDefaultFingerprintGenerator from '!!raw-loader!./code/browser_fingerprint/playwright_with_fingerprint_generator.py'; |
| 10 | +import PlaywrightDefaultFingerprintGeneratorWithArgs from '!!raw-loader!./code/browser_fingerprint/default_fingerprint_generator_with_args.py'; |
| 11 | +import PlaywrightWithCamoufox from '!!raw-loader!../examples/code/playwright_crawler_with_camoufox.py'; |
| 12 | + |
| 13 | + |
| 14 | +A scraper might get blocked for numerous reasons. Let's narrow it down to the two main ones. The first is a bad or blocked IP address. You can learn about this topic in the [proxy management guide](./proxy-management). The second reason is [browser fingerprints](https://pixelprivacy.com/resources/browser-fingerprinting/) (or signatures), which we will explore more in this guide. Check the [Apify Academy anti-scraping course](https://docs.apify.com/academy/anti-scraping) to gain a deeper theoretical understanding of blocking and learn a few tips and tricks. |
| 15 | + |
| 16 | +Browser fingerprint is a collection of browser attributes and significant features that can show if our browser is a bot or a real user. Moreover, most browsers have these unique features that allow the website to track the browser even within different IP addresses. This is the main reason why scrapers should change browser fingerprints while doing browser-based scraping. In return, it should significantly reduce the blocking. |
| 17 | + |
| 18 | +## Using browser fingerprints |
| 19 | + |
| 20 | +Changing browser fingerprints can be a tedious job. Luckily, Crawlee provides this feature with minimal configuration necessary - the usage of fingerprints can be enabled in <ApiLink to="class/PlaywrightCrawler">`PlaywrightCrawler`</ApiLink> by using the `fingerprint_generator` argument of the <ApiLink to="class/PlaywrightCrawler#__init__">`PlaywrightCrawler.__init__`</ApiLink>. You can either pass your own implementation of <ApiLink to="class/FingerprintGenerator">`FingerprintGenerator`</ApiLink> or use <ApiLink to="class/BrowserforgeFingerprintGenerator">`DefaultFingerprintGenerator`</ApiLink>. |
| 21 | + |
| 22 | +<CodeBlock className="language-python"> |
| 23 | + {PlaywrightDefaultFingerprintGenerator} |
| 24 | +</CodeBlock> |
| 25 | + |
| 26 | +In certain cases we want to narrow down the fingerprints used - e.g. specify a certain operating system, locale or browser. This is also possible with Crawlee - the crawler can have the generation algorithm customized to reflect the particular browser version and many more. For description of fingerprint generation options please see <ApiLink to="class/HeaderGeneratorOptions">`HeaderGeneratorOptions`</ApiLink>, <ApiLink to="class/ScreenOptions">`ScreenOptions`</ApiLink> and <ApiLink to="class/BrowserforgeFingerprintGenerator#__init__">`DefaultFingerprintGenerator.__init__`</ApiLink> See the example bellow: |
| 27 | + |
| 28 | +<CodeBlock className="language-python"> |
| 29 | + {PlaywrightDefaultFingerprintGeneratorWithArgs} |
| 30 | +</CodeBlock> |
| 31 | + |
| 32 | +If you do not want to use fingerprints, then do not pass `fingerprint_generator` argument to the <ApiLink to="class/PlaywrightCrawler#__init__">`PlaywrightCrawler.__init__`</ApiLink>. By default, fingerprints are disabled. |
| 33 | + |
| 34 | +## Using Camoufox |
| 35 | + |
| 36 | +In some cases even <ApiLink to="class/PlaywrightCrawler">`PlaywrightCrawler`</ApiLink> with fingerprints is not enough. You can try using <ApiLink to="class/PlaywrightCrawler">`PlaywrightCrawler`</ApiLink> together with [Camoufox](https://camoufox.com/). See the example integration below: |
| 37 | + |
| 38 | +<CodeBlock className="language-python"> |
| 39 | + {PlaywrightWithCamoufox} |
| 40 | +</CodeBlock> |
| 41 | + |
| 42 | +**Related links** |
| 43 | + |
| 44 | +- [Fingerprint Suite Docs](https://github.com/apify/fingerprint-suite) |
| 45 | +- [Apify Academy anti-scraping course](https://docs.apify.com/academy/anti-scraping) |
0 commit comments