Skip to content

Commit 80f1cf4

Browse files
updated readme for v2
1 parent 1597eb9 commit 80f1cf4

File tree

1 file changed

+9
-17
lines changed

1 file changed

+9
-17
lines changed

README.md

Lines changed: 9 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,30 @@
11
## Website Checker
22

3-
Website checker is a simple actor that allows you to scan any website for performance and blocking.
3+
Website checker is a simple actor that allows you to scan any website for performance and blocking using various scraping methods as Cheerio, Puppeteer and Playwright.
44

55
### Features
66

77
The actor provides these useful features out of the box:
88

99
- Collects response status codes
1010
- Recognizes the most common captchas
11-
- Saves HTML snapshots and screenshots (if Puppeteer is chosen)
12-
- Enables choosing between Cheerio (plain HTTP) and Puppeteer (browser) scraper
11+
- Saves HTML snapshots and screenshots (if Puppeteer or Playwright is chosen)
12+
- Enables choosing between Cheerio (plain HTTP) and Puppeteer/Playwright (browser) scraper
13+
- Enables choosing different browsers for Playwright - Chrome, Firefox and Webkit (Safari)
1314
- Enables re-scraping start URLs or enqueueing with a familiar link selector + pseudo URLs system
1415
- Handles different failure states like timeouts and network errors
1516
- Enables basic proxy and browser configuration
1617

17-
#### Planned features
18-
19-
- Usage calculation/stats
20-
- Better automatic workloads/workload actors
21-
- Add support for Playwright + Firefox
22-
2318
### How to use
2419

25-
The most common use-case is to do a quick check on how aggressively the target site is blocking. In that case just supply a start URL, ideally a category one or product one. You can either set `replicateStartUrls` or add enqueueing with `linkSelector` + `pseudoUrls`, both are good options to test different proxies. You can test a few different proxy groups and compare `cheerio` vs `puppeteer` options.
20+
The most common use-case is to do a quick check on how aggressively the target site is blocking. In that case just supply a start URL, ideally a category one or product one. You can either set `replicateStartUrls` or add enqueueing with `linkSelector` + `pseudoUrls`, both are good options to test different proxies.
2621

27-
In the end you will get a simple statistics about the blocking rate. It is recommended to check a few screenshots just to make sure the actor correctly recognized the page status. You can get to the detailed output (per URL) via KV store or dataset (the KV output sorts by response status while dataset is simply ordered by scraping order).
22+
You can pick any combination of run options and the checker will spawn runner actor for every combination of scraping tool & proxies and then combine the results into single output.
2823

29-
#### Checker workloads
30-
31-
To make your life easier, you can use other actors that will start more checker runs at once and aggregate the result. This way you can test more sites at once or different cheerio/browser and proxy combinations and compare those.
24+
In the end you will get a simple statistics about the blocking rate. It is recommended to check a few screenshots just to make sure the actor correctly recognized the page status. You can get to the detailed output (per URL) via KV store or dataset (the KV output sorts by response status while dataset is simply ordered by scraping order).
3225

33-
All of these actors are very young so we are glad for any feature ideas:
34-
[lukaskrivka/website-checker-workload](https://apify.com/lukaskrivka/website-checker-workload)
35-
[vaclavrut/website-checker-starter](https://apify.com/vaclavrut/website-checker-starter)
26+
#### Multiple URLs and configurations
27+
Website checker doesn't have any limitation of how many websites and configs you can check. For each website, it will run each config. You just need to set a reasonable `maxConcurrentDomainsChecked` so that all parallel runs fit into your total memory (4 GB for Cheerio and 8 GB for Puppeteer/Playwright checks).
3628

3729
### Input
3830

0 commit comments

Comments
 (0)