Capture screenshots of websites as a (host it yourself) API. This project is a wrapper around this library: https://github.com/sindresorhus/capture-website
- Pull the image:
docker pull robvanderleek/capture-website-api
- Start the container:
docker run -it -p 8080:8080 robvanderleek/capture-website-api
- Make screenshot test request:
curl 'localhost:8080/capture?url=https://news.ycombinator.com/' -o screenshot.png
- Clone the repo:
git clone git@github.com:robvanderleek/capture-website-api.git- Go to the
standalonedirectory:
cd capture-website-api/standalone- Build the image:
docker build -t cwa .- Start the container:
docker run -it -p 8080:8080 cwa- Do screenshot test request:
curl 'localhost:8080/capture?url=https://www.youtube.com' -o screenshot.pngRun in a terminal:
- Clone the repo:
git clone git@github.com:robvanderleek/capture-website-api.git- Go to the
standalonedirectory:
cd capture-website-api/standalone- Install dependencies:
npm install- Start the server:
npm start- Do screenshot test request:
curl 'localhost:8080/capture?url=https://www.reddit.com' -o screenshot.pngDeploy and run on Vercel:
- Clone the repo:
git clone git@github.com:robvanderleek/capture-website-api.git && cd capture-website-api/serverless
- Deploy to Vercel:
vercel deploy- Get site URL:
vercel ls - Make screenshot test request:
curl "${SITE_URL}/api/capture?url=https://www.linkedin.com" -o screenshot.png
Call the /capture endpoint and pass the site URL using the query parameters url:
curl 'https://capture-website-api.vercel.app/api/capture?url=http://gmail.com' -o screenshot.pngSimple as that.
Application configuration options can be set as environment veriables or in
a .env file in the root folder. There's an example .env file in the codebase: .env.example
Supported options are:
| Name | Descrition | Default |
|---|---|---|
| TIMEOUT | Timeout in seconds for loading a web page | 20 |
| CONCURRENCY | Number of captures that run in parallel, more memory allows more captures to run in parallel | 2 |
| MAX_QUEUE_LENGTH | Requests that can't be handled directly are queued until the queue is full | 6 |
| SHOW_RESULTS | Enable web endpoint to show latest capture | false |
| SECRET | Secret string to prevent undesired usage on public endpoints | "" |
Most of the configuration options from the wrapped capture-website library are supported using query parameters.
For example, to capture a site with a 650x350 viewport, no default background and animations disabled use:
curl 'https://capture-website-api.vercel.app/api/capture?url=http://amazon.com&width=650&height=350&scaleFactor=1&defaultBackground=false&disableAnimations=true&wait_before_screenshot_ms=300' -o screenshot.png
See https://github.com/sindresorhus/capture-website for a full list of options.
You may require to wait for async requests or animations to finish before capturing the screenshot. There are two ways of doing this, both specified in the query parameters:
wait_before_screenshot_ms(in ms, defaults to300) will wait before capturing a screenshot.- For standalone:
capture-websitelibrary'sdelay(in seconds)
Sometimes the capture-website library has problems capturing sites. You can
try to capture these sites with plain Puppeteer by supplying the query
parameter plainPuppeteer=true
This app looks at two environment variables:
SHOW_RESULTS: iftruethe latest capture result can be viewed in the browser by browsing the base urlSECRET: when set all capture requests need to contain a query parametersecretwhose value matches the value of this environment variable
To run the serverless version locally, execute vercel dev in the root folder
of the repository.
If you have suggestions for improvements, or want to report a bug, open an issue!
ISC © 2019 Rob van der Leek robvanderleek@gmail.com (https://twitter.com/robvanderleek)

