Skip to content

Commit 59b21cc

Browse files
Apify (API Key) - update scrape-single-url (#18210)
* update scrape-single-url * pnpm-lock.yaml --------- Co-authored-by: Lucas Caresia <[email protected]>
1 parent e878c02 commit 59b21cc

File tree

3 files changed

+101
-40
lines changed

3 files changed

+101
-40
lines changed
Lines changed: 6 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,25 @@
11
import apify from "../../apify.app.mjs";
2-
import { ACTOR_ID } from "../../common/constants.mjs";
2+
import { gotScraping } from "got-scraping";
33

44
export default {
55
key: "apify-scrape-single-url",
66
name: "Scrape Single URL",
7-
description: "Executes a scraper on a specific website and returns its content as text. This action is perfect for extracting content from a single page.",
8-
version: "0.0.4",
7+
description: "Executes a scraper on a specific website and returns its content as HTML. This action is perfect for extracting content from a single page. [See the documentation](https://docs.apify.com/sdk/js/docs/examples/crawl-single-url)",
8+
version: "0.1.0",
99
type: "action",
1010
props: {
1111
apify,
1212
url: {
1313
type: "string",
1414
label: "URL",
1515
description: "The URL of the web page to scrape.",
16-
optional: false,
17-
},
18-
crawlerType: {
19-
type: "string",
20-
label: "Crawler Type",
21-
description: "Select the crawling engine:\n- **Headless web browser** - Useful for modern websites with anti-scraping protections and JavaScript rendering. It recognizes common blocking patterns like CAPTCHAs and automatically retries blocked requests through new sessions. However, running web browsers is more expensive as it requires more computing resources and is slower. It is recommended to use at least 8 GB of RAM.\n- **Stealthy web browser** (default) - Another headless web browser with anti-blocking measures enabled. Try this if you encounter bot protection while scraping. For best performance, use with Apify Proxy residential IPs. \n- **Raw HTTP client** - High-performance crawling mode that uses raw HTTP requests to fetch the pages. It is faster and cheaper, but it might not work on all websites.",
22-
options: [
23-
{
24-
label: "Headless browser (stealthy Firefox+Playwright) - Very reliable, best in avoiding blocking, but might be slow",
25-
value: "playwright:firefox",
26-
},
27-
{
28-
label: "Headless browser (Chrome+Playwright) - Reliable, but might be slow",
29-
value: "playwright:chrome",
30-
},
31-
{
32-
label: "Raw HTTP client (Cheerio) - Extremely fast, but cannot handle dynamic content",
33-
value: "cheerio",
34-
},
35-
],
3616
},
3717
},
3818
async run({ $ }) {
39-
const response = await this.apify.runActor({
40-
$,
41-
actorId: ACTOR_ID,
42-
data: {
43-
crawlerType: this.crawlerType,
44-
maxCrawlDepth: 0,
45-
maxCrawlPages: 1,
46-
maxResults: 1,
47-
startUrls: [
48-
{
49-
url: this.url,
50-
},
51-
],
52-
},
19+
const { body } = await gotScraping({
20+
url: this.url,
5321
});
5422
$.export("$summary", `Successfully scraped content from ${this.url}`);
55-
return response;
23+
return body;
5624
},
5725
};

components/apify/package.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@pipedream/apify",
3-
"version": "0.2.2",
3+
"version": "0.3.0",
44
"description": "Pipedream Apify Components",
55
"main": "apify.app.mjs",
66
"keywords": [
@@ -14,6 +14,7 @@
1414
},
1515
"dependencies": {
1616
"@apify/consts": "^2.41.0",
17-
"@pipedream/platform": "^3.0.3"
17+
"@pipedream/platform": "^3.0.3",
18+
"got-scraping": "^4.1.2"
1819
}
1920
}

pnpm-lock.yaml

Lines changed: 92 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)