Skip to content

Commit ffcc878

Browse files
TC-MOdaveomri
authored andcommitted
docs: add LLM scraper doc (apify#1785)
add LLM scraper create screenshots TODOs structure the doc as others add links to other docs
1 parent 2d864fe commit ffcc878

File tree

11 files changed

+202
-27
lines changed

11 files changed

+202
-27
lines changed

sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,11 +143,10 @@ In addition to the standard output fields, Advanced Settings provides:
143143

144144
Looking for more than just AI crawling? You can use other native Make apps powered by Apify:
145145

146-
- [Instagram Data](/platform/integrations/make/instagram)
147146
- [TikTok Data](/platform/integrations/make/tiktok)
148147
- [Google Search](/platform/integrations/make/search)
149148
- [Google Maps Emails Data](/platform/integrations/make/maps)
150149
- [YouTube Data](/platform/integrations/make/youtube)
151150
- [Amazon](/platform/integrations/make/amazon)
152151

153-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
152+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/amazon.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,12 +226,11 @@ For Amazon URLs, you can extract:
226226

227227
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
228228

229-
- [Instagram Data](/platform/integrations/make/instagram)
230229
- [TikTok Data](/platform/integrations/make/tiktok)
231230
- [Google Search](/platform/integrations/make/search)
232231
- [Google Maps Emails Data](/platform/integrations/make/maps)
233232
- [YouTube Data](/platform/integrations/make/youtube)
234233
- [AI crawling](/platform/integrations/make/ai-crawling)
235234

236235

237-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
236+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/facebook.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -237,13 +237,12 @@ You’ll get:
237237

238238
Looking for more than just Facebook? You can use other native Make apps powered by Apify:
239239

240-
- [Instagram Data](/platform/integrations/make/instagram)
241240
- [TikTok Data](/platform/integrations/make/tiktok)
242241
- [Google Search](/platform/integrations/make/search)
243242
- [Google Maps Emails Data](/platform/integrations/make/maps)
244243
- [YouTube Data](/platform/integrations/make/youtube)
245244
- [AI crawling](/platform/integrations/make/ai-crawling)
246245
- [Amazon](/platform/integrations/make/amazon)
247246

248-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
247+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
249248

293 KB
Loading
149 KB
Loading

sources/platform/integrations/workflows-and-notifications/make/instagram.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,4 +188,4 @@ There are other native Make Apps powered by Apify. You can check out Apify Scrap
188188
- [AI crawling](/platform/integrations/make/ai-crawling)
189189
- [Amazon](/platform/integrations/make/amazon)
190190

191-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
191+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
title: Make - LLMs Actor integration
3+
description: Learn about LLM browser modules. Search the web and extract clean Markdown for AI assistants and RAG.
4+
sidebar_label: LLMs
5+
sidebar_position: 7
6+
slug: /integrations/make/llm
7+
toc_max_heading_level: 4
8+
---
9+
10+
## Apify Scraper for LLMs
11+
12+
Apify Scraper for LLMs from [Apify](https://apify.com) is a web browsing module for OpenAI Assistants, RAG pipelines, and AI agents. It can query Google Search, scrape the top results, and return page content as Markdown for downstream AI processing.
13+
14+
To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the Apify Console under **Settings > Integrations**. After connecting, you can automate content extraction and integrate results into your AI workflows.
15+
16+
## Connect Apify Scraper for LLMs
17+
18+
1. Create an account at [Apify](https://console.apify.com/). You can sign up using your email, Gmail, or GitHub account.
19+
20+
![Make interface showing API token field and connection name field for Apify integration setup](images/llm/rag-signup.png)
21+
22+
1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the token, go to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console.
23+
24+
![Apify Console sign-up page with email, Gmail, and GitHub sign-up options](images/Apify_Console_token_for_Make.png)
25+
26+
1. Find your token under **Personal API tokens**. You can also create a new token with custom permissions by clicking **+ Create a new token**.
27+
1. Click the **Copy** icon to copy your API token, then return to your Make scenario.
28+
29+
![Apify Console Settings page showing Personal API tokens section with token management options](images/Apify_token_on_Make.png)
30+
31+
1. In Make, click **Add** to open the **Create a connection** dialog of the chosen Apify Scraper module.
32+
1. In the **API token** field, paste your token, provide a clear **Connection name**, and click **Save**.
33+
34+
![Make connection dialog with completed API token and connection name fields for Apify Scraper module](images/llm/apify-token-for-module-on-make.png)
35+
36+
Once connected, you can build workflows that search the web, extract content, and pass it to your AI applications.
37+
38+
## Apify Scraper for LLMs modules
39+
40+
After connecting the app, you can use two modules to search and extract content.
41+
42+
### Standard Settings module
43+
44+
Use Standard Settings to quickly search the web and extract content with optimized defaults. This is ideal for AI agents that need to answer questions or gather information from multiple sources.
45+
46+
The module supports two modes:
47+
48+
- _Search mode_ (keywords)
49+
- Queries Google Search with your keywords (supports advanced operators)
50+
- Retrieves the top N organic results
51+
- Loads each result and extracts the main content
52+
- Returns Markdown-formatted content
53+
54+
- _Direct URL mode_ (URL)
55+
- Navigates to a specific URL
56+
- Extracts page content
57+
- Skips Google Search
58+
59+
#### How it works
60+
61+
When you provide keywords, the module runs Google Search, parses the results, and collects organic URLs. For content extraction, it loads pages, waits for dynamic content to render, removes clutter, extracts the main content, and converts it to Markdown. Finally, it generates output by combining content, adding metadata and sources, and formatting everything for AI consumption.
62+
63+
#### Output data
64+
65+
```json title="Standard Settings output (shortened)"
66+
{
67+
"query": "web browser for RAG pipelines -site:reddit.com",
68+
"crawl": {
69+
"httpStatusCode": 200,
70+
"httpStatusMessage": "OK",
71+
"loadedAt": "2025-06-30T10:15:23.456Z",
72+
"uniqueKey": "https://example.com/article",
73+
"requestStatus": "handled"
74+
},
75+
"searchResult": {
76+
"title": "Building RAG Pipelines with Web Browsers",
77+
"description": "Integrate web browsing into your RAG pipeline for real-time retrieval.",
78+
"url": "https://example.com/article",
79+
"resultType": "organic",
80+
"rank": 1
81+
},
82+
"metadata": {
83+
"title": "Building RAG Pipelines with Web Browsers",
84+
"description": "Add web browsing to RAG systems",
85+
"languageCode": "en",
86+
"url": "https://example.com/article"
87+
},
88+
"markdown": "# Building RAG Pipelines with Web Browsers\n\n..."
89+
}
90+
```
91+
92+
#### Configuration (Standard Settings)
93+
94+
- _Search query_: Google Search keywords or a direct URL
95+
- _Maximum results_: Number of top search results to process (default: 3)
96+
- _Output formats_: Markdown, text, or HTML
97+
- _Remove cookie warnings_: Dismiss cookie consent dialogs
98+
- _Debug mode_: Enable extraction diagnostics
99+
100+
### Advanced Settings module
101+
102+
Advanced Settings give you full control over search and extraction. Use it for complex sites or production RAG pipelines.
103+
104+
#### Key features
105+
106+
- _Advanced search options_: full Google operator support
107+
- _Flexible crawling tools_: browser-based (Playwright) or HTTP-based (Cheerio)
108+
- _Proxy configuration_: handle geo-restrictions and rate limits
109+
- _Granular content control_: include, remove, and click selectors
110+
- _Dynamic content handling_: wait strategies for JavaScript rendering
111+
- _Multiple output formats_: Markdown, HTML, or text
112+
- _Request management_: timeouts, retries, and concurrency
113+
114+
#### Configuration options
115+
116+
- _Search_: query, max results (1–100), SERP proxy group, SERP retries
117+
- _Scraping_: tool (browser-playwright, raw-http), HTML transformer, selectors (remove/keep/click), expand clickable elements
118+
- _Requests_: timeouts, retries, dynamic content wait
119+
- _Proxy_: use Apify Proxy, proxy groups, countries
120+
- _Output_: formats, save HTML/Markdown, debug mode, save screenshots
121+
122+
#### Output data
123+
124+
```json title="Advanced Settings output (shortened)"
125+
{
126+
"query": "advanced RAG implementation strategies",
127+
"crawl": {
128+
"httpStatusCode": 200,
129+
"httpStatusMessage": "OK",
130+
"loadedUrl": "https://ai-research.com/rag-strategies",
131+
"loadedTime": "2025-06-30T10:45:12.789Z",
132+
"referrerUrl": "https://www.google.com/search?q=advanced+RAG+implementation+strategies",
133+
"uniqueKey": "https://ai-research.com/rag-strategies",
134+
"requestStatus": "handled",
135+
"depth": 0
136+
},
137+
"searchResult": {
138+
"title": "Advanced RAG Implementation: A Complete Guide",
139+
"description": "Cutting-edge strategies for RAG systems.",
140+
"url": "https://ai-research.com/rag-strategies",
141+
"resultType": "organic",
142+
"rank": 1
143+
},
144+
"metadata": {
145+
"canonicalUrl": "https://ai-research.com/rag-strategies",
146+
"title": "Advanced RAG Implementation: A Complete Guide | AI Research",
147+
"description": "Vector DBs, chunking, and optimization techniques.",
148+
"languageCode": "en"
149+
},
150+
"markdown": "# Advanced RAG Implementation: A Complete Guide\n\n...",
151+
"debug": {
152+
"extractorUsed": "readableText",
153+
"elementsRemoved": 47,
154+
"elementsClicked": 3
155+
}
156+
}
157+
```
158+
159+
### Use cases
160+
161+
- Quick information retrieval for AI assistants
162+
- General web search integration and Q&A
163+
- Production RAG pipelines that need reliability
164+
- Extracting content from JavaScript-heavy sites
165+
- Building specialized knowledge bases and research workflows
166+
167+
### Best practices
168+
169+
To get the best search results, use specific keywords and operators, and exclude unwanted domains with `-site:`. For better performance, use HTTP mode for static sites and only switch to browser mode when necessary. You can also tune concurrency settings based on your needs. To maintain content quality, remove non-content elements, choose the right HTML transformer, and enable debug mode when troubleshooting. Finally, ensure reliable operation by setting appropriate timeouts and retries, and monitoring HTTP status codes for errors.
170+
171+
## Other scrapers available
172+
173+
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
174+
175+
- [TikTok Data](/platform/integrations/make/tiktok)
176+
- [Google Search](/platform/integrations/make/search)
177+
- [Google Maps Emails Data](/platform/integrations/make/maps)
178+
- [YouTube Data](/platform/integrations/make/youtube)
179+
- [AI crawling](/platform/integrations/make/ai-crawling)
180+
- [Amazon](/platform/integrations/make/amazon)
181+
182+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/maps.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Categories can be general (e.g., "restaurant") which includes all variations lik
8282
"rating": 4.6,
8383
"reviewsCount": 182,
8484
"featuredInLists": ["Best Chinese Food", "Top Rated Restaurants"],
85-
85+
8686
// Complete address information for targeted outreach
8787
"address": "175 Main St, Staten Island, NY 10307",
8888
"neighborhood": "Tottenville",
@@ -92,25 +92,25 @@ Categories can be general (e.g., "restaurant") which includes all variations lik
9292
"state": "New York",
9393
"countryCode": "US",
9494
"plusCode": "GQ62+8M Staten Island, New York",
95-
95+
9696
// Multiple contact channels
9797
"website": "http://kimsislandsi.com/",
9898
"phone": "(718) 356-5168",
9999
"phoneUnformatted": "+17183565168",
100100
"email": "[email protected]", // From website enrichment
101-
101+
102102
// Business qualification data
103103
"yearsInBusiness": 12,
104104
"claimThisBusiness": false, // Verified listing
105105
"popular": true,
106106
"temporarilyClosed": false,
107-
107+
108108
// Precise location for territory planning
109109
"location": {
110110
"lat": 40.5107736,
111111
"lng": -74.2482624
112112
},
113-
113+
114114
// Operational insights for scheduling outreach
115115
"openingHours": {
116116
"Monday": "11:00 AM - 10:00 PM",
@@ -185,7 +185,7 @@ This module provides the most flexible options for defining where and how to sea
185185
"title": "Bluestone Lane Chelsea Piers Café",
186186
"price": "$20–30",
187187
"categoryName": "Coffee shop",
188-
188+
189189
// Address and location data
190190
"address": "62 Chelsea Piers Pier 62, New York, NY 10011",
191191
"neighborhood": "Manhattan",
@@ -199,17 +199,17 @@ This module provides the most flexible options for defining where and how to sea
199199
"lng": -74.0087457
200200
},
201201
"plusCode": "GQ62+8M Staten Island, New York",
202-
202+
203203
// Contact information
204204
"website": "https://bluestonelane.com/?y_source=1_MjMwNjk1NDAtNzE1LWxvY2F0aW9uLndlYnNpdGU%3D",
205205
"phone": "(718) 374-6858",
206206
"phoneUnformatted": "+17183746858",
207-
207+
208208
// Rating and reviews
209209
"totalScore": 4.3,
210210
"reviewsCount": 425,
211211
"imagesCount": 659,
212-
212+
213213
// Business identifiers
214214
"claimThisBusiness": false,
215215
"permanentlyClosed": false,
@@ -218,7 +218,7 @@ This module provides the most flexible options for defining where and how to sea
218218
"categories": ["Coffee shop", "Cafe"],
219219
"fid": "0x89c25957cf20350d:0xc0d1df36ed3dc4b6",
220220
"cid": "13894131752416167094",
221-
221+
222222
// Operating hours
223223
"openingHours": [
224224
{"day": "Monday", "hours": "7 AM to 6 PM"},
@@ -229,7 +229,7 @@ This module provides the most flexible options for defining where and how to sea
229229
{"day": "Saturday", "hours": "7 AM to 6 PM"},
230230
{"day": "Sunday", "hours": "7 AM to 6 PM"}
231231
],
232-
232+
233233
// Business attributes and amenities
234234
"additionalInfo": {
235235
"Service options": [
@@ -305,7 +305,7 @@ This module provides the most flexible options for defining where and how to sea
305305
{"High chairs": true}
306306
]
307307
},
308-
308+
309309
// Image and metadata
310310
"imageUrl": "https://lh3.googleusercontent.com/p/AF1QipMl6-SnuqYEeE3mD54M0q5D5nysRUZQj1BB0g8=w408-h272-k-no",
311311
"kgmid": "/g/11ph8zh6sg",
@@ -352,11 +352,10 @@ This module provides the most flexible options for defining where and how to sea
352352

353353
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
354354

355-
- [Instagram Data](/platform/integrations/make/instagram)
356355
- [TikTok](/platform/integrations/make/tiktok)
357356
- [Google Search](/platform/integrations/make/search)
358357
- [YouTube Data](/platform/integrations/make/youtube)
359358
- [AI crawling](/platform/integrations/make/ai-crawling)
360359
- [Amazon](/platform/integrations/make/amazon)
361360

362-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
361+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/search.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,11 +115,10 @@ The scraper exports data in various formats including JSON, CSV, Excel, and XML,
115115

116116
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
117117

118-
- [Instagram Data](/platform/integrations/make/instagram)
119118
- [TikTok Data](/platform/integrations/make/tiktok)
120119
- [Google Maps Emails Data](/platform/integrations/make/maps)
121120
- [YouTube Data](/platform/integrations/make/youtube)
122121
- [AI crawling](/platform/integrations/make/ai-crawling)
123122
- [Amazon Data](/platform/integrations/make/amazon)
124123

125-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
124+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/tiktok.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,11 +164,10 @@ For each TikTok hashtag, you will extract:
164164

165165
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
166166

167-
- [Instagram Data](/platform/integrations/make/instagram)
168167
- [Google Search](/platform/integrations/make/search)
169168
- [Google Maps Emails Data](/platform/integrations/make/maps)
170169
- [YouTube Data](/platform/integrations/make/youtube)
171170
- [AI crawling](/platform/integrations/make/ai-crawling)
172171
- [Amazon](/platform/integrations/make/amazon)
173172

174-
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
173+
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

0 commit comments

Comments
 (0)