Skip to content

Commit 1d80581

Browse files
committed
docs: add LLM scraper doc
add LLM scraper create screenshots TODOs structure the doc as others add links to other docs
1 parent a9fb17c commit 1d80581

File tree

1 file changed

+188
-0
lines changed
  • sources/platform/integrations/workflows-and-notifications/make

1 file changed

+188
-0
lines changed
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
title: Make - LLMs Actor integration
3+
description: Learn about LLM browser modules. Search the web and extract clean Markdown for AI assistants and RAG.
4+
sidebar_label: LLMs
5+
sidebar_position: 7
6+
slug: /integrations/make/llm
7+
toc_max_heading_level: 4
8+
---
9+
10+
## Apify Scraper for LLMs
11+
12+
Apify Scraper for LLMs from [Apify](https://apify.com) is a web browsing module for OpenAI Assistants, RAG pipelines, and AI agents. It can query Google Search, scrape the top results, and return page content as Markdown for downstream AI processing.
13+
14+
To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the Apify Console under **Settings > Integrations**. After connecting, you can automate content extraction and integrate results into your AI workflows.
15+
16+
## Connect Apify Scraper for LLMs
17+
18+
1. Create an account at [Apify](https://console.apify.com/). You can sign up using your email, Gmail, or GitHub account.
19+
20+
<!-- TODO: Add signup screenshot. Suggested path: images/llm/signup.png -->
21+
22+
1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the token, go to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console.
23+
24+
<!-- TODO: Add console token screenshot. You can reuse: images/Apify_Console_token_for_Make.png -->
25+
26+
1. Find your token under **Personal API tokens**. You can also create a new token with custom permissions by clicking **+ Create a new token**.
27+
1. Click the **Copy** icon to copy your API token, then return to your Make scenario.
28+
29+
<!-- TODO: Add Make token connection screenshot. You can reuse: images/Apify_token_on_Make.png -->
30+
31+
1. In Make, click **Add** to open the **Create a connection** dialog of the chosen Apify Scraper module.
32+
1. In the **API token** field, paste your token, provide a clear **Connection name**, and click **Save**.
33+
34+
<!-- TODO: Add Make connection dialog screenshot. Suggested path: images/llm/make-connection.png -->
35+
36+
Once connected, you can build workflows that search the web, extract content, and pass it to your AI applications.
37+
38+
## Apify Scraper for LLMs modules
39+
40+
After connecting the app, you can use two modules to search and extract content.
41+
42+
### Standard Settings module
43+
44+
Use Standard Settings to quickly search the web and extract content with optimized defaults. This is ideal for AI agents that need to answer questions or gather information from multiple sources.
45+
46+
The module supports two modes:
47+
48+
- **Search mode** (keywords)
49+
- Queries Google Search with your keywords (supports advanced operators)
50+
- Retrieves the top N organic results
51+
- Loads each result and extracts the main content
52+
- Returns Markdown-formatted content
53+
54+
- **Direct URL mode** (URL)
55+
- Navigates to a specific URL
56+
- Extracts page content
57+
- Skips Google Search
58+
59+
#### Processing steps
60+
61+
1. Search (if keywords provided): run Google Search, parse results, collect organic URLs
62+
2. Content extraction: load pages, wait for dynamic content, remove clutter, extract main content, convert to Markdown
63+
3. Output generation: combine content, add metadata and sources, format for AI consumption
64+
65+
#### Output data
66+
67+
```json title="Standard Settings output (shortened)"
68+
{
69+
"query": "web browser for RAG pipelines -site:reddit.com",
70+
"crawl": {
71+
"httpStatusCode": 200,
72+
"httpStatusMessage": "OK",
73+
"loadedAt": "2025-06-30T10:15:23.456Z",
74+
"uniqueKey": "https://example.com/article",
75+
"requestStatus": "handled"
76+
},
77+
"searchResult": {
78+
"title": "Building RAG Pipelines with Web Browsers",
79+
"description": "Integrate web browsing into your RAG pipeline for real-time retrieval.",
80+
"url": "https://example.com/article",
81+
"resultType": "organic",
82+
"rank": 1
83+
},
84+
"metadata": {
85+
"title": "Building RAG Pipelines with Web Browsers",
86+
"description": "Add web browsing to RAG systems",
87+
"languageCode": "en",
88+
"url": "https://example.com/article"
89+
},
90+
"markdown": "# Building RAG Pipelines with Web Browsers\n\n..."
91+
}
92+
```
93+
94+
#### Configuration (Standard Settings)
95+
96+
- **Search query**: Google Search keywords or a direct URL
97+
- **Maximum results**: Number of top search results to process (default: 3)
98+
- **Output formats**: Markdown, text, or HTML
99+
- **Remove cookie warnings**: Dismiss cookie consent dialogs
100+
- **Debug mode**: Enable extraction diagnostics
101+
102+
### Advanced Settings module
103+
104+
Advanced Settings gives you full control over search and extraction. Use it for complex sites or production RAG pipelines.
105+
106+
#### Key features
107+
108+
- Advanced search options: full Google operator support
109+
- Flexible crawling tools: browser-based (Playwright) or HTTP-based (Cheerio)
110+
- Proxy configuration: handle geo-restrictions and rate limits
111+
- Granular content control: include, remove, and click selectors
112+
- Dynamic content handling: wait strategies for JavaScript rendering
113+
- Multiple output formats: Markdown, HTML, or text
114+
- Request management: timeouts, retries, and concurrency
115+
116+
#### Configuration options
117+
118+
- Search: query, max results (1–100), SERP proxy group, SERP retries
119+
- Scraping: tool (browser-playwright, raw-http), HTML transformer, selectors (remove/keep/click), expand clickable elements
120+
- Requests: timeouts, retries, dynamic content wait
121+
- Proxy: use Apify Proxy, proxy groups, countries
122+
- Output: formats, save HTML/Markdown, debug mode, save screenshots
123+
124+
#### Output data
125+
126+
```json title="Advanced Settings output (shortened)"
127+
{
128+
"query": "advanced RAG implementation strategies",
129+
"crawl": {
130+
"httpStatusCode": 200,
131+
"httpStatusMessage": "OK",
132+
"loadedUrl": "https://ai-research.com/rag-strategies",
133+
"loadedTime": "2025-06-30T10:45:12.789Z",
134+
"referrerUrl": "https://www.google.com/search?q=advanced+RAG+implementation+strategies",
135+
"uniqueKey": "https://ai-research.com/rag-strategies",
136+
"requestStatus": "handled",
137+
"depth": 0
138+
},
139+
"searchResult": {
140+
"title": "Advanced RAG Implementation: A Complete Guide",
141+
"description": "Cutting-edge strategies for RAG systems.",
142+
"url": "https://ai-research.com/rag-strategies",
143+
"resultType": "organic",
144+
"rank": 1
145+
},
146+
"metadata": {
147+
"canonicalUrl": "https://ai-research.com/rag-strategies",
148+
"title": "Advanced RAG Implementation: A Complete Guide | AI Research",
149+
"description": "Vector DBs, chunking, and optimization techniques.",
150+
"languageCode": "en"
151+
},
152+
"markdown": "# Advanced RAG Implementation: A Complete Guide\n\n...",
153+
"debug": {
154+
"extractorUsed": "readableText",
155+
"elementsRemoved": 47,
156+
"elementsClicked": 3
157+
}
158+
}
159+
```
160+
161+
### Use cases
162+
163+
- Quick information retrieval for AI assistants
164+
- General web search integration and Q&A
165+
- Production RAG pipelines that need reliability
166+
- Extracting content from JavaScript-heavy sites
167+
- Building specialized knowledge bases and research workflows
168+
169+
### Best practices
170+
171+
1. Search query optimization: use specific keywords and operators; exclude unwanted domains with `-site:`
172+
2. Performance: use HTTP mode for static sites; use browser mode only when necessary; tune concurrency
173+
3. Content quality: remove non-content elements; select the right HTML transformer; enable debug when needed
174+
4. Error handling: set timeouts and retries; monitor HTTP status codes
175+
176+
## Other scrapers available
177+
178+
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
179+
180+
- [Instagram Data](/platform/integrations/make/instagram)
181+
- [TikTok Data](/platform/integrations/make/tiktok)
182+
- [Google Search](/platform/integrations/make/search)
183+
- [Google Maps Emails Data](/platform/integrations/make/maps)
184+
- [YouTube Data](/platform/integrations/make/youtube)
185+
- [AI crawling](/platform/integrations/make/ai-crawling)
186+
- [Amazon](/platform/integrations/make/amazon)
187+
188+
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

0 commit comments

Comments
 (0)