docs: add LLM scraper doc (apify#1785)

TC-MO · daveomri · commit ffcc878c71de · 2025-09-03T10:28:47.000+02:00
add LLM scraper
create screenshots TODOs
structure the doc as others
add links to other docs
diff --git a/sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md b/sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md
@@ -143,11 +143,10 @@ In addition to the standard output fields, Advanced Settings provides:
 
 Looking for more than just AI crawling? You can use other native Make apps powered by Apify:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [TikTok Data](/platform/integrations/make/tiktok)
 - [Google Search](/platform/integrations/make/search)
 - [Google Maps Emails Data](/platform/integrations/make/maps)
 - [YouTube Data](/platform/integrations/make/youtube)
 - [Amazon](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/amazon.md b/sources/platform/integrations/workflows-and-notifications/make/amazon.md
@@ -226,12 +226,11 @@ For Amazon URLs, you can extract:
 
 There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [TikTok Data](/platform/integrations/make/tiktok)
 - [Google Search](/platform/integrations/make/search)
 - [Google Maps Emails Data](/platform/integrations/make/maps)
 - [YouTube Data](/platform/integrations/make/youtube)
 - [AI crawling](/platform/integrations/make/ai-crawling)
 
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/facebook.md b/sources/platform/integrations/workflows-and-notifications/make/facebook.md
@@ -237,13 +237,12 @@ You’ll get:
 
 Looking for more than just Facebook? You can use other native Make apps powered by Apify:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [TikTok Data](/platform/integrations/make/tiktok)
 - [Google Search](/platform/integrations/make/search)
 - [Google Maps Emails Data](/platform/integrations/make/maps)
 - [YouTube Data](/platform/integrations/make/youtube)
 - [AI crawling](/platform/integrations/make/ai-crawling)
 - [Amazon](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
 
diff --git a/sources/platform/integrations/workflows-and-notifications/make/images/llm/apify-token-for-module-on-make.png b/sources/platform/integrations/workflows-and-notifications/make/images/llm/apify-token-for-module-on-make.png
diff --git a/sources/platform/integrations/workflows-and-notifications/make/images/llm/rag-signup.png b/sources/platform/integrations/workflows-and-notifications/make/images/llm/rag-signup.png
diff --git a/sources/platform/integrations/workflows-and-notifications/make/instagram.md b/sources/platform/integrations/workflows-and-notifications/make/instagram.md
@@ -188,4 +188,4 @@ There are other native Make Apps powered by Apify. You can check out Apify Scrap
 - [AI crawling](/platform/integrations/make/ai-crawling)
 - [Amazon](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/llm.md b/sources/platform/integrations/workflows-and-notifications/make/llm.md
@@ -0,0 +1,182 @@
+---
+title: Make - LLMs Actor integration
+description: Learn about LLM browser modules. Search the web and extract clean Markdown for AI assistants and RAG.
+sidebar_label: LLMs
+sidebar_position: 7
+slug: /integrations/make/llm
+toc_max_heading_level: 4
+---
+
+## Apify Scraper for LLMs
+
+Apify Scraper for LLMs from [Apify](https://apify.com) is a web browsing module for OpenAI Assistants, RAG pipelines, and AI agents. It can query Google Search, scrape the top results, and return page content as Markdown for downstream AI processing.
+
+To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the Apify Console under **Settings > Integrations**. After connecting, you can automate content extraction and integrate results into your AI workflows.
+
+## Connect Apify Scraper for LLMs
+
+1. Create an account at [Apify](https://console.apify.com/). You can sign up using your email, Gmail, or GitHub account.
+
+    ![Make interface showing API token field and connection name field for Apify integration setup](images/llm/rag-signup.png)
+
+1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the token, go to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console.
+
+    ![Apify Console sign-up page with email, Gmail, and GitHub sign-up options](images/Apify_Console_token_for_Make.png)
+
+1. Find your token under **Personal API tokens**. You can also create a new token with custom permissions by clicking **+ Create a new token**.
+1. Click the **Copy** icon to copy your API token, then return to your Make scenario.
+
+    ![Apify Console Settings page showing Personal API tokens section with token management options](images/Apify_token_on_Make.png)
+
+1. In Make, click **Add** to open the **Create a connection** dialog of the chosen Apify Scraper module.
+1. In the **API token** field, paste your token, provide a clear **Connection name**, and click **Save**.
+
+    ![Make connection dialog with completed API token and connection name fields for Apify Scraper module](images/llm/apify-token-for-module-on-make.png)
+
+Once connected, you can build workflows that search the web, extract content, and pass it to your AI applications.
+
+## Apify Scraper for LLMs modules
+
+After connecting the app, you can use two modules to search and extract content.
+
+### Standard Settings module
+
+Use Standard Settings to quickly search the web and extract content with optimized defaults. This is ideal for AI agents that need to answer questions or gather information from multiple sources.
+
+The module supports two modes:
+
+- _Search mode_ (keywords)
+  - Queries Google Search with your keywords (supports advanced operators)
+  - Retrieves the top N organic results
+  - Loads each result and extracts the main content
+  - Returns Markdown-formatted content
+
+- _Direct URL mode_ (URL)
+  - Navigates to a specific URL
+  - Extracts page content
+  - Skips Google Search
+
+#### How it works
+
+When you provide keywords, the module runs Google Search, parses the results, and collects organic URLs. For content extraction, it loads pages, waits for dynamic content to render, removes clutter, extracts the main content, and converts it to Markdown. Finally, it generates output by combining content, adding metadata and sources, and formatting everything for AI consumption.
+
+#### Output data
+
+```json title="Standard Settings output (shortened)"
+{
+  "query": "web browser for RAG pipelines -site:reddit.com",
+  "crawl": {
+    "httpStatusCode": 200,
+    "httpStatusMessage": "OK",
+    "loadedAt": "2025-06-30T10:15:23.456Z",
+    "uniqueKey": "https://example.com/article",
+    "requestStatus": "handled"
+  },
+  "searchResult": {
+    "title": "Building RAG Pipelines with Web Browsers",
+    "description": "Integrate web browsing into your RAG pipeline for real-time retrieval.",
+    "url": "https://example.com/article",
+    "resultType": "organic",
+    "rank": 1
+  },
+  "metadata": {
+    "title": "Building RAG Pipelines with Web Browsers",
+    "description": "Add web browsing to RAG systems",
+    "languageCode": "en",
+    "url": "https://example.com/article"
+  },
+  "markdown": "# Building RAG Pipelines with Web Browsers\n\n..."
+}
+```
+
+#### Configuration (Standard Settings)
+
+- _Search query_: Google Search keywords or a direct URL
+- _Maximum results_: Number of top search results to process (default: 3)
+- _Output formats_: Markdown, text, or HTML
+- _Remove cookie warnings_: Dismiss cookie consent dialogs
+- _Debug mode_: Enable extraction diagnostics
+
+### Advanced Settings module
+
+Advanced Settings give you full control over search and extraction. Use it for complex sites or production RAG pipelines.
+
+#### Key features
+
+- _Advanced search options_: full Google operator support
+- _Flexible crawling tools_: browser-based (Playwright) or HTTP-based (Cheerio)
+- _Proxy configuration_: handle geo-restrictions and rate limits
+- _Granular content control_: include, remove, and click selectors
+- _Dynamic content handling_: wait strategies for JavaScript rendering
+- _Multiple output formats_: Markdown, HTML, or text
+- _Request management_: timeouts, retries, and concurrency
+
+#### Configuration options
+
+- _Search_: query, max results (1–100), SERP proxy group, SERP retries
+- _Scraping_: tool (browser-playwright, raw-http), HTML transformer, selectors (remove/keep/click), expand clickable elements
+- _Requests_: timeouts, retries, dynamic content wait
+- _Proxy_: use Apify Proxy, proxy groups, countries
+- _Output_: formats, save HTML/Markdown, debug mode, save screenshots
+
+#### Output data
+
+```json title="Advanced Settings output (shortened)"
+{
+  "query": "advanced RAG implementation strategies",
+  "crawl": {
+    "httpStatusCode": 200,
+    "httpStatusMessage": "OK",
+    "loadedUrl": "https://ai-research.com/rag-strategies",
+    "loadedTime": "2025-06-30T10:45:12.789Z",
+    "referrerUrl": "https://www.google.com/search?q=advanced+RAG+implementation+strategies",
+    "uniqueKey": "https://ai-research.com/rag-strategies",
+    "requestStatus": "handled",
+    "depth": 0
+  },
+  "searchResult": {
+    "title": "Advanced RAG Implementation: A Complete Guide",
+    "description": "Cutting-edge strategies for RAG systems.",
+    "url": "https://ai-research.com/rag-strategies",
+    "resultType": "organic",
+    "rank": 1
+  },
+  "metadata": {
+    "canonicalUrl": "https://ai-research.com/rag-strategies",
+    "title": "Advanced RAG Implementation: A Complete Guide | AI Research",
+    "description": "Vector DBs, chunking, and optimization techniques.",
+    "languageCode": "en"
+  },
+  "markdown": "# Advanced RAG Implementation: A Complete Guide\n\n...",
+  "debug": {
+    "extractorUsed": "readableText",
+    "elementsRemoved": 47,
+    "elementsClicked": 3
+  }
+}
+```
+
+### Use cases
+
+- Quick information retrieval for AI assistants
+- General web search integration and Q&A
+- Production RAG pipelines that need reliability
+- Extracting content from JavaScript-heavy sites
+- Building specialized knowledge bases and research workflows
+
+### Best practices
+
+To get the best search results, use specific keywords and operators, and exclude unwanted domains with `-site:`. For better performance, use HTTP mode for static sites and only switch to browser mode when necessary. You can also tune concurrency settings based on your needs. To maintain content quality, remove non-content elements, choose the right HTML transformer, and enable debug mode when troubleshooting. Finally, ensure reliable operation by setting appropriate timeouts and retries, and monitoring HTTP status codes for errors.
+
+## Other scrapers available
+
+There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
+
+- [TikTok Data](/platform/integrations/make/tiktok)
+- [Google Search](/platform/integrations/make/search)
+- [Google Maps Emails Data](/platform/integrations/make/maps)
+- [YouTube Data](/platform/integrations/make/youtube)
+- [AI crawling](/platform/integrations/make/ai-crawling)
+- [Amazon](/platform/integrations/make/amazon)
+
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/maps.md b/sources/platform/integrations/workflows-and-notifications/make/maps.md
@@ -82,7 +82,7 @@ Categories can be general (e.g., "restaurant") which includes all variations lik
   "rating": 4.6,
   "reviewsCount": 182,
   "featuredInLists": ["Best Chinese Food", "Top Rated Restaurants"],
-  
+
   // Complete address information for targeted outreach
   "address": "175 Main St, Staten Island, NY 10307",
   "neighborhood": "Tottenville",
@@ -92,25 +92,25 @@ Categories can be general (e.g., "restaurant") which includes all variations lik
   "state": "New York",
   "countryCode": "US",
   "plusCode": "GQ62+8M Staten Island, New York",
-  
+
   // Multiple contact channels
   "website": "http://kimsislandsi.com/",
   "phone": "(718) 356-5168",
   "phoneUnformatted": "+17183565168",
   "email": "info@kimsislandsi.com", // From website enrichment
-  
+
   // Business qualification data
   "yearsInBusiness": 12,
   "claimThisBusiness": false, // Verified listing
   "popular": true,
   "temporarilyClosed": false,
-  
+
   // Precise location for territory planning
   "location": {
     "lat": 40.5107736,
     "lng": -74.2482624
   },
-  
+
   // Operational insights for scheduling outreach
   "openingHours": {
     "Monday": "11:00 AM - 10:00 PM",
@@ -185,7 +185,7 @@ This module provides the most flexible options for defining where and how to sea
   "title": "Bluestone Lane Chelsea Piers Café",
   "price": "$20–30",
   "categoryName": "Coffee shop",
-  
+
   // Address and location data
   "address": "62 Chelsea Piers Pier 62, New York, NY 10011",
   "neighborhood": "Manhattan",
@@ -199,17 +199,17 @@ This module provides the most flexible options for defining where and how to sea
     "lng": -74.0087457
   },
   "plusCode": "GQ62+8M Staten Island, New York",
-  
+
   // Contact information
   "website": "https://bluestonelane.com/?y_source=1_MjMwNjk1NDAtNzE1LWxvY2F0aW9uLndlYnNpdGU%3D",
   "phone": "(718) 374-6858",
   "phoneUnformatted": "+17183746858",
-  
+
   // Rating and reviews
   "totalScore": 4.3,
   "reviewsCount": 425,
   "imagesCount": 659,
-  
+
   // Business identifiers
   "claimThisBusiness": false,
   "permanentlyClosed": false,
@@ -218,7 +218,7 @@ This module provides the most flexible options for defining where and how to sea
   "categories": ["Coffee shop", "Cafe"],
   "fid": "0x89c25957cf20350d:0xc0d1df36ed3dc4b6",
   "cid": "13894131752416167094",
-  
+
   // Operating hours
   "openingHours": [
     {"day": "Monday", "hours": "7 AM to 6 PM"},
@@ -229,7 +229,7 @@ This module provides the most flexible options for defining where and how to sea
     {"day": "Saturday", "hours": "7 AM to 6 PM"},
     {"day": "Sunday", "hours": "7 AM to 6 PM"}
   ],
-  
+
   // Business attributes and amenities
   "additionalInfo": {
     "Service options": [
@@ -305,7 +305,7 @@ This module provides the most flexible options for defining where and how to sea
       {"High chairs": true}
     ]
   },
-  
+
   // Image and metadata
   "imageUrl": "https://lh3.googleusercontent.com/p/AF1QipMl6-SnuqYEeE3mD54M0q5D5nysRUZQj1BB0g8=w408-h272-k-no",
   "kgmid": "/g/11ph8zh6sg",
@@ -352,11 +352,10 @@ This module provides the most flexible options for defining where and how to sea
 
 There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [TikTok](/platform/integrations/make/tiktok)
 - [Google Search](/platform/integrations/make/search)
 - [YouTube Data](/platform/integrations/make/youtube)
 - [AI crawling](/platform/integrations/make/ai-crawling)
 - [Amazon](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/search.md b/sources/platform/integrations/workflows-and-notifications/make/search.md
@@ -115,11 +115,10 @@ The scraper exports data in various formats including JSON, CSV, Excel, and XML,
 
 There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [TikTok Data](/platform/integrations/make/tiktok)
 - [Google Maps Emails Data](/platform/integrations/make/maps)
 - [YouTube Data](/platform/integrations/make/youtube)
 - [AI crawling](/platform/integrations/make/ai-crawling)
 - [Amazon Data](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/tiktok.md b/sources/platform/integrations/workflows-and-notifications/make/tiktok.md
@@ -164,11 +164,10 @@ For each TikTok hashtag, you will extract:
 
 There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [Google Search](/platform/integrations/make/search)
 - [Google Maps Emails Data](/platform/integrations/make/maps)
 - [YouTube Data](/platform/integrations/make/youtube)
 - [AI crawling](/platform/integrations/make/ai-crawling)
 - [Amazon](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
diff --git a/sources/platform/integrations/workflows-and-notifications/make/youtube.md b/sources/platform/integrations/workflows-and-notifications/make/youtube.md
@@ -221,11 +221,10 @@ For YouTube URLs, you can extract:
 
 There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
 
-- [Instagram Data](/platform/integrations/make/instagram)
 - [TikTok Data](/platform/integrations/make/tiktok)
 - [Google Search](/platform/integrations/make/search)
 - [Google Maps Emails Data](/platform/integrations/make/maps)
 - [AI crawling](/platform/integrations/make/ai-crawling)
 - [Amazon](/platform/integrations/make/amazon)
 
-And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
+And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).