Skip to content

docs: add LLM scraper doc #1785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .vale.ini
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
StylesPath = .github/styles
MinAlertLevel = warning
IgnoredScopes = code, tt, table, tr, td
IgnoredScopes = code, tt, table, tr, td, frontmatter

Vocab = Docs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,11 +143,10 @@ In addition to the standard output fields, Advanced Settings provides:

Looking for more than just AI crawling? You can use other native Make apps powered by Apify:

- [Instagram Data](/platform/integrations/make/instagram)
- [TikTok Data](/platform/integrations/make/tiktok)
- [Google Search](/platform/integrations/make/search)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [YouTube Data](/platform/integrations/make/youtube)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
Expand Up @@ -226,12 +226,11 @@ For Amazon URLs, you can extract:

There are other native Make Apps powered by Apify. You can check out Apify Scraper for:

- [Instagram Data](/platform/integrations/make/instagram)
- [TikTok Data](/platform/integrations/make/tiktok)
- [Google Search](/platform/integrations/make/search)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [YouTube Data](/platform/integrations/make/youtube)
- [AI crawling](/platform/integrations/make/ai-crawling)


And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
Expand Up @@ -237,13 +237,12 @@ You’ll get:

Looking for more than just Facebook? You can use other native Make apps powered by Apify:

- [Instagram Data](/platform/integrations/make/instagram)
- [TikTok Data](/platform/integrations/make/tiktok)
- [Google Search](/platform/integrations/make/search)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [YouTube Data](/platform/integrations/make/youtube)
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -188,4 +188,4 @@ There are other native Make Apps powered by Apify. You can check out Apify Scrap
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
---
title: Make - LLMs Actor integration
description: Learn about LLM browser modules. Search the web and extract clean Markdown for AI assistants and RAG.
sidebar_label: LLMs
sidebar_position: 7
slug: /integrations/make/llm
toc_max_heading_level: 4
---

## Apify Scraper for LLMs

Apify Scraper for LLMs from [Apify](https://apify.com) is a web browsing module for OpenAI Assistants, RAG pipelines, and AI agents. It can query Google Search, scrape the top results, and return page content as Markdown for downstream AI processing.

To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the Apify Console under **Settings > Integrations**. After connecting, you can automate content extraction and integrate results into your AI workflows.

## Connect Apify Scraper for LLMs

1. Create an account at [Apify](https://console.apify.com/). You can sign up using your email, Gmail, or GitHub account.

![Make interface showing API token field and connection name field for Apify integration setup](images/llm/rag-signup.png)

1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the token, go to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console.

![Apify Console sign-up page with email, Gmail, and GitHub sign-up options](images/Apify_Console_token_for_Make.png)

1. Find your token under **Personal API tokens**. You can also create a new token with custom permissions by clicking **+ Create a new token**.
1. Click the **Copy** icon to copy your API token, then return to your Make scenario.

![Apify Console Settings page showing Personal API tokens section with token management options](images/Apify_token_on_Make.png)

1. In Make, click **Add** to open the **Create a connection** dialog of the chosen Apify Scraper module.
1. In the **API token** field, paste your token, provide a clear **Connection name**, and click **Save**.

![Make connection dialog with completed API token and connection name fields for Apify Scraper module](images/llm/apify-token-for-module-on-make.png)

Once connected, you can build workflows that search the web, extract content, and pass it to your AI applications.

## Apify Scraper for LLMs modules

After connecting the app, you can use two modules to search and extract content.

### Standard Settings module

Use Standard Settings to quickly search the web and extract content with optimized defaults. This is ideal for AI agents that need to answer questions or gather information from multiple sources.

The module supports two modes:

- _Search mode_ (keywords)
- Queries Google Search with your keywords (supports advanced operators)
- Retrieves the top N organic results
- Loads each result and extracts the main content
- Returns Markdown-formatted content

- _Direct URL mode_ (URL)
- Navigates to a specific URL
- Extracts page content
- Skips Google Search

#### How it works

When you provide keywords, the module runs Google Search, parses the results, and collects organic URLs. For content extraction, it loads pages, waits for dynamic content to render, removes clutter, extracts the main content, and converts it to Markdown. Finally, it generates output by combining content, adding metadata and sources, and formatting everything for AI consumption.

#### Output data

```json title="Standard Settings output (shortened)"
{
"query": "web browser for RAG pipelines -site:reddit.com",
"crawl": {
"httpStatusCode": 200,
"httpStatusMessage": "OK",
"loadedAt": "2025-06-30T10:15:23.456Z",
"uniqueKey": "https://example.com/article",
"requestStatus": "handled"
},
"searchResult": {
"title": "Building RAG Pipelines with Web Browsers",
"description": "Integrate web browsing into your RAG pipeline for real-time retrieval.",
"url": "https://example.com/article",
"resultType": "organic",
"rank": 1
},
"metadata": {
"title": "Building RAG Pipelines with Web Browsers",
"description": "Add web browsing to RAG systems",
"languageCode": "en",
"url": "https://example.com/article"
},
"markdown": "# Building RAG Pipelines with Web Browsers\n\n..."
}
```

#### Configuration (Standard Settings)

- _Search query_: Google Search keywords or a direct URL
- _Maximum results_: Number of top search results to process (default: 3)
- _Output formats_: Markdown, text, or HTML
- _Remove cookie warnings_: Dismiss cookie consent dialogs
- _Debug mode_: Enable extraction diagnostics

### Advanced Settings module

Advanced Settings give you full control over search and extraction. Use it for complex sites or production RAG pipelines.

#### Key features

- _Advanced search options_: full Google operator support
- _Flexible crawling tools_: browser-based (Playwright) or HTTP-based (Cheerio)
- _Proxy configuration_: handle geo-restrictions and rate limits
- _Granular content control_: include, remove, and click selectors
- _Dynamic content handling_: wait strategies for JavaScript rendering
- _Multiple output formats_: Markdown, HTML, or text
- _Request management_: timeouts, retries, and concurrency

#### Configuration options

- _Search_: query, max results (1–100), SERP proxy group, SERP retries
- _Scraping_: tool (browser-playwright, raw-http), HTML transformer, selectors (remove/keep/click), expand clickable elements
- _Requests_: timeouts, retries, dynamic content wait
- _Proxy_: use Apify Proxy, proxy groups, countries
- _Output_: formats, save HTML/Markdown, debug mode, save screenshots

#### Output data

```json title="Advanced Settings output (shortened)"
{
"query": "advanced RAG implementation strategies",
"crawl": {
"httpStatusCode": 200,
"httpStatusMessage": "OK",
"loadedUrl": "https://ai-research.com/rag-strategies",
"loadedTime": "2025-06-30T10:45:12.789Z",
"referrerUrl": "https://www.google.com/search?q=advanced+RAG+implementation+strategies",
"uniqueKey": "https://ai-research.com/rag-strategies",
"requestStatus": "handled",
"depth": 0
},
"searchResult": {
"title": "Advanced RAG Implementation: A Complete Guide",
"description": "Cutting-edge strategies for RAG systems.",
"url": "https://ai-research.com/rag-strategies",
"resultType": "organic",
"rank": 1
},
"metadata": {
"canonicalUrl": "https://ai-research.com/rag-strategies",
"title": "Advanced RAG Implementation: A Complete Guide | AI Research",
"description": "Vector DBs, chunking, and optimization techniques.",
"languageCode": "en"
},
"markdown": "# Advanced RAG Implementation: A Complete Guide\n\n...",
"debug": {
"extractorUsed": "readableText",
"elementsRemoved": 47,
"elementsClicked": 3
}
}
```

### Use cases

- Quick information retrieval for AI assistants
- General web search integration and Q&A
- Production RAG pipelines that need reliability
- Extracting content from JavaScript-heavy sites
- Building specialized knowledge bases and research workflows

### Best practices

To get the best search results, use specific keywords and operators, and exclude unwanted domains with `-site:`. For better performance, use HTTP mode for static sites and only switch to browser mode when necessary. You can also tune concurrency settings based on your needs. To maintain content quality, remove non-content elements, choose the right HTML transformer, and enable debug mode when troubleshooting. Finally, ensure reliable operation by setting appropriate timeouts and retries, and monitoring HTTP status codes for errors.

## Other scrapers available

There are other native Make Apps powered by Apify. You can check out Apify Scraper for:

- [TikTok Data](/platform/integrations/make/tiktok)
- [Google Search](/platform/integrations/make/search)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [YouTube Data](/platform/integrations/make/youtube)
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Categories can be general (e.g., "restaurant") which includes all variations lik
"rating": 4.6,
"reviewsCount": 182,
"featuredInLists": ["Best Chinese Food", "Top Rated Restaurants"],

// Complete address information for targeted outreach
"address": "175 Main St, Staten Island, NY 10307",
"neighborhood": "Tottenville",
Expand All @@ -92,25 +92,25 @@ Categories can be general (e.g., "restaurant") which includes all variations lik
"state": "New York",
"countryCode": "US",
"plusCode": "GQ62+8M Staten Island, New York",

// Multiple contact channels
"website": "http://kimsislandsi.com/",
"phone": "(718) 356-5168",
"phoneUnformatted": "+17183565168",
"email": "[email protected]", // From website enrichment

// Business qualification data
"yearsInBusiness": 12,
"claimThisBusiness": false, // Verified listing
"popular": true,
"temporarilyClosed": false,

// Precise location for territory planning
"location": {
"lat": 40.5107736,
"lng": -74.2482624
},

// Operational insights for scheduling outreach
"openingHours": {
"Monday": "11:00 AM - 10:00 PM",
Expand Down Expand Up @@ -185,7 +185,7 @@ This module provides the most flexible options for defining where and how to sea
"title": "Bluestone Lane Chelsea Piers Café",
"price": "$20–30",
"categoryName": "Coffee shop",

// Address and location data
"address": "62 Chelsea Piers Pier 62, New York, NY 10011",
"neighborhood": "Manhattan",
Expand All @@ -199,17 +199,17 @@ This module provides the most flexible options for defining where and how to sea
"lng": -74.0087457
},
"plusCode": "GQ62+8M Staten Island, New York",

// Contact information
"website": "https://bluestonelane.com/?y_source=1_MjMwNjk1NDAtNzE1LWxvY2F0aW9uLndlYnNpdGU%3D",
"phone": "(718) 374-6858",
"phoneUnformatted": "+17183746858",

// Rating and reviews
"totalScore": 4.3,
"reviewsCount": 425,
"imagesCount": 659,

// Business identifiers
"claimThisBusiness": false,
"permanentlyClosed": false,
Expand All @@ -218,7 +218,7 @@ This module provides the most flexible options for defining where and how to sea
"categories": ["Coffee shop", "Cafe"],
"fid": "0x89c25957cf20350d:0xc0d1df36ed3dc4b6",
"cid": "13894131752416167094",

// Operating hours
"openingHours": [
{"day": "Monday", "hours": "7 AM to 6 PM"},
Expand All @@ -229,7 +229,7 @@ This module provides the most flexible options for defining where and how to sea
{"day": "Saturday", "hours": "7 AM to 6 PM"},
{"day": "Sunday", "hours": "7 AM to 6 PM"}
],

// Business attributes and amenities
"additionalInfo": {
"Service options": [
Expand Down Expand Up @@ -305,7 +305,7 @@ This module provides the most flexible options for defining where and how to sea
{"High chairs": true}
]
},

// Image and metadata
"imageUrl": "https://lh3.googleusercontent.com/p/AF1QipMl6-SnuqYEeE3mD54M0q5D5nysRUZQj1BB0g8=w408-h272-k-no",
"kgmid": "/g/11ph8zh6sg",
Expand Down Expand Up @@ -352,11 +352,10 @@ This module provides the most flexible options for defining where and how to sea

There are other native Make Apps powered by Apify. You can check out Apify Scraper for:

- [Instagram Data](/platform/integrations/make/instagram)
- [TikTok](/platform/integrations/make/tiktok)
- [Google Search](/platform/integrations/make/search)
- [YouTube Data](/platform/integrations/make/youtube)
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,10 @@ The scraper exports data in various formats including JSON, CSV, Excel, and XML,

There are other native Make Apps powered by Apify. You can check out Apify Scraper for:

- [Instagram Data](/platform/integrations/make/instagram)
- [TikTok Data](/platform/integrations/make/tiktok)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [YouTube Data](/platform/integrations/make/youtube)
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon Data](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
Expand Up @@ -164,11 +164,10 @@ For each TikTok hashtag, you will extract:

There are other native Make Apps powered by Apify. You can check out Apify Scraper for:

- [Instagram Data](/platform/integrations/make/instagram)
- [Google Search](/platform/integrations/make/search)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [YouTube Data](/platform/integrations/make/youtube)
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
Original file line number Diff line number Diff line change
Expand Up @@ -221,11 +221,10 @@ For YouTube URLs, you can extract:

There are other native Make Apps powered by Apify. You can check out Apify Scraper for:

- [Instagram Data](/platform/integrations/make/instagram)
- [TikTok Data](/platform/integrations/make/tiktok)
- [Google Search](/platform/integrations/make/search)
- [Google Maps Emails Data](/platform/integrations/make/maps)
- [AI crawling](/platform/integrations/make/ai-crawling)
- [Amazon](/platform/integrations/make/amazon)

And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).
And more! Because you can access any of thousands of our scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).