Skip to content

Commit 77a1c09

Browse files
committed
docs: add missing docs
add docs for maps & ai crawling add links to other modules at the end add titles to codeblokcs
1 parent 8b07a0a commit 77a1c09

File tree

12 files changed

+564
-75
lines changed

12 files changed

+564
-75
lines changed
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
title: Make - AI crawling Actor integration
3+
description: Learn about AI Crawling scraper modules.
4+
sidebar_label: AI Crawling
5+
sidebar_position: 6
6+
slug: /integrations/make/ai-crawling
7+
toc_max_heading_level: 4
8+
---
9+
10+
## Apify Scraper for AI Crawling
11+
12+
Apify Scraper for AI Crawling from [Apify](https://apify.com/) lets you extract text content from websites to feed AI models, LLM applications, vector databases, or Retrieval Augmented Generation(RAG) pipelines. It supports rich formatting using Markdown, cleans the HTML of irrelevant elements, downloads linked files, and integrates with AI ecosystems like LangChain, LlamaIndex, and other LLM frameworks.
13+
14+
To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the [Apify Console](https://console.apify.com/) under **Settings > Integrations**. After connecting, you can automate content extraction at scale and incorporate the results into your AI workflows.
15+
16+
## Connect Apify Scraper for AI Crawling
17+
18+
1. Create an account at [Apify](https://console.apify.com/). You can sign up using your email, Gmail, or GitHub account.
19+
20+
![Sign up page](images/ai-crawling/image.png)
21+
22+
1. To connect your Apify account with Make, you need to get the Apify API token. In the Apify Console, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)**.
23+
24+
![Apify Console token for Make.png](images/Apify_Console_token_for_Make.png)
25+
26+
1. Find your token under **Personal API tokens** section. You can also create a new API token with multiple customizable permissions by clicking on **+ Create a new token**.
27+
1. Click the **Copy** icon next to your API token to copy it to your clipboard. Then, return to your Make scenario interface.
28+
29+
![Apify token on Make.png](images/Apify_token_on_Make.png)
30+
31+
1. In Make, click **Add** to open the **Create a connection** dialog of the chosen Apify Scraper module.
32+
1. In the **API token** field, paste the API token you copied from Apify. Provide a clear **Connection name**, and click **Save**.
33+
34+
![Make API token](images/ai-crawling/image%201.png)
35+
36+
Once connected, you can build workflows to automate website extraction and integrate results into your AI applications.
37+
38+
## Apify Scraper for Website Content modules
39+
40+
After connecting the app, you can use one of the two modules as native scrapers to extract website content.
41+
42+
### Standard Settings Module
43+
44+
The Standard Settings module is a streamlined component of the Website Content Crawler that allows you to quickly extract content from websites using optimized default settings. This module is perfect for extracting content from blogs, documentation sites, knowledge bases, or any text-rich website to feed into AI models.
45+
46+
#### How it works
47+
48+
The crawler starts with one or more **Start URLs** you provide, typically the top-level URL of a documentation site, blog, or knowledge base. It then:
49+
50+
- Crawls these start URLs
51+
- Finds links to other pages on the site
52+
- Recursively crawls those pages as long as their URL is under the start URL
53+
- Respects URL patterns for inclusion/exclusion
54+
- Automatically skips duplicate pages with the same canonical URL
55+
- Provides various settings to customize crawling behavior (crawler type, max pages, depth, concurrency, etc.)
56+
57+
Once a web page is loaded, the Actor processes its HTML to ensure quality content extraction:
58+
59+
- Waits for dynamic content to load if using a headless browser
60+
- Can scroll to a certain height to ensure all page content is loaded
61+
- Can expand clickable elements to reveal hidden content
62+
- Removes DOM nodes matching specific CSS selectors (like navigation, headers, footers)
63+
- Optionally keeps only content matching specific CSS selectors
64+
- Removes cookie warnings using browser extensions
65+
- Transforms the page using the selected HTML transformer to extract the main content
66+
67+
#### Output data
68+
69+
For each crawled web page, you'll receive:
70+
71+
- _Page metadata_: URL, title, description, canonical URL
72+
- _Cleaned text content_: The main article content with irrelevant elements removed
73+
- _Markdown formatting_: Structured content with headers, lists, links, and other formatting preserved
74+
- _Crawl information_: Loaded URL, referrer URL, timestamp, HTTP status
75+
- _Optional file downloads_: PDFs, DOCs, and other linked documents
76+
77+
```json title="Sample output (shortened)"
78+
{
79+
"url": "https://docs.apify.com/academy/web-scraping-for-beginners",
80+
"crawl": {
81+
"loadedUrl": "https://docs.apify.com/academy/web-scraping-for-beginners",
82+
"loadedTime": "2025-04-22T14:33:20.514Z",
83+
"referrerUrl": "https://docs.apify.com/academy",
84+
"depth": 1,
85+
"httpStatusCode": 200
86+
},
87+
"metadata": {
88+
"canonicalUrl": "https://docs.apify.com/academy/web-scraping-for-beginners",
89+
"title": "Web scraping for beginners | Apify Documentation",
90+
"description": "Learn the basics of web scraping with a step-by-step tutorial and practical exercises.",
91+
"languageCode": "en",
92+
"markdown": "# Web scraping for beginners\n\nWelcome to our comprehensive web scraping tutorial for beginners. This guide will take you through the fundamentals of extracting data from websites, with practical examples and exercises.\n\n## What is web scraping?\n\nWeb scraping is the process of extracting data from websites. It involves making HTTP requests to web servers, downloading HTML pages, and parsing them to extract the desired information.\n\n## Why learn web scraping?\n\n- **Data collection**: Gather information for research, analysis, or business intelligence\n- **Automation**: Save time by automating repetitive data collection tasks\n- **Integration**: Connect web data with your applications or databases\n- **Monitoring**: Track changes on websites automatically\n\n## Getting started\n\nTo begin web scraping, you'll need to understand the basics of HTML, CSS selectors, and HTTP. This tutorial will guide you through these concepts step by step.\n\n...",
93+
"text": "Web scraping for beginners\n\nWelcome to our comprehensive web scraping tutorial for beginners. This guide will take you through the fundamentals of extracting data from websites, with practical examples and exercises.\n\nWhat is web scraping?\n\nWeb scraping is the process of extracting data from websites. It involves making HTTP requests to web servers, downloading HTML pages, and parsing them to extract the desired information.\n\nWhy learn web scraping?\n\n- Data collection: Gather information for research, analysis, or business intelligence\n- Automation: Save time by automating repetitive data collection tasks\n- Integration: Connect web data with your applications or databases\n- Monitoring: Track changes on websites automatically\n\nGetting started\n\nTo begin web scraping, you'll need to understand the basics of HTML, CSS selectors, and HTTP. This tutorial will guide you through these concepts step by step.\n\n..."
94+
}
95+
}
96+
```
97+
98+
### Advanced Settings Module
99+
100+
The Advanced Settings module provides complete control over the content extraction process, allowing you to fine-tune every aspect of the crawling and transformation pipeline. This module is ideal for complex websites, JavaScript-heavy applications, or when you need precise control over content extraction.
101+
102+
#### Key features
103+
104+
- _Multiple Crawler Options_: Choose between headless browsers (Playwright) or faster HTTP clients (Cheerio)
105+
- _Custom Content Selection_: Specify exactly which elements to keep or remove
106+
- _Advanced Navigation Control_: Set crawling depth, scope, and URL patterns
107+
- _Dynamic Content Handling_: Wait for JavaScript-rendered content to load
108+
- _Interactive Element Support_: Click expandable sections to reveal hidden content
109+
- _Multiple Output Formats_: Save content as Markdown, HTML, or plain text
110+
- _Proxy Configuration_: Use proxies to handle geo-restrictions or avoid IP blocks
111+
- _Content Transformation Options_: Multiple algorithms for optimal content extraction
112+
113+
#### How it works
114+
115+
The Advanced Settings module provides granular control over the entire crawling process:
116+
117+
1. _Crawler Selection_: Choose from Playwright (Firefox/Chrome), or Cheerio based on website complexity
118+
2. _URL Management_: Define precise scoping with include/exclude URL patterns
119+
3. _DOM Manipulation_: Control which HTML elements to keep or remove
120+
4. _Content Transformation_: Apply specialized algorithms for content extraction
121+
5. _Output Formatting_: Select from multiple formats for AI model compatibility
122+
123+
#### Configuration options
124+
125+
Advanced Settings offers numerous configuration options, including:
126+
127+
- _Crawler Type_: Select the rendering engine (browser or HTTP client)
128+
- _Content Extraction Algorithm_: Choose from multiple HTML transformers
129+
- _Element Selectors_: Specify which elements to keep, remove, or click
130+
- _URL Patterns_: Define URL inclusion/exclusion patterns with glob syntax
131+
- _Crawling Parameters_: Set concurrency, depth, timeouts, and retries
132+
- _Proxy Configuration_: Configure proxy settings for robust crawling
133+
- _Output Options_: Select content formats and storage options
134+
135+
#### Output data
136+
137+
In addition to the standard output fields, Advanced Settings provides:
138+
139+
- _Multiple Format Options_: Content in Markdown, HTML, or plain text
140+
- _Debug Information_: Detailed extraction diagnostics and snapshots
141+
- _HTML Transformations_: Results from different content extraction algorithms
142+
- _File Storage Options_: Flexible storage for HTML, screenshots, or downloaded files
143+
144+
Looking for more than just AI crawling? You can use other native Make apps powered by Apify:
145+
146+
- [Instagram Data](platform/integrations/make/instagram)
147+
- [TikTok Data](platform/integration/make/tiktok)
148+
- [Google Maps Emails Data](platform/integrations/make/maps)
149+
- [YouTube Data](platform/integrations/make/youtube)
150+
- [Amazon](platform/integrations/make/amazon)
151+
152+
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/amazon.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Make - Amazon Actor integration
33
description: Learn about Amazon scraper modules, extract product, search, or category data from Amazon.
44
sidebar_label: Amazon
5-
sidebar_position: 4
5+
sidebar_position: 6
66
slug: /integrations/make/amazon
77
---
88

@@ -52,7 +52,7 @@ For Amazon URLs, you can extract:
5252
- _Description_
5353
- _Price value and currency_
5454

55-
```json
55+
```json title="Example"
5656
[
5757
{
5858
"title": "Logitech M185 Wireless Mouse, 2.4GHz with USB Mini Receiver, 12-Month Battery Life, 1000 DPI Optical Tracking, Ambidextrous PC/Mac/Laptop - Swift Grey",
@@ -102,9 +102,7 @@ For Amazon URLs, you can extract:
102102
]
103103
```
104104

105-
Search data sample:
106-
107-
```json
105+
```json title="Search data sample"
108106
[
109107
{
110108
"title": "Logitech MK270 Wireless Keyboard And Mouse Combo For Windows, 2.4 GHz Wireless, Compact Mouse, 8 Multimedia And Shortcut Keys, For PC, Laptop - Black",
@@ -154,9 +152,7 @@ Search data sample:
154152
]
155153
```
156154

157-
Product data sample:
158-
159-
```json
155+
```json title="Product data sample"
160156
[
161157
{
162158
"title": "Amazon Basics Wired Keyboard, Full-Sized, QWERTY Layout, Black",
@@ -176,9 +172,7 @@ Product data sample:
176172
]
177173
```
178174

179-
Category data sample:
180-
181-
```json
175+
```json title="Category data sample"
182176
[
183177
{
184178
"title": "Logitech M185 Wireless Mouse, 2.4GHz with USB Mini Receiver, 12-Month Battery Life, 1000 DPI Optical Tracking, Ambidextrous PC/Mac/Laptop - Swift Grey",
@@ -232,11 +226,11 @@ Category data sample:
232226

233227
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
234228

235-
- Instagram Data
236-
- TikTok Data
237-
- Facebook Data
238-
- Google Search Data
239-
- Google Maps Emails Data
240-
- YouTube Data
229+
- [Instagram Data](platform/integrations/make/instagram)
230+
- [TikTok Data](platform/integrations/make/tiktok)
231+
- [Google Maps Emails Data](platform/integrations/make/maps)
232+
- [YouTube Data](platform/integrations/make/youtube)
233+
- [AI crawling](platform/integrations/make/ai-crawling)
234+
241235

242236
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

sources/platform/integrations/workflows-and-notifications/make/facebook.md

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,16 @@
22
title: Make - Facebook Actor integration
33
description: Learn about Facebook scraper modules, extract posts, comments, and profile data from Facebook.
44
sidebar_label: Facebook
5-
sidebar_position: 5
5+
sidebar_position: 8
66
slug: /integrations/make/facebook
7+
unlisted: true
78
---
89

910
## Apify Scraper for Facebook Data
1011

11-
The Facebook Scraper modules from apify.com allow you to extract posts, comments, and profile data from Facebook.
12+
The Facebook Scraper modules from [Apify](https://apify.com/) allow you to extract posts, comments, and profile data from Facebook.
1213

13-
To use these modules, you need an Apify account and an API token, which you can find under Settings > Integrations in Apify Console. After connecting, you can automate data extraction and incorporate the results into your workflows.
14+
To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the [Apify Console](https://console.apify.com/) under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows.
1415

1516
## Connect Apify Scraper for Facebook Data modules to Make
1617

@@ -58,9 +59,7 @@ For each given Facebook group URL, you will extract:
5859
- _Attachments_: media set URL, image thumbnail, full image URL, dimensions, OCR text (if any), media ID, and owner ID.
5960
- _Top comments_: comment ID, comment URL, timestamp, text, feedback ID, commenter ID and name, profile picture, likes count, and threading depth.
6061

61-
Profile data, shortened sample:
62-
63-
```json
62+
```json title="Profile data, shortened sample"
6463
[
6564
{
6665
"facebookUrl": "https://www.facebook.com/groups/WeirdSecondhandFinds",
@@ -118,9 +117,7 @@ Features like _replies_ and _comment sorting_ are limited for users on Apify's F
118117

119118
:::
120119

121-
Example (shortened):
122-
123-
```json
120+
```json title="Example (shortened)"
124121
[
125122
{
126123
"facebookUrl": "https://www.facebook.com/NASAJSC/posts/pfbid0ohxEG5cJnm3JNFodkvsehRUY3yfLx5Vis8cude7xRdmrXV9EMDxsuScPaSCtX9KNl?locale=cs_CZ", "commentUrl": "https://www.facebook.com/NASAJSC/posts/pfbid0ohxEG5cJnm3JNFodkvsehRUY3yfLx5Vis8cude7xRdmrXV9EMDxsuScPaSCtX9KNl?comment_id=2386082985122451",
@@ -163,9 +160,7 @@ You’ll get:
163160
- _Tags_: Hashtags used in the post
164161
- _Location_: Geographic location tagged in the post (if available)
165162

166-
Example (shortened):
167-
168-
```json
163+
```json title="Example (shortened)"
169164
[
170165
{
171166
"facebookUrl": "https://www.facebook.com/nasa",
712 KB
Loading
202 KB
Loading
584 KB
Loading
202 KB
Loading

sources/platform/integrations/workflows-and-notifications/make/instagram.md

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,7 @@ For each Instagram profile, you will extract:
5757
- _Content information_: number of IGTV videos and highlight reels.
5858
- _Related profiles_: suggested accounts, including their username, full name, profile picture URL, and verification status.
5959

60-
Profile data, shortened sample:
61-
62-
```json
60+
```json title="Profile data, shortened sample"
6361
[
6462
{
6563
"fullName": "NASA",
@@ -106,9 +104,7 @@ Features like _replies_ and _newest comments first_ are limited for users on Api
106104

107105
:::
108106

109-
Comment data, shortened sample:
110-
111-
```json
107+
```json title="Comment data, shortened sample"
112108
[
113109
{
114110
"text": "So beautiful 🥲🥹✨",
@@ -146,9 +142,7 @@ For each Instagram post, you will extract:
146142
- _User information_: owner’s username, full name (if available), and user ID.
147143
- _Additional data_: tagged users, child posts (for carousel posts), and location details (if available).
148144

149-
Post data, shortened sample:
150-
151-
```json
145+
```json title="Post data, shortened sample"
152146
[
153147
{
154148
"caption": "A supernova glowing in the dark 🌟⁣\n ⁣\nWhen supernova remnant SN 1006 first appeared in the sky in 1006 C.E., it was far brighter than Venus and visible during the daytime for weeks. From that moment on, it occupied the hearts of astronomers all over the world; it has been studied from the ground and from space many times.⁣\n ⁣\nIn this image, visible, radio, and X-ray data combine to give us that blue (and red) view of the remnant’s full shell – the debris field that was created when a white dwarf star exploded and sent material hurtling into space.⁣\n ⁣\nScientists believe SN 1006 is a Type Ia supernova. This class of supernova is caused when a white dwarf never lets another star go: either it pulls too much mass from a companion star and explodes, or it merges with another white dwarf and explodes. Understanding Type Ia supernovas is especially important because astronomers use observations of these explosions in distant galaxies as mileposts to mark the expansion of the universe.⁣\n ⁣\nImage description: This supernova remnant looks like a bubble filled with blue and red clouds of dust and gas, floating amid a million stars. These stars are visible all around the bubble and even can be seen peeking through it.⁣\n ⁣\nCredit: NASA, ESA, and Z. Levay (STScI)⁣\n ⁣\n#NASA #Supernova #Stars #IVE #Astronomy #Hubble #Chandra #Clouds #아이브 #SupernovaLove #DavidGuetta",
@@ -187,11 +181,10 @@ Post data, shortened sample:
187181

188182
There are other native Make Apps powered by Apify. You can check out Apify Scraper for:
189183

190-
- TikTok Data
191-
- YouTube Data
192-
- Facebook Data
193-
- Google Search Data
194-
- Google Maps Emails Data
195-
- Amazon Data
184+
- [TikTok Data](platform/integrations/make/tiktok)
185+
- [Google Maps Emails Data](platform/integrations/make/maps)
186+
- [YouTube Data](platform/integrations/make/youtube)
187+
- [AI crawling](platform/integrations/make/ai-crawling)
188+
- [Amazon](platform/integrations/make/amazon)
196189

197190
And more! Because you can access any of our 4,500+ scrapers on Apify Store by using the [general Apify connections](https://www.make.com/en/integrations/apify).

0 commit comments

Comments
 (0)