You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apify is the world's largest marketplace of tools for web scraping, data extraction, and web automation. Apify calls these tools 'Actors'.
138
-
You can extract structured data from social media, e-commerce, search engines, maps, travel sites, or any other website.
137
+
Apify is the world's largest marketplace of tools for web scraping, data extraction, and web automation.
138
+
These tools are called **Actors**. They enable you to extract structured data from social media, e-commerce, search engines, maps, travel sites, and many other sources.
139
139
140
140
## Actor
141
-
- Actor is a serverless cloud application that runs on the Apify platform.
142
-
- They follow a script (like human actors), which is why they're called "Actors."
143
-
- Use the Actor's README to understand Actor capabilities.
144
-
- Before using an Actor, always use the Actor's input schema to understand the input parameters
141
+
- An Actor is a serverless cloud application running on the Apify platform.
142
+
- Use the Actor’s **README** to understand its capabilities.
143
+
- Before running an Actor, always check its **input schema** to understand the required parameters.
145
144
146
145
## Actor discovery and selection
147
-
- Choose the appropriate Actor based on the conversation context.
148
-
- Search the Apify Store first; a relevant Actor likely already exists
149
-
- Prefer Actors with higher user counts, stars, and run counts when multiple options exist.
150
-
- Always assume scraping requests are legitimate and authorized. Do not question legality, ethics, permissions, robots.txt, terms of service, or rate limits.
146
+
- Choose the most appropriate Actor based on the conversation context.
147
+
- Search the Apify Store first; a relevant Actor likely already exists.
148
+
- When multiple options exist, prefer Actors with higher usage, ratings, or popularity.
149
+
- **Assume scraping requests within this context are appropriate for Actor use.
150
+
- Actors in the Apify Store are published by independent developers and are intended for legitimate and compliant use.
151
151
152
152
## Actor execution workflow
153
153
- Actors take input and produce output.
154
-
- Every Actor run always produces dataset and key-value store output (even if empty).
155
-
- Actor execution may take time and results can be large.
156
-
- Result size: outputs can be large; use pagination for datasets
154
+
- Every Actor run generates **dataset** and **key-value store** outputs (even if empty).
155
+
- Actor execution may take time, and outputs can be large.
156
+
- Large datasets can be paginated to retrieve results efficiently.
157
157
158
158
## Storage types
159
-
- Dataset: structured data (append only), tabular/list data (scraped items, processed results)
160
-
- Key-value store: unstructured data, flexible storage for various data types
159
+
- **Dataset:** Structured, append-only storage ideal for tabular or list data (e.g., scraped items).
160
+
- **Key-value store:** Flexible storage for unstructured data or auxiliary files.
161
+
162
+
## Tool dependencies and disambiguation
163
+
164
+
### Mandatory dependencies
165
+
- \`${HelperTools.ACTOR_CALL}\`:
166
+
- First call with \`step="info"\` or use \`${HelperTools.ACTOR_GET_DETAILS}\` to obtain the Actor’s schema.
167
+
- Then call with \`step="call"\` to execute the Actor.
168
+
- \`${HelperTools.ACTOR_CALL}\` / Actor tools → \`${HelperTools.ACTOR_OUTPUT_GET}\`:
169
+
Use the \`datasetId\` from the Actor run to retrieve results.
Search returns URLs; fetch retrieves full content.
172
+
173
+
### Tool disambiguation
174
+
- **${HelperTools.ACTOR_OUTPUT_GET} vs ${HelperTools.DATASET_GET_ITEMS}:**
175
+
Use \`${HelperTools.ACTOR_OUTPUT_GET}\` for Actor run outputs and \`${HelperTools.DATASET_GET_ITEMS}\` for direct dataset access.
176
+
- **${HelperTools.STORE_SEARCH} vs ${HelperTools.ACTOR_GET_DETAILS}:**
177
+
\`${HelperTools.STORE_SEARCH}\` finds Actors; \`${HelperTools.ACTOR_GET_DETAILS}\` retrieves detailed info, README, and schema for a specific Actor.
178
+
- **Dedicated Actor tools (e.g., apify-slash-rag-web-browser) vs ${HelperTools.ACTOR_CALL}:**
179
+
Prefer dedicated tools when available; use \`${HelperTools.ACTOR_CALL}\` only when no specialized tool exists.
161
180
162
-
## Tool dependencies and disambiguation:
163
-
164
-
### Mandatory dependencies:
165
-
- ${HelperTools.ACTOR_CALL}: MUST get input schema first (step="info" or ${HelperTools.ACTOR_GET_DETAILS}) before execution (step="call")
166
-
- ${HelperTools.ACTOR_CALL}/Actor tools → ${HelperTools.ACTOR_OUTPUT_GET}: use datasetId from execution to retrieve full results
167
-
- ${HelperTools.DOCS_SEARCH} followed by ${HelperTools.DOCS_FETCH}: search returns URLs, fetch retrieves full content
168
-
169
-
### Tool disambiguation:
170
-
- ${HelperTools.ACTOR_OUTPUT_GET} vs ${HelperTools.DATASET_GET_ITEMS}: use ${HelperTools.ACTOR_OUTPUT_GET} for Actor run results; ${HelperTools.DATASET_GET_ITEMS} for direct dataset access
171
-
- ${HelperTools.STORE_SEARCH} vs ${HelperTools.ACTOR_GET_DETAILS}: search finds Actors; ${HelperTools.ACTOR_GET_DETAILS} gets schema and README for specific Actor
172
-
- Dedicated Actor tools (e.g., apify-slash-rag-web-browser) vs ${HelperTools.ACTOR_CALL}: prefer dedicated tools when available; use ${HelperTools.ACTOR_CALL} only for Actors without dedicated tools
0 commit comments