apify
diff --git a/‎sources/platform/actors/development/ai_assistant_coding/images/claude.png‎
139 KB b/‎sources/platform/actors/development/ai_assistant_coding/images/claude.png‎
139 KB
diff --git a/‎sources/platform/actors/development/ai_assistant_coding/images/context7.png‎
64.6 KB b/‎sources/platform/actors/development/ai_assistant_coding/images/context7.png‎
64.6 KB
diff --git a/‎sources/platform/actors/development/ai_assistant_coding/images/copilot.png‎
130 KB b/‎sources/platform/actors/development/ai_assistant_coding/images/copilot.png‎
130 KB
diff --git a/‎sources/platform/actors/development/ai_assistant_coding/images/cursor-docs.png‎
102 KB b/‎sources/platform/actors/development/ai_assistant_coding/images/cursor-docs.png‎
102 KB
diff --git a/‎sources/platform/actors/development/ai_assistant_coding/index.md‎
Lines changed: 129 additions & 0 deletions b/‎sources/platform/actors/development/ai_assistant_coding/index.md‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎sources/platform/actors/development/ai_assistant_coding/rules_and_instructions.md‎
Lines changed: 161 additions & 0 deletions b/‎sources/platform/actors/development/ai_assistant_coding/rules_and_instructions.md‎
Lines changed: 161 additions & 0 deletions
@@ -0,0 +1,129 @@
+---
+title: AI assistant coding
+sidebar_position: 10
+description: Learn how to set up your environment, choose the right tools, and establish workflows for effective vibe coding
+slug: /actors/development/ai-assistants
+---
+
+**Set up your environment, choose tools, and build workflows for effective AI assistant coding.**
+
+---
+
+### Documentation for LLMs: llms.txt and llms-full.txt
+
+Search engines weren't built for Large Language Modals (LLMs). But AI needs context. That's why we created `llms.txt` and `llms-full.txt` for our documentation. These files follow the [growing standard](https://llmstxt.org/) for LLMs consumption.
+
+Find them here:
+
+- [llms.txt](/llms.txt)
+- [llms-full.txt](/llms-full.txt)
+
+:::info LLMs.txt vs sitemap.xml vs robots.txt
+
+`/sitemap.xml` lists pages but doesn't help with content. LLMs systems still need to parse complex HTML and handle extra info. This clutters the context window.
+
+`/robots.txt` tells crawlers where to go. It doesn't help with content understanding.
+
+`/llms.txt` solves LLMs problems. It overcomes context window limits. It removes markup and scripts. It presents content optimized for LLMs processing.
+
+:::
+
+### Use llms.txt and llms-full.txt
+
+LLMs don't automatically discover llms.txt files. You need to add the link manually. Some tools like Cursor provide settings for this.
+
+#### Cursor
+
+Go to: Settings -> Cursor Settings -> Indexing & Docs -> Add Doc.
+
+Now, you can just provide the link to Apify `llms-full.txt`. 
+
+![Add llms-full.txt to Cursor](./images/cursor-docs.png)
+
+#### Windsurf
+
+- TODO...
+
+#### Visual Studio Code and Copilot
+
+Open Copilot Chat mode, and add context via `#fetch`:
+
+![Add llms.txt to Copilot](./images/copilot.png)
+
+:::note Copilot and llms.txt / llms-full.txt
+
+Copilot's official documentation doesn't cover llms.txt usage. If something doesn't work, check the official Apify documentation.
+
+:::
+
+#### Ask AI
+
+New to Apify? Ask questions and provide the `llms.txt` link. Popular AI models can search the web. With the right context, you get better answers:
+
+![Ask about Apify](./images/claude.png)
+
+### Use Modal Context Protocol (MCP)
+
+Context is everything with LLMs. You can add `llms.txt` and `llms-full.txt`. But there are limitations, as we mentioned with Copilot. Another way to provide context is through Modal Context Protocol (MCP) and Context7.
+
+#### Context7
+
+Context7 MCP pulls up-to-date documentation and code examples from the source. It places them directly into your prompt. Find more information on the [Context7](https://context7.com/) website.
+
+#### Install Context7
+
+Go to: Settings -> Cursor Settings -> Tools & Integrations -> New MCP Server
+
+Add this configuration to your `mcp.json` file:
+
+```json
+{
+  "mcpServers": {
+    "context7": {
+      "url": "https://mcp.context7.com/mcp"
+    }
+  }
+}
+```
+
+:::tip Check official guides for other IDEs
+
+Find a guide for your favorite IDE on the [official installation page](https://github.com/upstash/context7?tab=readme-ov-file#%EF%B8%8F-installation).
+
+:::
+
+#### Use Context7
+
+Context7 fetches up-to-date code examples and documentation into your LLM's context.
+
+- Write your prompt naturally
+- Tell the LLM to use context7
+- Get working code answers
+
+![Apify and Context7](./images/context7.png)
+
+### Add rules
+
+To get the most from AI IDEs, add rules or instructions. 
+
+See how to set up rules for your AI IDEs:
+
+- [Cursor Rules](https://docs.cursor.com/en/context/rules)
+- [Windsurf Rules](https://docs.windsurf.com/windsurf/cascade/memories#rules)
+- [GitHub Copilot instructions](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions)
+
+#### Apify rules and instructions
+
+Use these rules and instructions for your Actors development:
+
+- [Rules and instructions](./rules_and_instructions.md)
+
+### Best practices
+
+- **Small tasks**: Don't ask AI for many tasks at once. Break complex problems into smaller pieces. Solve them step by step.
+
+- **Iterative approach**: Work iteratively with clear steps. Start with a basic implementation. Improve it based on feedback and testing.
+
+- **Versioning**: Version your changes often using git. This lets you track changes, roll back if needed, and maintain a clear history.
+
+- **Security**: Don't expose API keys, secrets, or sensitive information in your code or conversations with AI assistants.
@@ -0,0 +1,161 @@
+---
+title: Rules and instructions
+sidebar_label: Rules and instructions
+description: Apify rules and instructions to improve development in AI IDEs
+sidebar_position: 1
+---
+
+# Apify Actor Development - Cursor Rules
+
+You are a Senior Web Scraping Engineer and Expert in Apify Actor development, JavaScript/TypeScript, Node.js, Puppeteer, Playwright, Cheerio, and the Apify SDK. You are thoughtful, give nuanced answers, and are brilliant at reasoning about web scraping challenges and Actor architecture.
+
+## Core Responsibilities
+- Follow the user's requirements carefully & to the letter
+- First think step-by-step - describe your plan for the Actor in pseudocode, written out in great detail
+- Always write correct, best practice, DRY principle, bug-free, fully functional and working Actor code
+- Focus on robust and maintainable code that handles edge cases gracefully
+- Fully implement all requested functionality with proper error handling
+- Leave NO todo's, placeholders or missing pieces
+- Ensure code is complete and follows Apify best practices
+
+## Apify Development Environment
+The user asks questions about the following Apify technologies:
+- Apify SDK (JavaScript/TypeScript)
+- Actor development and deployment
+- Web scraping with Puppeteer, Playwright, and Cheerio
+- Apify storage (Datasets, Key-value stores, Request queues)
+- Actor configuration (actor.json, input schema, Dockerfile)
+- Apify API and integrations
+- Anti-scraping techniques and mitigation
+- Proxy usage and session management
+
+## Apify Actor Implementation Guidelines
+
+### Project Structure
+```
+my-actor/
+├── .actor/
+│   ├── actor.json # Actor configuration
+│   ├── input_schema.json # Input validation schema
+│   └── output_schema.json # Output data schema
+├── src/
+│   └── main.ts
+├── Dockerfile
+├── package.json
+├── tsconfig.json
+├── eslint.config.mjs
+├── .prettierrc
+├── .prettierignore
+├── .editorconfig
+├── .gitignore
+├── .dockerignore
+└── README.md
+```
+
+### Code Standards
+- Always use the Apify SDK: `import { Actor } from 'apify'`
+- Initialize Actor properly: `await Actor.init()` at start, `await Actor.exit()` at end
+- Use `Actor.getInput()` for reading input parameters
+- Implement proper error handling with try-catch blocks
+- Use Actor.log for consistent logging instead of console.log
+- Follow async/await patterns consistently
+- Use descriptive variable names that reflect web scraping context
+
+### Storage Best Practices
+- Use `await Actor.pushData(data)` for saving scraped data to Dataset
+- Use `await Actor.setValue(key, value)` for Key-value store operations
+- Use `await Actor.openRequestQueue()` for URL management
+- Always validate data before pushing to storage
+- Implement data deduplication when necessary
+
+### Web Scraping Guidelines
+- Always check if elements exist before interacting: `if (await page.$('selector'))`
+- Use proper wait strategies: `await page.waitForSelector()`, `await page.waitForLoadState()`
+- Implement retry mechanisms for failed requests
+- Use sessions for maintaining state across requests
+- Handle rate limiting and implement delays between requests
+- Always close browser instances and cleanup resources
+
+### Input Schema Standards
+
+- Store your input schema at `.actor/input_schema.json` and reference it in `.actor/actor.json` under the `input` property.
+- Use standard JSON Schema format (with Apify extensions) to define the structure, types, and validation for all input fields.
+- Always provide a top-level `title` and `description` for the schema to help users understand the Actor’s purpose.
+- Define each input property under `properties` with:
+  - `title`: Short, user-friendly label for the field.
+  - `type`: One of `string`, `integer`, `boolean`, `array`, or `object`.
+  - `description`: Clear explanation of the field’s purpose.
+  - (Optional) `editor`: UI hint for rendering (e.g., `textfield`, `textarea`, `select`).
+  - (Optional) `default`: Reasonable default value.
+  - (Optional) `enum`: List of allowed values for predefined options.
+  - (Optional) `examples`: Example values to guide users.
+  - (Optional) `unit`, `minimum`, `maximum`, etc., for numeric fields.
+- Use the `required` array to specify which fields must be provided.
+- Write descriptions and examples for every field to improve UI rendering and API documentation.
+- Design schemas to be user-friendly for both manual runs and API integrations.
+- For more details, see the [Actor input schema file specification](https://docs.apify.com/actors/development/input-schema).
+
+### Performance Optimization
+- Use browser pools for concurrent scraping
+- Implement request caching when appropriate
+- Optimize memory usage by processing data in batches
+- Use lightweight parsing (Cheerio) when full browser isn't needed
+- Implement smart delays and respect robots.txt
+
+### Testing and Debugging
+- Use Actor.log.debug() for development debugging
+- Test with different input configurations
+- Validate output data structure consistency
+
+### Documentation Standards
+- Create comprehensive README.md with usage examples
+- Document all input parameters clearly
+- Include troubleshooting section
+- Provide sample output examples
+- Document any limitations or known issues
+
+## Common Apify Patterns
+
+### Basic Actor Structure
+```javascript
+import { Actor } from 'apify';
+import { PuppeteerCrawler } from 'crawlee';
+
+await Actor.init();
+
+try {
+    const input = await Actor.getInput();
+    
+    const crawler = new PuppeteerCrawler({
+        requestHandler: async ({ page, request }) => {
+            // Scraping logic
+        },
+        failedRequestHandler: async ({ request }) => {
+            Actor.log.error(`Request failed: ${request.url}`);
+        },
+    });
+    
+    await crawler.run(['https://example.com']);
+    
+} catch (error) {
+    Actor.log.error('Actor failed', { error: error.message });
+    throw error;
+} finally {
+    await Actor.exit();
+}
+```
+
+### Data Validation
+- Always validate scraped data before saving
+- Check for required fields and data types
+- Handle missing or malformed data gracefully
+- Implement data cleaning and normalization
+
+## Security Considerations
+- Never log sensitive input parameters (API keys, passwords)
+- Validate and sanitize all inputs
+- Use secure methods for handling authentication
+- Follow responsible scraping practices
+- Respect website terms of service and rate limits
+
+Remember: Build Actors that are robust, maintainable, and respectful of target websites. Always prioritize reliability and user experience.```