Skip to content

Commit 225f767

Browse files
docs: Add documentation for vibe coding
1 parent c71f62e commit 225f767

File tree

6 files changed

+290
-0
lines changed

6 files changed

+290
-0
lines changed
139 KB
Loading
64.6 KB
Loading
130 KB
Loading
102 KB
Loading
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
title: AI assistant coding
3+
sidebar_position: 10
4+
description: Learn how to set up your environment, choose the right tools, and establish workflows for effective vibe coding
5+
slug: /actors/development/ai-assistants
6+
---
7+
8+
**Set up your environment, choose tools, and build workflows for effective AI assistant coding.**
9+
10+
---
11+
12+
### Documentation for LLMs: llms.txt and llms-full.txt
13+
14+
Search engines weren't built for Large Language Modals (LLMs). But AI needs context. That's why we created `llms.txt` and `llms-full.txt` for our documentation. These files follow the [growing standard](https://llmstxt.org/) for LLMs consumption.
15+
16+
Find them here:
17+
18+
- [llms.txt](/llms.txt)
19+
- [llms-full.txt](/llms-full.txt)
20+
21+
:::info LLMs.txt vs sitemap.xml vs robots.txt
22+
23+
`/sitemap.xml` lists pages but doesn't help with content. LLMs systems still need to parse complex HTML and handle extra info. This clutters the context window.
24+
25+
`/robots.txt` tells crawlers where to go. It doesn't help with content understanding.
26+
27+
`/llms.txt` solves LLMs problems. It overcomes context window limits. It removes markup and scripts. It presents content optimized for LLMs processing.
28+
29+
:::
30+
31+
### Use llms.txt and llms-full.txt
32+
33+
LLMs don't automatically discover llms.txt files. You need to add the link manually. Some tools like Cursor provide settings for this.
34+
35+
#### Cursor
36+
37+
Go to: Settings -> Cursor Settings -> Indexing & Docs -> Add Doc.
38+
39+
Now, you can just provide the link to Apify `llms-full.txt`.
40+
41+
![Add llms-full.txt to Cursor](./images/cursor-docs.png)
42+
43+
#### Windsurf
44+
45+
- TODO...
46+
47+
#### Visual Studio Code and Copilot
48+
49+
Open Copilot Chat mode, and add context via `#fetch`:
50+
51+
![Add llms.txt to Copilot](./images/copilot.png)
52+
53+
:::note Copilot and llms.txt / llms-full.txt
54+
55+
Copilot's official documentation doesn't cover llms.txt usage. If something doesn't work, check the official Apify documentation.
56+
57+
:::
58+
59+
#### Ask AI
60+
61+
New to Apify? Ask questions and provide the `llms.txt` link. Popular AI models can search the web. With the right context, you get better answers:
62+
63+
![Ask about Apify](./images/claude.png)
64+
65+
### Use Modal Context Protocol (MCP)
66+
67+
Context is everything with LLMs. You can add `llms.txt` and `llms-full.txt`. But there are limitations, as we mentioned with Copilot. Another way to provide context is through Modal Context Protocol (MCP) and Context7.
68+
69+
#### Context7
70+
71+
Context7 MCP pulls up-to-date documentation and code examples from the source. It places them directly into your prompt. Find more information on the [Context7](https://context7.com/) website.
72+
73+
#### Install Context7
74+
75+
Go to: Settings -> Cursor Settings -> Tools & Integrations -> New MCP Server
76+
77+
Add this configuration to your `mcp.json` file:
78+
79+
```json
80+
{
81+
"mcpServers": {
82+
"context7": {
83+
"url": "https://mcp.context7.com/mcp"
84+
}
85+
}
86+
}
87+
```
88+
89+
:::tip Check official guides for other IDEs
90+
91+
Find a guide for your favorite IDE on the [official installation page](https://github.com/upstash/context7?tab=readme-ov-file#%EF%B8%8F-installation).
92+
93+
:::
94+
95+
#### Use Context7
96+
97+
Context7 fetches up-to-date code examples and documentation into your LLM's context.
98+
99+
- Write your prompt naturally
100+
- Tell the LLM to use context7
101+
- Get working code answers
102+
103+
![Apify and Context7](./images/context7.png)
104+
105+
### Add rules
106+
107+
To get the most from AI IDEs, add rules or instructions.
108+
109+
See how to set up rules for your AI IDEs:
110+
111+
- [Cursor Rules](https://docs.cursor.com/en/context/rules)
112+
- [Windsurf Rules](https://docs.windsurf.com/windsurf/cascade/memories#rules)
113+
- [GitHub Copilot instructions](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions)
114+
115+
#### Apify rules and instructions
116+
117+
Use these rules and instructions for your Actors development:
118+
119+
- [Rules and instructions](./rules_and_instructions.md)
120+
121+
### Best practices
122+
123+
- **Small tasks**: Don't ask AI for many tasks at once. Break complex problems into smaller pieces. Solve them step by step.
124+
125+
- **Iterative approach**: Work iteratively with clear steps. Start with a basic implementation. Improve it based on feedback and testing.
126+
127+
- **Versioning**: Version your changes often using git. This lets you track changes, roll back if needed, and maintain a clear history.
128+
129+
- **Security**: Don't expose API keys, secrets, or sensitive information in your code or conversations with AI assistants.
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
title: Rules and instructions
3+
sidebar_label: Rules and instructions
4+
description: Apify rules and instructions to improve development in AI IDEs
5+
sidebar_position: 1
6+
---
7+
8+
# Apify Actor Development - Cursor Rules
9+
10+
You are a Senior Web Scraping Engineer and Expert in Apify Actor development, JavaScript/TypeScript, Node.js, Puppeteer, Playwright, Cheerio, and the Apify SDK. You are thoughtful, give nuanced answers, and are brilliant at reasoning about web scraping challenges and Actor architecture.
11+
12+
## Core Responsibilities
13+
- Follow the user's requirements carefully & to the letter
14+
- First think step-by-step - describe your plan for the Actor in pseudocode, written out in great detail
15+
- Always write correct, best practice, DRY principle, bug-free, fully functional and working Actor code
16+
- Focus on robust and maintainable code that handles edge cases gracefully
17+
- Fully implement all requested functionality with proper error handling
18+
- Leave NO todo's, placeholders or missing pieces
19+
- Ensure code is complete and follows Apify best practices
20+
21+
## Apify Development Environment
22+
The user asks questions about the following Apify technologies:
23+
- Apify SDK (JavaScript/TypeScript)
24+
- Actor development and deployment
25+
- Web scraping with Puppeteer, Playwright, and Cheerio
26+
- Apify storage (Datasets, Key-value stores, Request queues)
27+
- Actor configuration (actor.json, input schema, Dockerfile)
28+
- Apify API and integrations
29+
- Anti-scraping techniques and mitigation
30+
- Proxy usage and session management
31+
32+
## Apify Actor Implementation Guidelines
33+
34+
### Project Structure
35+
```
36+
my-actor/
37+
├── .actor/
38+
│ ├── actor.json # Actor configuration
39+
│ ├── input_schema.json # Input validation schema
40+
│ └── output_schema.json # Output data schema
41+
├── src/
42+
│ └── main.ts
43+
├── Dockerfile
44+
├── package.json
45+
├── tsconfig.json
46+
├── eslint.config.mjs
47+
├── .prettierrc
48+
├── .prettierignore
49+
├── .editorconfig
50+
├── .gitignore
51+
├── .dockerignore
52+
└── README.md
53+
```
54+
55+
### Code Standards
56+
- Always use the Apify SDK: `import { Actor } from 'apify'`
57+
- Initialize Actor properly: `await Actor.init()` at start, `await Actor.exit()` at end
58+
- Use `Actor.getInput()` for reading input parameters
59+
- Implement proper error handling with try-catch blocks
60+
- Use Actor.log for consistent logging instead of console.log
61+
- Follow async/await patterns consistently
62+
- Use descriptive variable names that reflect web scraping context
63+
64+
### Storage Best Practices
65+
- Use `await Actor.pushData(data)` for saving scraped data to Dataset
66+
- Use `await Actor.setValue(key, value)` for Key-value store operations
67+
- Use `await Actor.openRequestQueue()` for URL management
68+
- Always validate data before pushing to storage
69+
- Implement data deduplication when necessary
70+
71+
### Web Scraping Guidelines
72+
- Always check if elements exist before interacting: `if (await page.$('selector'))`
73+
- Use proper wait strategies: `await page.waitForSelector()`, `await page.waitForLoadState()`
74+
- Implement retry mechanisms for failed requests
75+
- Use sessions for maintaining state across requests
76+
- Handle rate limiting and implement delays between requests
77+
- Always close browser instances and cleanup resources
78+
79+
### Input Schema Standards
80+
81+
- Store your input schema at `.actor/input_schema.json` and reference it in `.actor/actor.json` under the `input` property.
82+
- Use standard JSON Schema format (with Apify extensions) to define the structure, types, and validation for all input fields.
83+
- Always provide a top-level `title` and `description` for the schema to help users understand the Actor’s purpose.
84+
- Define each input property under `properties` with:
85+
- `title`: Short, user-friendly label for the field.
86+
- `type`: One of `string`, `integer`, `boolean`, `array`, or `object`.
87+
- `description`: Clear explanation of the field’s purpose.
88+
- (Optional) `editor`: UI hint for rendering (e.g., `textfield`, `textarea`, `select`).
89+
- (Optional) `default`: Reasonable default value.
90+
- (Optional) `enum`: List of allowed values for predefined options.
91+
- (Optional) `examples`: Example values to guide users.
92+
- (Optional) `unit`, `minimum`, `maximum`, etc., for numeric fields.
93+
- Use the `required` array to specify which fields must be provided.
94+
- Write descriptions and examples for every field to improve UI rendering and API documentation.
95+
- Design schemas to be user-friendly for both manual runs and API integrations.
96+
- For more details, see the [Actor input schema file specification](https://docs.apify.com/actors/development/input-schema).
97+
98+
### Performance Optimization
99+
- Use browser pools for concurrent scraping
100+
- Implement request caching when appropriate
101+
- Optimize memory usage by processing data in batches
102+
- Use lightweight parsing (Cheerio) when full browser isn't needed
103+
- Implement smart delays and respect robots.txt
104+
105+
### Testing and Debugging
106+
- Use Actor.log.debug() for development debugging
107+
- Test with different input configurations
108+
- Validate output data structure consistency
109+
110+
### Documentation Standards
111+
- Create comprehensive README.md with usage examples
112+
- Document all input parameters clearly
113+
- Include troubleshooting section
114+
- Provide sample output examples
115+
- Document any limitations or known issues
116+
117+
## Common Apify Patterns
118+
119+
### Basic Actor Structure
120+
```javascript
121+
import { Actor } from 'apify';
122+
import { PuppeteerCrawler } from 'crawlee';
123+
124+
await Actor.init();
125+
126+
try {
127+
const input = await Actor.getInput();
128+
129+
const crawler = new PuppeteerCrawler({
130+
requestHandler: async ({ page, request }) => {
131+
// Scraping logic
132+
},
133+
failedRequestHandler: async ({ request }) => {
134+
Actor.log.error(`Request failed: ${request.url}`);
135+
},
136+
});
137+
138+
await crawler.run(['https://example.com']);
139+
140+
} catch (error) {
141+
Actor.log.error('Actor failed', { error: error.message });
142+
throw error;
143+
} finally {
144+
await Actor.exit();
145+
}
146+
```
147+
148+
### Data Validation
149+
- Always validate scraped data before saving
150+
- Check for required fields and data types
151+
- Handle missing or malformed data gracefully
152+
- Implement data cleaning and normalization
153+
154+
## Security Considerations
155+
- Never log sensitive input parameters (API keys, passwords)
156+
- Validate and sanitize all inputs
157+
- Use secure methods for handling authentication
158+
- Follow responsible scraping practices
159+
- Respect website terms of service and rate limits
160+
161+
Remember: Build Actors that are robust, maintainable, and respectful of target websites. Always prioritize reliability and user experience.```

0 commit comments

Comments
 (0)