Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a65364f
feat(ios): add iOS automation support via screen mirroring
lhuanyu Aug 4, 2025
777d4c2
refactor(ios): update comments in page/index.ts
lhuanyu Aug 4, 2025
ae52bfe
refactor(ios): update comments in page/index.ts
lhuanyu Aug 4, 2025
d80b0d6
refactor(ios): rename iOSMirrorConfig to mirrorConfig and update exam…
lhuanyu Aug 4, 2025
8971067
refactor(ios): update comments and prompts to English
lhuanyu Aug 4, 2025
312c073
feat(ios): update mirror config and improve window rect script
lhuanyu Aug 4, 2025
ee6bdd7
refactor(ios): simplify scrolling logic in auto_server.py
lhuanyu Aug 4, 2025
26aefaa
feat(ios): activate mirroring app on connect and update example
lhuanyu Aug 4, 2025
c525143
refactor(ios): reorganize iOS-related files and remove obsolete examples
lhuanyu Aug 4, 2025
ccabe2a
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 5, 2025
694f85a
refactor(ios): remove unused ios-input-test.ts
lhuanyu Aug 5, 2025
628c243
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 5, 2025
3a81145
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 5, 2025
56d7b9f
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 6, 2025
7eaa252
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 6, 2025
87357d3
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 7, 2025
b3036c4
Merge branch 'main' into ios
lhuanyu Aug 8, 2025
07eb8f4
feat(ios): clear input before typing and remove unused scripts
lhuanyu Aug 8, 2025
0ec3f18
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 9, 2025
928e54c
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 11, 2025
ecc739d
feat(ios): introduce interactive playground with auto-detection
lhuanyu Aug 11, 2025
6546227
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 12, 2025
3451621
Merge branch 'main' into ios
lhuanyu Aug 12, 2025
f688d51
Merge branch 'main' into ios
lhuanyu Aug 13, 2025
b4e5828
Merge branch 'main' into ios
lhuanyu Aug 14, 2025
1908f1d
feat(ios): implement action space for iOS devices
lhuanyu Aug 14, 2025
fdb12a7
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 15, 2025
8663b85
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 15, 2025
f175408
refactor(ios): use resizeImgBuffer instead of resizeImg
lhuanyu Aug 15, 2025
9fecc80
refactor(ios): improve code quality and formatting
lhuanyu Aug 15, 2025
ae9853b
build(ios-playground): configure explicit input entries
lhuanyu Aug 15, 2025
fc1c8a8
feat(ios): improve automation server stability and logging
lhuanyu Aug 16, 2025
4085537
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 18, 2025
41e1fb9
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 19, 2025
027b208
refactor(ios): improve device actions and validation
lhuanyu Aug 19, 2025
6dbcf0e
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 20, 2025
840e67b
feat(ios): use createImgBase64ByFormat for screenshots
lhuanyu Aug 20, 2025
6d0722f
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 21, 2025
e7b4bf7
Merge branch 'main' into ios
lhuanyu Aug 22, 2025
9f6daf6
refactor(ios): fix ios issues for new agent type
lhuanyu Aug 22, 2025
b779eb3
refactor(ios): revert mdx changes
lhuanyu Aug 22, 2025
a528762
refactor(ios): simplify scrolling and update tsconfig
lhuanyu Aug 24, 2025
07c426a
refactor(ios): turn the debug mode of python server on by default
lhuanyu Aug 24, 2025
c8ec483
Merge branch 'main' into ios
lhuanyu Aug 25, 2025
1a6d104
chore(build): remove root tsconfig.json
lhuanyu Aug 25, 2025
188409f
test(web-integration): remove ios configuration test
lhuanyu Aug 25, 2025
f02737a
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 26, 2025
db9c9ea
Merge branch 'web-infra-dev:main' into ios
lhuanyu Aug 28, 2025
e2e3848
refactor(ios): rename iOSDevice import and move related code
lhuanyu Aug 28, 2025
cc75e92
Merge branch 'main' into ios
lhuanyu Aug 29, 2025
ef5009a
feat(ios-playground): integrate action space management and update de…
lhuanyu Aug 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions apps/site/docs/en/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,27 @@ tasks:
- aiAssert: The results show weather information
```

Or, to drive an iOS device automation task (requires PyAutoGUI server setup and device mirroring):

```yaml
ios:
serverPort: 1412
mirrorConfig:
mirrorX: 100
mirrorY: 200
mirrorWidth: 400
mirrorHeight: 800

tasks:
- name: Search for weather
flow:
- ai: Open Safari browser
- ai: Navigate to bing.com
- ai: Search for "today's weather"
- sleep: 3000
- aiAssert: The results show weather information
```

Run the script:

```bash
Expand Down Expand Up @@ -177,6 +198,37 @@ android:
output: <path-to-output-file>
```

### The `ios` part

```yaml
ios:
# PyAutoGUI server port, optional, defaults to 1412.
serverPort: <port>

# PyAutoGUI server URL, optional, defaults to http://localhost:1412.
serverUrl: <url>

# Whether to automatically dismiss keyboard after input, optional, defaults to false.
autoDismissKeyboard: <boolean>

# iOS device mirroring configuration for precise targeting operations
mirrorConfig:
# X position of the mirror on the host display
mirrorX: <number>
# Y position of the mirror on the host display
mirrorY: <number>
# Width of the mirror
mirrorWidth: <number>
# Height of the mirror
mirrorHeight: <number>

# The launch URL or app, optional, defaults to the device's current page.
launch: <url>

# The path to the JSON file for outputting aiQuery/aiAssert results, optional.
output: <path-to-output-file>
```

### The `tasks` part

The `tasks` part is an array that defines the steps of the script. Remember to add a `-` before each step to indicate it's an array item.
Expand Down Expand Up @@ -304,6 +356,11 @@ The command-line tool provides several options to control the execution behavior
- `--web.viewportWidth <width>`: Sets the browser viewport width, which will override the `web.viewportWidth` parameter in all script files.
- `--web.viewportHeight <height>`: Sets the browser viewport height, which will override the `web.viewportHeight` parameter in all script files.
- `--android.deviceId <device-id>`: Sets the Android device ID, which will override the `android.deviceId` parameter in all script files.
- `--ios.serverPort <port>`: Sets the iOS PyAutoGUI server port, which will override the `ios.serverPort` parameter in all script files.
- `--ios.mirrorX <x>`: Sets the iOS mirror X position, which will override the `ios.mirrorConfig.mirrorX` parameter in all script files.
- `--ios.mirrorY <y>`: Sets the iOS mirror Y position, which will override the `ios.mirrorConfig.mirrorY` parameter in all script files.
- `--ios.mirrorWidth <width>`: Sets the iOS mirror width, which will override the `ios.mirrorConfig.mirrorWidth` parameter in all script files.
- `--ios.mirrorHeight <height>`: Sets the iOS mirror height, which will override the `ios.mirrorConfig.mirrorHeight` parameter in all script files.
- `--dotenv-debug`: Sets the debug log for dotenv, disabled by default.
- `--dotenv-override`: Sets whether dotenv overrides global environment variables with the same name, disabled by default.

Expand Down
57 changes: 57 additions & 0 deletions apps/site/docs/zh/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,27 @@ tasks:
- aiAssert: 结果显示天气信息
```

或者驱动 iOS 设备的自动化任务(需要设置 PyAutoGUI 服务器和设备镜像)

```yaml
ios:
serverPort: 1412
mirrorConfig:
mirrorX: 100
mirrorY: 200
mirrorWidth: 400
mirrorHeight: 800

tasks:
- name: 搜索天气
flow:
- ai: 打开 Safari 浏览器
- ai: 导航到 bing.com
- ai: 搜索 "今日天气"
- sleep: 3000
- aiAssert: 结果显示天气信息
```

运行脚本

```bash
Expand Down Expand Up @@ -177,6 +198,37 @@ android:
output: <path-to-output-file>
```

### `ios` 部分

```yaml
ios:
# PyAutoGUI 服务器端口,可选,默认 1412
serverPort: <port>

# PyAutoGUI 服务器 URL,可选,默认 http://localhost:1412
serverUrl: <url>

# 输入后是否自动隐藏键盘,可选,默认 false
autoDismissKeyboard: <boolean>

# iOS 设备镜像配置,用于精确定位操作
mirrorConfig:
# 镜像在主显示器上的 X 位置
mirrorX: <number>
# 镜像在主显示器上的 Y 位置
mirrorY: <number>
# 镜像的宽度
mirrorWidth: <number>
# 镜像的高度
mirrorHeight: <number>

# 启动 URL 或应用,可选,默认使用设备当前页面
launch: <url>

# 输出 aiQuery/aiAssert 结果的 JSON 文件路径,可选
output: <path-to-output-file>
```

### `tasks` 部分

`tasks` 部分是一个数组,定义了脚本执行的步骤。记得在每个步骤前添加 `-` 符号,表明这些步骤是个数组。
Expand Down Expand Up @@ -308,6 +360,11 @@ midscene './scripts/**/*.yaml'
- `--web.viewportWidth <width>`: 设置浏览器视口宽度,这将覆盖所有脚本文件中的 `web.viewportWidth` 参数。
- `--web.viewportHeight <height>`: 设置浏览器视口高度,这将覆盖所有脚本文件中的 `web.viewportHeight` 参数。
- `--android.deviceId <device-id>`: 设置安卓设备 ID,这将覆盖所有脚本文件中的 `android.deviceId` 参数。
- `--ios.serverPort <port>`: 设置 iOS PyAutoGUI 服务器端口,这将覆盖所有脚本文件中的 `ios.serverPort` 参数。
- `--ios.mirrorX <x>`: 设置 iOS 镜像 X 位置,这将覆盖所有脚本文件中的 `ios.mirrorConfig.mirrorX` 参数。
- `--ios.mirrorY <y>`: 设置 iOS 镜像 Y 位置,这将覆盖所有脚本文件中的 `ios.mirrorConfig.mirrorY` 参数。
- `--ios.mirrorWidth <width>`: 设置 iOS 镜像宽度,这将覆盖所有脚本文件中的 `ios.mirrorConfig.mirrorWidth` 参数。
- `--ios.mirrorHeight <height>`: 设置 iOS 镜像高度,这将覆盖所有脚本文件中的 `ios.mirrorConfig.mirrorHeight` 参数。
- `--dotenv-debug`: 设置 dotenv 的 debug 日志,默认关闭。
- `--dotenv-override`: 设置 dotenv 是否覆盖同名的全局环境变量,默认关闭。

Expand Down
112 changes: 112 additions & 0 deletions examples/README-iOS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# iOS YAML Automation Examples

This directory contains examples of using Midscene.js with iOS devices through YAML configuration files.

## Prerequisites

1. **PyAutoGUI Server**: You need to have a PyAutoGUI server running on your system to communicate with the iOS device.
2. **iOS Device Mirroring**: Your iOS device should be mirrored to your computer screen (using tools like QuickTime Player, AirServer, or similar).
3. **Midscene CLI**: Install the Midscene CLI tool: `npm install -g @midscene/cli`

## Configuration

### Basic iOS Configuration

```yaml
ios:
# Server configuration (required for iOS automation)
serverPort: 1412
serverUrl: "http://localhost:1412"

# Mirror configuration (required for precise targeting)
mirrorConfig:
mirrorX: 100 # X position of the mirrored iOS screen
mirrorY: 200 # Y position of the mirrored iOS screen
mirrorWidth: 414 # Width of the mirrored screen
mirrorHeight: 896 # Height of the mirrored screen
```

### Optional Configuration

```yaml
ios:
# Auto dismiss keyboard after input (optional)
autoDismissKeyboard: true

# Launch URL or app when starting (optional)
launch: "https://example.com"

# Output file for results (optional)
output: "./results.json"
```

## Examples

### 1. Simple iOS Test (`ios-yaml-example.yaml`)

A basic example showing iOS automation with Safari browser interaction.

### 2. Comprehensive Example (`ios-comprehensive-example.yaml`)

A more complex example demonstrating:
- Safari navigation
- Search functionality
- Data extraction
- Settings app interaction
- Home screen operations

### 3. Configuration File (`ios-config.yaml`)

Shows how to use a configuration file to set global iOS settings for multiple test scripts.

## Running the Examples

### Single Script

```bash
# Run a single iOS automation script
midscene ./ios-yaml-example.yaml
```

### Multiple Scripts with Configuration

```bash
# Run multiple scripts using a configuration file
midscene --config ./ios-config.yaml
```

### Command Line Options

You can override iOS settings from the command line:

```bash
# Override mirror settings
midscene --ios.mirrorX 150 --ios.mirrorY 250 ./ios-yaml-example.yaml

# Override server port
midscene --ios.serverPort 1413 ./ios-yaml-example.yaml
```

## Mirror Configuration Setup

1. **Connect your iOS device** to your computer
2. **Enable mirroring** (e.g., using QuickTime Player's "New Movie Recording" and select your iOS device)
3. **Measure the mirror position and size** on your computer screen
4. **Update the mirrorConfig** values in your YAML file:
- `mirrorX` and `mirrorY`: Top-left corner coordinates of the mirrored screen
- `mirrorWidth` and `mirrorHeight`: Dimensions of the mirrored screen

## Tips

- Make sure the PyAutoGUI server is running before executing the scripts
- Adjust the `sleep` durations based on your device's performance
- Test the mirror configuration with simple actions first
- Use descriptive prompts in `aiAction` for better AI understanding
- The `aiAssert` statements help verify that actions completed successfully

## Troubleshooting

- **Connection issues**: Verify the PyAutoGUI server is running on the specified port
- **Targeting issues**: Double-check your mirror configuration coordinates
- **Performance issues**: Increase sleep durations between actions
- **Recognition issues**: Use more descriptive text in your AI prompts
38 changes: 38 additions & 0 deletions examples/ios-input-example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# iOS automation with YAML script example
# This example shows how to automate iOS devices using PyAutoGUI server

ios:
# PyAutoGUI server configuration
serverUrl: "http://localhost:1412"

# Auto dismiss keyboard after input (optional)
autoDismissKeyboard: false

# iOS device mirroring configuration for precise location targeting
# These values define the position and size of the mirrored device screen
mirrorConfig:
mirrorX: 692 # X position of iOS mirror on computer screen
mirrorY: 161 # Y position of iOS mirror on computer screen
mirrorWidth: 344 # Width of the mirrored iOS screen
mirrorHeight: 764 # Height of the mirrored iOS screen (iPhone 11 Pro size)

# Output file for aiQuery/aiAssert results (optional)
output: "./results.json"

tasks:
- name: Open music app and search Coldplay
flow:
- sleep: 5000
- aiAction: "打开音乐应用"
- sleep: 2000
- aiTap: "搜索按钮"
- sleep: 3000
- aiInput: "Coldplay"
locate: "Search box"
- sleep: 2000
- aiKeyboardPress: "Enter"
- sleep: 3000
- aiWaitFor: "Search results are displayed"
- aiAction: "随机播放一首歌曲"
- sleep: 3000
- aiAction: "返回Home"
82 changes: 82 additions & 0 deletions examples/ios-input-test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/usr/bin/env tsx
/**
* iOS Input Test - Demonstrates the improved iOS input functionality
*
* This test shows how the iOS input system now automatically handles:
* - Element focusing by tapping
* - Content clearing with cmd+a and delete
* - Optimized typing with proper intervals for iOS keyboards
* - Automatic keyboard dismissal
*
* The beauty is that it all happens transparently - no special iOS methods needed!
*/

import { agentFromPyAutoGUI } from '../packages/ios/src/agent';
import type { iOSDeviceOpt } from '../packages/ios/src/page';

const sleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));

async function testIOSInput() {
console.log('🚀 Starting iOS Input Test...');

// Configure for iOS device mirroring - adjust these coordinates for your setup
const options: iOSDeviceOpt = {
serverPort: 1412,
autoDismissKeyboard: true,
iOSMirrorConfig: {
mirrorX: 692, // X position of iOS mirror on screen
mirrorY: 161, // Y position of iOS mirror on screen
mirrorWidth: 344, // Width of the mirrored iOS screen
mirrorHeight: 764, // Height of the mirrored iOS screen
},
};

try {
// Create agent - this will automatically start the PyAutoGUI server if needed
const agent = await agentFromPyAutoGUI(options);
console.log('✅ iOS Agent created successfully');

// Test 1: Simple text input
console.log('\n📝 Test 1: Simple text input using aiInput');
await agent.aiInput('Hello iOS!', 'search box or text field');
await sleep(2000);

// Test 2: Email input with special characters
console.log('\n📧 Test 2: Email input with special characters');
await agent.aiInput('[email protected]', 'email input field');
await sleep(2000);

// Test 3: Multi-word text with spaces
console.log('\n📄 Test 3: Multi-word text input');
await agent.aiInput(
'This is a longer text message with spaces',
'text area or message field',
);
await sleep(2000);

// Test 4: Numbers and symbols
console.log('\n🔢 Test 4: Numbers and symbols');
await agent.aiInput('Password123!@#', 'password field');
await sleep(2000);

// Test 5: Clear and replace existing text
console.log('\n🔄 Test 5: Clear and replace existing text');
await agent.aiInput('', 'input field'); // Clear the field
await sleep(1000);
await agent.aiInput('New replacement text', 'same input field');
await sleep(2000);

console.log('\n✅ All iOS input tests completed successfully!');
console.log('🎉 The iOS input system is working properly with:');
console.log(' - Automatic element focusing');
console.log(' - Smart content clearing');
console.log(' - Optimized typing intervals');
console.log(' - Automatic keyboard dismissal');
} catch (error) {
console.error('❌ iOS Input Test failed:', error);
process.exit(1);
}
}

// Run the test
testIOSInput().catch(console.error);
Loading