Skip to content

Commit 2b3e491

Browse files
authored
feat(ios): support iOS devices by using WebDriver (#1236)
* feat(ios): implement iOS device and agent management * feat(ios): switch from simctl to idb for screenshot and gesture automation fix(tests): update error handling and improve test actions for iOS settings * feat(ios): enhance text input handling and update test for keyboard behavior * feat(ios): refactor code structure for improved readability and maintainability * refactor(ios): remove WebDriverAgent-specific utilities and streamline iOS device management * feat(ios): enhance iOS automation setup and testing * feat(ios): improve key press handling for iOS keyboard interactions * feat(android, ios): add playground scripts and improve device management for Android and iOS * feat(ios): update playground script to enable debugging and enhance iOS key handling * feat(docs): add iOS integration sections for WebDriverAgent in English and Chinese documentation * test(ios): enhance iOS testing suite and improve package structure * feat(ios): enhance URL launching capabilities and improve documentation for iOS integration * feat(ios): refactor exec usage to execFile for improved command execution in utils and WDA backend * feat(ios): update session capabilities to connect to active app and disable idle wait * feat(ios): update documentation and code to use deviceId instead of udid for iOS integration * refactor(ios): update iOS package structure tests to use default parameters * feat(ios): refactor code structure for improved readability and maintainability * refactor(tests): remove unused import of globalConfigManager in agent.test.ts * test(ios): refactor code structure for improved readability and maintainability * feat(ios): implement swipe gesture to dismiss keyboard in IOSWebDriverClient * feat(ios): enhance iOS automation documentation and configuration examples * feat(ios): update keyboard dismissal methods to support key names and improve functionality * test(ios): update keyboard dismissal tests to use swipe gesture and improve error handling * docs(ios): update integration documentation for WebDriver and Midscene, enhancing clarity and structure
1 parent ece8e90 commit 2b3e491

File tree

80 files changed

+5645
-142
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+5645
-142
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ midscene_run/dump
113113
extension_output
114114
.cursor
115115
packages/android-playground/static/
116+
packages/ios-playground/static/
117+
packages/ios/static/
116118
packages/playground/static/
117119
.cursor
118120
CLAUDE.md

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ English | [简体中文](./README.zh.md)
4444
### Web & Mobile App & Any Interface
4545
- **Web Automation 🖥️**: Either integrate with [Puppeteer](https://midscenejs.com/integrate-with-puppeteer.html), [Playwright](https://midscenejs.com/integrate-with-playwright.html) or use [Bridge Mode](https://midscenejs.com/bridge-mode-by-chrome-extension.html) to control your desktop browser.
4646
- **Android Automation 📱**: Use [Javascript SDK](https://midscenejs.com/integrate-with-android.html) with adb to control your local Android device.
47+
- **iOS Automation 🍎**: Use [Javascript SDK](https://midscenejs.com/integrate-with-ios.html) with iOS Simulator to control your local iOS devices and simulators.
4748
- **Any Interface Automation 🌐**: Use [Javascript SDK](https://midscenejs.com/integrate-with-any-interface.html) to control your own interface.
4849

4950
### Tools
@@ -135,6 +136,7 @@ We would like to thank the following projects:
135136
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) for the open-source VL model Qwen2.5-VL.
136137
- [scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.
137138
- [appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.
139+
- [appium-webdriveragent](https://github.com/appium/WebDriverAgent) for the javascript operate XCTest。
138140
- [YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.
139141
- [Puppeteer](https://github.com/puppeteer/puppeteer) for browser automation and control.
140142
- [Playwright](https://github.com/microsoft/playwright) for browser automation and control and testing.

README.zh.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ for (const record of recordList) {
136136
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) 用于开源的视觉语言模型 Qwen2.5-VL。
137137
- [scrcpy](https://github.com/Genymobile/scrcpy)[yume-chan](https://github.com/yume-chan) 允许我们使用浏览器控制 Android 设备。
138138
- [appium-adb](https://github.com/appium/appium-adb) 用于 javascript 桥接 adb。
139+
- [appium-webdriveragent](https://github.com/appium/WebDriverAgent) 用于 javascript 操作 XCTest。
139140
- [YADB](https://github.com/ysbing/YADB) 用于提高文本输入的兼容性。
140141
- [Puppeteer](https://github.com/puppeteer/puppeteer) 用于浏览器自动化与控制。
141142
- [Playwright](https://github.com/microsoft/playwright) 用于浏览器自动化与控制和测试。

apps/android-playground/rsbuild.config.ts

Lines changed: 7 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
import fs from 'node:fs';
21
import path from 'node:path';
2+
import { createPlaygroundCopyPlugin } from '@midscene/shared';
33
import { defineConfig } from '@rsbuild/core';
44
import { pluginLess } from '@rsbuild/plugin-less';
55
import { pluginNodePolyfill } from '@rsbuild/plugin-node-polyfill';
@@ -8,34 +8,6 @@ import { pluginSvgr } from '@rsbuild/plugin-svgr';
88
import { pluginTypeCheck } from '@rsbuild/plugin-type-check';
99
import { version as playgroundVersion } from '../../packages/playground/package.json';
1010

11-
const copyAndroidPlaygroundStatic = () => ({
12-
name: 'copy-android-playground-static',
13-
setup(api) {
14-
api.onAfterBuild(async () => {
15-
const srcDir = path.join(__dirname, 'dist');
16-
const destDir = path.join(
17-
__dirname,
18-
'..',
19-
'..',
20-
'packages',
21-
'android-playground',
22-
'static',
23-
);
24-
const faviconSrc = path.join(__dirname, 'src', 'favicon.ico');
25-
const faviconDest = path.join(destDir, 'favicon.ico');
26-
27-
await fs.promises.mkdir(destDir, { recursive: true });
28-
// Copy directory contents recursively
29-
await fs.promises.cp(srcDir, destDir, { recursive: true });
30-
// Copy favicon
31-
await fs.promises.copyFile(faviconSrc, faviconDest);
32-
33-
console.log(`Copied build artifacts to ${destDir}`);
34-
console.log(`Copied favicon to ${faviconDest}`);
35-
});
36-
},
37-
});
38-
3911
export default defineConfig({
4012
environments: {
4113
web: {
@@ -82,7 +54,12 @@ export default defineConfig({
8254
pluginNodePolyfill(),
8355
pluginLess(),
8456
pluginSvgr(),
85-
copyAndroidPlaygroundStatic(),
57+
createPlaygroundCopyPlugin(
58+
path.join(__dirname, 'dist'),
59+
path.join(__dirname, '../../packages/android-playground/static'),
60+
'copy-android-playground-static',
61+
path.join(__dirname, 'src', 'favicon.ico'),
62+
),
8663
pluginTypeCheck(),
8764
],
8865
});

apps/playground/rsbuild.config.ts

Lines changed: 13 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
import fs from 'node:fs';
21
import path from 'node:path';
2+
import { createPlaygroundCopyPlugin } from '@midscene/shared';
33
import { defineConfig } from '@rsbuild/core';
44
import { pluginLess } from '@rsbuild/plugin-less';
55
import { pluginNodePolyfill } from '@rsbuild/plugin-node-polyfill';
@@ -8,41 +8,24 @@ import { pluginSvgr } from '@rsbuild/plugin-svgr';
88
import { pluginTypeCheck } from '@rsbuild/plugin-type-check';
99
import { version as playgroundVersion } from '../../packages/playground/package.json';
1010

11-
const copyWebPlaygroundStatic = () => ({
12-
name: 'copy-playground-static',
13-
setup(api) {
14-
api.onAfterBuild(async () => {
15-
const srcDir = path.join(__dirname, 'dist');
16-
const destDir = path.join(
17-
__dirname,
18-
'..',
19-
'..',
20-
'packages',
21-
'playground',
22-
'static',
23-
);
24-
const faviconSrc = path.join(__dirname, 'src', 'favicon.ico');
25-
const faviconDest = path.join(destDir, 'favicon.ico');
26-
27-
await fs.promises.mkdir(destDir, { recursive: true });
28-
// Copy directory contents recursively
29-
await fs.promises.cp(srcDir, destDir, { recursive: true });
30-
// Copy favicon
31-
await fs.promises.copyFile(faviconSrc, faviconDest);
32-
33-
console.log(`Copied build artifacts to ${destDir}`);
34-
console.log(`Copied favicon to ${faviconDest}`);
35-
});
36-
},
37-
});
38-
3911
export default defineConfig({
4012
plugins: [
4113
pluginReact(),
4214
pluginLess(),
4315
pluginNodePolyfill(),
4416
pluginSvgr(),
45-
copyWebPlaygroundStatic(),
17+
createPlaygroundCopyPlugin(
18+
path.join(__dirname, 'dist'),
19+
path.join(__dirname, '../../packages/playground/static'),
20+
'copy-playground-static',
21+
path.join(__dirname, 'src', 'favicon.ico'),
22+
),
23+
createPlaygroundCopyPlugin(
24+
path.join(__dirname, 'dist'),
25+
path.join(__dirname, '../../packages/ios/static'),
26+
'copy-ios-playground-static',
27+
path.join(__dirname, 'src', 'favicon.ico'),
28+
),
4629
pluginTypeCheck(),
4730
],
4831
resolve: {

apps/site/docs/en/automate-with-scripts-in-yaml.mdx

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,22 @@ tasks:
8080
- aiAssert: The results show weather information
8181
```
8282

83+
Or, to drive an iOS device automation task (requires WebDriverAgent configuration):
84+
85+
```yaml
86+
ios:
87+
# launch: com.apple.mobilesafari
88+
wdaPort: 8100
89+
90+
tasks:
91+
- name: Search for weather
92+
flow:
93+
- ai: Open the browser and navigate to bing.com
94+
- ai: Search for "today's weather"
95+
- sleep: 3000
96+
- aiAssert: The results show weather information
97+
```
98+
8399
Run the script:
84100

85101
```bash
@@ -94,7 +110,7 @@ You will see the script's execution progress and the visual report file.
94110

95111
Script files use YAML format to describe automation tasks. It defines the target to be manipulated (like a webpage or an Android app) and the series of steps to perform.
96112

97-
A standard `.yaml` script file includes a `web` or `android` section to configure the environment, and a `tasks` section to define the automation tasks.
113+
A standard `.yaml` script file includes a `web`, `android`, or `ios` section to configure the environment, and a `tasks` section to define the automation tasks.
98114

99115
```yaml
100116
web:
@@ -177,6 +193,29 @@ android:
177193
output: <path-to-output-file>
178194
```
179195

196+
### The `ios` part
197+
198+
```yaml
199+
ios:
200+
# WebDriverAgent port, optional, defaults to 8100.
201+
wdaPort: <port>
202+
203+
# WebDriverAgent host address, optional, defaults to localhost.
204+
wdaHost: <host>
205+
206+
# Whether to auto dismiss keyboard, optional, defaults to false.
207+
autoDismissKeyboard: <boolean>
208+
209+
# Launch URL or app bundle ID, optional, defaults to the device's current page.
210+
launch: <url-or-bundle-id>
211+
212+
# The path to the JSON file for outputting aiQuery/aiAssert results, optional.
213+
output: <path-to-output-file>
214+
215+
# Whether to save log content to a JSON file, optional, defaults to `false`. If true, saves to `unstableLogContent.json`. If a string, saves to the specified path. The log content structure may change in the future.
216+
unstableLogContent: <boolean | path-to-unstable-log-file>
217+
```
218+
180219
### The `tasks` part
181220

182221
The `tasks` part is an array that defines the steps of the script. Remember to add a `-` before each step to indicate it's an array item.
@@ -352,6 +391,8 @@ The command-line tool provides several options to control the execution behavior
352391
- `--web.viewportWidth <width>`: Sets the browser viewport width, which will override the `web.viewportWidth` parameter in all script files.
353392
- `--web.viewportHeight <height>`: Sets the browser viewport height, which will override the `web.viewportHeight` parameter in all script files.
354393
- `--android.deviceId <device-id>`: Sets the Android device ID, which will override the `android.deviceId` parameter in all script files.
394+
- `--ios.wdaPort <port>`: Sets the WebDriverAgent port, which will override the `ios.wdaPort` parameter in all script files.
395+
- `--ios.wdaHost <host>`: Sets the WebDriverAgent host address, which will override the `ios.wdaHost` parameter in all script files.
355396
- `--dotenv-debug`: Sets the debug log for dotenv, disabled by default.
356397
- `--dotenv-override`: Sets whether dotenv overrides global environment variables with the same name, disabled by default.
357398

apps/site/docs/en/integrate-with-android.mdx

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,13 @@ Integrate Vitest for testing: [https://github.com/web-infra-dev/midscene-example
1717

1818
<SetupEnv />
1919

20-
## Step 1. Install dependencies
20+
## Integrate Midscene
21+
22+
### Step 1: Install dependencies
2123

2224
<PackageManagerTabs command="install @midscene/android --save-dev" />
2325

24-
## Step 2. Write scripts
26+
### Step 2: Write scripts
2527

2628
Let's take a simple example: search for headphones on eBay using the browser in the Android device. (Of course, you can also use any other apps on the Android device.)
2729

@@ -72,7 +74,7 @@ Promise.resolve(
7274
);
7375
```
7476

75-
## Step 3. Run
77+
### Step 3: Run
7678

7779
Using `tsx` to run
7880

@@ -96,11 +98,13 @@ After a while, you will see the following output:
9698
]
9799
```
98100

99-
## Step 4: View the report
101+
### Step 4: View the report
100102

101103
After the above command executes successfully, the console will output: `Midscene - report file updated: /path/to/report/some_id.html`. You can open this file in a browser to view the report.
102104

103-
## `AndroidDevice` constructor
105+
## Constructor and Interface
106+
107+
### `AndroidDevice` Constructor
104108

105109
The AndroidDevice constructor supports the following parameters:
106110

@@ -114,11 +118,11 @@ The AndroidDevice constructor supports the following parameters:
114118
- `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` - Optional, when should Midscene invoke [yadb](https://github.com/ysbing/YADB) to input texts. (Default: 'always-yadb')
115119
- `displayId?: number` - Optional, the display id to use. (Default: undefined, means use the current display)
116120

117-
## More interfaces in AndroidAgent
121+
### Additional Android Agent Interfaces
118122

119123
Except the common agent interfaces in [API Reference](./api.mdx), AndroidAgent also provides some other interfaces:
120124

121-
### `agent.launch()`
125+
#### `agent.launch()`
122126

123127
Launch a webpage or native page.
124128

@@ -149,7 +153,7 @@ await agent.launch('com.android.settings'); // open a native page
149153
await agent.launch('com.android.settings/.Settings'); // open a native page
150154
```
151155

152-
### `agentFromAdbDevice()`
156+
#### `agentFromAdbDevice()`
153157

154158
Create a AndroidAgent from a connected adb device.
155159

@@ -180,7 +184,7 @@ const agent = await agentFromAdbDevice('s4ey59'); // create a AndroidAgent from
180184
const agent = await agentFromAdbDevice(); // no deviceId, use the first connected device
181185
```
182186

183-
### `getConnectedDevices()`
187+
#### `getConnectedDevices()`
184188

185189
Get all connected Android devices.
186190

@@ -216,7 +220,7 @@ console.log(devices);
216220
const agent = await agentFromAdbDevice(devices[0].udid);
217221
```
218222

219-
## Provide custom actions
223+
## Extending Custom Interaction Actions
220224

221225
Use the `customActions` option to extend the agent's action space with your own actions defined via `defineAction`. When provided, these actions will be appended to the built-in ones so the agent can call them during planning.
222226

apps/site/docs/en/integrate-with-any-interface.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ We have prepared a demo project for you to learn how to define your own interfac
3030

3131
* [Android (adb) Agent](https://github.com/web-infra-dev/midscene/blob/main/packages/android/src/device.ts) - This is the Android (adb) Agent for Midscene that implements this feature
3232

33+
* [iOS (WebDriverAgent) Agent](https://github.com/web-infra-dev/midscene/blob/main/packages/ios/src/device.ts) - This is the iOS (WebDriverAgent) Agent for Midscene that implements this feature
34+
3335
There are also some community projects that use this feature:
3436

3537
* [midscene-ios](https://github.com/lhuanyu/midscene-ios) - A project driving the OSX "iPhone Mirroring" app with Midscene

0 commit comments

Comments
 (0)