You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(ios): support iOS devices by using WebDriver (#1236)
* feat(ios): implement iOS device and agent management
* feat(ios): switch from simctl to idb for screenshot and gesture automation
fix(tests): update error handling and improve test actions for iOS settings
* feat(ios): enhance text input handling and update test for keyboard behavior
* feat(ios): refactor code structure for improved readability and maintainability
* refactor(ios): remove WebDriverAgent-specific utilities and streamline iOS device management
* feat(ios): enhance iOS automation setup and testing
* feat(ios): improve key press handling for iOS keyboard interactions
* feat(android, ios): add playground scripts and improve device management for Android and iOS
* feat(ios): update playground script to enable debugging and enhance iOS key handling
* feat(docs): add iOS integration sections for WebDriverAgent in English and Chinese documentation
* test(ios): enhance iOS testing suite and improve package structure
* feat(ios): enhance URL launching capabilities and improve documentation for iOS integration
* feat(ios): refactor exec usage to execFile for improved command execution in utils and WDA backend
* feat(ios): update session capabilities to connect to active app and disable idle wait
* feat(ios): update documentation and code to use deviceId instead of udid for iOS integration
* refactor(ios): update iOS package structure tests to use default parameters
* feat(ios): refactor code structure for improved readability and maintainability
* refactor(tests): remove unused import of globalConfigManager in agent.test.ts
* test(ios): refactor code structure for improved readability and maintainability
* feat(ios): implement swipe gesture to dismiss keyboard in IOSWebDriverClient
* feat(ios): enhance iOS automation documentation and configuration examples
* feat(ios): update keyboard dismissal methods to support key names and improve functionality
* test(ios): update keyboard dismissal tests to use swipe gesture and improve error handling
* docs(ios): update integration documentation for WebDriver and Midscene, enhancing clarity and structure
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,6 +44,7 @@ English | [简体中文](./README.zh.md)
44
44
### Web & Mobile App & Any Interface
45
45
-**Web Automation 🖥️**: Either integrate with [Puppeteer](https://midscenejs.com/integrate-with-puppeteer.html), [Playwright](https://midscenejs.com/integrate-with-playwright.html) or use [Bridge Mode](https://midscenejs.com/bridge-mode-by-chrome-extension.html) to control your desktop browser.
46
46
-**Android Automation 📱**: Use [Javascript SDK](https://midscenejs.com/integrate-with-android.html) with adb to control your local Android device.
47
+
-**iOS Automation 🍎**: Use [Javascript SDK](https://midscenejs.com/integrate-with-ios.html) with iOS Simulator to control your local iOS devices and simulators.
47
48
-**Any Interface Automation 🌐**: Use [Javascript SDK](https://midscenejs.com/integrate-with-any-interface.html) to control your own interface.
48
49
49
50
### Tools
@@ -135,6 +136,7 @@ We would like to thank the following projects:
135
136
-[Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) for the open-source VL model Qwen2.5-VL.
136
137
-[scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.
137
138
-[appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.
139
+
-[appium-webdriveragent](https://github.com/appium/WebDriverAgent) for the javascript operate XCTest。
138
140
-[YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.
139
141
-[Puppeteer](https://github.com/puppeteer/puppeteer) for browser automation and control.
140
142
-[Playwright](https://github.com/microsoft/playwright) for browser automation and control and testing.
Copy file name to clipboardExpand all lines: apps/site/docs/en/automate-with-scripts-in-yaml.mdx
+42-1Lines changed: 42 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,22 @@ tasks:
80
80
- aiAssert: The results show weather information
81
81
```
82
82
83
+
Or, to drive an iOS device automation task (requires WebDriverAgent configuration):
84
+
85
+
```yaml
86
+
ios:
87
+
# launch: com.apple.mobilesafari
88
+
wdaPort: 8100
89
+
90
+
tasks:
91
+
- name: Search for weather
92
+
flow:
93
+
- ai: Open the browser and navigate to bing.com
94
+
- ai: Search for "today's weather"
95
+
- sleep: 3000
96
+
- aiAssert: The results show weather information
97
+
```
98
+
83
99
Run the script:
84
100
85
101
```bash
@@ -94,7 +110,7 @@ You will see the script's execution progress and the visual report file.
94
110
95
111
Script files use YAML format to describe automation tasks. It defines the target to be manipulated (like a webpage or an Android app) and the series of steps to perform.
96
112
97
-
A standard `.yaml` script file includes a `web`or `android` section to configure the environment, and a `tasks` section to define the automation tasks.
113
+
A standard `.yaml` script file includes a `web`, `android`, or `ios` section to configure the environment, and a `tasks` section to define the automation tasks.
98
114
99
115
```yaml
100
116
web:
@@ -177,6 +193,29 @@ android:
177
193
output: <path-to-output-file>
178
194
```
179
195
196
+
### The `ios` part
197
+
198
+
```yaml
199
+
ios:
200
+
# WebDriverAgent port, optional, defaults to 8100.
201
+
wdaPort: <port>
202
+
203
+
# WebDriverAgent host address, optional, defaults to localhost.
204
+
wdaHost: <host>
205
+
206
+
# Whether to auto dismiss keyboard, optional, defaults to false.
207
+
autoDismissKeyboard: <boolean>
208
+
209
+
# Launch URL or app bundle ID, optional, defaults to the device's current page.
210
+
launch: <url-or-bundle-id>
211
+
212
+
# The path to the JSON file for outputting aiQuery/aiAssert results, optional.
213
+
output: <path-to-output-file>
214
+
215
+
# Whether to save log content to a JSON file, optional, defaults to `false`. If true, saves to `unstableLogContent.json`. If a string, saves to the specified path. The log content structure may change in the future.
Let's take a simple example: search for headphones on eBay using the browser in the Android device. (Of course, you can also use any other apps on the Android device.)
27
29
@@ -72,7 +74,7 @@ Promise.resolve(
72
74
);
73
75
```
74
76
75
-
## Step 3. Run
77
+
###Step 3: Run
76
78
77
79
Using `tsx` to run
78
80
@@ -96,11 +98,13 @@ After a while, you will see the following output:
96
98
]
97
99
```
98
100
99
-
## Step 4: View the report
101
+
###Step 4: View the report
100
102
101
103
After the above command executes successfully, the console will output: `Midscene - report file updated: /path/to/report/some_id.html`. You can open this file in a browser to view the report.
102
104
103
-
## `AndroidDevice` constructor
105
+
## Constructor and Interface
106
+
107
+
### `AndroidDevice` Constructor
104
108
105
109
The AndroidDevice constructor supports the following parameters:
106
110
@@ -114,11 +118,11 @@ The AndroidDevice constructor supports the following parameters:
114
118
-`imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` - Optional, when should Midscene invoke [yadb](https://github.com/ysbing/YADB) to input texts. (Default: 'always-yadb')
115
119
-`displayId?: number` - Optional, the display id to use. (Default: undefined, means use the current display)
116
120
117
-
##More interfaces in AndroidAgent
121
+
### Additional Android Agent Interfaces
118
122
119
123
Except the common agent interfaces in [API Reference](./api.mdx), AndroidAgent also provides some other interfaces:
120
124
121
-
### `agent.launch()`
125
+
####`agent.launch()`
122
126
123
127
Launch a webpage or native page.
124
128
@@ -149,7 +153,7 @@ await agent.launch('com.android.settings'); // open a native page
149
153
awaitagent.launch('com.android.settings/.Settings'); // open a native page
150
154
```
151
155
152
-
### `agentFromAdbDevice()`
156
+
####`agentFromAdbDevice()`
153
157
154
158
Create a AndroidAgent from a connected adb device.
155
159
@@ -180,7 +184,7 @@ const agent = await agentFromAdbDevice('s4ey59'); // create a AndroidAgent from
180
184
const agent =awaitagentFromAdbDevice(); // no deviceId, use the first connected device
Use the `customActions` option to extend the agent's action space with your own actions defined via `defineAction`. When provided, these actions will be appended to the built-in ones so the agent can call them during planning.
Copy file name to clipboardExpand all lines: apps/site/docs/en/integrate-with-any-interface.mdx
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,8 @@ We have prepared a demo project for you to learn how to define your own interfac
30
30
31
31
*[Android (adb) Agent](https://github.com/web-infra-dev/midscene/blob/main/packages/android/src/device.ts) - This is the Android (adb) Agent for Midscene that implements this feature
32
32
33
+
*[iOS (WebDriverAgent) Agent](https://github.com/web-infra-dev/midscene/blob/main/packages/ios/src/device.ts) - This is the iOS (WebDriverAgent) Agent for Midscene that implements this feature
34
+
33
35
There are also some community projects that use this feature:
34
36
35
37
*[midscene-ios](https://github.com/lhuanyu/midscene-ios) - A project driving the OSX "iPhone Mirroring" app with Midscene
0 commit comments