Skip to content

Commit 8692277

Browse files
quanruyuyutaotao
andauthored
docs(site): Add support for iOS automation and enhance documentation (#1253)
* docs(site): refactor code structure for improved readability and maintainability * docs(changelog): update changelog for v0.29 release with iOS support and enhancements * docs(site): add support for iOS automation and enhance documentation * docs(changelog): separate Qwen3-VL model adaptation into its own section for clarity * docs(core): update changelog * docs(site): update automation documentation for iOS and Android, including demo projects and limitations * test(puppeteer): remove exclusive focus from Sauce Demo test * fix(docs): standardize capitalization in changelog and rspress configuration --------- Co-authored-by: yutao <[email protected]>
1 parent a76235e commit 8692277

28 files changed

+1116
-324
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ English | [简体中文](./README.zh.md)
1212
</div>
1313

1414
<p align="center">
15-
Open-source AI Operator for Web, Android, Automation & Testing
15+
Open-source AI Operator for Web, Android, iOS, Automation & Testing
1616
</p>
1717

1818
<p align="center">
@@ -61,6 +61,7 @@ English | [简体中文](./README.zh.md)
6161

6262
- **[Chrome Extension](https://midscenejs.com/quick-experience.html)**: Start in-browser experience immediately through [the Chrome Extension](https://midscenejs.com/quick-experience.html), without writing any code.
6363
- **[Android Playground](https://midscenejs.com/quick-experience-with-android.html)**: There is also a built-in Android playground to control your local Android device.
64+
- **[iOS Playground](https://midscenejs.com/quick-experience-with-ios.html)**: There is also a built-in iOS playground to control your local iOS device.
6465

6566
## ✨ Model Choices
6667

@@ -148,7 +149,7 @@ If you use Midscene.js in your research or project, please cite:
148149
```bibtex
149150
@software{Midscene.js,
150151
author = {Xiao Zhou, Tao Yu, YiBing Lin},
151-
title = {Midscene.js: Your AI Operator for Web, Android, Automation & Testing.},
152+
title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
152153
year = {2025},
153154
publisher = {GitHub},
154155
url = {https://github.com/web-infra-dev/midscene}

README.zh.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
### Web & Mobile App & 任意界面
4545
- **Web 自动化 🖥️**: 可以[与 Puppeteer 集成](https://midscenejs.com/integrate-with-puppeteer.html)[与 Playwright 集成](https://midscenejs.com/integrate-with-playwright.html)或使用[桥接模式](https://midscenejs.com/bridge-mode-by-chrome-extension.html)来控制桌面浏览器。
4646
- **Android 自动化 📱**: 使用 [Javascript SDK](https://midscenejs.com/integrate-with-android.html) 配合 adb 来控制本地 Android 设备。
47+
- **iOS 自动化 🍎**: 使用 [Javascript SDK](https://midscenejs.com/zh/integrate-with-ios.html) 配合 WebDriverAgent 来控制本地 iOS 设备。
4748
- **任意界面自动化 🌐**: 使用 [Javascript SDK](https://midscenejs.com/integrate-with-any-interface.html) 来控制你自己的界面。
4849

4950
### 工具
@@ -60,6 +61,7 @@
6061

6162
- **[Chrome 插件](https://midscenejs.com/zh/quick-experience.html)**: 通过 [Chrome 插件](https://midscenejs.com/zh/quick-experience.html) 立即开始体验,无需编写代码。
6263
- **[Android Playground](https://midscenejs.com/zh/quick-experience-with-android.html)**: 内置的 Android Playground 可以控制你的本地 Android 设备。
64+
- **[iOS Playground](https://midscenejs.com/zh/quick-experience-with-ios.html)**: 内置的 iOS Playground 可以控制你的本地 iOS 设备。
6365

6466
## ✨ 选择 AI 模型
6567

@@ -148,7 +150,7 @@ for (const record of recordList) {
148150
```bibtex
149151
@software{Midscene.js,
150152
author = {Xiao Zhou, Tao Yu, YiBing Lin},
151-
title = {Midscene.js: Your AI Operator for Web, Android, Automation & Testing.},
153+
title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
152154
year = {2025},
153155
publisher = {GitHub},
154156
url = {https://github.com/web-infra-dev/midscene}

apps/playground/src/App.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ export default function App() {
141141
showVersionInfo: true,
142142
enableScrollToBottom: true,
143143
serverMode: true,
144-
showEnvConfigReminder: false,
144+
showEnvConfigReminder: true,
145145
}}
146146
branding={{
147147
title: 'Playground',

apps/site/docs/en/blog-support-android-automation.mdx

Lines changed: 0 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -130,33 +130,3 @@ You can use the playground to experience the Android automation without any code
130130
After the experience, you can integrate with the Android device by javascript code. Please refer to [Integrate with Android(adb)](./integrate-with-android) for more details.
131131

132132
If you prefer the yaml file for automation scripts, please refer to [Automate with scripts in yaml](./automate-with-scripts-in-yaml).
133-
134-
### Demo projects
135-
136-
We have prepared a demo project for javascript SDK:
137-
138-
[JavaScript demo project](https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo)
139-
140-
If you want to use the automation for testing purpose, you can use the javascript with vitest. We have setup a demo project for you to see how it works:
141-
142-
[Vitest demo project](https://github.com/web-infra-dev/midscene-example/blob/main/android/vitest-demo)
143-
144-
You can also write the automation scripts by yaml file:
145-
146-
[YAML demo project](https://github.com/web-infra-dev/midscene-example/blob/main/android/yaml-scripts-demo)
147-
148-
## Limitations
149-
150-
1. Element location caching is not supported due to lack of XPath support. However, planning cache (for `.ai` and `.aiAction` methods) is fully supported to improve execution efficiency.
151-
2. LLMs like gpt-4o or deepseek are not supported. Only some known vl models with visual grounding ability are supported for now. If you want to introduce other vl models, please let us know.
152-
3. The performance is not good enough for now. We are still working on it.
153-
4. The vl model may not perform well on `.aiQuery` and `.aiAssert`. We will give a way to switch model for different kinds of tasks.
154-
5. Due to some security restrictions, you may got a blank screenshot for the password input and Midscene will not be able to work for that.
155-
156-
## Credits
157-
158-
We would like to thank the following projects:
159-
160-
- [scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.
161-
- [appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.
162-
- [YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Support iOS automation
2+
3+
From Midscene v0.29, we are happy to announce the support for iOS automation. The era for AI-driven iOS automation is here!
4+
5+
## Showcases
6+
7+
### Auto-like tweets
8+
9+
Open Twitter and auto-like the first tweet by [@midscene_ai](https://x.com/midscene_ai).
10+
11+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/ios-twitter.mp4" controls/>
12+
13+
## Suitable for all apps
14+
15+
For our developers, all you need is the WebDriver Server and a visual-language model (vl-model) service. Everything is ready!
16+
17+
Behind the scenes, we utilize the visual grounding capabilities of vl-model to locate target elements on the screen. So, regardless of whether it's a native iOS app, a Safari web page, or a hybrid app with a WebView, it makes no difference. Developers can write automation scripts without the burden of worrying about the technology stack of the app.
18+
19+
## With all the power of Midscene
20+
21+
When using Midscene to do web automation, our users loves the tools like playgrounds and reports. Now, we bring the same power to iOS automation!
22+
23+
### Use the playground to run automation without any code
24+
25+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/ios-playground-demo.mp4" controls/>
26+
27+
### Use the report to replay the whole process
28+
29+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/ios-twitter.mp4" controls/>
30+
31+
### Write the automation scripts by YAML file
32+
33+
Open Safari on iOS device, search for content and extract information.
34+
35+
```yaml
36+
# Open Safari browser on iOS device, search for content and extract information
37+
38+
ios:
39+
deviceId: "iPhone"
40+
bundleId: "com.apple.mobilesafari"
41+
42+
tasks:
43+
- name: search content
44+
flow:
45+
- aiAction: tap address bar
46+
- aiAction: input 'Midscene AI automation'
47+
- aiAction: tap search button
48+
- sleep: 3000
49+
- aiAction: scroll down 500px
50+
51+
- name: extract search results
52+
flow:
53+
- aiQuery: >
54+
{title: string, url: string, description: string}[],
55+
return search result titles, links and descriptions
56+
name: searchResults
57+
58+
- name: verify page elements
59+
flow:
60+
- aiAssert: there is a search results list on the page
61+
```
62+
63+
### Use the JavaScript SDK
64+
65+
Use the javascript SDK to do the automation by code.
66+
67+
```ts
68+
import { IOSAgent, IOSDevice } from '@midscene/ios';
69+
import "dotenv/config"; // read environment variables from .env file
70+
71+
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
72+
Promise.resolve(
73+
(async () => {
74+
// 👀 initialize iOS device
75+
const device = new IOSDevice({
76+
deviceId: 'iPhone',
77+
bundleId: 'com.apple.mobilesafari'
78+
});
79+
80+
// 👀 initialize Midscene agent
81+
const agent = new IOSAgent(device, {
82+
aiActionContext:
83+
'If any permission popup appears, tap allow. If login page pops up, skip it.',
84+
});
85+
86+
await device.connect();
87+
await device.launchApp();
88+
89+
await sleep(3000);
90+
91+
// 👀 tap address bar and input search keywords
92+
await agent.aiAction('tap address bar and input "Midscene automation"');
93+
94+
// 👀 perform search
95+
await agent.aiAction('tap search button');
96+
97+
// 👀 wait for loading to complete
98+
await agent.aiWaitFor("there is at least one search result on the page");
99+
// or you may use a plain sleep:
100+
// await sleep(5000);
101+
102+
// 👀 understand page content, find search results
103+
const results = await agent.aiQuery(
104+
"{title: string, url: string}[], find titles and links in search results list"
105+
);
106+
console.log("search results", results);
107+
108+
// 👀 assert by AI
109+
await agent.aiAssert("relevant search results are displayed on the page");
110+
})()
111+
);
112+
113+
```
114+
115+
### Two style APIs to do interaction
116+
117+
The auto-planning style:
118+
119+
```javascript
120+
await agent.ai('tap address bar and input "Midscene automation", then search');
121+
```
122+
123+
The instant action style:
124+
125+
```javascript
126+
await agent.aiTap('address bar');
127+
await agent.aiInput('Midscene automation', 'address bar');
128+
await agent.aiTap('search button');
129+
```
130+
131+
## Quick start
132+
133+
You can use the playground to experience the iOS automation without any code. Please refer to [Quick experience with iOS](./quick-experience-with-ios) for more details.
134+
135+
After the experience, you can integrate with the iOS device by javascript code. Please refer to [Integrate with iOS(WebDriverAgent)](./integrate-with-ios) for more details.
136+
137+
If you prefer the yaml file for automation scripts, please refer to [Automate with scripts in yaml](./automate-with-scripts-in-yaml).

apps/site/docs/en/changelog.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,26 @@
22

33
> For the complete changelog, please refer to: [Midscene Releases](https://github.com/web-infra-dev/midscene/releases)
44
5+
## v0.29 - 📱 iOS platform support added
6+
7+
### 🚀 iOS platform support added
8+
The biggest highlight of v0.29 is the official introduction of iOS platform support! Now you can connect and automate iOS devices through WebDriver, extending Midscene's powerful AI automation capabilities to the Apple ecosystem, details: [Support iOS automation](./blog-support-ios-automation).
9+
10+
### Qwen3-VL model adaptation
11+
12+
We've adapted the latest Qwen `Qwen3-VL` model, giving developers faster and more accurate visual understanding capabilities. See [Choose an AI model](./choose-a-model).
13+
14+
### 🤖 AI core capability enhancement
15+
16+
- **UI-TARS Model Performance Optimization**: Optimized aiAction planning, improved dialogue history management, and provided better context awareness capabilities
17+
- **AI Assertion and Action Optimization**: We updated the prompt for `aiAssert` and optimized the internal implementation of `aiAction`, making AI-driven assertions and action execution more precise and reliable
18+
19+
### 📊 Reporting and debugging experience optimization
20+
- **URL Parameter Playback Control**: To improve debugging experience, you can now directly control the default behavior of report playback through URL parameters
21+
22+
### 📚 Documentation
23+
- Updated documentation deployment cache strategy to ensure users can access the latest documentation content in time
24+
525
## v0.28 - 📱 Build your own GUI automation agent by integrating with your own interface (preview feature)
626

727
### 🚀 Support for integration with any interface (preview feature)
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
## Preparation
2+
3+
### Install Node.js
4+
5+
Install [Node.js 18 or higher](https://nodejs.org/en/download/).
6+
7+
### Prepare API Key
8+
9+
Prepare an API Key for a visual language (VL) model.
10+
11+
You can find supported models and configurations for Midscene.js in the [Choose a Model](../choose-a-model) documentation.
12+
13+
### Prepare WebDriver Server
14+
15+
Before getting started, you need to set up the iOS development environment:
16+
17+
- macOS (required for iOS development)
18+
- Xcode and Xcode command line tools
19+
- iOS Simulator or real device
20+
21+
#### Environment Configuration
22+
23+
Before using Midscene iOS, you need to prepare the WebDriverAgent service. Please refer to the official documentation for setup:
24+
25+
- **Simulator Configuration**: [Run Prebuilt WDA](https://appium.github.io/appium-xcuitest-driver/5.12/run-prebuilt-wda/)
26+
- **Real Device Configuration**: [Real Device Configuration](https://appium.github.io/appium-xcuitest-driver/5.12/real-device-config/)
27+
28+
#### Verify Environment Configuration
29+
30+
After completing the configuration, you can verify whether the service is working properly by accessing WebDriverAgent's status endpoint:
31+
32+
**Access URL**: `http://localhost:8100/status`
33+
34+
**Correct Response Example**:
35+
```json
36+
{
37+
"value": {
38+
"build": {
39+
"version": "10.1.1",
40+
"time": "Sep 24 2025 18:56:41",
41+
"productBundleIdentifier": "com.facebook.WebDriverAgentRunner"
42+
},
43+
"os": {
44+
"testmanagerdVersion": 65535,
45+
"name": "iOS",
46+
"sdkVersion": "26.0",
47+
"version": "26.0"
48+
},
49+
"device": "iphone",
50+
"ios": {
51+
"ip": "10.91.115.63"
52+
},
53+
"message": "WebDriverAgent is ready to accept commands",
54+
"state": "success",
55+
"ready": true
56+
},
57+
"sessionId": "BCAD9603-F714-447C-A9E6-07D58267966B"
58+
}
59+
```
60+
61+
If you can successfully access this endpoint and receive a similar JSON response as shown above, it indicates that WebDriverAgent is properly configured and running.

apps/site/docs/en/common/start-experience.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Start experiencing
22

3-
After the configuration, you can immediately experience Midscene. There are three main tabs in the extension:
3+
After the configuration, you can immediately experience Midscene. It provides multiple key operation tabs, including but not limited to:
44

55
- **Action**: interact with the web page. This is also known as "Auto Planning". For example:
66
```
@@ -20,12 +20,14 @@ extract the user id from the page, return in \{ id: string \}
2020
the page title is "Midscene"
2121
```
2222

23-
- **Tap**: perform a single tap on the element where you want to click. This is also known as "Instant Action".
23+
- **Tap**: perform a single tap on the element where you want to click. This is also known as "Instant Action".
2424

2525
```
2626
the login button
2727
```
2828

29+
All Agent APIs can be directly debugged and run in the Playground! Interactive, extraction, and verification methods are fully covered, with visual operations and verification that boost your automation development efficiency!
30+
2931
Enjoy !
3032

3133
> For the different between "Auto Planning" and "Instant Action", please refer to the [API](../api.mdx) document.

apps/site/docs/en/index.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Open-source AI Operator for Web, Mobile App, Automation & Testing
1111
### Web or mobile app
1212
- **Web Automation**: Either [integrate with Puppeteer](https://midscenejs.com/integrate-with-puppeteer.html), [with Playwright](https://midscenejs.com/integrate-with-playwright.html) or use [Bridge Mode](https://midscenejs.com/bridge-mode-by-chrome-extension.html) to control your desktop browser.
1313
- **Android Automation**: Use [Javascript SDK](https://midscenejs.com/integrate-with-android.html) with adb to control your local Android device.
14+
- **iOS Automation**: Use [Javascript SDK](https://midscenejs.com/integrate-with-ios.html) with WebDriverAgent to control your local iOS device.
1415

1516
### Tools
1617
- **Visual Reports for Debugging**: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
@@ -44,6 +45,7 @@ We've prepared some showcases for you to learn the use of Midscene.js.
4445

4546
- **[Chrome Extension](https://midscenejs.com/quick-experience.html)**: Start in-browser experience immediately through [the Chrome Extension](https://midscenejs.com/quick-experience.html), without writing any code.
4647
- **[Android Playground](https://midscenejs.com/quick-experience-with-android.html)**: There is also a built-in Android playground to control your local Android device.
48+
- **[iOS Playground](https://midscenejs.com/quick-experience-with-ios.html)**: There is also a built-in iOS playground to control your local iOS device.
4749

4850
## Model choices
4951

@@ -113,6 +115,7 @@ We would like to thank the following projects:
113115
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) for the open-source VL model Qwen2.5-VL.
114116
- [scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.
115117
- [appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.
118+
- [appium-webdriveragent](https://github.com/appium/WebDriverAgent) for the javascript operate XCTest。
116119
- [YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.
117120
- [Puppeteer](https://github.com/puppeteer/puppeteer) for browser automation and control.
118121
- [Playwright](https://github.com/microsoft/playwright) for browser automation and control and testing.

apps/site/docs/en/integrate-with-android.mdx

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,23 @@ After connecting the Android device with adb, you can use Midscene javascript SD
77

88
import { PackageManagerTabs } from '@theme';
99

10-
:::info Demo Project
10+
:::info Demo Projects
1111
Control Android devices with javascript: [https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo](https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo)
1212

1313
Integrate Vitest for testing: [https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo)
1414
:::
1515

16+
:::info Showcases
17+
18+
[More showcases](./blog-support-android-automation.mdx)
19+
20+
<p align="center">
21+
<img src="/android.png" alt="android" width="400" />
22+
</p>
23+
24+
:::
25+
26+
1627
<PrepareAndroid />
1728

1829
<SetupEnv />

0 commit comments

Comments
 (0)