AI-powered Android phone automation using Claude Agent SDK + AutoGLM-Phone-9B.
Control your Android phone with natural language commands, featuring iOS remote control, intelligent stuck detection, and user preference learning.
- Natural Language Control - "Open Taobao and search for headphones" → Agent handles it
- Claude Agent SDK Orchestration - Task planning, decomposition, and error recovery
- AutoGLM-Phone-9B - Zhipu's vision model for phone UI understanding and control
- iOS Remote Control - View Android screen and control via touch on iOS device (~50-80ms latency)
- Stuck Detection - 7 heuristic detectors automatically identify and recover from failures
- User Preferences - Learns and remembers your preferences for personalized automation
- Python 3.11+
- Android device with USB debugging enabled
- API Keys:
- Zhipu API key for AutoGLM (required)
- Anthropic API key for Claude Agent SDK (required)
# Clone repository
git clone https://github.com/charfeng1/mobile-glm.git
cd mobile-glm
# Install dependencies (automatically installs Open-AutoGLM from GitHub)
uv sync
# Or using pip
pip install -e .# Copy environment template
cp .env.example .env
# Edit .env and add your API keys:
# ZHIPU_API_KEY=your_zhipu_api_key_here
# ANTHROPIC_API_KEY=your_anthropic_api_key_hereNote: You don't need to use Claude's cloud models! The system works with any Anthropic-compatible API. For example, you can use Zhipu's GLM models through their Anthropic-compatible endpoint by setting:
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.7This allows you to use local or alternative models while keeping the same interface.
# Connect via USB and enable USB debugging
adb devices
# You should see your device listed# Start the interactive agent CLI
uv run python agent_sdk.py
# Example commands:
# - "Open Settings"
# - "Search for restaurants on Meituan"
# - "Turn on airplane mode"User Command ("Open Taobao and search for headphones")
│
▼
┌─────────────────────────────────────────┐
│ Claude Agent SDK (Orchestrator) │
│ • Understands natural language │
│ • Plans multi-step tasks │
│ • Calls phone_task tool │
│ • Handles errors and retries │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ phone_tool.py (Execution Layer) │
│ • Step limiting & safety checks │
│ • Action/app allowlists │
│ • 7 stuck detection heuristics │
│ • Session management │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ AutoGLM-Phone-9B API (Vision Model) │
│ • Screenshot analysis │
│ • UI element recognition │
│ • Action planning & execution │
└─────────────────────────────────────────┘
│
▼
Android Device (via ADB)
mobile-glm/
├── agent_sdk.py # Claude Agent SDK orchestrator
├── phone_tool.py # AutoGLM execution wrapper
├── preference_tool.py # User preference storage
├── scrcpy_ws_bridge.py # WebSocket H.264 bridge (for iOS viewer)
├── security/ # Prompt injection defense & image filtering
├── MobileGLM-iOS/ # iOS remote viewer app
├── .env.example # Environment template
└── pyproject.toml # Python dependencies
View and control your Android phone directly from an iOS device with real-time screen streaming and touch control.
iOS Device (View & Touch)
│
│ WebSocket Connection
│ ├─ Receive: H.264 video stream
│ └─ Send: Touch events (x, y, action)
│
▼
WebSocket Bridge (scrcpy_ws_bridge.py)
│ Runs on your computer
│
├─ Video Encoding: Android screen → H.264 → iOS
└─ Touch Forwarding: iOS touch → ADB commands → Android
│
▼
Android Device
├─ MediaProjection: Screen capture
├─ MediaCodec: H.264 hardware encoding
└─ ADB: Receives and executes touch commands
Performance Benefits:
- iOS VideoToolbox hardware H.264 decoding: ~5ms
- Metal rendering: ~8ms
- Touch latency: ~8ms
- Total latency: ~50-80ms (vs ~100-150ms for web-based solutions)
-
Download scrcpy-server (one-time setup):
# Download from https://github.com/Genymobile/scrcpy/releases # Place scrcpy-server.jar in project root
-
Start WebSocket bridge:
uv run python scrcpy_ws_bridge.py # Server runs on ws://0.0.0.0:8765 -
Build iOS app:
open MobileGLM-iOS/MobileGLM.xcodeproj
In Xcode:
- Select your development team (Signing & Capabilities)
- Connect your iOS device or select a simulator
- Click the Run button (⌘R)
First-time use:
- Enter the IP address of the computer running the bridge
- Enter the port (default 8765)
- Tap Connect
Tip: Run
ifconfig | grep "inet "on Mac to find your local IP address
- Video Stream: H.264 NAL units over WebSocket
- Control: JSON commands for touch/gestures
{"type": "touch", "action": "down", "x": 0.5, "y": 0.5} {"type": "home"} {"type": "back"}
from agent_sdk import TelemetryAgentSDK
# Initialize agent
agent = TelemetryAgentSDK()
# Execute task with natural language
result = agent.invoke("Open Settings and turn on airplane mode")
print(result['final_response'])from phone_tool import phone_task
# Execute single task
result = phone_task(
goal="Open the Settings app",
max_steps=5,
)
print(result) # JSON with status, steps_taken, etc.The system automatically detects when the phone agent gets stuck:
- Repeated failed app launches (2+ failures)
- Repetitive actions (same action 4+ times in last 5 steps)
- Too many steps (15+ steps without completion)
- Infinite loops (same screen state repeating)
When stuck, the system returns guidance requests to the orchestrator for recovery.
- Prompt injection defense - Detects and blocks malicious instructions in screenshots
- Image filtering - Preprocesses screenshots to remove low-contrast injection attempts
- Sensitive screen detection - Automatically stops on login/payment screens
- Action allowlists - Restrict which actions can be executed
- App allowlists - Restrict which apps can be launched
- Python 3.11+
- Android device with ADB
- uv or pip
- Zhipu API key (for AutoGLM)
- Anthropic API key (for Claude)
MIT
- Built with Claude Agent SDK
- Powered by AutoGLM-Phone-9B from Zhipu AI
- Uses Open-AutoGLM SDK
Contributions welcome! Please feel free to submit issues and pull requests.