Desktop
Web
Mobile
Laptop
Workstation
Wearable
Released April 2026 · ZJU-REAL

Build, Evaluate, and Deploy
GUI Agents

A unified research framework covering the complete GUI agent lifecycle — from online RL training and rigorous evaluation to real-device deployment.

View on GitHub 🤗 ClawGUI-2B Model ▶ Watch Demos
17.1
MobileWorld SR (ours)
95.8%
Benchmark Reproduction Rate
6
Evaluation Benchmarks
12+
Supported Chat Platforms

Three modules.
One complete pipeline.

ClawGUI addresses the three tightly coupled problems in GUI agent research — training, measurement, and deployment — in a single unified framework.

ClawGUI-RL · Build

Online RL Training

Train GUI agents with scalable reinforcement learning using parallel Docker-based Android emulators or real physical devices.

  • Parallel multi-environment with Docker Android
  • GiGPO + PRM for fine-grained step-level rewards
  • Real-device training with same API
  • Automatic spare server failover
  • Episode visualization & replay
Read documentation →
ClawGUI-Eval · Evaluate

Standardized Evaluation

A reliable measurement baseline for GUI grounding research with a 95.8% faithful reproduction rate across official benchmarks.

  • 6 benchmarks: ScreenSpot-Pro, ScreenSpot-V2, UIVision, MMBench-GUI, OSWorld-G, AndroidControl
  • 11+ models including Qwen3-VL, UI-TARS, MAI-UI
  • Dual backend: local GPU or remote API
  • Multi-GPU & multi-thread with auto resume
Read documentation →
ClawGUI-Agent · Deploy

Real-Device Deployment

Control Android, HarmonyOS, or iOS devices with natural language from 12+ chat platforms. Run full benchmark pipelines with a single sentence — no scripts needed.

  • Cross-platform: Android (ADB), HarmonyOS (HDC), iOS (XCTest)
  • 12+ chat platforms: WeChat, DingTalk, Slack, and more
  • One-command evaluation: say "benchmark qwen3vl on screenspot-pro" and it handles everything
Reawd documentation →

Empirical results on
MobileWorld.

ClawGUI-2B is trained end-to-end with ClawGUI-RL and GiGPO, achieving a 54% improvement over the baseline on MobileWorld GUI-Only tasks.

MobileWorld GUI-Only Success Rate — ClawGUI-2B vs Baseline

ClawGUI-2B
17.1
Baseline
11.1
+54%
improvement over baseline on GUI-Only tasks
17.1 SR (ours) vs 11.1 SR (baseline)
Task type: GUI-Only · Benchmark: MobileWorld

Eval Reproduction Rate

95.8%
faithful reproduction of official benchmark results across 6 benchmarks and 11+ models

Comprehensive benchmark
coverage.

ClawGUI-Eval covers 6 major GUI grounding benchmarks with a standardized Infer → Judge → Metric pipeline.

Official
Ours (reproduced)
No official baseline
🔓 Open-Source Models
Model ScreenSpot-Pro ScreenSpot-V2 UIVision MMBench-GUI OSWorld-G
OfficialOurs OfficialOurs OfficialOurs OfficialOurs OfficialOurs
UI-Venus 1.5-2B 57.70 58.82 92.80 93.24 44.80 43.82 80.30 81.19 59.40 58.97
UI-Venus 1.5-8B 68.40 67.68 95.90 95.83 46.50 45.88 88.10 87.79 69.70 69.98
MAI-UI-2B 57.40 57.94 92.50 92.30 30.30 29.68 82.60 82.80 52.00 54.17
MAI-UI-8B 65.80 64.07 95.20 94.34 40.70 40.23 88.80 88.81 60.10 63.23
GUI-G2 47.50 47.75 93.30 93.32 25.99 79.33 58.63
StepGUI-4B 60.00 59.14 93.60 91.98 29.90 84.00 83.03 66.90 65.69
GUI-Owl 1.5-8B 71.10 70.08 93.70 93.55 36.70 82.52 82.33 65.80 64.12
🔒 Closed-Source Models
Model ScreenSpot-Pro
OfficialOurs
Gemini 3.1 Pro (Zoom) 85.01
Gemini 3.0 Pro (Zoom) 72.70 75.08
Seed 1.8 (Zoom) 73.10 72.80

Reproduced with ClawGUI-Eval · Overall reproduction rate 95.8% across 48 benchmark×model pairs · Full results in technical report.

See ClawGUI-Agent
in action.

ClawGUI-Agent handles complex, multi-step tasks on real mobile devices — controlled entirely through natural language.

ClawGUI-Agent

Controls a real phone via natural language to complete complex multi-step tasks

ClawGUI-RL

Trains a GUI agent with online reinforcement learning in parallel Android environments

End-to-end architecture.

ClawGUI's modular design allows each component to be used independently or as part of the full pipeline.

ClawGUI Architecture

What's coming next.

ClawGUI-Agent

Natural language phone control via 12+ platforms

ClawGUI-RL

Scalable mobile online RL with GiGPO + PRM

ClawGUI-Eval

6 benchmarks, 95%+ reproduction rate

ClawGUI-2B

2B agent trained with GiGPO, 17.1 MobileWorld SR

🔲

On-device Agent

Deploy directly on real phones for privacy

🔲

Desktop Online RL

Extend RL training to desktop environments

🔲

Web Online RL

Extend RL training to web environments

🔲

Real-time RL

OPD-based real-time reinforcement learning

Standing on the shoulders
of great open source.

Ready to build your GUI agent?

Clone the repo and get started with any module independently. Full documentation in each subdirectory.

$ git clone https://github.com/ZJU-REAL/ClawGUI.git
$ cd ClawGUI
Star on GitHub