A unified research framework covering the complete GUI agent lifecycle — from online RL training and rigorous evaluation to real-device deployment.
ClawGUI addresses the three tightly coupled problems in GUI agent research — training, measurement, and deployment — in a single unified framework.
Train GUI agents with scalable reinforcement learning using parallel Docker-based Android emulators or real physical devices.
A reliable measurement baseline for GUI grounding research with a 95.8% faithful reproduction rate across official benchmarks.
Control Android, HarmonyOS, or iOS devices with natural language from 12+ chat platforms. Run full benchmark pipelines with a single sentence — no scripts needed.
ClawGUI-2B is trained end-to-end with ClawGUI-RL and GiGPO, achieving a 54% improvement over the baseline on MobileWorld GUI-Only tasks.
ClawGUI-Eval covers 6 major GUI grounding benchmarks with a standardized Infer → Judge → Metric pipeline.
| Model | ScreenSpot-Pro | ScreenSpot-V2 | UIVision | MMBench-GUI | OSWorld-G | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Official | Ours | Official | Ours | Official | Ours | Official | Ours | Official | Ours | |
| UI-Venus 1.5-2B | 57.70 | 58.82 | 92.80 | 93.24 | 44.80 | 43.82 | 80.30 | 81.19 | 59.40 | 58.97 |
| UI-Venus 1.5-8B | 68.40 | 67.68 | 95.90 | 95.83 | 46.50 | 45.88 | 88.10 | 87.79 | 69.70 | 69.98 |
| MAI-UI-2B | 57.40 | 57.94 | 92.50 | 92.30 | 30.30 | 29.68 | 82.60 | 82.80 | 52.00 | 54.17 |
| MAI-UI-8B | 65.80 | 64.07 | 95.20 | 94.34 | 40.70 | 40.23 | 88.80 | 88.81 | 60.10 | 63.23 |
| GUI-G2 | 47.50 | 47.75 | 93.30 | 93.32 | — | 25.99 | — | 79.33 | — | 58.63 |
| StepGUI-4B | 60.00 | 59.14 | 93.60 | 91.98 | — | 29.90 | 84.00 | 83.03 | 66.90 | 65.69 |
| GUI-Owl 1.5-8B | 71.10 | 70.08 | 93.70 | 93.55 | — | 36.70 | 82.52 | 82.33 | 65.80 | 64.12 |
| Model | ScreenSpot-Pro | |
|---|---|---|
| Official | Ours | |
| Gemini 3.1 Pro (Zoom) | — | 85.01 |
| Gemini 3.0 Pro (Zoom) | 72.70 | 75.08 |
| Seed 1.8 (Zoom) | 73.10 | 72.80 |
Reproduced with ClawGUI-Eval · Overall reproduction rate 95.8% across 48 benchmark×model pairs · Full results in technical report.
ClawGUI-Agent handles complex, multi-step tasks on real mobile devices — controlled entirely through natural language.
Controls a real phone via natural language to complete complex multi-step tasks
Trains a GUI agent with online reinforcement learning in parallel Android environments
ClawGUI's modular design allows each component to be used independently or as part of the full pipeline.
Natural language phone control via 12+ platforms
Scalable mobile online RL with GiGPO + PRM
6 benchmarks, 95%+ reproduction rate
2B agent trained with GiGPO, 17.1 MobileWorld SR
Deploy directly on real phones for privacy
Extend RL training to desktop environments
Extend RL training to web environments
OPD-based real-time reinforcement learning
Clone the repo and get started with any module independently. Full documentation in each subdirectory.