Desktop

Web

Mobile

Laptop

Workstation

Wearable

Released April 2026 · ZJU-REAL

Build, Evaluate, and Deploy
GUI Agents

A unified research framework covering the complete GUI agent lifecycle — from online RL training and rigorous evaluation to real-device deployment.

View on GitHub 🤗 ClawGUI-2B Model ▶ Watch Demos

17.1

MobileWorld SR (ours)

95.8%

Benchmark Reproduction Rate

Evaluation Benchmarks

12+

Supported Chat Platforms

Architecture

Three modules.
One complete pipeline.

ClawGUI addresses the three tightly coupled problems in GUI agent research — training, measurement, and deployment — in a single unified framework.

ClawGUI-RL · Build

Online RL Training

Train GUI agents with scalable reinforcement learning using parallel Docker-based Android emulators or real physical devices.

Parallel multi-environment with Docker Android
GiGPO + PRM for fine-grained step-level rewards
Real-device training with same API
Automatic spare server failover
Episode visualization & replay

Read documentation →

ClawGUI-Eval · Evaluate

Standardized Evaluation

A reliable measurement baseline for GUI grounding research with a 95.8% faithful reproduction rate across official benchmarks.

6 benchmarks: ScreenSpot-Pro, ScreenSpot-V2, UIVision, MMBench-GUI, OSWorld-G, AndroidControl
11+ models including Qwen3-VL, UI-TARS, MAI-UI
Dual backend: local GPU or remote API
Multi-GPU & multi-thread with auto resume

Read documentation →

ClawGUI-Agent · Deploy

Real-Device Deployment

Control Android, HarmonyOS, or iOS devices with natural language from 12+ chat platforms. Run full benchmark pipelines with a single sentence — no scripts needed.

Cross-platform: Android (ADB), HarmonyOS (HDC), iOS (XCTest)
12+ chat platforms: WeChat, DingTalk, Slack, and more
One-command evaluation: say "benchmark qwen3vl on screenspot-pro" and it handles everything

Reawd documentation →

Performance

Empirical results on
MobileWorld.

ClawGUI-2B is trained end-to-end with ClawGUI-RL and GiGPO, achieving a 54% improvement over the baseline on MobileWorld GUI-Only tasks.

MobileWorld GUI-Only Success Rate — ClawGUI-2B vs Baseline

ClawGUI-2B

17.1

Baseline

11.1

+54%

improvement over baseline on GUI-Only tasks

17.1 SR (ours) vs 11.1 SR (baseline)

Task type: GUI-Only · Benchmark: MobileWorld

Eval Reproduction Rate

95.8%

faithful reproduction of official benchmark results across 6 benchmarks and 11+ models

Evaluation

Comprehensive benchmark
coverage.

ClawGUI-Eval covers 6 major GUI grounding benchmarks with a standardized Infer → Judge → Metric pipeline.

Official

Ours (reproduced)

— No official baseline

🔓 Open-Source Models

Model	ScreenSpot-Pro		ScreenSpot-V2		UIVision		MMBench-GUI		OSWorld-G
Model	Official	Ours	Official	Ours	Official	Ours	Official	Ours	Official	Ours
UI-Venus 1.5-2B	57.70	58.82	92.80	93.24	44.80	43.82	80.30	81.19	59.40	58.97
UI-Venus 1.5-8B	68.40	67.68	95.90	95.83	46.50	45.88	88.10	87.79	69.70	69.98
MAI-UI-2B	57.40	57.94	92.50	92.30	30.30	29.68	82.60	82.80	52.00	54.17
MAI-UI-8B	65.80	64.07	95.20	94.34	40.70	40.23	88.80	88.81	60.10	63.23
GUI-G2	47.50	47.75	93.30	93.32	—	25.99	—	79.33	—	58.63
StepGUI-4B	60.00	59.14	93.60	91.98	—	29.90	84.00	83.03	66.90	65.69
GUI-Owl 1.5-8B	71.10	70.08	93.70	93.55	—	36.70	82.52	82.33	65.80	64.12

🔒 Closed-Source Models

Model	ScreenSpot-Pro
Model	Official	Ours
Gemini 3.1 Pro (Zoom)	—	85.01
Gemini 3.0 Pro (Zoom)	72.70	75.08
Seed 1.8 (Zoom)	73.10	72.80

Reproduced with ClawGUI-Eval · Overall reproduction rate 95.8% across 48 benchmark×model pairs · Full results in technical report.

Demo

See ClawGUI-Agent
in action.

ClawGUI-Agent handles complex, multi-step tasks on real mobile devices — controlled entirely through natural language.

ClawGUI-Agent

Controls a real phone via natural language to complete complex multi-step tasks

ClawGUI-RL

Trains a GUI agent with online reinforcement learning in parallel Android environments

Roadmap

What's coming next.

✅

ClawGUI-Agent

Natural language phone control via 12+ platforms

✅

ClawGUI-RL

Scalable mobile online RL with GiGPO + PRM

✅

ClawGUI-Eval

6 benchmarks, 95%+ reproduction rate

✅

ClawGUI-2B

2B agent trained with GiGPO, 17.1 MobileWorld SR

🔲

On-device Agent

Deploy directly on real phones for privacy

🔲

Desktop Online RL

Extend RL training to desktop environments

🔲

Web Online RL

Extend RL training to web environments

🔲

Real-time RL

OPD-based real-time reinforcement learning

Build, Evaluate, and Deploy
GUI Agents