Feb 19, 2025 1:01 PM
•
Author:
Yanan Xie
At Orby AI, we are thrilled to share groundbreaking news: ActIO / UGround has achieved state-of-the-art (SOTA) performance on the ScreenSpot benchmark, surpassing models from industry leaders such as Google DeepMind, Anthropic, and OpenAI.
Breaking New Ground in GUI Grounding
ActIO / UGround, developed in collaboration with the OSU NLP Group, has set a new benchmark in GUI visual grounding, achieving an impressive 89.4% accuracy on ScreenSpot. This performance establishes ActIO / UGround as the leading model for GUI understanding, outperforming Google DeepMind’s Project Mariner / Gemini (84.0%), Anthropic’s Sonnet 3.5 (82.9%), and OpenAI’s GPT-4o (18.3%).
The Innovation Behind ActIO / UGround
Since its initial release in August 2024, ActIO / UGround has continuously evolved, integrating cutting-edge visual grounding techniques to enhance GUI agent performance. ActIO / UGround’s exceptional cross-platform generalization on the challenging ScreenSpot-Pro benchmark makes this achievement even more remarkable—despite using no desktop training data.
Recognized at ICLR 2025
We are also proud to announce that our research paper on UGround has been accepted to ICLR 2025, receiving outstanding scores of 10, 8, 8, and 5. This recognition highlights the impact of our work in advancing the field of GUI automation and agentic AI.
Open-Sourcing for the Community
As strong advocates for open research, we are open-sourcing the model weights and the curated training datasets to further drive progress in this domain. The broader research community will have access to these resources to continue pushing the boundaries of GUI agent capabilities.
Acknowledging Our Team and Collaborators
This milestone would not have been possible without the incredible collaboration between Orby AI and the OSU NLP Group. A special shoutout to our lead author, Boyu Gou, and our dedicated teammates, Demi Ruohan Wang, Boyuan Zheng, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su, Gang Li, and Yining Mao, for their tireless contributions.
Explore the Research
We invite the AI and research community to explore our work, access the models, and contribute to the future of GUI automation.
📖 Learn more and access the resources here: https://osu-nlp-group.github.io/UGround/
🚀 Stay tuned for more updates as we continue to redefine the possibilities of AI-driven automation!