Visual Grounding
Locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or an instruction.
The Industry's First Agentic-AI Foundation Model Purpose-Built for the Enterprise
ActIO has shown state-of-the-art performance across top GUI agent benchmarks, better than existing multimodal models. These benchmarks cover multiple scenarios, including web, desktop and mobile in both online and offline settings.
In VisualWebBench test, ActIO-7b outperforms top models like, GPT-4o, Gemini 1.5 pro and Llava 1.6-34B.
ActIO also demonstrates state-of-the-art effectiveness and proficiency in supporting GUI agents.
Detailed Large Action Model evaluation results are available on LAMB