Orby ActIO
Large Action Model

The Industry's First Agentic-AI Foundation Model Purpose-Built for the Enterprise

ActIO is the industry's first multimodal large action foundation model.

Visual Grounding

Locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or an instruction.

Content Understanding

Comprehend and interpret the meaning, context, and nuances of various forms of content, such as text, images, GUI, and documents. It allows AI agents to perform tasks that require layout and GUI understanding, beyond mere semantic comprehension.

Planning

Process of formulating a sequence of actions or decisions to achieve a specific goal or set of goals. Planning involves reasoning about the future, taking into account the current state of the environment, possible actions, and the outcomes of those actions.

Task Modeling

Understanding sequences of actions taken by users, predicting user intent, and constructing workflows are critical capabilities for enabling AI agents to learn from demonstrations. This involves observing and analyzing the actions users take to achieve specific outcomes, inferring their intentions, then using this intelligence to create automated workflows.

Latest Resources

Peerless performance and accuracy

ActIO has shown state-of-the-art performance across top GUI agent benchmarks, better than existing multimodal models.  These benchmarks cover multiple scenarios, including web, desktop and mobile in both online and offline settings.

Grounding Evaluation in Multiple GUI Enviornments
VisualWebBench Grounding Benchmarks

In VisualWebBench test, ActIO-7b outperforms top models like, GPT-4o, Gemini 1.5 pro and Llava 1.6-34B.
ActIO also demonstrates state-of-the-art effectiveness and proficiency in supporting GUI agents.
Detailed Large Action Model evaluation results are available on LAMB