Meet ActIO, Orby's
Large Action Model

You’ve heard of large language models—now meet Orby’s Large Action Model (LAM).

Unlike language models that generate words, Orby’s LAM generates actions. It’s designed to not just understand your workflows but to execute them, automating complex processes from start to finish.

Think of it as AI that doesn’t just talk—it gets things done, transforming how teams work with speed, precision, and scale.

ActIO Outperforms Google DeepMind, Anthropic, OpenAI

Accurately identifying the right visual element for interaction is crucial for GUI agents to perform tasks effectively in complex environments like enterprise applications. Orby’s proprietary Large Action Model, ActIO, excels in visual grounding and task execution, outperforming industry leaders.


ActIO is the industry's first multimodal large action foundation model.

Visual Grounding

Locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or an instruction.

Content Understanding

Comprehend and interpret the meaning, context, and nuances of various forms of content, such as text, images, GUI, and documents. It allows AI agents to perform tasks that require layout and GUI understanding, beyond mere semantic comprehension.

Planning

Process of formulating a sequence of actions or decisions to achieve a specific goal or set of goals. Planning involves reasoning about the future, taking into account the current state of the environment, possible actions, and the outcomes of those actions.

Task Modeling

Understanding sequences of actions taken by users, predicting user intent, and constructing workflows are critical capabilities for enabling AI agents to learn from demonstrations. This involves observing and analyzing the actions users take to achieve specific outcomes, inferring their intentions, then using this intelligence to create automated workflows.

Latest Resources

Peerless performance and accuracy

ActIO has shown state-of-the-art performance across top GUI agent benchmarks, better than existing multimodal models.  These benchmarks cover multiple scenarios, including web, desktop and mobile in both online and offline settings.

Grounding Evaluation in Multiple GUI Enviornments
VisualWebBench Grounding Benchmarks

In VisualWebBench test, ActIO-7b outperforms top models like, GPT-4o, Gemini 1.5 pro and Llava 1.6-34B.
ActIO also demonstrates state-of-the-art effectiveness and proficiency in supporting GUI agents.
Detailed Large Action Model evaluation results are available on LAMB

Join us in San Francisco for a panel discussion with top finance experts from Visa, Google, and Adobe.