Experiment

Fara

Microsoft’s first agentic small language model specifically designed for computer use. With only 7 billion parameters, Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models.

Try Fara in Foundry Catalog Fara has been released for research purposes.

In 2024, Microsoft introduced small language models (SLMs) to customers, starting with the release of Phi models on Microsoft Foundry. Today, we are pleased to announce Fara-7B, our first agentic SLM designed specifically for computer use, now available on Microsoft Foundry.

Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces such as a mouse and keyboard to perform multi-step tasks on behalf of users. With only 7 billion parameters, Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models. Fara-7B’s small size now makes it possible to run CUA models directly on devices. This results in reduced latency and improved privacy, as user data remains local.

Fara-7B is an experimental release, designed to invite hands-on exploration and feedback from the community. Users can build and test agentic experiences beyond pure research—automating everyday web tasks like filling out forms, searching for information, booking travel, or managing accounts.

Fara-7B operates by visually perceiving a webpage and takes actions like scrolling, typing, and clicking on directly predicted coordinates. It does not rely on separate models to parse the screen, nor on any additional information like accessibility trees, and thus uses the same modalities as humans to interact with the computer.

Fara-7B exhibits strong performance compared to existing models across a diverse set of benchmarks. This includes both existing benchmarks as well as new evaluations we are releasing which cover useful task segments that are underrepresented in common benchmarks. While Fara-7B demonstrates strong benchmark results, even against much larger models, it shares many of their limitations, including challenges with accuracy on more complex tasks, mistakes in following instructions, and susceptibility to hallucinations. These are active areas of research, and we’re committed to ongoing improvements as we learn from real-world use.

Figure 1: Comparing WebVoyager accuracy and cost of Fara-7B to other computer use agents (CUAs) or agents that prompt LLMs with accessibility trees (SoM Agent w/ Ax Tree). Cost is computed by multiplying the average number of input and output tokens each model consumes by price per token. Both Fara-7B and UI-TARS-1.5-7B are based on Qwen-2.5-VL-7B, for which the lowest inference price from https://openrouter.ai/ is $0.2/$0.2 per 1M input/output tokens. Even though both models are priced equally, Fara-7B is more efficient, completing tasks with only ~16 steps on average compared to ~41 for UI-TARS-1.5-7B. OpenAI computer-use-preview accessed November 2025 via the Responses API.

Evaluations

We evaluate Fara and comparable baselines on canonical public benchmarks including WebVoyager, Online-Mind2Web, and Deepshop, as well as a new benchmark we developed named WebTailBench, specifically focusing on 11 real-world task types underrepresented or missing in existing benchmarks like booking movie/event tickets, restaurant reservations, comparing prices across retailers, applying for jobs, finding real estate, and more complex multi-step tasks.

Evaluation of web agents can be tricky because the web is constantly changing, and many websites even block detected bots, which is why we developed a test harness that relies on BrowserBase to standardize how browser sessions are managed. In Table 1 below, we report a notion of task success rate (%) defined by each benchmark’s official LLM-as-judge evaluator; WebTailBench success is computed using the same Task Verification pipeline that filtered our training data. We find that Fara-7B is state-of-the-art, even outperforming native computer-use agents like UI-TARS-1.5-7B, or much larger models like GPT-4o prompted to act like a computer use agent with Set-Of-Marks (SoM Agent).

		WebVoyager	Online-Mind2Web	DeepShop	WebTailBench
		SoM Agent	SoM Agent (GPT-4o)	65.1	34.6	16.0	30.0
GLM-4.1V-9B-Thinking	66.8	SoM Agent	33.9	32.0	22.4
Computer Use Models	OpenAI computer-use-preview	70.9	42.9	24.7	25.7
	UI-TARS-1.5-7B	66.4	31.3	11.6	19.5
	Fara-7B	73.5	34.1	26.2	38.4

Table 1: Performance comparison across four web benchmarks: WebVoyager, Online-Mind2Web, DeepShop, and our newly introduced WebTailBench. Results are reported as Task Succes Rate / Accuracy (%) and are averaged over 3 runs. OpenAI computer-use-preview accessed November 2025 via the Responses API.

Safety

Fara‑7B operates by visually perceiving the browser through screenshots, collecting only what is necessary to complete the user’s requested task. Specifically, the model processes browser screenshots, user task instructions, and a history of actions taken during each session. No additional site data—such as accessibility trees or external scaffolding—is accessed; Fara interacts with the computer in the same way a human would, relying solely on what is visible on the screen.

Transparency and user control are foundational to Fara‑7B’s design. All actions taken by the agent are logged and auditable, allowing users to review and monitor every step. Fara‑7B does not store or transmit screenshots or personal data beyond the session, and no user data is used to train or fine-tune the model.

Looking Forward

Our current release is an experimental CUA model that achieves state-of-the-art results for its size, purely using supervised fine-tuning. We believe even stronger CUA models capable of running on-device are possible through improved multimodal base models and through Reinforcement Learning on live and sandboxed environments.