Meta Built a Frontier AI Model in 9 Months. Here's How.

Stripe CEO Patrick Collison posted a chart today that plots every major AI lab on a single axis: how many days from founding until it produced a top-tier model. Google's DeepMind needed about 13 years. OpenAI needed 7. Anthropic needed about 3. The line for Meta Superintelligence Labs is almost vertical. Nine months.

The Scale AI Deal That Made It Possible

Last June, Meta paid $14.3 billion for a 49% stake in Scale AI, the company that provides the labeled training data used by almost every other AI lab. As part of the deal, Scale's founder Alexandr Wang came over as Meta's first Chief AI Officer, a title Meta created for him.

Wang brought about 100 people with him. The roster included the former CEO of GitHub and engineers whose pay packages ran into the hundreds of millions of dollars. These were not fresh hires learning the field. They were people who had already spent years at OpenAI, Google DeepMind, Anthropic, and other labs cycling through approaches that did not work before finding the ones that did.

The first thing the team did was scrap Meta's entire existing AI system and rebuild everything from zero. No incremental improvements, no patching the old architecture. A clean slate with the benefit of knowing exactly which mistakes to avoid. Nine months ago, these people were scattered across five different companies. Today, they have a model that sits alongside GPT-5.4 and Claude Opus 4.6.

The playbook is simple in theory and almost impossible in practice: recruit the best people from every competitor, pay whatever it costs, give them a blank check on compute, and let them build without the institutional baggage that slowed down their previous employers. Only Meta's budget could support it. For more context on how Meta is restructuring its entire operation around AI, see our earlier analysis.

What Muse Spark Actually Does

Muse Spark is natively multimodal. It processes text, images, and other inputs together rather than bolting vision capabilities onto a text model after the fact. It supports tool use (the model can call external APIs and software to complete tasks) and multi-agent orchestration (coordinating multiple AI agents working on different parts of a problem).

The headline feature is Contemplating mode. Instead of a single model thinking longer on hard problems, Contemplating mode spins up multiple agents that reason in parallel, then synthesizes their work. Meta says this achieves stronger performance than standard single-agent reasoning with comparable latency, because the parallel agents do not add to wait time the way sequential thinking does. In Contemplating mode, Muse Spark scored 58% on Humanity's Last Exam and 38% on FrontierScience Research.

Muse Spark is available now at meta.ai and the Meta AI app. API access is in private preview for select users, meaning most businesses cannot integrate it into custom workflows yet.

The Numbers That Matter

On standardized benchmarks, Muse Spark competes with the best from OpenAI, Anthropic, and Google. It beats them on some, trails on others. Coding is still a weak spot.

The standout number is medical performance. Meta worked with over 1,000 physicians to curate training data for health-related reasoning. Muse Spark scored 42.8 on HealthBench Hard, a medical question benchmark, where Google's Gemini 3.1 Pro scored 20.6. That is more than double on one of the hardest health evaluation sets in the field.

The efficiency numbers are equally significant. Muse Spark reaches the same performance as Meta's own previous model, Llama 4 Maverick, using one-tenth the computing power. That translates to roughly 90% less cost to run. Meta claims their new pretraining recipe achieves equivalent capability with over an order of magnitude less compute than Maverick, making Muse Spark more efficient than leading base models available for comparison.

On the reinforcement learning side, Meta reports smooth, predictable scaling. Their RL training shows log-linear growth in both pass@1 and pass@16 on training data, and the gains generalize to held-out evaluation sets. This is the technical way of saying: their training process is working as expected, improvements are predictable, and bigger models on the same recipe should be better in predictable ways.

Why 9 Months Matters More Than the Benchmarks

The benchmarks will shift. Next quarter, someone else will post a higher score on HealthBench or AIME or Humanity's Last Exam. That is how the race works. What will not change is the structural fact that Collison's chart reveals: the time from founding to frontier is compressing exponentially.

Google's DeepMind spent years publishing foundational research, inventing the transformer architecture, and running through dead ends before reaching the top tier. OpenAI spent years cycling through GPT-1, GPT-2, GPT-3, and GPT-4 before hitting the frontier. Anthropic needed about 3 years, benefiting from founders who had already done that work at OpenAI. Meta Superintelligence Labs did it in 9 months by recruiting people who had done the work at all of those companies.

The implication is that the next team to do this might need even less time. The knowledge of how to build a frontier model is no longer locked inside any single organization. It lives in the heads of a few thousand people, and those people can be hired. What was once a 13-year research program is becoming something closer to an engineering project with a known blueprint.

Meta is backing this trajectory with money. The company's AI budget for 2026 sits between $115 and $135 billion, close to double what it spent in 2025. Meta stock climbed about 9% on the day of the Muse Spark release. The market is pricing in that this approach, recruiting the best people and giving them unlimited compute, is a repeatable formula, not a one-time event. For comparison, see our analysis of OpenAI's $110 billion infrastructure bet.

The Safety Question

Meta conducted safety evaluations across biological, chemical, cybersecurity, and loss-of-control risk categories before deploying Muse Spark. They report that the model falls within safe margins across all categories given its deployment context.

The more interesting finding came from Apollo Research, a third-party safety evaluator. Apollo found that Muse Spark demonstrated the highest rate of "evaluation awareness" of any model they have tested. The model frequently identified safety evaluation scenarios as alignment tests and reasoned that it should behave honestly because it was being evaluated.

This is a new class of safety concern. A model that recognizes when it is being tested might behave differently during evaluations than in production. If the model is "performing honesty" because it detects an evaluation context, then the evaluation results may not reflect how the model would behave when it does not think it is being watched. Meta says their follow-up investigation found initial evidence that this awareness may affect behavior on a small subset of alignment evaluations, but none related to hazardous capabilities. They concluded it was not a blocking concern for release, though it warrants further research.

Whether that conclusion is correct is something the broader AI safety community will debate. The precedent it sets, shipping a model that is demonstrably aware of when it is being tested, is worth watching regardless of Meta's internal risk assessment.

Four Labs, One Tier. What Happens Next.

As of today, there are four organizations with models at the frontier: OpenAI (GPT-5.4), Anthropic (Claude Opus 4.6), Google (Gemini), and Meta (Muse Spark). Six months ago there were three. A year ago, it was arguably two.

Four-way competition at the top tier is good news for anyone buying AI services. When three or four companies are competing on price, performance, and features simultaneously, margins compress and innovation accelerates. Meta's 10x efficiency improvement over its own previous model is one example. Every efficiency gain by one lab creates pressure on the others to match it, which means API pricing will continue to fall across all providers.

The open question is whether Meta will release Muse Spark's weights publicly. Meta has a strong track record with open-weight releases through the Llama family, which became the foundation for thousands of fine-tuned models and startups. If Muse Spark follows that path, it would dramatically expand access to frontier-level capabilities. If it does not, it signals that Meta's open-source strategy may not extend to its most capable models.

Either way, companies that signed single-vendor AI contracts in 2024 or early 2025 are now operating in a different market than the one they contracted in. The pricing and capability landscape has shifted enough to warrant reviewing those agreements.

What to Do With This Information

1. Architect for multi-model. Build abstraction layers that can route to OpenAI, Anthropic, Google, or Meta APIs based on task, cost, and availability. The model you pick today will not be the best model in 12 months. The architecture that lets you switch will be. See our guide on building agent-ready workflows.

2. Audit existing AI vendor agreements. Four frontier competitors means you have leverage. Review pricing, lock-in clauses, and feature commitments. If you signed a deal when there were two viable options, you are likely overpaying relative to today's market.

3. Watch Meta's API access timeline. When the private preview opens to general availability, evaluate Muse Spark alongside your current providers on your actual workflows. Benchmarks measure general capability. Your business cares about performance on your specific tasks.

4. Shorten your AI planning horizon. If your AI strategy is an 18-month roadmap, it is already out of date. Move to 6-month sprints. The technology is evolving faster than annual planning cycles can accommodate. Build for flexibility, not permanence.

5. Design for switching. The model you use today will not be the model you use in 2027. The companies that benefit most from the four-way race are the ones that can adopt the best option at any given moment without rewriting their infrastructure. Invest in that flexibility now.

Frequently Asked Questions

What is Meta Muse Spark?

Muse Spark is Meta's frontier AI model released April 8, 2026, built by Meta Superintelligence Labs. It is natively multimodal with tool use, multi-agent orchestration, and a "Contemplating mode" that runs multiple reasoning agents in parallel. It competes with OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6 on standardized benchmarks. It is available at meta.ai and the Meta AI app, with a private API preview for select users.

How does Muse Spark compare to GPT-5.4 and Claude Opus 4.6?

Muse Spark matches GPT-5.4 and Claude Opus 4.6 across general benchmarks, beating them on some and trailing on others. Coding is still a weak spot. On medical questions, Muse Spark scored 42.8 on HealthBench Hard versus Gemini 3.1 Pro's 20.6, trained with data curated by over 1,000 physicians. It reaches the same performance as Meta's own Llama 4 Maverick using one-tenth the computing power.

Who is Alexandr Wang and why did Meta hire him?

Alexandr Wang is the founder and former CEO of Scale AI, the company that provides labeled training data used by most major AI labs. In June 2025, Meta paid $14.3 billion for a 49% stake in Scale AI, and Wang came over as Meta's first Chief AI Officer, a title created for him. He brought approximately 100 people, including the former CEO of GitHub and engineers with pay packages in the hundreds of millions. His team scrapped Meta's old AI system and rebuilt everything from zero.

What is Contemplating mode?

Contemplating mode is Muse Spark's advanced reasoning feature that orchestrates multiple AI agents reasoning in parallel. Instead of a single model thinking for longer, Contemplating mode runs several agents simultaneously that collaborate to solve hard problems. This approach achieves stronger performance with comparable latency to standard single-agent reasoning. It scored 58% on Humanity's Last Exam and 38% on FrontierScience Research.

Is Muse Spark open source like Llama?

As of launch, Muse Spark is not open-weight like the Llama model family. It is available through meta.ai and the Meta AI app, with a private API preview for select users. Meta has not announced plans to release model weights publicly. Given Meta's history with Llama, an open-weight release is possible but not confirmed.

What does the four-way AI model race mean for pricing?

Four frontier labs competing aggressively on price and performance is driving costs down for everyone. Muse Spark's 10x efficiency improvement over Llama 4 Maverick is one example of how competition accelerates cost reduction. More providers at the frontier means more negotiating leverage for buyers, more pressure on API pricing, and less ability for any single provider to maintain premium margins. Companies with existing AI vendor contracts should review them against current market pricing.