Tokenmaxxing Hits a Wall: Uber's AI Spending Reckoning

Few companies have leaned into AI coding as hard as Uber. So it carries weight when two of its most senior executives say, in public, that the spending is getting hard to justify. In a Rapid Response interview, chief operating officer Andrew Macdonald said it was becoming harder to defend Uber's escalating AI expenditures, because higher token usage was not translating into a proportional increase in useful consumer features. The remark followed an April disclosure from chief technology officer Praveen Neppalli Naga that Uber had already burned through its entire 2026 AI coding budget, four months into the year. The industry has a word for the behavior underneath this: tokenmaxxing, the belief that spending more tokens is the same as making more progress. Uber just became the first big company to say, out loud, that it is not.

This matters beyond Uber. The same dynamic is playing out wherever engineering organizations rolled out AI coding tools at speed and let consumption run. The mistake is subtle, because the inputs look like outcomes. Tokens spent, prompts sent, and seats activated all go up and to the right, and it is easy to read those curves as productivity. They are not. They are a fuel gauge, not a speedometer. The useful question is not how much you are consuming. It is how much of what you consume turns into shipped, correct, durable work, and Uber's reckoning is the clearest signal yet that the answer is lower than the spending implies.

What Uber's Executives Actually Said

The two statements land differently, and both are worth reading precisely. Macdonald's point, made in a Rapid Response interview, was not that AI coding is useless. It was that the relationship between spend and benefit had stopped looking linear. Based on conversations with Uber's senior engineering leaders, he said, he had come to see that higher token usage did not produce a proportional increase in the consumer-facing features that justify the cost. That is a careful claim. It does not say the tools failed. It says the marginal dollar stopped pulling its weight, and that the people closest to the work were the ones telling him so.

Neppalli Naga's disclosure, reported by The Information in April, was blunter and more concrete. "I'm back to the drawing board because the budget I thought I would need is blown away already," he said, describing a 2026 AI coding budget that was exhausted four months into the year. The two remarks fit together. The CTO saw the cost line break the plan, and the COO articulated why that was a problem rather than a triumph: the spending was real, the proportional value was not obviously there, and the gap was widening.

What "Tokenmaxxing" Means

Tokenmaxxing is the drive to consume as many AI tokens as possible on the assumption that more usage automatically means more output. The name is a little tongue-in-cheek, but the behavior is genuine and widespread. It took hold because, in the early phase of AI coding adoption, consumption really did correlate with value. When a team goes from zero AI usage to heavy usage, output jumps, and the simplest explanation, more tokens equals more work, holds up well enough to become a habit. Leaders started treating token growth as a leading indicator of an engineering org that was modern and moving fast.

The problem is that the correlation is local, not global. It is strong at the bottom of the curve, where any usage beats none, and it flattens as consumption climbs. Tokenmaxxing keeps optimizing the input long after the output has stopped responding. It is the engineering-budget version of judging a road trip by how much gas you burned. Early on, more gas means more distance. Past a point, you are idling in traffic with the engine running, and the fuel gauge keeps ticking down while the odometer barely moves.

How Uber's AI Coding Budget Blew Up

The numbers explain how a serious company overshot a serious budget so quickly. Uber introduced Anthropic's Claude Code across roughly 5,000 engineers starting in December 2025, and adoption was fast and deep rather than tentative. The depth is the part that drove the cost.

Cost per heavy user ran $500 to $2,000 per month. That range, not the average, is the story. A power-user tail consuming the top of that band can dominate total spend even if most engineers sit far below it, which makes a blended budget estimate dangerously optimistic.

About 95 percent of engineers used AI tools monthly. Adoption was effectively universal, so there was no remaining headroom from rollout to absorb. Every future cost increase would come from deeper usage, not wider usage.

Roughly 70 percent of committed code was AI-generated. The tools were not a sidecar. They were the main road for most of the codebase, which is exactly why the spend mattered and why pulling back is hard.

Around 11 percent of live backend code updates were written entirely by AI agents with no direct human input. That is the leading edge of autonomy, and it is genuinely impressive. It is also the most expensive mode of operation, because agentic workflows loop, retry, and explore, consuming tokens at a rate a human-in-the-loop session never approaches.

Put those together and the budget overshoot stops looking like a forecasting error and starts looking structural. Universal adoption removed the cheap growth. A power-user tail and autonomous agents supplied expensive growth. And nobody had a value metric sitting next to the cost metric to say when the next dollar stopped being worth it. The budget was built on a usage assumption, and usage is precisely the thing that runs away from you when consumption is treated as a virtue.

Why Token Usage Doesn't Equal Value

The intuition that more usage means more output is not crazy. It is just incomplete, because generating code was never the bottleneck. Reviewing it, integrating it, testing it, and confirming it is correct are the bottlenecks, and those costs are paid in human attention, not tokens. When an AI agent produces ten times more code, the organization does not get ten times the value. It gets ten times the review burden, and the value is capped by how much of that output the team can actually verify and ship.

A Jellyfish analysis published in May put data behind the intuition. It found that while token usage does boost raw coding output, extreme consumption delivers diminishing returns. Past a certain level, each additional token buys progressively less usable work. That is the empirical shape of the curve Macdonald described from the inside: strong at first, then flattening, with the heaviest consumers sitting furthest out on the flat part where the marginal token is nearly worthless. Tokenmaxxing optimizes for the part of the curve that has already stopped paying.

This is the same lesson the industry is learning about agents more broadly. We wrote about the shift from copilots to autonomous systems in AI Autopilots vs Copilots, and the more autonomous the system, the more it consumes per unit of supervised output. Autonomy raises the ceiling on what AI can do and raises the cost of finding out whether it did it right. Both move together, and a budget that prices only the first one is going to break.

The Hiring Tradeoff

The budget overshoot is not staying inside the engineering tools line. In May, Business Insider reported that Uber was slowing hiring to help fund its AI investment. That is the moment an experiment becomes a strategy. When a company starts trading headcount for tokens, it has decided, implicitly, that a dollar of AI spend produces more than a dollar of marginal engineer, and it is reallocating the budget to match.

The tradeoff sharpens the measurement problem rather than resolving it. If you are going to fund AI by not hiring, you need to know that the AI is actually delivering the output the unmade hire would have, and the only honest way to know that is to measure outcomes on both sides. Otherwise the company is swapping a known cost with known output (an engineer) for an uncertain cost with unmeasured output (a token budget that already blew past its forecast once). The hiring slowdown raises the stakes on getting the ROI question right, because now the answer determines staffing, not just a software line item.

The Broader Reckoning

Uber is early to say it, not alone in facing it. The tension between escalating AI spend and hard-to-pin-down returns runs through the entire sector right now. The largest infrastructure buyers are spending at a scale that, by some measures, now rivals or exceeds what they return to shareholders, and the public debate has shifted from whether AI is useful to whether the spending is proportional to the benefit. Some executives frame the spend as a deliberate bet on future demand. Others, increasingly, are admitting the returns are harder to locate than the costs.

What makes Uber's case clarifying is that it is not an infrastructure buyer rationalizing capex. It is a consumer of AI coding tools, looking at its own bill, and reporting from the trenches that the heaviest usage is not buying proportional features. That is a different and more credible signal than a vendor or hyperscaler talking its book. It is also a preview of a conversation that every organization scaling AI tooling will have, the moment the budget catches up with the enthusiasm. The agent-to-agent commerce we covered in When Software Buys Software only intensifies this, because autonomous systems that spend on each other's services make consumption even easier to grow and even harder to tie to value.

How to Measure AI Coding ROI

The fix is not to spend less for its own sake. It is to stop scoring AI by consumption and start scoring it by results. Every metric below shares one property: it goes up only when real work gets done, so it cannot be inflated by tokenmaxxing.

Cost per merged pull request. Divide AI spend by the number of pull requests that actually merged, not by lines generated. Code that never merges is pure cost, and this single ratio exposes teams whose token bills are climbing while their merged output is flat.

Cycle time from ticket to production. The promise of AI coding is speed. Measure it directly. If spend is rising and the time from opened ticket to shipped feature is not falling, the tokens are not buying what they are supposed to.

Defect and rework rate on AI-generated code. Output that ships and then breaks is worse than no output, because it consumes tokens twice and burns trust. Track how often AI-generated changes get reverted or hotfixed, and weigh that against the raw volume.

Share of planned features shipped. The COO's actual complaint was about consumer features, not abstract productivity. Tie spend back to the roadmap: of the features you planned this quarter, how many shipped, and did more AI spend move that number?

Per-team token budgets with review of the heavy tail. Because a small group of power users can dominate spend, give each team a budget and look closely at the top consumers. The goal is not to punish them. It is to confirm their output justifies their bill, and to catch the case where it does not.

What Operators Should Do Now

You do not need a budget blowout of your own to act on Uber's. Four steps put you ahead of the curve.

1. Instrument outcomes before you scale spend. Stand up at least one value metric (cost per merged pull request is the easiest) next to your token spend before you expand seats or raise limits. If you cannot see value and cost on the same chart, you are flying on the fuel gauge alone.

2. Cap and review the heavy users. Set a per-engineer or per-team ceiling, and review anyone who hits it. Most of your runaway cost lives in a small tail, and a light-touch review of that tail recovers most of the savings without slowing the broad base of productive users.

3. Tie the AI budget to delivered value, not usage targets. Stop celebrating token growth in reviews. Make the AI line item clear the same bar as any other investment: it has to produce measurable output, and the threshold is a result, not a usage number.

4. Run a measured pilot before the org-wide rollout. Uber adopted fast and broad, then discovered the cost shape afterward. A scoped pilot with the value metrics already wired in tells you the slope of your own diminishing-returns curve before you have committed a year's budget to it. If you are early in this, our guide to agentic AI workflows covers how to scope that first pilot, and the Claude pricing breakdown helps you model the cost side honestly.

None of this is an argument against AI coding. Uber is not abandoning the tools, and neither should anyone else who is getting real work from them. The argument is narrower and more durable: consumption is not a scoreboard. The companies that win the next phase of AI adoption will be the ones that measured value early, capped the spending that was not buying any, and treated their token budget like the capital allocation it actually is. Tokenmaxxing was the easy mistake of the rollout phase. The reckoning Uber just made public is what the discipline phase looks like.

Frequently Asked Questions

What is tokenmaxxing?

Tokenmaxxing is the belief that consuming as many AI tokens as possible is itself a sign of progress, on the assumption that more usage automatically means more output. The term spread through Silicon Valley as engineering teams raced to adopt AI coding tools in late 2025 and early 2026. Uber's chief operating officer Andrew Macdonald put a name to the doubt about it in a Rapid Response interview, noting that based on conversations with senior engineering leaders, higher token usage was not translating into a proportional increase in useful consumer features. The core mistake is treating an input (tokens spent) as if it were an outcome (value delivered).

Did Uber stop using AI coding tools?

No. Uber is questioning the return on its AI spending, not abandoning the tools. Roughly 95 percent of Uber engineers use AI tools monthly, about 70 percent of committed code is AI-generated, and around 11 percent of live backend code updates are written entirely by AI agents without direct human input. The debate is about cost discipline and measurement, not retreat. The company is asking whether escalating token consumption is producing proportional value, which is a very different question from whether AI coding works at all.

How much does AI coding cost per engineer?

At Uber, monthly API costs ranged from roughly $500 to $2,000 per engineer for heavy users of Anthropic's Claude Code, which the company rolled out to about 5,000 engineers starting in December 2025. That spread is the important detail: a small group of power users can drive a large share of total spend. Uber's chief technology officer Praveen Neppalli Naga disclosed in April that the company had already exhausted its entire 2026 AI coding budget, four months into the year, which is what triggered the internal rethink.

How should companies measure AI coding ROI?

Measure outcomes, not consumption. Token counts, prompt volume, and seats activated are inputs that tell you what you spent, not what you got. Better metrics tie spend to delivered value: cost per merged pull request, cycle time from ticket to production, defect and rework rates on AI-generated code, and the share of planned features that actually shipped. Set per-team token budgets, review the heaviest users to confirm their output justifies their spend, and treat any AI investment like any other capital allocation by requiring it to clear a value threshold rather than a usage target.

Does more token usage mean more productivity?

Up to a point, then it stops. A Jellyfish analysis published in May found that while token usage boosts raw coding output, extreme consumption delivers diminishing returns. Past a certain level of spend, each additional token buys progressively less usable work, because the bottleneck shifts from generating code to reviewing it, integrating it, and confirming it is correct. That is the empirical version of what Uber's executives observed in practice: the curve between tokens and value flattens, and the most aggressive consumers are often the furthest out on the flat part.

Tokenmaxxing Hits a Wall: What Uber's AI Spending Reckoning Means for Everyone Else