The Token Meter Is Moving Pricing Power To Whichever Model Is Cheapest To Host

Jul 02, 2026

On June 1, 2026, GitHub Copilot switched from flat-rate requests to per-token billing. Every model call now shows up as a line of input tokens, cached-input tokens, and output tokens, priced individually per model, on a bill the user watches accumulate in real time. One month later, on July 1, GitHub made Kimi K2.7 Code generally available in the same product: an Modified MIT-licensed, trillion-parameter open-weight model that GitHub hosts itself on Azure and markets, in its own words, as "a lower-cost option." GitHub didn't wait for Anthropic, OpenAI, or Google to cut their prices. It built the cheaper alternative and put it in the model picker next to theirs.

The sequence matters more than either change alone. GitHub's own billing documentation confirms per-token pricing now applies across input, cached input, and output separately for every model in Copilot, a move GitHub announced as a shift away from flat-rate requests. Usage-based billing converted a subsidized flat fee into a running number that GitHub itself had to look at. And the number it produced pointed at the obvious fix: stop routing every request through a proprietary frontier model when a self-hosted one does the job for less.

JetBrains reached the identical conclusion in the same window, from the supply side instead of the demand side. On June 2 it open-sourced Mellum2, a 12-billion-parameter mixture-of-experts model that activates only 2.5 billion parameters per token, built explicitly, in JetBrains' own words, to cut AI latency and cost for the routing and sub-agent work that would otherwise run through a frontier model. A model publisher gave away a model built specifically to replace paid calls to bigger ones. That's not philanthropy. That's a company reading the same cost curve GitHub just made visible and shipping the escape route before its own customers had to build one.

Microsoft Research made the same bet inside the same window, and it came from the research arm of the company that owns GitHub. MagenticLite, an agentic stack built on 9-billion to 14-billion-parameter models, exists because small-model orchestration is, in Microsoft's own framing, significantly cheaper and faster at scale than defaulting to frontier intelligence at every step. Microsoft's research division shipped MagenticLite in late May, weeks before its own product division metered the frontier model in Copilot. The reason not to use the frontier model by default existed inside Microsoft before the bill that made the case for it.

On the demand side, the number showed up as an actual budget breach. Uber burned through its entire 2026 AI coding allocation by April, with its heaviest users generating $500 to $2,000 a month each in token costs. Uber's response wasn't to negotiate a better rate. It imposed a hard $1,500-per-tool spending cap and stood up a real-time dashboard that puts each engineer's own consumption on screen, the way GitHub now makes its users watch theirs. A frontier-model contract that once ran as an invisible subscription became a resource every engineer had to ration.

Salesforce hit the same wall from the enterprise-contract side. It was tracking toward a $300 million annualized bill to Anthropic and went looking for model-routing software to bring that number down, not for a better relationship with its supplier. Anthropic's own pricing confirms why: Opus fast-mode carries a 2x price premium for output that Anthropic itself advertises as up to 2.5x faster, a ceiling, not a guarantee. The supplier's own numbers show the same non-linear cost curve that GitHub, Uber, and Salesforce are all pricing around from the outside.

None of this required coordination. GitHub metered its bill and undercut it with its own open-weight model in the same product cycle. JetBrains and Microsoft Research shipped the smaller model before their customers demanded it. Uber and Salesforce priced the frontier model out of their own workflows the moment the invoice became visible. The token meter didn't just change how coding assistants bill. It moved pricing power off Anthropic, OpenAI, and Google's roadmap and onto whichever open-weight model is currently cheapest to host.

AI Datum Point

Discussion about this post

Ready for more?