The April every AI plan broke
Five panicked moves in three weeks. One Anthropic exec quote that explained all of them. And the financial engineering reckoning the whole industry has been deferring.
April was a strange month for anyone who’s been tracking AI pricing. I keep a running file of the meaningful packaging and pricing moves from the major labs. By the third week of April my notes for the month had outgrown the page and started spilling into a separate document. Five major announcements, three of the four biggest providers, all in three weeks, all pointing in roughly the same direction. Here is the chronology, in the order it happened.
April 4. Anthropic gives Claude Pro and Max subscribers fewer than 24 hours notice that their subscriptions will no longer power third-party agent harnesses like OpenClaw.
April 9. OpenAI launches a brand new $100/month “Pro 5x” tier targeting heavy Codex users.
April 13. GitHub freezes new Copilot Pro trials over “abuse.”
April 16. Anthropic ships Opus 4.7 with a new tokenizer that uses up to 35% more tokens for the same input text.
April 20. GitHub pauses new individual signups for Copilot Pro, Pro+, and Student plans entirely, and removes Opus models from Pro.
April 21. Anthropic’s pricing page quietly drops Claude Code from Pro. By April 23, after Hacker News spends 12 hours on it, Anthropic walks the change back.
The same April 23, OpenAI ships GPT-5.5 with API prices doubled from $2.50/$15 to $5/$30 per million tokens.
Five panicked moves in three weeks, from three of the four biggest commercial AI providers in the world, with one common thread:
The original design of their subscription plans is being challenged by evolving product capabilities and usage patterns.
I see this pattern at smaller scale every week. A product team wants to ship a new plan. Sales wants to grandfather a cohort. Finance wants to flip a model from per-seat to per-token. The entitlement that gates the feature is computed inside the same service that does the inference. So the “small change” turns into a quarter-long migration, and the migration turns into a panic announcement on X. What happened in April was that pattern blowing up at the scale at Anthropic, OpenAI, and Microsoft simultaneously.
This post is about what broke, why it broke, and what proper financial engineering looks like when your unit economics live downstream a non-deterministic agent loop.
The plan was built for one product. The product evolved into five products.
The most honest line about any of this came from Amol Avasare, Anthropic’s head of growth, on X on April 22, trying to explain why Claude Code had silently disappeared from the Pro plan listing for 12 hours. Read it slowly…
“When we launched Max a year ago, it didn’t include Claude Code, Cowork didn’t exist, and agents that run for hours weren’t a thing. Max was designed for heavy chat usage, that’s it. Since then, we bundled Claude Code into Max and it took off after Opus 4. Cowork landed. Long-running async agents are now everyday workflows. The way people actually use a Claude subscription has changed fundamentally. We’ve made small adjustments along the way (weekly caps, tighter limits at peak), but usage has changed a lot and our current plans weren’t built for this.”
That is a structural overhaul sentence delivered as damage control. The phrase that should land for any engineer who has ever owned a billing system is “small adjustments along the way.” Those small adjustments are the duct tape. They are weekly caps and peak-hour throttles bolted onto plans that fundamentally cannot represent the product they’re supposed to govern.
A subscription plan is, mechanically, an entitlement contract. It declares what a customer can do, with what limits, until when. If the plan was authored when the product was a chat box, and the entitlements live in the same code path that runs inference, you cannot extend that plan to govern a product that runs autonomously for six hours. You can only patch around it. Which is what every provider is now doing in public, with their customers in the middle.
The OpenClaw cliff: when subscription math becomes pure subsidy
The clearest example was the OpenClaw cutoff on April 4. OpenClaw is an open-source agent harness that lets users wire any LLM into their own automation loops. By early 2026 it had north of 100,000 GitHub stars and a developer community routing autonomous workloads through Claude using their personal $200/month Max subscriptions.
Anthropic announced on a Friday evening that starting noon Pacific the next day, third-party harnesses including OpenClaw could no longer draw from subscription quotas. Users would have to pay through pay-as-you-go “extra usage” or a separate API key. Refund window through April 17 for anyone who wanted out.
The math that made this inevitable is brutal. Aakash Gupta posted the day before the cutoff:
“The $20/month all-you-can-eat buffet just closed. A single OpenClaw agent running for one day burns $1,000 to $5,000 in API-equivalent costs. On a $200 Max subscription. Anthropic was eating that difference on every user who routed through a third-party harness.”
That is a 5x to 25x subsidy per user per day. Boris Cherny, Anthropic’s head of Claude Code, explained the technical reason. First-party tools like Claude Code and Cowork are engineered to maximize prompt cache hit rates. Cached input tokens cost 10% of the standard rate. Third-party harnesses don’t follow the cache-friendly request patterns, so every interaction triggers full reprocessing of the conversation context at full price. The economics break by an order of magnitude on the same nominal token volume.
The technical case is real. The execution was widely criticized, specifically due to the sub-24-hour notice and lack of grandfathering. The OpenClaw creator Peter Steinberger, who joined OpenAI in February, got banned from Anthropic six days later for “suspicious activity,” then unbanned within hours after his post went viral. Four days after the cutoff, Anthropic launched Claude Managed Agents, its own paid agent runtime competing in the same category as OpenClaw. The optics, fairly or not, were “absorb the use case, then close the loophole.”
The financial engineering point matters more than the optics. There was no entitlement boundary inside Anthropic’s stack that could distinguish “user typing into a chat box” from “user spawning autonomous agents that run 18 hours a day.” The same OAuth token authenticated both. So the only way to ship a policy change was a binary cutoff. Any non-first-party tool, any usage pattern, blanket disabled.
The GitHub Copilot freeze: pricing is now an outage
April 20 was worse. GitHub paused new signups for Copilot Pro, Pro+, and Student plans entirely. Two days later they paused new self-serve Copilot Business signups for organizations on Free and Team plans too. The only remaining individual signup path was free Copilot. Existing subscribers got a refund window through May 20 with one twist. Cancellation is irreversible. You can’t come back.
VP of Product Joe Binder’s blog post is the most candid description of unit economics breaking I’ve seen from any provider:
“It’s now common for a handful of requests to incur costs that exceed the plan price... Long-running, parallelized sessions now regularly consume far more resources than the original plan structure was built to support.”
According to Ed Zitron, citing internal documents, the weekly cost of running GitHub Copilot doubled since the start of the year. Same plan price. Twice the underlying cost. Add the agent-driven workflows GitHub itself encouraged (their own /fleet command for parallel agent dispatch is now listed as something they’re asking users to limit) and the existing plan structure becomes financially unsustainable.
While they were closing the front door, GitHub also did something subtle to the back. Opus models are no longer available on the Pro plan. Opus 4.7 is restricted to Pro+. Opus 4.5 and 4.6 will be removed from Pro+ entirely. And the new Opus 4.7 carries a 7.5x premium request multiplier, up from 3x for Opus 4.6, even though Anthropic’s API price for the two models is identical at $5/$25 per million tokens.
The reason 7.5x is “justified” is that Opus 4.7 ships with a new tokenizer that produces up to 35% more tokens for the same input text. Anthropic confirms this in their pricing documentation. So the rate card is flat. The invoice is not.
Now read the next sentence carefully. The 35% inflation is a silent change to the meter. Every downstream tool that hardcoded a multiplier (Copilot’s 7.5x, Cursor’s “$20 of API-equivalent credit,” Replit’s checkpoint counter) is now overcharging users by the difference between their old multiplier and the new tokenizer reality. Some of those tools will catch it and adjust. Others won’t. None of them notified users.
You can already see it in the bills. GitHub community discussion #192911 documents users seeing usage records for models they never selected (Sonnet 4.5, Haiku 4.5, Gemini 3 Flash) all billed at the wrong 7.5x rate. Phantom model billing. Compound multipliers. This is what happens when pricing logic is layered inline through three different services that don’t share a single source of truth.
The Claude Code Pro experiment: the test that wasn’t
Then on the afternoon of April 21, the live pricing page on claude.com silently changed the Pro tier’s Claude Code listing from a checkmark to an X. Support docs changed at the same time, from “Using Claude Code with your Pro or Max plan” to just “Max plan.” Mobile and desktop both. No press release, no email, no changelog entry.
Hacker News and Reddit caught it within hours. Avasare posted on X claiming it was a “small test on ~2% of new prosumer signups” and that existing subscribers were unaffected. Ed Zitron pointed out the obvious. A 2% test that updates the public-facing pricing page for everyone is not a 2% test. The change reverted within 24 hours.
Sam Altman, never one to miss a moment, replied “ok boomer” under Avasare’s thread.
The part that actually mattered, again, was the structural overhaul sentence buried inside the damage control. Anthropic is openly modeling a future where Claude Code is no longer part of the $20 plan. Avasare said as much. “We’re looking at different options to keep delivering a great experience for users. We don’t know exactly what those look like yet. That’s what we’re testing and getting feedback on right now.” That is not about a 2% experiment. We’re looking ata company that knows the plan structure has to change and is workshopping the politics of the announcement.
The supporting cast: the whole industry is converging on per-token
The headline incidents are dramatic, but the slow-motion structural shift underneath them is more important. Three other moves the same month show every major provider converging on the same conclusion: flat-rate pricing on agentic workloads is dead.
OpenAI moved Codex from per-message billing to per-token API-style billing on April 2 for new Plus, Pro, Business, and Enterprise customers, then extended it to all existing Enterprise customers on April 23 including Edu, Health, Gov, and Teachers. The rate card now bills credits per million input, cached input, and output tokens, mapping directly to the underlying API meter.
Anthropic restructured enterprise contracts in February through April. The Information first reported the shift on April 14. The previous flat $200/seat/month tier with a discounted token allowance is gone. New contracts are $20/seat for chat-only or Claude Code, plus standard API rates on all consumption, plus a monthly spending commitment based on Anthropic’s estimate of your usage that you pay whether you hit it or not. Volume discounts of 10 to 15% that previously applied are gone. Heavy enterprise users will see bills double or triple, per Fredrik Filipsson at Redress Compliance.
And on April 23 OpenAI shipped GPT-5.5 with the API price doubled from $2.50/$15 to $5/$30 per million tokens. They claim a “20% effective price increase” once token efficiency is factored in. Either way, the headline rate doubled, on the same model family, in a single release.
This is not a coincidence. The math underneath it is the same everywhere. Anthropic now processes such heavy agentic workloads that its inference costs surged 23% beyond internal projections in 2025, pushing gross margin to roughly 40% per The Information. OpenAI’s APIs went from 6 billion tokens per minute in October to 15 billion per minute by March. That is a 2.5x compounding load curve in five months. Flat-rate plans were a marketing decision. Per-token meters are an operational reality the providers cannot defer.
The financial engineering reckoning
Here is the part that should matter most to anyone reading this newsletter. The reason the pricing changes have been ugly is not because the providers are evil. It is because their monetization infrastructure was not built as a separate concern from their product code.
When the entitlement that decides “can this user invoke Claude Code from a Pro plan” lives inside the request handler, the only way to change that entitlement is to ship code. When the meter that tracks “how many tokens has this user consumed in the current 5-hour window” lives in the same service that does the rate limiting, the only way to ship a new meter is to redeploy the inference layer. When the price-to-credit conversion (Cursor’s $20-equivalent, Copilot’s premium request multipliers, Replit’s effort-based checkpoints) is hardcoded as constants, the only way to handle a tokenizer change like Opus 4.7 is to push a new build, and any tool that doesn’t push a build is silently overcharging.
The pattern most teams have, and the one that may broke for in April, looks roughly like this:
// inline entitlement check, computed in the request handler
async function handleClaudeCodeRequest(req: Request) {
const user = await authenticate(req);
const plan = user.plan; // ‘pro’ | ‘max5’ | ‘max20’ | ‘enterprise’
if (plan === ‘pro’ && !ENABLE_CLAUDE_CODE_FOR_PRO) {
return error(’Claude Code requires Max’);
}
const sessionUsage = await getRollingWindowUsage(user.id, ‘5h’);
const limit = SESSION_LIMITS[plan]; // hardcoded constant per plan
if (sessionUsage > limit) {
if (user.extraUsageEnabled) {
return billExtraUsage(user, /* ... */);
}
return error(’Session limit hit’);
}
return invokeModel(req);
}Every line of that handler is a future panicked announcement waiting to happen. Toggle ENABLE_CLAUDE_CODE_FOR_PRO for 2% of signups and you ship a “test.” Change SESSION_LIMITS[plan] and you ship a “tightening.” Add a new model with a different tokenizer and you ship a regression. Every business decision becomes a code change. Every code change risks a billing side effect. Every customer becomes a beta tester for the pricing team.
The pattern that doesn’t break is one where entitlements, plans, and meters live in a separate monetization layer that the request handler queries. The plan structure itself becomes data. New plans become configuration. Customer grandfathering becomes a query. Tokenizer changes are isolated to the meter. Price experiments are run by product managers in a UI, not engineers in a deploy.
The two biggest AI companies in the world have already concluded this internally. OpenAI created a Financial Engineering function led by Sara Conlon, organized into pods for Pricing & Packaging, Infrastructure, Financial Automation, and Payments. Anthropic hired Shanmugasundaram Alagumuthu in June 2025 to lead its Billing Platform team, after he ran payments and billing engineering at Turo. They are doing for monetization what their predecessors did for SRE. Treating it as production infrastructure with a dedicated team and a dedicated discipline. Because the financial cost of getting it wrong is now visible at quarter-end, on the company’s S-1, and in the headlines.
What This Means for Builders
If you build anything that involves AI inference and a paying customer, the next quarter is going to look like the last one. There will be more weekly caps. More peak-hour throttles. More silent tokenizer shifts. More plans that drop features in 2% experiments and restore them after the Hacker News thread crests. The volatility is structural. AI providers do not yet have stable unit economics. They are figuring it out in public, with their customers absorbing the variance.
The companies that look smart at the end of this period will be the ones whose monetization layer can absorb the volatility without shipping code. The ones who can re-meter a workload in a config change. The ones who can grandfather a customer cohort in a query. The ones who can run a pricing experiment without a deploy. That is not a feature. That is an architectural decision made one or two quarters before you need it.
Ramp’s CEO Eric Glyman, who shipped a token-tracking tool last month, summed up the customer-side reality. AI spend across Ramp’s customer base grew 13x in 12 months. Nobody knows how to budget for it. Glyman pointed out the hardest question in a CNBC piece on April 17. If your business model depends on extracting maximum token spend, do you have any incentive to help customers use AI more efficiently? That question is going to define which AI companies survive the IPO window. Anthropic is moving toward per-token because it wants its revenue line on the S-1 to match what customers actually value. OpenAI is moving the same direction at the same time, more reluctantly, because the alternative is a margin crisis at scale.
Next week I’m going to take this story one layer deeper. There’s a single GitHub issue, #41930, filed against Claude Code in March. It runs to thousands of words and was written by community engineers who reverse-engineered the Claude Code binary using Ghidra and a MITM proxy. Inside it is the most detailed public account of how a major AI provider’s metering system actually works, what’s wrong with it, and why $200 of extra usage credits can vanish from your account because you typed certain words in a previous prompt. We’ll go through it together. Cache invalidation, sentinel string replacement, false rate limits, prompt routing bugs, and what proper metering looks like instead.
Until then. If you build anything that touches a meter or an entitlement, treat April as the warning shot. The April every AI plan broke was always going to happen. The question now is whether your company is building monetization that bends, or that breaks the same way.









fascinating. in the month of March/April I went from Claude Pro to Max5 to Max20 to Max5. They make it remarkably easy to burn tokens. I argue unnecessarily. April revealed a whole lot of problem solving inefficiencies - unnecessary tool and skill spin-ups, over complicated solutions to problems that had already been defined and previously mapped. The model’s capabilities seemingly stepped back. Hung up. And clearly wasted tokens. Every workflow I build now is lensed through maximizing token spend efficiency. So yes - the economics of current token spend are clearly challenging but instead of punishing the user why not work on delivering more efficient token use. We need to get past this being a sausage factory.