GPT-5 has just been released. It did not exactly dazzle on the benchmarks. It is an economic response before it is a technical feat. ChatGPT now automatically chooses the appropriate level of reasoning effort. That smooths the user experience across simple tasks and deeper queries. More important, it cuts unit cost by routing most usage to smaller models when limits are reached. OpenAI also says GPT-5 matches the performance of earlier reasoning models while using 50 to 80 percent fewer output tokens. The $8 billion loss forecast for 2025, raised by $1 billion at the end of June, needs to be contained.
Customer segments do not absorb these constraints in the same way. On the consumer side, the cost explodes when the product promises to answer prompts with no guardrails. The mass market has been useful for distribution and product learning, but it crushes margins unless providers keep tight control over reasoning, context, and tool calls. Free tiers will likely be rationed to keep the bill in check and paid versions will see price increases. On the enterprise side, willingness to pay is higher. That is why model vendors are trying to build more dedicated tools, from office productivity to finance to software development. But that willingness is not unlimited.
The starkest tell of this new economic tension is in code assistants. TechCrunch reports that several high-growth “vibe coding” tools are running very negative gross margins. That means each user costs more to serve than they pay. This is not just a land-grab. It is structural. Replit’s CEO, whose company added $150 million in revenue in 12 months, confirmed on a podcast that it pays far more to foundation-model providers than subscriptions bring in. In short, Replit’s losses are OpenAI’s and Anthropic’s revenues. Venture capital is covering the gap in the name of growth, but the endgame is either higher prices or degraded service. The dream would be for an ad-funded company, Alphabet, Meta, or X, to release an open-source model on par with proprietary leaders. That is not the trajectory today.
Behind those shrunken P&Ls lie power-purchase agreements, data-center leases, transformers, Nvidia chips, HBM memory, lithography machines, an entire supply chain running hot. Hyperscaler capex is soaring. Alphabet has lifted its 2025 target to about $85 billion. Microsoft is running at close to $30 billion per quarter and expects cloud margin dilution as infrastructure grows faster than revenue. In the second quarter, Meta’s cash fell to about $12 billion from $32 billion at the end of 2024, even though operating cash flows remain large. In the United States in Q2, some estimates attribute nearly half of the 3.0 percent annualized GDP growth to AI investment.
For historical perspective, big investment cycles rhyme. In the 1840s the United Kingdom poured roughly 15 to 20 percent of annual GDP over a few years into railways, then endured a long purge. By 1893 nearly a quarter of U.S. rail miles were controlled by bankrupt companies even as traffic kept rising. In the 1990s telecoms overbuilt networks that were later sold for pennies, with excess capacity driving prices down and crushing margins. The lesson is not that usage disappears, but that capital is destroyed and prices fall until demand catches up with supply.
Applied to AI, a normalization of demand as investor subsidies end would make training new models less attractive, especially as progress already looks less obvious than before, and would push hardware earmarked for training into serving models in production (inference), creating a lasting glut of capacity.
[Update, 8/30/2025] Anthropic’s latest terms and policy changes read less like product polish and more like cost control: the company is extending consumer data retention to five years and, unless users opt out, will train on new chats and coding sessions, while simultaneously adding weekly rate limits to curb 24/7 “power user” behavior in Claude Code and across Pro and Max tiers. The economic logic is plain: first-party training data lowers dependence on expensive licensed corpora, and throttling high-intensity usage reins in runaway inference bills and scarce compute, which Anthropic itself frames as a binding capacity constraint.


