🔥BTC/USDT

Inference drives the next AI computing boom

The center of gravity in artificial intelligence computing is shifting from model training to inference, with markets beginning to reflect the change. Analysts estimate that inference, which generates ongoing operational costs rather than one-time expenses, could grow into a market 10 to 50 times larger than training, according to J.P. Morgan.

Inference demand reshapes hardware market

Rising demand for inference capacity is already rippling through hardware supply chains. Nvidia’s latest results underscore this transition, with data-center revenue reaching roughly $75 billion, up 92 percent year-on-year. Of that, $38 billion came from hyperscale clients and $37 billion from enterprise and industrial segments. Its edge-computing unit added $6.4 billion, rising 29 percent from a year earlier.

The company’s shift toward reporting metrics tied to “service tokens” highlights a strategic pivot toward ongoing inference workloads across data centers and edge devices as the main driver of future growth.

Market backs inference-focused chipmakers

The trend is also evident in capital markets. Cerebras, which designs chips optimized for inference rather than training, saw its public listing oversubscribed more than 20 times. Its shares have traded well above the initial $185 offering price, with volatility ranging from the low $200s to peaks near $386, signaling sustained demand for architectures aimed at reducing model execution costs.

Supply constraints intensify competition

Soaring demand is tightening supply. Lead times for advanced chips now exceed a year, while key suppliers such as TSMC have reportedly allocated packaging capacity through the end of 2026. Nvidia has responded by locking in multi-year agreements for next-generation memory to secure production.

Anthropic’s move to take over the full capacity of the Colossus 1 data center in May 2026 reflects this pressure. The facility, powered by more than 220,000 Nvidia GPUs and over 300 megawatts, is dedicated entirely to inference. The expansion allowed the company to increase API limits, double service usage in some cases, and introduce pricing tied more closely to real-time compute consumption.

AI build-out finds revenue balance in inference

The shift toward inference also addresses earlier concerns about the economics of AI infrastructure. Sequoia’s David Cahn warned in 2023 of a potential $600 billion revenue gap between massive GPU spending and realistic returns. That gap is now expected to narrow as continuous model usage drives recurring demand for computation.

Inference effectively turns compute into a utility-like service, where revenue scales with usage rather than one-off deployments.

Different strategies emerge across the value chain

Companies are adapting in distinct ways depending on their position in the AI stack.

  • Hyperbolic aggregates unused GPUs from cloud providers into a unified marketplace, routing workloads based on real-time pricing. By not owning hardware, it acts as a liquidity layer in a fragmented market and has attracted more than 200,000 developers shortly after launch.
  • Venice operates at the application layer, offering inference services built on both open- and closed-source models while relying on third-party compute providers. Its revenue comes from subscriptions focused on privacy, with around 136,000 subscriber-linked wallets and an estimated $6 million to $15 million in annual recurring revenue.

Both models depend on the same constraint: access to affordable inference capacity. Aggregators benefit from price arbitrage and routing efficiency, while application providers remain exposed to rising compute costs.

Outlook: exponential growth in inference workloads

Analysts expect inference demand to accelerate as AI agents and physical AI systems scale across cloud and edge environments. These applications could require between five and thirty times more compute per task than current systems, compounding pressure on already strained infrastructure.

As a result, markets increasingly view computational power as a scarce, revenue-generating resource. The industry’s earlier overinvestment in training infrastructure may ultimately find balance through sustained demand for inference, completing the shift toward AI as a continuous service rather than a discrete build process.


Explore how markets react to AI shifts in our in-depth guide on AI-driven market moves and trading dynamics.

Disclaimer: The content on this page is provided for general informational purposes only and does not represent the views or financial advice of Toobit. We make no guarantees regarding the accuracy or completeness of this information and shall not be held liable for any errors, omissions, or outcomes resulting from its use. Investing in digital assets involves risk; users should independently evaluate their financial situation and the risks involved. For further details, please consult our Terms of Service and Risk Disclosure.

Sign up and trade to earn over 15,000 USDT
Sign up