
May 24, 2026, 14:15 p.m. ET | ⏱️14–15 minutes
By Ethan Carter
From “Competing for GPUs” to “Designing Chips”
In 2026, access to AI computing power has become increasingly uneven. Major cloud providers reportedly prioritize their largest enterprise customers for advanced GPU access, while smaller AI companies may wait months for capacity. Some startups have even begun purchasing hardware directly instead of relying entirely on cloud services.
Only a few years ago, the central issue was whether companies could secure enough GPUs to train large AI models. Now, some of the world’s largest technology companies are attempting to design their own AI chips.
OpenAI is reportedly preparing its first custom AI accelerator through Taiwan Semiconductor Manufacturing Company (TSMC). Meta continues expanding its MTIA chip program. Microsoft is increasing deployment of its Maia chips. Amazon is developing newer versions of Trainium, while Google keeps scaling its TPU ecosystem.
At first glance, this trend may appear to be a simple effort to reduce dependence on NVIDIA. However, industry developments suggest the shift is broader and more structural.
The AI industry is gradually moving from a training-focused phase toward an inference-focused phase. During the first wave of generative AI adoption, the primary challenge was training increasingly large models. In 2026, the larger long-term cost increasingly comes from running those models continuously for billions of users.
That shift changes the economics of AI infrastructure.
Competition is no longer centered only on model capability. It is increasingly about inference efficiency, token cost, power consumption, and long-term infrastructure control. In this sense, the industry may be entering a broader contest over what some analysts describe as “silicon sovereignty” — the ability to control the AI stack from hardware to end-user services.
NVIDIA’s Dominance and the Push for Alternatives
A Market Concentrated Around One Supplier
NVIDIA remains the dominant supplier of AI accelerators. Industry estimates generally place the company’s share of the high-end AI training GPU market above 80%, particularly in hyperscale AI infrastructure.
The company’s financial performance reflects this position. NVIDIA reported fiscal 2026 data center revenue of approximately $193.7 billion, while quarterly gross margins remained around 75%. (NVIDIA FY2026 Financial Results)
During periods of supply shortage, spot-market prices for advanced GPUs such as the H100 reportedly rose far above official pricing. Cloud GPU rental costs also increased significantly in parts of 2025 and 2026.
The roots of today’s supply pressure can be traced back several years. Following the rise of large language models, demand for NVIDIA’s A100 and H100 GPUs accelerated rapidly. Reports from 2023 indicated that companies including Baidu, ByteDance, Tencent, and Alibaba collectively placed multibillion-dollar orders for NVIDIA chips within a relatively short period, contributing to severe supply constraints.
At the same time, AI companies themselves warned about compute shortages. OpenAI CEO Sam Altman repeatedly stated that limited access to computing power was becoming a bottleneck for AI deployment and product expansion.
From NVIDIA’s perspective, these high margins represent strong commercial success. From the perspective of cloud providers and AI developers, however, they increasingly represent strategic dependence.
That dependence matters because AI infrastructure is no longer a secondary IT expense. For many AI companies, compute costs are becoming central to long-term profitability.
Why Inference Changes the Hardware Equation
General-purpose GPUs are highly flexible, which made them especially effective during the early stages of AI research and model training. However, inference workloads often behave differently from training workloads.
Once AI services reach large scale, many tasks become relatively predictable. In those situations, specialized chips — particularly application-specific integrated circuits (ASICs) — may offer better efficiency and lower operating costs.
This creates a strong incentive for hyperscalers to develop custom hardware.
Importantly, many companies may not need hardware that surpasses NVIDIA across every benchmark. Instead, they may prefer chips that are optimized for their own workloads while reducing token cost or energy consumption at very large scale.
The economic incentives are substantial. Earlier industry estimates suggested that processing a single ChatGPT query could cost several cents in compute expenses. At global scale, those operational costs become extremely large.
For companies spending billions annually on inference infrastructure, reducing reliance on expensive third-party GPUs is increasingly viewed not only as a supply-chain strategy, but also as a financial necessity.
Paradoxically, NVIDIA’s success may have accelerated this diversification trend. The stronger and more indispensable its ecosystem became, the greater the incentive for hyperscalers to seek partial alternatives.
Even NVIDIA appears aware of this shift. In recent years, the company has increasingly emphasized customized AI infrastructure and closer hyperscaler partnerships. (NVIDIA Investor Materials)

The Economics of Inference
Training Is Intensive — Inference Is Continuous
Public discussion about AI hardware often focuses on training frontier models. However, many analysts now believe inference accounts for the majority of global AI compute usage, and its share appears to be increasing.
Training occurs periodically. Inference happens continuously.
Every chatbot response, recommendation refresh, AI search result, or autonomous agent action consumes inference tokens. As AI systems become more widely integrated into software and business workflows, operational inference costs continue rising.
This changes how companies evaluate hardware investments.
During training, organizations typically prioritize maximum performance. During inference, companies increasingly optimize for efficiency, power usage, and cost per token.
That shift naturally favors more specialized hardware architectures.
The “Jevons Paradox” in AI
One useful framework for understanding current AI infrastructure demand is the Jevons paradox.
In the 19th century, economist William Stanley Jevons observed that improvements in steam engine efficiency did not reduce coal consumption. Instead, lower operating costs increased total demand.
AI may be following a similar pattern.
Newer models and optimization techniques have reduced inference costs in several areas. However, lower costs appear to be increasing total AI usage rather than reducing infrastructure demand.
As AI becomes more affordable, more businesses integrate it into products and workflows. AI agents may intensify this trend because reasoning-based systems often consume significantly more tokens than ordinary chat applications.
As a result, lower inference costs may actually increase total global demand for AI chips.
Why Custom Chips May Become Economically Attractive
Developing custom AI chips is expensive. Industry estimates often place development costs anywhere from tens of millions to hundreds of millions of dollars once software ecosystems, networking integration, and deployment costs are included.
However, hyperscale economics operate differently from traditional enterprise computing.
If an AI company processes billions of requests daily, even moderate efficiency improvements can potentially translate into billions of dollars in annual savings. Under those conditions, custom silicon becomes economically attractive despite high upfront investment.
This is one reason the current chip-development trend extends beyond simple cost reduction.
Increasingly, it is about controlling the economics of AI services themselves.
In traditional computing, hardware supported software products. In AI systems, computing infrastructure is becoming part of the product itself. Every generated response directly depends on compute availability, energy consumption, and inference cost.
That gives chip design strategic importance.

Different Companies, Different Strategies
Google TPU: Full-Stack Integration
Google began developing TPUs more than a decade ago, making it one of the earliest major companies to pursue custom AI accelerators at scale.
Its advantage may not depend solely on chip performance. Google controls much of the surrounding ecosystem, including software frameworks, cloud infrastructure, networking, and deployment systems.
This level of vertical integration allows Google to optimize across the entire AI stack.
Google’s TPU strategy also suggests that custom chips are not necessarily intended to replace NVIDIA entirely. Instead, they provide workload optimization, supply diversification, and greater control over long-term infrastructure economics.
Microsoft Maia: Strategic Leverage
Microsoft occupies a more complex position because it remains one of NVIDIA’s largest customers while simultaneously developing alternatives.
Reports indicate Microsoft’s chip efforts began several years ago. Its Maia and Athena projects appear primarily focused on Azure AI workloads and inference efficiency rather than universal GPU replacement.
This reflects a broader strategic calculation. Microsoft may not need to replace NVIDIA completely in order to benefit from custom chips.
Simply having alternatives can strengthen bargaining power and reduce long-term dependency risks.
At the same time, Microsoft’s relationship with OpenAI has introduced additional complexity. Multiple industry reports suggest Microsoft became deeply involved in financing discussions surrounding OpenAI’s custom-chip partnership with Broadcom.
OpenAI and Broadcom officially announced a long-term collaboration in October 2025 to deploy 10 gigawatts of OpenAI-designed AI accelerators between 2026 and 2029. (OpenAI-Broadcom Announcement)
However, later reports suggested the project encountered financing challenges. Broadcom reportedly sought purchasing commitments from Microsoft before finalizing portions of the financing structure.
Although many details remain unconfirmed publicly, the broader implication appears clear: technical capability alone may no longer be sufficient. Financing capacity, supply-chain credibility, and long-term purchasing guarantees are becoming increasingly important in large-scale AI infrastructure projects.

Amazon Trainium: Protecting Cloud Margins
Amazon Web Services faces a structural issue similar to other hyperscalers: it effectively resells NVIDIA-powered infrastructure to customers.
High GPU prices therefore directly affect AWS margins and pricing flexibility.
Trainium potentially gives Amazon a way to reduce dependence on external suppliers while offering lower-cost AI infrastructure for selected workloads.
Reports suggest AWS has priced some Trainium-based services below comparable GPU-based alternatives. This creates both competitive differentiation and pricing leverage.
Meta MTIA: Large-Scale Efficiency
Meta’s strategy appears especially focused on operational efficiency.
The company operates recommendation systems and AI services used by billions of users daily. In these environments, even small efficiency gains can translate into major infrastructure savings.
Meta may not necessarily require the industry’s highest-performing chip for every task. Instead, it may benefit more from specialized accelerators optimized for large-scale inference and recommendation workloads.
Its MTIA initiative reportedly began with recommendation systems before gradually expanding toward broader generative AI tasks.
OpenAI: Designing Hardware Around AI Models
Among major AI companies, OpenAI may represent one of the most ambitious long-term entrants into custom chip development.
The company has reportedly recruited hardware engineers and AI compiler specialists from firms including Meta and Google’s TPU ecosystem, suggesting its efforts extend beyond experimental research.
Its collaboration with Broadcom reportedly centers around a custom inference accelerator internally associated with the codename “Jalapeno.” The goal is not necessarily to outperform NVIDIA universally, but to reduce inference costs for OpenAI’s own workloads.
OpenAI and Broadcom stated that deployment of these accelerators is expected to begin in the second half of 2026 and continue scaling through 2029. (OpenAI Official Statement)
At the same time, the project also highlights the financial challenges of AI infrastructure expansion.
Some industry reports estimate that the full 10GW deployment could involve infrastructure spending well above $100 billion once manufacturing, data centers, networking, and power systems are included. These figures remain estimates rather than finalized commitments, but they illustrate the scale of capital involved in modern AI infrastructure competition.
This has led some analysts to argue that the AI race is entering a new phase in which financing capability and institutional trust matter almost as much as technological innovation.
The first phase of the AI boom rewarded companies that could build powerful models quickly. The next phase may reward companies that can secure sufficient manufacturing capacity, financing support, and long-term supply-chain stability.

The Companies Behind the “Self-Developed” Chips
The term “self-developed chip” can sometimes be misleading.
Most hyperscalers still depend heavily on semiconductor design partners such as Broadcom and Marvell Technology.
Industry estimates suggest these firms participate in the design of a large share of hyperscaler AI accelerators. Broadcom has reportedly worked on projects associated with Google and OpenAI, while Marvell has collaborated with AWS and Microsoft.
As a result, the industry is not necessarily becoming fully decentralized.
Instead, the structure of dependency may simply be evolving into a more layered ecosystem involving cloud providers, chip designers, foundries, memory suppliers, and infrastructure operators.
Physical Constraints Still Matter
TSMC Remains a Critical Bottleneck
Nearly every advanced AI chip initiative depends heavily on TSMC.
Its advanced manufacturing nodes and packaging technologies remain difficult to replace at scale. Industry reports continue to suggest that demand for leading-edge manufacturing capacity exceeds available supply.
As more companies pursue custom AI chips, they are effectively competing for the same fabrication resources.
This creates concentration risk across the global AI industry.
HBM and Energy Constraints
High-bandwidth memory (HBM) may be an even tighter bottleneck than AI processors themselves.
Advanced AI accelerators rely heavily on HBM, but production capacity remains limited and technically difficult to expand quickly. Securing memory supply has therefore become strategically important for large-scale AI deployments.
At the same time, energy availability is emerging as another major constraint.
AI servers consume substantially more electricity than traditional servers, and some data centers already face power and cooling limitations. Executives including Jensen Huang and Sam Altman have publicly suggested that long-term AI growth may ultimately be constrained more by energy infrastructure than by raw chip performance alone.
Conclusion: The Competition Is Expanding Beyond Models
The current AI chip boom is often described as a response to NVIDIA’s dominance. However, the broader transformation may be more significant than any single company.
The AI industry is gradually shifting from a race focused primarily on model training toward a broader competition over operational economics. As inference demand scales globally, infrastructure efficiency becomes increasingly important.
In this environment, chips are no longer merely hardware components.
They are tools for controlling costs, scalability, margins, and ultimately the structure of the AI economy itself.
This does not necessarily mean NVIDIA will lose its leadership position. The company still holds major advantages in software ecosystems, developer adoption, and high-performance AI infrastructure.
However, the market appears to be becoming more specialized and more diversified. AI companies increasingly want infrastructure optimized for their own workloads, business models, and long-term operational goals.
At the same time, competition is expanding beyond technology alone. The next phase of the AI race may depend not only on who can design the most advanced models or chips, but also on who can secure manufacturing capacity, financing support, energy infrastructure, and long-term supply-chain trust.
The central question may no longer be simply who builds the most powerful AI model.
Increasingly, it may be about who controls the infrastructure that makes large-scale AI economically sustainable.
References
[1] NVIDIA Investor Relations — Fiscal 2026 Financial Results
https://investor.nvidia.com/news/press-release-details/2026/NVIDIA-Announces-Financial-Results-for-Fourth-Quarter-and-Fiscal-2026/
[2] OpenAI and Broadcom Strategic Collaboration Announcement
https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration/
[3] Google Cloud TPU Documentation and Infrastructure Overview
https://cloud.google.com/tpu
[4] AWS Trainium and Inferentia Technical Overview
https://aws.amazon.com/machine-learning/trainium/
[5] Microsoft Azure Maia AI Accelerator Overview
https://azure.microsoft.com/en-us/blog/introducing-microsoft-azure-maia-ai-accelerator/
About the Author
Ethan Carter focuses on AI chips, semiconductor technology, and computing infrastructure. His work covers GPUs, AI accelerators, edge AI processors, and the hardware systems that power modern artificial intelligence. He writes analytical articles that connect technical developments with industry trends and practical applications.
Editor’s Note:
This article examines the recent surge in custom AI chip development among major technology companies including OpenAI, Microsoft, Google, Amazon, and Meta. Rather than framing the trend solely as an effort to reduce dependence on NVIDIA, the analysis explores how the AI industry’s transition from model training to large-scale inference is reshaping infrastructure economics, supply-chain strategy, and competitive dynamics. The discussion is based on publicly available financial disclosures, industry reports, and media coverage available as of May 2026.
Recommend:
From Pattern Recognition to World Simulation: How the SIMI System Builds “Physical Common Sense” for AI
Going Unmanned Isn’t About Removing Pilots—It’s About Rebuilding Flight Capability: AI and 5G-A Transform Low-Altitude Aviation
Behind the AI Chip Boom: ABF Substrate Shortages, Engineering Trade-offs, and Real Demand
China Approves 6G Trial Spectrum: What the 6425–7125 MHz Band Means