close
close

Nvidia could lose this share of the AI ​​market

Nvidia could lose this share of the AI ​​market

In the world of AI hardware, almost everyone is talking about inference.

Nvidia CFO Colette Kress said during the company’s earnings call on Wednesday that inference accounted for about 40% of Nvidia’s $26.3 billion in data center revenue in the second quarter. AWS CEO Matt Garman recently told the No Priors podcast that inference is likely half of the work done on AI-powered computing servers today. And that share is likely to grow, attracting competitors looking to challenge Nvidia’s crown.

It is therefore logical that many companies looking to take market share from Nvidia start with inference.

A team of Google founders founded Groq, which focuses on inference hardware and raised $640 million in August at a valuation of $2.8 billion.

In December 2023, Positron AI emerged from the shadows with a conclusion chip that it claims can perform the same computations as Nvidia’s H100, but for five times less. Amazon is developing both training and inference chips — aptly named Trainium and Inferentia, respectively.

“I think the more diversity we have, the better off we are,” Garman said in the same podcast.

And Cerebras, the California company known for its massive AI training chips, announced last week that it had developed an equally large inference chip that CEO Andrew Feldman claims is the fastest on the market.

Not all inference chips are built equal

Chips designed for artificial intelligence (AI) workloads must be optimized for training or inference.

Training is the first stage of developing an AI tool — when you feed labeled and annotated data into a model so it can learn to produce accurate and useful outputs. Inference is the act of producing those outputs after the model has been trained.

Training chips are typically optimized for raw processing power. Inference chips require less processing power, in fact some of the inference can be done on traditional CPUs. Chipmakers for this task are more concerned with latency, because the difference between an addictive AI tool and a tedious one often comes down to speed. That’s what Cerebras CEO Andrew Feldman is banking on.

According to the company, Cerebra’s chip has 7,000 times the memory bandwidth of Nvidia’s H100. That’s what Feldman calls “blistering speed.”

The company, which has begun work on launching an IPO, is also introducing inferencing as a multi-tiered service, including a free tier.

“Inference is a memory bandwidth problem,” Feldman told Business Insider.

To make money with AI, you need to scale your inference workload

Choosing to optimize a chip design for training or inference isn’t just an engineering decision; it’s also a market decision. Most companies building AI tools will need both at some point, but the bulk of their needs will likely be in one area or the other, depending on where the company is in the build cycle.

Massive training workloads can be considered the R&D phase of AI. When a company shifts to primarily inference, it means that every product it has built will work for end customers — at least in theory.

Inference is expected to represent the vast majority of computing tasks as more AI projects and startups mature. According to AWS’s Garman, that’s what needs to happen to realize the unrealized return on hundreds of billions of dollars in AI infrastructure investments.

“Inference workloads have to dominate, otherwise all this investment in these large models isn’t really going to pay off,” Garman told No Priors.

However, the simple dualism of training versus inference may not hold forever for chip designers.

“Some clusters in our data centers are used by our customers for both purposes,” said Raul Martynek, CEO of data center rental company Databank.

Nvidia’s recent acquisition of Run.ai may support Martynek’s prediction that the line between inference and training will soon disappear.

In April, Nvidia agreed to acquire Israeli company Run:ai, but the deal has not yet closed and is being reviewed by the Justice Department, according to Politico. Run:ai’s technology makes GPUs run more efficiently, allowing more work to be done on fewer chips.

“I think most companies will merge. You get a cluster that trains and does inference,” Martynek said.

Nvidia declined to comment on this report.