close
close

Nvidia Ethernet technology powers Elon Musk’s world-class AI training system

Nvidia Ethernet technology powers Elon Musk’s world-class AI training system

American chipmaker Nvidia announced on Monday, October 28, that it has helped Elon Musk’s xAI expand his Colossus supercomputer.

The Colossus supercomputer cluster is now recognized as the largest AI training cluster in the world.

Thanks in part to Nvidia’s Spectrum-X ethernet networking technology, xAI can take its ChatGPT rival Grok AI to a new level.

Grok AI: Can generative AI decipher the meaning of life?

Founded last year by Elon Musk, xAI is a startup that provides a service similar to Open AI’s ChatGPT. In typical Musk fashion, the company has an outstanding mission purpose that goes to the core of our existence. That goal, the company says, is to use generative artificial intelligence “to understand the true nature of the universe.”

The key to achieving that goal is xAI’s Colossus supercomputer. The impressive computing powerhouse was built in Memphis, Tennessee, to train the third generation Grok. xAI’s Grok is a large language model AI, just like Open AI’s ChatGPT. It is available to premium X (formerly Twitter) subscribers.

Impressively, xAI completed Colossus in just 122 days. The company then started training the first models 19 days after installation. According to Nvidia, these systems often take many months or even years to create.

Like ChatGPT, Grok’s large language models are trained by analyzing massive amounts of data, which requires enormous computing power. The data used includes text, images and other content typically purchased online.

In a recent post on X, Elon Musk said: “Colossus is the most powerful AI training system in the world. Furthermore, it will double in size to 200,000 (50,000 H200s) within a few months. Excellent work from the (xAI) team, NVIDIA and our many partners/vendors.”

Now Nvidia xAI is helping take the world’s most powerful AI training cluster to the next level. In its statement on Monday, the US tech giant said it will help Elon Musk’s xAI double the capacity of Colossus to 200,000 GPUs.

The world’s most advanced AI chatbot

The Colossus AI training cluster consists of an interconnected network of 100,000 NVIDIA Hopper GPUs. These use a unified Remote Direct Memory Access network.

The network uses Nvidia’s Spectrum-X technology for low latency. Data moves directly between nodes without being redirected to the operating system, allowing Colossus to process the massive amounts of data needed to train Grok.

“Across all three layers of the network fabric, the system experienced zero application latency degradation or packet loss due to flow collisions,” Nvidia explained in its statement. “It has maintained 95% data throughput, powered by Spectrum-X congestion control.”

Nvidia describes Spectrum-X as the world’s first Ethernet networking platform for generative AI.

Theoretically, with Nvidia’s help, the combined Colossus array could eventually reach around 497.9 exaflops (497,900,000 teraflops). This would set a new benchmark in supercomputing power and make Grok the most impressive AI chatbot in the world.