The center of gravity for high-performance computing has shifted. What was once the domain of government research labs has migrated to the private sector, driven by explosive AI workload growth. This transformation brings a fundamental challenge: how do you build network infrastructure capable of supporting massive AI clusters without forcing organizations into proprietary vendor ecosystems?
To explore this question, The Tolly Group spoke with Ed Nakamoto, Principal Hardware Architect at VIAVI Solutions, who is deeply involved in the Ultra Ethernet Consortium's (UEC) efforts to create an open standard for AI networking.
Why AI Demands a New Approach to Ethernet
The AI revolution is moving faster than anticipated, and networking requirements for these workloads differ significantly from traditional applications. "It makes a lot of sense for there to be a unified standard for the various kinds of AI clusters and networks out there, and for it to be based on Ethernet," Nakamoto explains.
Ethernet has served the industry well for decades, but AI workloads introduce new requirements that legacy Ethernet wasn't designed to handle. Low latency and lossless networks are non-negotiable for AI training clusters where thousands of GPUs must communicate in tightly synchronized patterns. A single dropped packet or microsecond of additional latency can cascade through an entire training job, extending completion times by hours.
Ultra Ethernet leverages the existing Ethernet foundation, keeping proven components like the physical layer (PHY) and media access control (MAC) while adding purpose-built features for AI. This approach balances innovation with stability, allowing organizations to build on existing knowledge while addressing AI's unique demands.
The Open Standard Advantage
The Ultra Ethernet Consortium takes a deliberately open approach. "The consortium is going to create an open standard, exactly as Ethernet is today," Nakamoto notes. "In an open standard, anybody can come in and participate, read the standard, figure out how to design equipment for it, and put together new networks across this new standard."
This openness addresses the vendor lock-in that has plagued many high-performance computing environments. Organizations relying on proprietary interconnects often find themselves constrained by single-vendor roadmaps and pricing with limited negotiating power.
Open standards also drive innovation through competition. When multiple vendors can implement the same standard, competition pushes technology advancement. "By nature, an open standard is going to enable competition," Nakamoto explains.
Collaborative Development
The strength of Ultra Ethernet lies in its diverse stakeholder base. Hyperscalers, cloud providers, system vendors, and equipment manufacturers are all actively participating.
With massive capital investments flowing into AI infrastructure, an open standard facilitates the cross-vendor collaboration needed to meet these demands efficiently. Organizations can mix and match components from different vendors, choosing the best solutions for specific requirements.
The Adoption Timeline
Early adoption will concentrate in organizations with the technical expertise and capital to deploy large-scale AI infrastructure. "The early adopters will be some of the large hyperscalers, who can afford the cost of these large AI clusters," Nakamoto predicts.
These hyperscalers will build the massive GPU networks powering the AI applications consumers already use for imaging and other tasks. As these deployments mature and the standard proves itself at scale, adoption will gradually expand to enterprises with mission-critical AI requirements.
Ensuring Interoperability
Creating an open standard is only the first step. Ensuring interoperability requires rigorous testing and validation. The consortium has established both compliance and performance groups to verify that equipment from different vendors can actually work together.
"The Ultra Ethernet standards is aiming to challenges related to compliance and interoperability. The UEC created the Compliance and Test Group as well as the Performance and Debug Group that will create tests that the vendors need to pass or benchmark," Nakamoto explains. This testing framework provides the foundation for true multi-vendor deployments.
What's Coming Next
Ultra Ethernet 1.0 focuses primarily on scale-out networks, the connections between racks at the data center level. Future versions will address scale-up scenarios, the extremely high-performance connections needed within individual racks where GPUs must communicate with minimal latency.
Congestion management represents a critical area for future development. In AI workloads, congestion has different implications than traditional networking. The synchronized nature of AI traffic patterns means a single congested link can slow an entire training job. "A single link that has congestion will slow everything down," Nakamoto emphasizes.
The consortium is developing congestion management mechanisms that can recognize where congestion occurs and make real-time adjustments to maintain Quality of Service. These capabilities will be essential as AI workloads scale to even larger cluster sizes. Unlike traditional networking where congestion might affect a single flow, synchronized AI flows mean one bottleneck degrades all related traffic.
Looking Ahead
The pace of Ultra Ethernet development reflects the urgency of AI infrastructure needs. "I've never seen anything this significant move so quickly in my entire career. It's all driven by the incredible momentum of AI and the transformative changes happening across the world right now," Nakamoto observes.
For organizations planning AI deployments, Ultra Ethernet offers a path forward that avoids proprietary lock-in while leveraging the industry's collective expertise. The open standard approach, combined with rigorous testing and broad industry participation, positions Ultra Ethernet as the foundation for the next generation of AI infrastructure.
Key Takeaways
AI workloads demand low-latency, lossless networks that traditional Ethernet wasn't designed to support
Ultra Ethernet builds on existing Ethernet foundations while adding purpose-built features for AI
Open standards prevent vendor lock-in and drive competitive innovation in both technology and pricing
Hyperscalers will be early adopters, with broader enterprise adoption following as the standard matures
Rigorous compliance and performance testing ensures true multi-vendor interoperability
Future versions will address scale-up networks, advanced congestion management, and Quality of Service
Synchronized AI traffic patterns make network congestion more critical than in traditional workloads
Learn More
The Ultra Ethernet Consortium specification 1.0 is now published and available at the consortium website. Organizations interested in participating can join to monitor ongoing meetings and discussions shaping the standard's evolution. For more information, read VIAVI blog on "What Ultra Ethernet Means for AI and HPC Networks."
