Reports & Publications
Dell Networking Data Center AI Switch Fabric Dell PowerSwitch Z9864F-ON AI Fabric Congestion Mitigation Evaluation & LPO Ecosystem
Login or create an account to download this report
Abstract
Much more than most legacy traffic, AI training traffic is both latency-sensitive and bursty. Because many flows are synchronized, latency or dropped frames caused by congestion can have a ripple effect and degrade not just the session experiencing congestion but other flows in the training collective as well. The Solution? Avoid congestion at all costs - and that is just what Dell helps you do. Proprietary Dell enhancements to its Enterprise SONiC Distribution implement dynamic prioritization for AI (NCCL) traffic.
Dell Technologies commissioned Tolly to evaluate its RDMA over Converged Ethernet (RoCE), one of its many dynamic congestion mitigation networking features, in its networking operating system Dell Enterprise SONiC using the Dell PowerSwitch Z9864F-ON 800GbE. The demonstration included the evaluation of various options for supporting linear pluggable optics (LPO) in addition to traditional, DSP-based optical connections.
The Dell PowerSwitch Z9864F-ON fabric demonstrated dynamic prioritization of AI (NCCL) traffic. This AI training session traffic was unable to reach its destination without Dell’s dynamic prioritization. Additionally, the Dell PowerSwitch and Dell PowerEdge server environment demonstrated support for DR8-400GbE LPO, DR4 LPO, and VR4400GbE LPO.