Renowned processor architect Jim Keller, celebrated for his groundbreaking contributions across various ISAs such as x86, ARM and now RISC-V with Tenstorrent.
Keller has raised concerns about NVIDIA’s monopolistic CUDA software stack. He drew parallels of CUDA to the x86 ecosystem, which he characterized as stagnant, labeling it a “swamp.”
He highlighted that NVIDIA itself utilizes several specialized software packages dependent on open-source frameworks for optimal performance.
In a Twitter post, Keller expressed, “CUDA is a swamp, not a moat. x86 was a swamp too. CUDA is not beautiful. It was built by piling on one thing at a time.”
Similar to x86, CUDA has evolved over time by incrementally adding functionality while preserving backward compatibility in both software and hardware. While this approach ensures NVIDIA’s platform is comprehensive and backwards compatible, it can impede performance and increase the complexity of program development.
Conversely, numerous open-source software development frameworks offer more efficient alternatives to CUDA.
“Basically nobody writes CUDA,” stated Keller in a subsequent post. “If you do write CUDA, it is probably not fast. There is a good reason there is Triton, Tensor RT, Neon, and Mojo.”
Nvidia itself offers tools that don’t solely depend on CUDA. One such tool is the Triton Inference Server, an open-source solution designed to streamline the deployment of AI models at scale.
Triton supports popular frameworks such as TensorFlow, PyTorch, and ONNX. Additionally, Triton offers functionalities like model versioning, multi-model serving, and concurrent model execution to enhance the efficient utilization of GPU and CPU resources.
TensorRT by NVIDIA is a cutting-edge deep learning inference optimizer and runtime library engineered to enhance the performance of deep learning inference on NVIDIA graphics cards.
It seamlessly integrates with various frameworks like TensorFlow and PyTorch, optimizing trained models for deployment. By minimizing latency and boosting throughput, TensorRT is tailored for real-time applications such as image classification, object detection, and natural language processing.
Jim Keller’s stance on AMD’s ROCm and Intel’s OneAPI remains unclear. However, despite dedicating numerous years to designing legendary x86 architectures, such as AMD’s life-saving Zen architecture, he appears less optimistic about the future of x86 and CUDA for that matter.
Keller’s critique is not unique, as others in the industry also raise concerns about the complexity and performance of CUDA. The accumulation of software layers to ensure backward compatibility has resulted in CUDA losing efficiency and performance. Although it remains essential for NVIDIA and a key factor for many companies in selecting their products, there is a rising demand for more tailored solutions compatible with specific models.
Although CUDA currently holds dominance, its complexity may require reevaluation in the future, particularly with the increasing popularity of alternatives like RISC-V for performance and efficiency
Ultimately, Jim Keller’s critiques emphasize the importance of ongoing innovation in GPU software development, urging companies like NVIDIA to adjust to the evolving demands of the industry. His viewpoint promotes exploring fresh avenues and avoiding complacency with solutions that, while currently vital, may warrant reevaluation down the line.