Nvidia sees a trillion-dollar future in open and parallel code • The Register

Nvidia sees a trillion-dollar future in open and parallel code • The Register

GPUs are becoming a staple in computing, which is why Nvidia is stepping up its work with standards and open source communities to downstream technologies once largely exclusive to the company’s development tools.

A lot of work is done specifically around programming languages ​​like C++ and Fortran, which are known to lag behind the native implementation for running code on highly parallel systems.

The plan is to make generic computing environments and compilers more productive and accessible, said Timothy Costa, group product manager for high-performance computing and quantum computing at Nvidia. The register.

“Ultimately our goal with the open source and programming community is to improve concurrency and parallelism for everyone. I say that because I’m talking about CPUs and GPUs,” Costa said.

Many open and generalized technologies tie into previous work done by Nvidia in its CUDA parallel programming framework, which combines open and proprietary libraries.

CUDA was introduced in 2007 as a set of programming tools and frameworks for coders to write programs on GPUs. But the CUDA strategy has changed as GPU usage has spread to more applications and industries.

Nvidia is widely known for dominating the GPU market, but CUDA is central to the company’s repositioning as a software and services provider seeking a $1 trillion valuation.

The long-term goal is for Nvidia to be a comprehensive provider targeting niche areas such as autonomous driving, quantum computing, healthcare, robotics, cybersecurity, and quantum computing.

Nvidia has built CUDA libraries that specialize in these areas, and also provides the hardware and services that businesses can leverage.

The full-stack strategy is best exemplified by the concept of an “AI factory” presented by CEO Jensen Huang at the recent GPU Technology Conference. The concept is that customers can drop applications into Nvidia’s mega data centers, the result being a custom AI model that meets specific industry or application requirements.

Nvidia has two ways to make money via concepts like the AI ​​factory: by using GPU capacity or by using domain-specific CUDA libraries. Programmers can use open source parallel programming frameworks that include OpenCL on Nvidia’s GPUs. But for those willing to invest, CUDA will provide that extra last-mile boost, as it’s tuned to work closely with Nvidia’s GPU.

Parallel for all

While parallel programming is prevalent in HPC, Nvidia’s goal is to standardize it in mainstream computing. The company helps the community standardize best-in-class tools for writing portable parallel code on all hardware platforms, regardless of brand, accelerator type, or parallel programming framework.

“The complication is — it can be measured as simply as lines of code. If you are, if you’re bouncing between a lot of different programming models, you’re going to have more lines of code,” Costa said.

For one, Nvidia is involved in a C++ committee that defines the pipework that orchestrates the parallel execution of portable code on hardware. A context can be a CPU thread performing mostly I/O, or a CPU or GPU thread performing computationally intensive. Nvidia is specifically active in providing a standard vocabulary and framework for asynchrony and parallelism that C++ programmers demand.

“Every institution, every major player, has a C++ and Fortran compiler, so it would be crazy not to. As the language progresses, we get to a place where we have true open standards with portability performance across platforms,” ​​Costa said.

“Then users are of course still able, if they want, to optimize with a vendor-specific programming model that is tied to the hardware. end users and developers,” Costa said. mentioned.

Language-level standardization will make parallel programming more accessible to coders, which could also boost the adoption of open-source parallel programming frameworks like OpenCL, he said.

Of course, Nvidia’s own compiler will extract the best performance and value out of its GPUs, but it’s important to remove barriers to bring parallelism to language standards regardless of platform, Costa said.

“By focusing on language standards, we ensure that we have a true breadth of compilers and platform support for programming performance models,” he explained, adding that Nvidia has been working with the community for over a decade to bring low-level language changes. for parallelism.

The initial work was on the memory model, which was included in C++11, but needed to be advanced when parallelism and concurrency started to take hold. The memory model in C++11 focused on concurrent execution on multi-core chips, but lacked hooks for parallel programming.

The C++17 standard introduced the basics of higher-level parallelism functionality, but true portability will come in future standards. The current standard is C++20with C++23 coming soon.

“What’s great now is because this piping has been laid, if you start looking at the next iterations of the standard, you’ll see more and more user-oriented and productive features that go into these languages, which are really portable in terms of performance. Any hardware architecture in the CPU and GPU space will be able to take advantage of this,” Costa promised. ®