Jensen Huang GTC Keynote Speech Transcript

Welcome to GTC. This past year has been remarkable. The goal is to create a new world, and through the power of artificial intelligence, this presentation will transport you to NVIDIA’s headquarters, the heart of its operations.

The past year has been filled with incredible developments, and there are many exciting topics to discuss. This presentation will proceed without a script or teleprompter, covering a wide range of subjects.

First, gratitude is extended to all the sponsors and participants of this conference. Representation spans nearly every industry, including healthcare, transportation, retail, and the broader computer industry. The presence of such a diverse group is greatly appreciated. Thank you for your sponsorship.

GTC began with GeForce. It originated with GeForce. Today, the GeForce 5090 is presented. Unbelievably, 25 years after its inception, GeForce is sold out worldwide.

The 5090 represents the Blackwell generation. Compared to the 49, it boasts a 30% reduction in volume, a 30% improvement in energy dissipation, and delivers unparalleled performance. This advancement is largely attributed to artificial intelligence.

GeForce introduced CUDA to the world, which in turn enabled AI. Now, AI is revolutionizing computer graphics. What you are seeing is real-time computer graphics, 100% path traced. For every pixel rendered, artificial intelligence predicts the other 15. Consider that for every pixel mathematically rendered, artificial intelligence infers the other 15, doing so with such precision that the image appears correct and remains temporally accurate, ensuring stability from frame to frame, forward and backward. This is a significant achievement.

Evolution of AI

Artificial intelligence has made extraordinary progress in just the last 10 years. While discussions about AI have been ongoing for a longer period, it truly entered the world’s consciousness about a decade ago.

The evolution began with perception AI, encompassing computer vision and speech recognition, followed by generative AI. The focus of the last five years has largely been on generative AI, teaching AI to translate between modalities: text to image, image to text, text to video, amino acids to proteins, properties to chemicals, and various other applications of AI to generate content.

«From a retrieval computing model, we now have a generative computing model… Rather than retrieving data, it now generates answers. Fundamentally changed how computing is done. Every single layer of computing has been transformed.»

Generative AI has fundamentally changed how computing is done. Shifting from a retrieval computing model to a generative one, the previous approach involved creating content in advance, storing multiple versions, and fetching the most appropriate version at the time of use.

Now, AI understands the context, the request, and its meaning, generating what it knows. If necessary, it retrieves information, augments its understanding, and generates an answer. Instead of retrieving data, it generates answers, fundamentally changing how computing is done and transforming every layer of computing.

Over the last two years, a major breakthrough has occurred: a fundamental advance in artificial intelligence known as agentic AI. This means an AI possesses agency, capable of perceiving and understanding the context of a situation. It can reason about how to answer or solve a problem, plan an action, and utilize tools.

Agentic AI understands multi-modal information. It can access a website, analyze its format, including text and videos, and even play a video. It learns from the website, understands the information, and uses this newfound knowledge to perform its tasks. This is agentic AI.

The next wave, already underway, is robotics, enabled by physical AI. This AI understands the physical world, including concepts like friction, inertia, cause and effect, and object permanence. The ability to understand the three-dimensional world will enable a new era of AI called physical AI, which will drive advancements in robotics.

Each of these phases opens up new market opportunities and attracts new partners to GTC, resulting in a highly attended event.

The presentation is taking place in a stadium setting. Last year marked the return to a live format, resembling a rock concert. GTC has been described as the Woodstock of AI, with the distinction that everyone benefits. Each year, more people attend as AI solves increasingly interesting problems for more industries and companies.

This year, the focus will be on agentic AI and physical AI. At the core, three fundamental aspects enable each wave and phase of AI:

The first is solving the data problem. AI relies on data to learn, requiring a data-driven computer science approach and digital experience to learn from.
The second is solving the training problem without human intervention. The challenge with «human in the loop» is the limited time available, necessitating AI to learn at superhuman and super real-time rates, at a scale that humans cannot match.
The third is scaling. This involves creating or finding an algorithm where the more resources provided, the smarter the AI becomes, following a scaling law.

In the past year, a misjudgment occurred regarding the computation requirement. The scaling law of AI is more resilient and hyper-accelerated. The amount of computation needed now, due to agentic AI and reasoning, is significantly higher than previously anticipated.

Reasoning and Computation Requirements

To understand why, consider what AI can do. Agentic AI, at its foundation, involves reasoning, breaking down problems step by step. It may approach a problem in different ways, selecting the best answer, or solve the same problem in multiple ways to ensure consistency. After deriving an answer, it might plug it back into the equation to confirm its accuracy, rather than simply providing a one-shot response.

Two years ago, ChatGPT, despite its groundbreaking nature, often struggled with both complicated and simple questions. This was understandable, as it relied on a one-shot approach based on pre-trained data and prior experiences, acting like a savant. Now, AIs can reason step by step, using techniques like chain of thought, best of N, consistency checking, and various path-planning methods.

As a result, the number of tokens generated is substantially higher. The fundamental technology of AI remains the same: generating the next token. However, the next token now represents step one, which is then fed back into the AI as it generates step two, step three, and step four. Instead of generating a single token or word, it generates a sequence of words representing a step of reasoning.

The amount of tokens generated is significantly higher, easily a hundred times more. This could mean generating a hundred times more tokens, or the model could be more complex, generating ten times more tokens. To maintain responsiveness and interactivity, avoiding delays, the computation must be ten times faster. Ten times more tokens and ten times faster computation result in a hundred times more computation overall.

The remainder of this presentation will demonstrate that the amount of computation required for inference is dramatically higher than before.

Training AI to Reason

The question then arises: how to teach an AI to execute this chain of thought? One method involves teaching the AI how to reason. As mentioned earlier, training involves solving two fundamental problems:

Where does the data come from?
How to avoid limitations imposed by human intervention?

There is a limit to the amount of data and human demonstration that can be provided. The major breakthrough in recent years has been reinforcement learning with verifiable results. This involves reinforcement learning of an AI as it attempts to solve a problem step by step.

Many problems solved throughout history have known answers, such as solving a quadratic equation or applying the Pythagorean theorem. Numerous rules exist in math, geometry, logic, and science. Puzzle games and constraint-type problems like Sudoku can be used.

Millions of different examples can be generated across hundreds of problem spaces, giving the AI numerous chances to improve through reinforcement learning. By using hundreds of different topics, millions of examples, and hundreds of attempts, with each attempt generating tens of thousands of tokens,

The process involves trillions of tokens to train the model. Reinforcement learning enables the generation of an enormous amount of tokens, essentially using a robotic approach to teach AI through synthetic data generation.

The combination of these two factors presents an enormous computing challenge for the industry.

Infrastructure Growth

The industry is responding. The following data shows Hopper shipments from the top four CSPs: Amazon, Azure, GCP, and OCI. These figures include only the top four CSPs, excluding AI companies, startups, enterprises, and other entities.

This provides a sense of comparison between the peak year of Hopper and the first year of Blackwell, illustrating that AI is undergoing an inflection point. It has become more useful because it is smarter and can reason. Its increased usage is evident in the longer wait times experienced when using ChatGPT, indicating a large number of users benefiting from its capabilities. The amount of computation necessary to train and infer these models has grown tremendously.

In just one year, with Blackwell just beginning to ship, the incredible growth in AI infrastructure is apparent.

This growth is reflected in computing across the board. The forecast from analysts projects an increase in capital expenditure for the world’s data centers, including CSPs and enterprises, through the end of the decade, to 2030.

Data center build-out is expected to reach a trillion dollars very soon. Two dynamics are occurring simultaneously:

The vast majority of this growth is likely to be accelerated, indicating that general-purpose computing has run its course and a new computing approach is needed. The world is undergoing a platform shift from hand-coded software running on general-purpose computers to machine learning software running on accelerators and GPUs. This method of computation has passed its tipping point, and the inflection is now occurring in the world’s data center build-outs.
There is increasing recognition that the future of software requires capital investment. In the past, software was written and run on computers. In the future, the computer will generate the tokens for the software. The computer has become a generator of tokens, shifting from retrieval-based computing to generative-based computing, and transforming the old way of building data centers into what are now called AI factories.

These AI factories have one job: generating the tokens that are reconstituted into music, words, videos, research, chemicals, or proteins, and various other types of information.

The world is transitioning not only in the amount of data centers being built but also in how they are built.

CUDA Libraries and Ecosystem

Everything in the data center will be accelerated, though not all of it is AI. The following information is particularly relevant, as it highlights the libraries that have been discussed at GTC over the years. This is what GTC is all about. In fact, 20 years ago, this was the only information presented: one library after another.

Software cannot simply be accelerated. An AI framework is needed to create AI, and the AI frameworks are accelerated. Frameworks are also needed for physics, biology, multi-physics, and quantum physics. Various libraries and frameworks, known as CUDAX libraries, are required for accelerating each of these fields of science.

[Jensen proceeds to describe various CUDA libraries and their applications across different industries, including CUDF for data frames, CULITHO for computational lithography, CUDF for data frames, and many others.]

This is just a sampling of the libraries that enable accelerated computing. It’s not just CUDA. While CUDA is valued, without its large install base, none of these libraries would be useful for the developers who use them. Developers use these libraries because:

They provide incredible speed and scale.
The install base of CUDA is now ubiquitous, present in every cloud, data center, and available from every computer company worldwide.

Therefore, by using one of these libraries, software can reach everyone. The tipping point of accelerated computing has been reached, made possible by CUDA and the entire ecosystem.

AI Everywhere

AI began in the cloud for a good reason: it requires infrastructure. Machine learning, as the science suggests, needs a machine to perform the science. Cloud data centers provided the necessary infrastructure, creating the perfect environment for AI to take off, enabling scientific research and computer science engineering.

However, AI is not limited to the cloud. It will extend everywhere. Cloud service providers value leading-edge technology and the full stack approach, as accelerated computing is a significant undertaking. It involves not just the chip and the library or the programming model, but also a whole suite of software. This entire stack is incredibly complex, with each layer and library being as complex as SQL.

SQL, known as in-storage computing, was a major computational revolution by IBM. It is just one library, whereas AI involves many more. The stack is therefore complicated.

CSPs also appreciate that NVIDIA CUDA developers are CSP customers, as they are building infrastructure for global use. The rich developer ecosystem is highly valued and deeply appreciated.

As AI expands to the rest of the world, it encounters different system configurations, operating environments, domain-specific library differences, and usage differences. As AI translates to enterprise IT, manufacturing, robotics, self-driving cars, and even GPU clouds, which are hosted by about 20 companies, each with its own requirements.

One of NVIDIA’s partners, CoreWeave, is in the process of going public. GPU clouds have their own requirements, and the edge is an area of particular interest.

Cisco, NVIDIA, T-Mobile, and Cerberus ODC are collaborating to build a full stack for radio networks in the United States, putting AI into the edge.

Annually, $100 billion is invested globally in radio networks and the data centers that provision communications. In the future, accelerated computing infused with AI will undoubtedly be the standard. AI can significantly improve the adaptation of radio signals and massive MIMOs to changing environments and traffic conditions, using reinforcement learning. MIMO is essentially a giant radio robot, and AI can revolutionize communications.

«Jensen, because of your work, I can do my life’s work in my lifetime.» – Jensen quoting a scientist’s comment to him, which he described as deeply moving: «And boy, if that doesn’t touch you, well, you’ve got to be a corpse.»

Autonomous Vehicles

Autonomous vehicles were among the first industries to adopt AI. The moment AlexNet was seen, it was an inspiring and exciting moment, prompting a decision to fully commit to building self-driving cars.

Work on self-driving cars has been ongoing for over a decade, resulting in technology used by nearly every self-driving car company. Tesla uses NVIDIA GPUs in its data center, while Waymo and Wave use NVIDIA computers in both data centers and cars. Some companies use NVIDIA technology only in the car. NVIDIA collaborates with the car industry in various ways.

NVIDIA builds all three types of computers:

The training computer
The simulation computer
The robotics computer, the self-driving car computer

As well as the software stack that sits on top of it, models and algorithms, just as it does with all of the other industries that have been demonstrated.

GM has selected NVIDIA to partner in building its future self-driving car fleet. The time for autonomous vehicles has arrived, and NVIDIA looks forward to collaborating with GM AI in all three areas:

AI for manufacturing, revolutionizing the manufacturing process
AI for enterprise, revolutionizing work, car design, and simulation
AI infrastructure for in-car systems

Safety, often overlooked, is a key area of focus, known internally as HALOS. Safety requires technology from systems software, algorithms, and methodologies, encompassing diversity, monitoring, transparency, and explainability.

These philosophies are deeply ingrained in every aspect of system and software development. NVIDIA is the first company to have every line of code safety assessed, with seven million lines of code reviewed. Its chips, systems, and system software are safety assessed by third parties to ensure diversity, transparency, and explainability. Over a thousand patents have been filed.

Blackwell and Data Centers

Blackwell is in full production. This is a significant advancement because it represents a fundamental transition in computer architecture. A version of this was shown about three years ago, called Grace Hopper, and the system was called Ranger. The Ranger system was about half the width of the screen and was the world’s first NV-Link 32.

Three years ago, Ranger was demonstrated, and while it was too large, the idea was correct. The goal was to