NVIDIA at Bank of America Global Technology Conference: AI Strategy Unveiled

Published 04/06/2025, 19:50
© Reuters

On Wednesday, 04 June 2025, NVIDIA Corporation (NASDAQ:NVDA) participated in the Bank of America Global Technology Conference 2025. The discussion provided insights into NVIDIA’s strategic focus on AI, touching on both opportunities and challenges. While NVIDIA’s platform approach and AI infrastructure advancements were highlighted positively, constraints like power consumption and adoption speed were noted as potential hurdles.

Key Takeaways

  • NVIDIA emphasizes platform building to foster innovation in the AI inference market.
  • Sovereign AI is identified as a growing demand, with global AI factory investments.
  • The company plans to monetize software through NeMoTron models and enterprise support.
  • Challenges include power limitations and the speed of AI adoption by enterprises.
  • NVIDIA’s focus remains on scalable AI factories and training clusters.

Financial Results

  • Token Economics:

- DeepSeek’s model generates 13 times more tokens than traditional LLMs like Llama 70B.

- The reasoning models present a 20 times larger market opportunity for inferencing.

  • Performance Improvement:

- DeepSeek’s math benchmark accuracy improved from 70% to 89% by doubling token generation.

Operational Updates

  • DeepSeek:

- The open-source reasoning model is priced at $1 per million tokens.

  • Model Size:

- 100 billion parameter models are now standard, with models up to a trillion parameters in use.

  • Blackwell and HGX:

- Blackwell GPUs offer 20 petaflops of FP4 performance.

- The B200 HGX platform boosts inference performance by 3 times compared to previous models.

  • Sovereign AI:

- 100 AI factories are under construction globally, including a significant investment in Taiwan.

Future Outlook

  • Model Sizes:

- Trillion parameter models are expected to become commonplace.

  • NVIDIA’s Strategy:

- Focus on AI factories for inference and training clusters at scale.

  • Software Monetization:

- Monetization plans include NeMoTron models and data center software like Lepton.

  • Growth Constraints:

- Power access and enterprise AI adoption speed are potential growth constraints.

Q&A Highlights

  • Inference Market Competitiveness:

- NVIDIA focuses on solving complex problems and building platforms for innovation.

  • ASICs vs. GPUs:

- NVIDIA’s platform is valued for its adaptability and broad ecosystem support.

  • Sovereign AI:

- Nations are increasingly investing in AI to apply to their unique industries and data.

Readers are encouraged to refer to the full transcript for a detailed understanding.

Full transcript - Bank of America Global Technology Conference 2025:

Vivek Arya, Analyst, BofA Securities: Good morning, everyone. Thank you so much for joining us on day two of the BofA Securities Global Technology Conference. I’m Vivek Arya. I cover semiconductors and semi cap equipment here at BFA. And I’m absolutely delighted and honored to have Ian Buck, the Head of Accelerated Computing at NVIDIA join us for this keynote.

I think most of you are probably familiar with Ian, but if not, Ian heads all the hardware and software product lines, third party enablement and marketing activities for GPU computing at NVIDIA. He joined the company in 02/2004, same year I joined Merrill Lynch. So I guess that’s the only thing we have in common, I believe. And he created CUDA, which remains the established leading platform for accelerated parallel computing. And before joining NVIDIA, he was a development lead on Brook, which is a forerunner to generalize computing on GPUs.

So we’re absolutely thrilled to have Ian with us. And before I get into the Q and A, I was just asked to read a brief statement. So as a reminder, this presentation contains forward looking statements and investors are advised to read Nvidia’s reports filed with the SEC for information related to risks and uncertainties facing their business. So with that very warm welcome to you, Ian, really appreciate having you. This is I think our third note session, so really appreciate you joining us.

Ian Buck, Head of Accelerated Computing, NVIDIA: Yeah. They we’re we’re working running in the AI time. So I I have a year ago feels like a lifetime Lifetime. And one of the most challenging parts of my job often is to try to predict the future, but AI is always surprising us.

Vivek Arya, Analyst, BofA Securities: That’s right. Bigger and better. So NVIDIA sorry. Ian, let’s just start with, you know, the big news that kind of rocked at least Wall Street early this year, which was, you know, the deep sea moment.

Ian Buck, Head of Accelerated Computing, NVIDIA: Right.

Vivek Arya, Analyst, BofA Securities: So how much of that news was a surprise to you, right? Because you have followed the industry for a long time. And what does it really mean for investors who are looking at that as some big seminal game changing moment? So what are the positive and negative implications of that deep sea moment from your perspective?

Ian Buck, Head of Accelerated Computing, NVIDIA: Deep it was a you know, there are a couple of inflection points in AI for sure. You know, you can go back to the original Google cat moment where AI recognized cats. You can go through the ResNet moment, or you can go through the ImageNet moment. In ’2 in 2022, we had the chat GPT moment, which I’m I’m sure the investor community all noticed as well. This in January, we had the the DeepSeek moment.

DeepSeek itself wasn’t a surprise. I think the company and DeepSeek and High Flyer have been around for a while. I think if you look at the history of the papers they’ve been publishing, it is amazing work. Actually, they’re one of the best CUDA developers out there in terms of getting all the way down. And if you if you read that DeepSeek r one paper and the v three, which it was based on, the amount of optimization that they’ve done for GPUs, for NVLink, for GPU direct RDMA, for for sending data across from the GPU over PCIe to the NIC over NVLink to build a training and inferencing platform and and solution and technology.

It’s truly amazing. What the moment though that really activated it was was reasoning. It was the first open world class reasoning model, and it was truly open. The pay they explained how they built, how they trained it, and the optimizations that they did to make it to train it at the level of intelligence and optimize the execution of the training and inference stack. And there’s some amazing graphs in that paper that that taught it.

It basically, you know, it was a barn door moment for reasoning models in AI. And today, I think the world would agree, you can’t really publish or celebrate a new model without it being a reasoning model. Reasoning wasn’t new. OpenAI had been publishing papers about using reasoning. O three and o four, many excellent, and Gemini, all reasoning models.

Gem but DeepSeek really made it ubiquitous, open, democratized it. The the implications for and the impact was not understood when it got launched. First off, it it by being open, anyone can run anywhere. Today, DeepSeek, r one is call it, a dollar per million token, where a traditional LLM like a LAMA 70 b might be 60¢ per million token. It’s a big model.

671,000,000,000 parameters. I think it’s 38,000,000,000 active parameters. It has over two a 20 plus odd layers and 250 experts with a shared expert. Like, that’s that is, like, stuff that only, you know, folks like Gemini or OpenAI, that level of of complexity and technology, you had a world truly world class open model. Running that level of complexity is really hard.

What it what I what has happened is now that and what makes reasoning so useful is the fact that your output tokens, you let the model think, you teach the model to think, and really kind of think out loud. If you’ve ever used deep sea guard one, it’s quite amusing to watch it think. It actually is just talking out loud, asking itself questions. It’s actually trained itself to come up with an answer by thinking out loud, and then it doesn’t give you that answer right away. It actually you can see it.

It checks the answer. So it’s taught it it’s it’s taught itself to check the answer and and make sure it’s right by double checking its math, and then it doesn’t give you the answer again. It checks it a second time, and that’s very intentional. They actually train the model to think for as long as it can till it comes up with an answer, check it once, check it twice, and then give you the answer. As a result, we’re seeing an explosion in the number of tokens generated.

You ask Wamla a question, you get an answer back at about a hundred words. That’s it. You pay for those hundred words of or call it 200 some odd tokens, 60 cents. DeepSeek, you’re actually you it reasons for about a thousand words, and then it gives you that hundred word answer, and it’s right. And while all those tokens you you’re paying for, by the way, you value at a dollar.

So in general, DeepSeek has kind of made every model a reasoning model. The inference demand as a result has kind The opportunity for multi GPU, multi node inference is everywhere. It well, it came actually at a great time for g v 200 because of of all those GPUs connected with NVLink and and Blackwell, and you’re seeing that now. With the increase in value, even of a a free open model like like Deepsea GAR one at $1 per million token, it generates about 13 times more tokens, 13 x more tokens.

That’s a, like, 20 x more total market opportunity for for inferencing because of reasoning. Actually, they just announced a new rev of Deepsea Grow One on the math benchmark. They went from the AME math benchmark. They were getting about 70% accuracy, 69 or 70% accuracy. It’s kinda like a c minus.

You know, 70%, it’s like you’re getting two out of three questions right. It’s not it’s not that great. They just the new one they did, they just updated the r one, same model, better better weights, same cost, is now 89% accurate. So they went to kind of a b plus just which is basically nine out of 10 questions right versus two out of three. And the way they did that, they taught the model to think longer.

So they just doubled the number of tokens they’re generating, how much out thinking out loud they did. So the again, as these models are getting smarter, it is driving more output tokens, more thinking and more opportunity for token revenue.

Vivek Arya, Analyst, BofA Securities: Do you think anything that DeepSeek is doing or what’s happening in China as as a proxy for, let’s call it CapEx constrained computing. So there is a lot more effort being made to make these things a lot more efficient because they may not have access. Do you think they are able to bend the cost curve in a way that has implications on, you know, how much spending needs to happen in this industry?

Ian Buck, Head of Accelerated Computing, NVIDIA: No. Actually, opposite. It they would they would they just made it. Whatever everyone was doing, they just talked about an academic paper. Computing has always been constrained.

Access to compute, amount of compute, dollars of compute, capital expenditure compute, the the AI race is about how regardless of how much compute you have, how efficiently you’re using it, how intelligent you’re using it, how much value you bring. Everybody wants wanted Hopper to chat GBT moment. That that wasn’t unique to DeepSeek. It was around the world. It’s just do you have the engineering talent to capitalize on it, to invent, to code your CUDA, do your know your InfiniBand, know your MVLink, know your optimize your transformer layer.

One of the big innovations that Deepsea did was they used a new technique called MLA, which actually is a the statistical method for approximating the the weights and the k v layers of the transformer layer. It it wasn’t a new idea. It actually had been deployed in image generation. All those fun, you know, draw me a picture of a teddy bear swimming in Olympic lab. They were using this MLA statistical technique, but it compressed the bejesus out of the transformer layer, made it a lot cheaper by approximating.

And and they were able to to apply it to to deep sea v three and r one. That was the first time they had been publicly talked about. Trust me, these methods are are being deployed and optimized. Just not everyone must DeepSeek themselves are doing the world a favor by sharing some of the state of the art research they’re doing with the world. But it’s happening everywhere.

And it was happening back in Hopper, it was happening even back in the a one hundred days as well. Got it.

Vivek Arya, Analyst, BofA Securities: You know, you talk with a lot of cloud customers. You know, many of them are developing their frontier models. Yeah. Are you seeing any kind of saturation or diminishing returns in the size of the benefits from increasing the size of these models? There was this kind of public story about Meta’s large language model where they are not getting enough ROI on it, right?

So do you see any saturation in the effectiveness of these models that again, because what this community cares about is CapEx at the end of the day. Right. So is there anything that is happening from a Western large language model perspective that gives you pause on how long and how big can Western, you

Ian Buck, Head of Accelerated Computing, NVIDIA: know, AI CapEx be? So the I I wouldn’t get too hung up on the Behemoth question. You know, there is Behemoth is an open model. There’s a competition in the open space. It’s hard to launch a model if it’s not world class, and and it relates to your brand and what you’re doing versus what you know, how does it compare to the all the models that are out there.

What I am seeing right now is a is is the drive toward, first reasoning models. Reason do they just add so much more value? They’re able to think and solve a problem, and that is only but based upon two things. One is how much knowledge they know, which is the size of the model, and how good are they at thinking using that knowledge to come up with an answer to a question. Traditional LMs simply regurgitated what they knew.

A traditional LAMA 70 b, 70,000,000,000 parameters, it was trained on the corpus of the Internet. When you ask a question, it’s it is really just trying to reconsolidate the information it knows and answer your question, but it can’t really think. And the the what the deep sea and the other models are doing right now is they take the corpse of the Internet. They they use that information to to think and answer a question. So what I’m seeing, they they and more they know, the quicker they can think, the more accurate the answer they come up with, or the cheaper their answer is.

So have we have a conf a conflation of taking all the knowledge that they they know and baking it into the model repeatedly. And the more questions that get asked, the more data now and more answers that they can invest into the model itself. We don’t need to like, you and I don’t need to know that 50 plus 50 is a hundred, but that’s because we just know it. A first grader needs to actually do the math and carry the one and and and make it make it happen. But once they’ve done that, it’s now part of their inherent knowledge.

Think about ChatGPT. Think about Grok. Think about MedAI. Every time someone’s asking a question, they are expanding the corpus of they think about that answer. Now that answer gets baked into the model into the model itself, and the models are constantly training and retraining and retraining and retraining.

So they are both inferring, adding you know, making money or adding value to the customers and also make being smarter and their intelligence, how much they know is strictly the size of their model. So that’s why the models you know, when we were talking last year, a hundred billion parameter model plus was a rarity. Now a hundred million billion is kind of like sort of table stakes going to 600,000,000,000, and obviously, we have models out there that are in the trillion, but they’re not open. So the the that’s because they’re adding value. There’s a there’s a benefit to them, that model being smarter to answer the question quicker or answer more valuable questions even further.

The tricks that are happening are the tricks in executing the model. The MOE experts, which is a hard thing to do, actually picking throughout the whole model, which parts of that knowledge I should pull from and and compute on versus skip is where a lot of the innovation’s happening. So there’s a little bit of this race right now of model size and active parameters. Traditional LLMs don’t they’re not MOE. They just compute on every piece of knowledge they know.

And you and I both know that’s not very efficient to take all the knowledge you know and process it relative to what my answer is. So the question of and so that’s what experts are. They split the model up into little pieces. And throughout the whole thinking path, they’re trying to prove and only pull in the right parts. And DeepSeek made public what a lot of our were doing, which is having the experts in every layer of the stack.

So we kind of are the models are getting bigger. The active prem it’s a race between that and the active parameters to answer a question. You’re only seeing a small glimpse in the public papers of what the, you know, the true behind the scenes world class work is actually be able to do.

Vivek Arya, Analyst, BofA Securities: So a year from now, how large of model sizes will you be talking about?

Ian Buck, Head of Accelerated Computing, NVIDIA: Well, we’re already using trillion parameter models today. You just don’t know it. The active parameters, it’s a are highly variant.

Vivek Arya, Analyst, BofA Securities: Right.

Ian Buck, Head of Accelerated Computing, NVIDIA: It is and the techniques and every piece of idea that you can use to trim how much compute you use, like you said, your previous question is being applied, researched, figured out. What then happens is that the other way of of optimizing for compute is distillation. So you take the trillion parameter model, and if you fine tune it, you can limit the use case or limit the application to a to a vertical or narrow workspace, you can get you can reduce down to a 70 or 7,000,000,000 parameter model, and there’s lots of that. Quick small models like for doing, you know, search texts, you know, that when you type in your your text on your phone, it’s expanding the sentence for you. That’s a very small model, which can be finely tuned to you, personalized to you, and what you may be doing at that moment.

So we see an explosion of, like, this of of of vertical models. A hugging face right now, I I can’t remember there’s if you search for a llama on hugging face, you’re gonna find bazillions of distilled models. By the way, all those is some models also need to be computed on, and they’re constantly being regenerated. They’re one of the big consumers of of GPU is distillation, taking a big model, running inference on it, creating smaller models. So we are they start from a really highly intelligent one, and they distill down.

So I think we’re all getting to trillion parameter models now. There’s talk, you know, the you know, when do we get to the 10 t and how many active parameters and what does that model actually look like in terms of the optimization stack is pretty funky.

Vivek Arya, Analyst, BofA Securities: The next topic Ian would love to get your perspective on is NVIDIA’s competitiveness as the word moves to more inference. Right? In that, training, I think there is recognition that NVIDIA has done an outstanding job, but as we go to inference, there’s a fragmentation of workloads, optimization, etcetera, etcetera. One of your GPU competitors has added a lot more high bandwidth memory, and they are saying that’s better for inference. There’s a whole bunch of startups, right, who are promising lower cost per token, etcetera.

So how do you view NVIDIA’s competitiveness when it comes to the inference market? And even if we could compare it against a lot of the ASIC players that are out there?

Ian Buck, Head of Accelerated Computing, NVIDIA: It’s good question. NVIDIA thrives at things that aren’t hard. We just we just do. And we’re an engineering and technology company. I’ve got a a boss who’s passionate about solving the hard problems and letting other people make make money and innovate on top of what what we can provide as a platform.

And my my wife is I wanna update my bio. I’m just a platform guy. I’m just constantly building technology platforms to help other people make money. The inference is really hard. It is wickedly hard.

It’s actually in many cases, while while training is is hard for different reasons, trying to do a hundred thousand GPUs or going to a million GPU distributed training clusters and keeping that thing going at scale is a is a data center scale, reliability, networking, you know, one giant GPU problem. Inference is a myriad of optimizations. You start with numerical precision, 32 bit floating point, 16 bit floating point, eight bit floating point, four bit floating point, just to be you know, if we can use the opportunity, Blackwell has 20 petaflops of f p four per per GPU or petaflops. That’s a lot. You know, the fastest supercomputer in the world is measuring exaflops, which is only a thousand petaflops.

We got that in f p four. But making four bits work and come up with the right answer, you only have four zeros and ones. Like, that’s not a lot of lot of numbers. So that mathematically, numerically, getting an accurate answer by using only that is is requires expertise in numerical and quantization primitives that are that are extremely complicated. Go up from there, you have that you’re now distributing the model.

The model’s in front of a single GPU. It doesn’t even single piece of silicon. I don’t care who you are. And in order to get performance, you have to have multiple you have to connect multiple chips together to run-in parallel within the node. And then if you’re gonna do the high value models, you’re gonna actually have to run multi node and connect them all together.

And you’ve seen how complex and we share how complex, you know, the g b 200 MBL semantide is. On top of that, you have diversity workload. An AI factory is not gonna run just one model all day long. It’s easy to benchmark that one model. It’s easy to optimize your other one model, and certainly, you can it’s easy to build.

If you want just one run thing, you could build just you know, you can attune your architecture for that, but AI factories are gonna run every kind of models, and the models are gonna change. You’re buying a billion dollar AI factory. You’re gonna need to capitalize that expenditure for five years. You damn well better make sure that whatever you buy for for now, you’re gonna you’re gonna be able to run and capitalize and and create value for five years. And, you know, the future of AI is you go back five years ago.

You know, we were launching the first a 100. You know, I think I was still talking about ResNet to today. So that’s a really important and strategic investment for companies to make sure that they’re building an AI factory that can do all of those optimizations, all those techniques, run all those models today and next next year and the year after that all the way out to 02/1930. That’s why the platform is so critical. That’s why we work NVIDIA’s got to work with every single and we do with every single AI company to make sure that our platform is constantly innovating.

The innovations, we don’t we invent do some of that technology, but the vast majority of it actually comes from all of those companies like OpenAI, like Meta, like Grok, the Grok model at xAI, and as well as the entire academic community. And and amazing innovations come from there. And also DeepSeek. Vaster transformer was a student. He’s now a professor at at Princeton and just right there doubled the transformer performance because he figured out a way to run it more efficiently, more accurately, and with less cost.

So that is the inference market is about running every model across all those AI factories now and into the future. It’s a it’s a it’s a fascinating business model where think data centers are bought with a billion billions of dollars, 5 years of CapEx, and you end out end up charging, you know, dollars per hour or millions per token at this pennies. So

Vivek Arya, Analyst, BofA Securities: So if let’s say you were the head of AWS, how would you go about making the decision between ASICs or GPUs for your AI factory?

Ian Buck, Head of Accelerated Computing, NVIDIA: You should ask Matt that question. He’s a good guy. He I worked with him.

Vivek Arya, Analyst, BofA Securities: Well, they talk a lot about Tranium.

Ian Buck, Head of Accelerated Computing, NVIDIA: I’m sure. I know. And and they should rightly mean, there it’s building silicon is hard. I can do somebody who’s been involved with it for twenty years. It’s it’s hard and getting even more complicated.

So it’s no small feat to be able to achieve even what they’ve achieved. And I’m super happy. I mean, that’s impressive what they’ve been able to to anyone who’s gotten over the survived it and been able to do multi gen multiple generations and stuck with it is requires, you know, almost founder level CEO commitment to make it happen. You know, their values and every hyperscaler, they’re all building their own silicon. There are people who and they’re both our customers and also provide looking at alternatives, and they rightly should.

Their own and other other silicon, other opportunities out there. Each of them have to find what they need to optimize for and what they need to go serve and what they’re gonna do it for their business. So I can’t speak for Matt’s business exactly where he’s gonna be applying all those. And likewise with TPU, they’re all looking at they have an internal workload and an external opportunity. They’re all very passionate about making sure they provide our time to market, the latest NVIDIA GPUs and the customers and workloads that we bring to their clouds.

So our business with AWS and with everyone is extremely healthy and continues to grow. AWS launched well, the first launch actually, the b 200 h g x. We talk a lot about NVL 72, but the the existing b 200 h g x platform, which is just eight GPUs and VLINK connected, the same architecture that ChatGPT ran on with Hopper. We also do it with Blackwell. It’s a fantastic inference platform.

Right. It’s it runs all the same Hopper workloads, all on x a six. It carried over and immediately provided a three x boost for inferencing. So everyone who is on Hopper using h 100, h 200, HDX, as soon as they get on b 300, immediately, you’re getting a three x boost. And you see that in the artificial analysis benchmarks and everything else in terms of performance.

So AWS is an excellent partner. How they go and apply and and where they see their opportunity, everyone’s gotta define that niche or that area that they’re gonna add value with, and then how they’re gonna engage in a in a community. It’s it’s actually it’s one thing also to to win on a benchmark or do a certain workload. It’s a whole another game to try to activate an ecosystem and and developers and your platform into the market. Not all of them need to do that, and certainly some have chosen to work on certain opportunities.

But the the the undeniable part of it is that we’re constantly making things faster. We are lowering costs. We are making things more profitable as per the DeepSeek and b 200 example. And we just we’re doing that like annually. So each of them have to kind of choose where they’re going to provide value or differentiate.

Vivek Arya, Analyst, BofA Securities: So if I ask the question in a different way, which is today, if I look at $100 of spend on AI, 10 to $15 of that is going into ASICs. If we go out the next three, four, five years, what makes this 10 to 15 go to 20 to 25? What do you think would have changed in the industry or can change in the industry to make it more towards ASICs and away from from merchant silicon?

Ian Buck, Head of Accelerated Computing, NVIDIA: Well, it it you look you look at the problem wrong because the profitability. Your revenue your performance is actually your profitability, your your your gross margin. And you can look at, like, the cost reduction of and we we, you know, of a component. But, generally, when we look at it, we look at it as terms of there’s a billion dollars of AI factory that you’re gonna generate. How many tokens is it gonna output compared to the previous generation?

And how much more value that those tokens are gonna not just in strict dollar same dollar per token of the same model. But if you can deliver three x more tokens per per second, you would pay more for that. So the reasoning of the like, in a reasoning model, you get your answer faster and or be able to reason with a certain amount of time. You actually pay a premium for that. You know, asking what is 50 plus 50, go away for an hour, come back versus getting it right away is more valuable.

So the it’s a little bit the dollar spent on a data center, on chips is actually pretty small. If you actually look at the chip silicon cost, the or even just the price of the dollars they’re spending on the on the chips versus everything that goes around the chips. Oh, that’s around. Is you know, it’s it is increasingly a really important part of the value because it AI really isn’t or inference and certainly training is be is because of the value of reasoning in these large models. It’s not a single GPU chip business anymore.

It’s about connecting all those those chips together with high speed signaling with and as a result, liquid cooling to fit them all in one small space so they can all talk to each other at those speeds. The more you spread them apart, the slower the signals have to have to travel. Right. And so that’s why, you know, liquid coolant brings it all together. The complexity and the value that that brings is driving up the the it’s not because we want to spend that much more money or we want to run that fast.

It’s because the value that we bring with bringing that together drives up the revenue side of it. So I think the we will always look at previous generation. We always look at what the opportunity is and what others are able to actually achieve on the basket of workloads that we know is valuable now and what we do our best to predict is gonna be valuable in in a year or two years’ time. And then the good news is NVIDIA is always coming up with new GPUs every year now, new architectures every year now, and also optimizing the data center design every year. So I that makes my job a little easier.

Used to have to predict a three year horizon. Now I can think about now and the future. And if I get we get see another opportunity or we get a little bit wrong, we can just keep fixing and fixing and fixing it. So that’s the in terms of how do ASICs or alternatives play, I think it’s gonna be basically, you know, what what what niche, what vertical, what workload do they wanna optimize for what use case and what they wanna decide. I we NVIDIA’s goal is not to run every AI model everywhere.

Certainly, what goes in a Ring doorbell should be what the silicon inside a Ring doorbell should do or a hockey puck on your kitchen counter or what’s inside of your phone, and how that wants to work there. Where we’re gonna focus or I focus on is just that the AI factory for inference and the training clusters at scale, and increasingly those two things are melding together. And then also providing it as a platform and from with all my cloud providers so that all the startups, all the innovators, the next open AIs, and every enterprise can get access to the technology and and and capitalize on the opportunity, the revenue that the tokens bring to them, and also the token serving companies can can make money on on top. So it is it’s really important to look at the overall and then values of the inference brings in terms of revenue to the you know, if you to the the cost of compute, which is actually going up in percents, where the benefit in revenue and then and benefit is going up in x factors. That’s we’re seeing that.

And by and only by providing that kind of percents to xFactors do you get a growth trajectory that NVIDIA can can hopefully provide and will continue to provide in the future. Let’s say and when we look at our our value props, we look at our pricing, we look at our models, we’re always looking at at that net of, you know, through the chain, is everybody adding value? Is everybody able to capitalize it and able to continue to scale up and grow? And if you just look at it over time, it’s percents to to x factors to big x factors. At GTC, you often see the big x factors kinda in there.

But there’s that whole model that actually gets played out in that world.

Vivek Arya, Analyst, BofA Securities: Maybe one last one or two things. The new sovereign AI opportunity, how incremental is it? Is it just a lot of Western companies just deciding to spend overseas or is this truly incremental versus like the original build out of the internet was pretty kind of concentrated. Now as we are starting to see all these new AI factories open up, is this truly incremental demand, right, for this?

Ian Buck, Head of Accelerated Computing, NVIDIA: Yeah, definitely is. So when you go and talk to governments or nations or and actually a lot of the supercomputing my other job is HPC. I’ve been doing supercomputing for, it’s where kind of this whole thing started from. Those same people are now, like, getting are in the are in the are in the center of attention in every country because computing is important for their their nations. Then we just did a I believe it was 10,000 Blackwell GPU AI AI factory in Taiwan.

It’s for Taiwan industry. It’s owned by Taiwan. It’s there to help apply AI to manufacturing, whether it be silicon or automotive or city or civil or as a resource for the country. We have, we’re seeing in Japan, a country that is rich with data, with unique industries, with a unique population and demographics and country that’s facing significant change in in in how to grow, they’re building AI they’re building their own they’re using that data, building their own they’ve see AI as an as a national need, a computing need in order to basically apply their data, apply AI, apply computing to their their industries. And by consolidate by the government setting in, by the nation setting in, they can actually consolidate that as a as a national resource versus, you know, waiting for every single company or every single industry necessarily to build their own.

And they can pull some of those resources, and they’re a good partner with NVIDIA. Seeing the same happening in Germany. It has happened already in The UK. These are basically and they the the they know how to build them because most of those countries know how know why supercomputing is important, why supercomputing is important. Now it’s it’s really been elevated with with AI to execute.

So yeah. My the HPC and certain side of the business has exploded as a result, and they know how to execute. So it is a it is a a really exciting opportunity, and every nation sees the opportunity to be a player on the stage and and apply that. It starts with keeping their data, you know, keeping their computing local and also, you know, prioritizing it.

Vivek Arya, Analyst, BofA Securities: How large do you think it can be over time?

Ian Buck, Head of Accelerated Computing, NVIDIA: You know, it’s a good question. Today, are we are seeing about a hundred AI factories being built and assembled right now across the world.

Vivek Arya, Analyst, BofA Securities: And AI factory is how much, like a billion ish or how much how much is an AI factory?

Ian Buck, Head of Accelerated Computing, NVIDIA: Stuart and the others teams can can talk to it, but we we track it as a data center build that is we have either b two Blackwell or Hopper. It’s specifically designed for serving and and for tokens for industry. And that is a number that’s just gonna continue to track and grow over time. The actually, next week is GTC Paris and also ISC, two events at the same time, International Supercomputing Conference. You’ll hear a lot about AI factories and sovereign AI and the activities.

Vivek Arya, Analyst, BofA Securities: You obviously see So European Commission actually announced big projects earlier this year. Europe

Ian Buck, Head of Accelerated Computing, NVIDIA: Europe gets it. They absolutely, you know, gets the fact that they can and has the capability to deploy. US as well. Last week, launched at nurse down over Berkeley across the bay, 9,000 Vera Rubins. Actually, our first supercomputer announcement with our next generation Rubin architecture was announced with the secretary of energy.

And actually Jensen participated in in the announcement that’ll be deployed next year, 9,000 ver verruvans. And the mission at NERSC is open open science and also for industry for and named after the supercomputer is actually named the Doudna supercomputer

Vivek Arya, Analyst, BofA Securities: Yeah.

Ian Buck, Head of Accelerated Computing, NVIDIA: Named after doctor Doudna who invented who is, I guess, discovered And she was there, a wonderful woman, brilliantly intelligent, and as an example of using and why computing is important for for health care and and pharma discovery. So this and one of the purposes of this is to combine and and figure out how to apply both traditional simulation and AI together to advance scientific discovery and needs for the nation.

Vivek Arya, Analyst, BofA Securities: Got it. And maybe one last question. What do you think will create a constraint on this growth? Is it access to power? Is it customers may not be able to adopt this kind of annual cadence of products?

Is it just that CapEx demands are going up? Like what do you worry about the most as you look over the horizon?

Ian Buck, Head of Accelerated Computing, NVIDIA: You know, there’s a diversification that’s happening. Of course, it was the business is expanding. The number of players in the data center world is expanding. Certainly power, you know, how many megawatts do you have and how many gigawatts you have. And we track that very closely with all of our CSP partners, but also now increasingly with all of the NVIDIA cloud partners and GPU data center partners.

You’ve obviously heard of CoreWeave, but there’s there’s Lambda, another data, Nibius. There’s many, many players now. And the the template of how to secure a data center, secure GPUs for that data center, and align with customers is actually very and also on top of that, the software and infrastructure necessary to operate and run a it’s not not even just a cloud, just a GPU factory, an AI factory, a token factory, is starting to become fine tuned and executable and operationalized. There’s multiple things that are coming together to help accelerate the growth. Certainly, the hyperscale is not gonna do it, and they’re investing on time.

You can see there how much how much megawatts and and how many data centers. Microsoft just talked about the fact that they’re, this year, are deploying more new capacity than they this year alone than all the capacity they had three years ago. So there’s a up into the right curve in terms of and they shared what their next generation hundred thousand hundreds of thousands of Blackwell GPUs under one under one site that they’re building. And and they they talked about in their build keynote. Look at Scott Guthrie’s keynotes.

It great to see them talk about it. That is now but the there’s a diversification that’s happening in terms of where can everybody get their compute, certainly as more enterprises needed it, as more startups needed. They’re both going to the public clouds for sure, but they’re also looking at all the regional clouds and what they can do from a data center data center capacity. So the growth it require is being tracked by gigawatts of of compute that’s being put online, not just by CSP, but by the by the world, by by all the players. The speed of which the AI saw the deployment software and stacks get standardized or commoditized or understood or how fast they can deploy.

And and that as a result diversified. You’ll certainly hear about the big big ones, obviously, but that is a portion of the business. There’s a very long tail of and and sizable part of the business that is distributed that’s happening around the world, which is exciting because it it it’s more people being able to contribute, deliver the compute, make it available. I think the only other limiter right now is the speed of which, you know, people are coming up with new high value models and bringing them to the enterprise. The enterprise and that’s the all the Fortune five hundreds, their ability to take an AI model and have it add value to their business, whether it’s straight up lifting ChatGPT and putting that into a a help or the top of the search bar or to applying an ad revenue, to applying a better connecting and a a feed to inserting the right ad or the right product placement to to closing and making it profitable for them.

So that is certainly happening, and the and that’s where the the the speed at which that was the limiter there is just how many models, how many different techniques can be deployed in all those different use cases. It’s also really hard to track. I feel bad for you guys trying to figure that out. We get to but if you see the activity around AI for enterprise, that limit that is the the the demand generation that’s that we’re seeing across the across all of our consumption of our GPUs.

Vivek Arya, Analyst, BofA Securities: Got it. I know we are out of time. I did want to ask just one last question. What is NVIDIA’s ability to monetize software? And where are you in that journey?

Ian Buck, Head of Accelerated Computing, NVIDIA: Sure. We I’m gonna pause on the public statements on software monetization because I don’t have that off the top of my head. I don’t wanna say anything. But I I think you can we get to see some of things we’ve said in the past. Our we have sort of NVIDIA is an open company.

So my job is to make sure that their computing platform is available everywhere and provide to provide that compute, whether it be in the cloud directly, go all the way down to CUDA, all the way up to running PyTorch or running a model of Hugging Face. For the enterprises, there’s companies wanna work directly with NVIDIA. We have the opportunity to monetize working directly with NVIDIA on specific models and make it available. It’s not to supplant the community, but to provide direct engagement. And that comes in the form of providing a supported NeMotron model, which is a model that NVIDIA generates, has trained, actually my team, to provide that extra value directly to them.

The other opportunity is in the data center software itself. You know, a lot of our partners are looking for help to provide the infrastructure. And we’ve talked about Lepton before, that software to support the clouds And to take, you know, it’s one thing to stand up a data center full of GPUs. It’s another thing to operate it as a data center and be able to serve and host and schedule and and execute. That’s another use case where we can provide that value.

And in general, our software stack of all of our library, all of our KudaX and all of the inferencing software like Dynamo and everything else, customers want to be able to gauge directly with NVIDIA. We also offer that as an enterprise support so that we can they can have a direct relationship with NVIDIA. As our software footprint expands as where they wanna engage directly with us, we can directly monetize or provide a service to them, which they want to pay for. They want that engagement. And of course, can as as that value and as that goes to the broader enterprise, you’ll continue to see that number increase.

Vivek Arya, Analyst, BofA Securities: I can go on for another hour, but we are out of time. Sure. Thank you so much, Ian. Really appreciate your insights. Thanks everyone for joining.

This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.

Latest comments

Risk Disclosure: Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers
© 2007-2025 - Fusion Media Limited. All Rights Reserved.