Oracle at Oracle AI World 2025 Keynote: Strategic Advances in Cloud Technology

Published 15/10/2025, 19:02
© Reuters.

On Wednesday, 15 October 2025, Oracle Corporation (NYSE:ORCL) hosted the Oracle AI World 2025 Keynote, unveiling strategic advancements in Oracle Cloud Infrastructure (OCI). Led by CEO Clay McGuire, the conference highlighted OCI’s commitment to delivering high performance, cost-effectiveness, and security. The event showcased Oracle’s partnerships with TikTok and OpenAI, underscoring its position in the competitive cloud market.

Key Takeaways

  • Oracle introduced Acceleron, a new software and architecture to enhance I/O security and acceleration.
  • The AI Data Platform was launched, integrating AI models with private data for custom applications.
  • Oracle emphasized its multi-cloud strategy, offering flexibility with Multi Cloud Universal Credits.
  • Partnerships with TikTok and OpenAI demonstrate OCI’s capacity to handle large-scale workloads.
  • Oracle’s Dedicated Region 25 offers more functionality in a smaller footprint.

Financial Results

  • Consistent pricing across all regions, with free data transfer within OCI regions.
  • Lowest egress fees, enhanced by optimized Internet transit and backbone costs.
  • Zero egress fees between Oracle Cloud and Microsoft or Google Cloud within a region.
  • Compute offerings boast 7,000 times the configurations of competitors.

Operational Updates

  • Focus on bare metal servers for security and extensibility.
  • High-performance block storage service with improvements in throughput and scalability.
  • All services are available in all regions, supported by performance SLAs.
  • Dedicated regions inside customer datacenters, with OCI functionality extended to other Clouds.

Future Outlook

  • AI Data Platform integrates AI models with private data, offering fine-grained access control.
  • New Gen AI agent platform prebuilt with agents for common tasks.
  • Dedicated Region 25 provides more functionality in a smaller, three-rack setup.
  • OCI is continuously reinvented to enhance performance, efficiency, and security.

Q&A Highlights

  • TikTok’s infrastructure scaling supports over 1 billion users globally.
  • OpenAI relies on Oracle for industrializing compute, maximizing power and silicon efficiency.
  • Oracle’s partnerships emphasize stability and proactive goals for seamless operations.

Oracle’s strategic moves at the Oracle AI World 2025 Keynote underscore its dedication to innovation and customer success in the cloud infrastructure arena. For a deeper dive into the detailed discussions, readers are encouraged to refer to the full transcript below.

Full transcript - Oracle AI World 2025 Keynote:

Operator: Please welcome to the stage, chief executive officer of Oracle, Clay McGuire.

Clay McGuire, Chief Executive Officer, Oracle: Well, thank you all for

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: being here. I don’t know if you can

Clay McGuire, Chief Executive Officer, Oracle: see this stage is extremely large. It’s not that I’m just very small and walk slowly, it’s actually a large stage. So first I want to thank everybody for being here. I know that you’ve got other things you can do with your day than to come out to this event and to hang out with us. But I also promise that if you haven’t made friends with your neighbors yet, here’s my advice.

I did this yesterday. Make friends with your neighbor and then every ten to fifteen minutes just kind of scoot back and forth between your seats, it keeps the blood flowing. Cause if you have to sit here for a good two hours, wasn’t clear. This is a multi hour presentation, but it’ll it’ll help you if you kind of prevent blood clots and nerve pain. So if you need a moment to do that, do it now because this is a very serious presentation.

Okay. So why does OCI exist? I joined Oracle more than eleven years ago and before that I had spent my career working on Cloud infrastructure. And Just so if people don’t remember, joining Oracle in 2014 to work on building a new Cloud was not an obvious decision. Many people in a polite way were asking, Well, do we really need a fourth Cloud provider?

Is this really what you should be spending your time on? And look, I’ve made a lot of good decisions in my life. The people that know me also know I’ve made some wrong decisions, but joining Oracle and building OCI is not one of those wrong decisions. So I spent a lot of time thinking about why OCI needed to exist and with that context of the previous experience, it was clear to me that the industry needed a different Cloud. Now, our mission that we’ve been working on for those past ten years has been consistent and it’s actually quite simple.

Our goal is to be the highest performance, lowest cost, and most secure infrastructure that we can be. Now, sometimes when I talk about this, people think that I’m talking about, hey, we want to be higher performance compared to our competitors or we’re trying to be lower cost than somebody else or we want to be more secure. That’s not what I’m saying. I’m saying something fundamentally different. We want to be the highest performance, lowest cost and most secure infrastructure that we can imagine.

Our goal is not to be better than competitors. Our goal is to be the absolute best that we can be. That’s easy for us to say and it’s very hard to deliver. Now, one more thing to add to that is that it’s not enough to be those things, you also have to be a Cloud that’s available where customers need it. That means more than just the country or the city where the Cloud is located.

It’s also about the configuration it’s delivered in, the specific location, as well as how that Cloud is governed. Still, even with that little addition, this does not appear to be a very complex goal. Along the way in building towards this goal, it’s actually quite easy to get distracted. It’s easy to get distracted by new features. It’s easy to get distracted by new services.

Customers have infinite wants and they are not shy about telling you exactly what you should be doing. They think that you should be building the new things that they want that are going to help solve their problem right now and we love that feedback. But Cloud infrastructure is constructed of many layers, and each one of those layers is utterly dependent on the layers below it. It takes a lot of skill and dedication to build an architecture that’s resilient to the constant change required by Cloud infrastructure. It’s even harder to build something you can improve upon throughout the process.

But that’s what you want. You want something that gets better with time, not worse. Now, great systems are extensible. To be extensible, you have to have an architecture that sees into the future, anticipates unknown improvements across both hardware and software. You need a system that improves across all the layers of the stack, not just at the top.

That’s what we’re focused on every day at OCI. Now, to begin with, I want to review some of the most important choices we’ve made over the past few years and how they directly contribute to these goals. So, I don’t know if any of you have been in the industry long enough but I remember the time before virtualization. We didn’t call things bare metal, we just called them computers or servers and that’s the way pretty much everybody did computing. Then obviously through the advent of virtualization and then the advent of the Cloud, suddenly VMs became the standard and so having an actual whole server really became exotic.

When we designed OCI, we made a very conscious choice to focus on bare metal servers upfront. Why did we do that? Well, one major reason is security. By implementing bare metal first, it meant that us as a Cloud provider actually have no software that runs on your machine. So when people provision a bare metal server, they have complete control of that.

I can’t see what’s going on in your memory, I can’t see what’s going on in your CPU. We also did it because of extensibility. What I mean by that is that our virtual machine service, right? When you provision a VM on OCI, it’s actually built on top of our bare metal server. I don’t just mean the hardware, I mean the actual service.

The reason that’s important is because it actually enables other people to go out and then build extensible platforms on top of us. It’s also about hardware flexibility. We knew, right, when we started ten years ago that there were going to be a lot of different types of hardware that we needed to plug in to our infrastructure. We knew we wanted different servers, we knew we had different storage appliances, we knew we needed to do things like Exadata. This has proven extremely valuable to us as we look at how do we optimize for different hardware accelerators especially in this AI era.

Along the way, doing bare metal forced us to invent new security. We invested a lot of energy into our hardware root of trust. We had to invent off box network and storage virtualization. The thing to remember here is that bare metal is, was and always will be a first class citizen in OCI, specifically because it fulfills all of our goals around performance, efficiency and security. Now, Oracle database as you can imagine at Oracle is a very important piece of technology and through a combination of things like Rack and Exadata, it was very clear to us that we would need to support RDMA networks early on.

We took on that challenge and we made a secure way to actually dynamically hard partition RDMA networks. That means you get the full performance benefits but all of the security enhancements you expect from a fully virtualized Cloud. That same design then enabled our HPC and our GPU networking environments. We also focus on the highest performance block storage service that we can create. We designed this service for the most extreme workloads and we’re constantly adding more throughput, more IOPS, and more scalable volumes all the time.

Now, one of the big decisions we made early on was to make sure that all of our services are available in all of our regions. Now, that may seem obvious as a thing to do, but it turns out it’s actually not how many other people do it. What they have is a giant Swiss cheese chart where you have to look up a decoder ring and there’s multiple dropdowns and you pick your region and then you pick to see what services are available and what hardware types are available. We found that to be far too complex. Instead, we have a simple solution.

Everything is available everywhere. That makes it easy for customers and it actually makes it easy for us because we’re also a customer of our Cloud. So anything less than that, we find just creates far too much complexity. Now, performance is also something that was important to us but we also wanted to commit to our customers. So we created performance SLAs that apply across all of our regions and all of our different region types.

This gives customers the comfort they can always rely on the best performance from Oracle. Now, when pricing is complex, customers no longer understand it. Suddenly, are entire job functions that are created to try to understand the complex pricing. They invent new tools to be able to analyze and understand their bills. We found it’s actually much simpler if we just have a single consistent price across all of our regions.

Accessing your data should not be expensive. We made a decision early on that within an OCI region across all of our datacenters, it would be free to transfer data around. We also made a huge amount of energy, put a lot of effort into optimizing our Internet transit costs and our backbone costs, which resulted in us having the lowest egress fees by a factor of 10. Then we went and worked with our partners like Microsoft and Google such that within a cloud region, our multi cloud interconnect enables zero egress fees between our cloud and their cloud. Now, it’s not enough to focus on the aggregate availability of your Cloud.

We found it was really important to focus on the availability and performance of each individual VM. So to do this, we focused on optimizing these availability through a combination of KSplice, which supports zero downtime kernel upgrades, as well as implementing live migration, which enables us to do hardware maintenance without customer reboot intervention. We chose to make our infrastructure building blocks flexible, not brittle. What that means, say about compute, is that you get exactly the cores and the memory that you want and you’re only paying for what you use. Our load balancer is infinitely flexible and scalable.

You don’t have to pick between 15 different block storage volume types. There’s a single volume type and you can dynamically change the performance in real time. So how much more flexible is our compute offering? Compared to our competitors, we have 7,000 times the configurations. Now, you might think that actually adds complexity but it doesn’t because it turns out that in those different configurations, each core costs exactly the same amount and each gigabyte of RAM costs the same amount.

So understanding what you’re going to pay for is simple. Write down what you need and you just pay for that. We designed our network, our hardware and our services to scale down as well as to scale up. We designed our operations to scale out. We needed to deliver and operate an immense number of regions.

The combination of those two things enabled us to provide dedicated regions for individual customers. Each year, we continue to deliver more and more dedicated regions inside customers’ datacenters, bringing the best of OCI where they need it. Our ability to scale down and our ability to build and operate so many regions also opened up new possibilities for us. Suddenly, could put OCI inside of other Clouds and we could bring the full functionality of our data platform to the other Cloud environments. What that means for customers is that they get the same exact service, the same great hardware and it’s available in all of the Cloud environments.

I don’t know if any of you have tried to do this, but I personally think it’s actually really hard to start your own Cloud from scratch. Don’t ask how I know. We built OCI as an extensible platform, making it easy to extend with new services. By doing that, we enabled ourselves to create Alloy which combines OCI, Fusion applications, and other custom IP and the combination of that enables others to become Cloud providers themselves. Now, it’s kind of a whirlwind tour of some of the key choices we made along the way, but it’s just a subset of many of the important decisions that we’ve made.

The thing I want you to take away from that is that we’re constantly focused on living up to our commitment to be the highest performance, lowest cost, and most secure infrastructure possible. We get closer to that ideal every single day. Let’s take a moment to take a look at a customer that’s taking advantage of a bunch of this technology.

Operator: With over a billion users worldwide, TikTok has transformed how people discover, create, and connect. What began as a platform for short videos has become a cultural force, shaping music, fashion, education, and more, turning everyday creators into global stars. Powering that experience is ByteDance’s world class infrastructure built to empower creativity at scale. Through its partnership with Oracle, TikTok has been able to deliver a seamless experience to users around the world. Please welcome to the stage, head of infrastructure engineering of ByteDance, Fengfei Chen.

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: Thank you.

Clay McGuire, Chief Executive Officer, Oracle: Songfei, it’s been a it’s been a journey, sir. Thank you for coming.

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Thanks for

Clay McGuire, Chief Executive Officer, Oracle: having I picked up my iPad which make sure I hit all my critically important talking points. It’s it’s really hard to memorize things. Don’t That’s not a skill set I have. I don’t have many skills but memorizing stuff is not Fengfei is much better at remembering than me. You you agree?

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Yeah.

Clay McGuire, Chief Executive Officer, Oracle: Okay. Good. Okay. We’ve been working together for, you know, more than five years. Since the beginning of this partnership, we both know that every one of these videos is going to run on some of the most sophisticated infrastructure in the world.

What I can tell you I didn’t anticipate and I don’t think you fully anticipated was how fast we had to scale this infrastructure up, both in a single location as well as globally, and all of the unique engineering challenges that came with that. For us, it’s not just about providing the Cloud infrastructure, but we really had to come together and engineer this as a team to make sure we have the performance, the reliability, and the scale. So let’s start there. Can you tell us what is at the heart of TikTok?

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Sure. Again, thanks for having me and it’s very exciting to be here after working on this for five years. So, TikTok’s mission has always been to inspire creativity and bring joy. That means we want to be the canvas for people to create, the windows to discover, and the bridge to connect, and really doing that on a global scale. When we say global scale, today on the platform we have well over 1,000,000,000 users globally.

Within The US alone, we’re talking about over 170,000,000 users who generate approximately 20,000,000 videos every day. They were also probably supporting 7,500,000,000 small businesses on the platform. So, as the infrastructure guy, this really translates to a ton of infrastructure demand. And we’re really talking about millions of servers, zettabyte scale of storage, and hundreds and hundreds of terabits per second of network capacity. And then today even the smallest deployment that we put together will require tens of thousands of servers.

So, that really set a high bar for infrastructure provider like Oracle and

Clay McGuire, Chief Executive Officer, Oracle: We’re learning that Fengfei’s memorization skills are about as good as mine.

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Let’s see.

Clay McGuire, Chief Executive Officer, Oracle: Does that help Fengfei? Yeah. So, agree. We are really awesome.

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: You are awesome.

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): The example we have with Oracle here is that the way we integrate with OCI really is deeply at the network layer. Because of that, we need hundreds of terabits per second of interconnection traffic. Because of the way we integrate that way, that actually translates to thousands of fast connect. I believe that’s one of the reason Oracle was the first to release 100 gs fast connect and then 400 gs. So, together we’re really together pushing the boundary of infrastructure evolution and we need to continue to do that because the business grows and since 2021 when we first deploy, we really have seen a 60 increase in our monthly active users.

If you add the capacity and new features to the platform, our infrastructure need to scale even beyond that. So that’s really at the core of the problem we need to solve.

Clay McGuire, Chief Executive Officer, Oracle: Well, I could not agree more. I feel like we talk about these numbers these days. I remember when a terabit was a lot and I remember when a petabit didn’t exist, Now suddenly we throw them around like, Oh, it’s just a few 100 petabits of network traffic. It’s fine. To do this, we really had to invest a significant amount of effort to design this network fabric together.

I remember us on multiple iterations of the JFab in Virginia for example. And there has been a lot of learnings that I know we at Oracle have gained from working hard to maintain and operate consistently at this scale. So what I would say here is that I wanna thank you and the team. It was not always an easy road. Sometimes there are bumps and it has been amazing to work together to solve those problems.

I’d like to understand a bit more about what’s actually driving the scale in that growth. Right? You talked about 60% growth. Now, I don’t know if many of you but like TikTok was pretty big in 2021. So 60% growth in the past four years is huge in monthly active users.

So at some point, there’s just not more people on earth to use the service which is a separate That’s that’s not my problem yet with Oracle Cloud Infrastructure. So you know, we we’re still We’re not quite reaching saturation of all people on earth. But one of the things that I I remember was huge was when you actually launched TikTok shop. What are some of the new ways that your users have used the platform that you just didn’t expect early on?

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Yeah. TikTok shop is definitely a great example and it’s unique because it’s a different type of shopping experiences. It’s different from the traditional web based shopping. It’s all about live streams and in the past three years we have seen the number of livestreams doubled on the platform. For shopping events like Black Friday, you can even observe number of shoppers even double on a single day.

The good news is some of these shopping events are predictable and so the way we handle the capacity is to basically plan them ahead of time. The way we work with Oracle here is we plan in terms of demand and supply just like how would you plan for your supply chain and we do that in lock steps. The challenges comes in when plan sometimes have to change and very often on short notice. Thanks for the flexibility the OCI team has provided to us, so we can accommodate those changes. And at the end of the day, we’re making sure we have enough capacity at exactly the time we need it.

Other type of events may not be so predictable. So, the way we handle those are, we need to build a smart load balancing system so that we can take into consideration all sorts of information we can gather, including those telemetries we get from OCI and we’re even sometimes tap into something deeper like data center information, temperature, and power cappings. So, we can really respond to load spikes fast and precisely while maintaining the efficiency and stabilities.

Clay McGuire, Chief Executive Officer, Oracle: Yeah. Completely agree, right? And I think obviously in the retail space Black Friday is critical. One of the things I’m very proud of is that I think our teams have an amazing Black Friday readiness program that we implemented. We do that every year.

If you didn’t know, it’s coming up again. So it’s really important I think that we do that and we make sure your customers are happy again this year.

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Look forward to that.

Clay McGuire, Chief Executive Officer, Oracle: Yes. When you manage such a large scale infrastructure and you grow at such a fast pace, what do you have to do to maintain that user experience? Because look, if Fengfei runs all of this infrastructure both a Cloud as well as a whole bunch of other infrastructure, But you’re kind of in that middle ground where we’re here and then you’re there and then you’ve got all your customers yelling at you. What do you have to do to make things work?

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): That’s a great question. At TikTok, we’re really obsessed about user experiences. And as a matter of fact, we even test almost all of our engineering work against user performance, user experience metrics. The metrics could be whether the user liked the video, whether they finished watching the video or they just simply swipe through the videos. And we do that even for the infrastructure layer.

So, this is definitely beyond the typical network latency or response time type of metrics. Basically, we want to make sure whatever infrastructure solution we put together, user will stay engaged with our platform. Once you establish those user experience metrics, you really would consider stability as the most important factor contributing to user experiences. And you definitely want to avoid large scale incident failures outages at all cost. But to me, very often the smaller things matters as well.

It could be a minor code box or someone pulled the wrong cable in the datacenter, that happens and it could cause a cascading failure to your infrastructure. For that, you cannot simply just rely on the SLAs because those are the minimum requirement. We need something more proactive and more day to day. So the way we work with Oracle is we actually establish a set of joint stability goals. Those are top down goals, which means those are sponsored and supported by senior leadership on both side.

So, the two teams, when they conduct daily work, they have stability as the top priority in their mind. When they de level new features or they roll out changes, they will have stability as priority. So, on top of that, we share full transparency at the infrastructure layer. We write our operation procedures together. So really the two team works seamlessly like a team together.

I have to say there’s no secret here what we have done together. There’s a lot of hard work together and you have to do that consistently every single day.

Clay McGuire, Chief Executive Officer, Oracle: I completely agree. Look, the the reality is is that when we are having problems, which we hope are rare and they are, the key is to have both of our teams working together. Right? And I’ve seen it many times. There’ll be something happening whether it’s a plan for the future right?

A game day kind of like what we do at Black Friday or an actual operational incident and we bring everyone together. We’re looking at the same data, we have all access to the same whiteboards. We have definitely learned a lot and what I wanted to say to you and I think to everybody at TikTok, I can tell you from personal experience, OCI and Oracle would not be where we are today without all of the opportunity and the learnings that we’ve had from serving you and your customers. So, thank you very very much.

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): We really appreciate the partnership as well. Really appreciate the comment as well. TikTok grows, Oracle has definitely played a very vital role in that growth. And, I do want to take the opportunity to thank teams from both sides for their hard work which made everything possible. And, I also want to emphasize that the work we do really matters because TikTok is not only a just a fun app, it’s a platform that create opportunities for millions of people and businesses and we’re really trying to make the world a better place.

So, for that we’re sharing our view to our economic and social impact reports. I strongly encourage the audience to take a look.

Clay McGuire, Chief Executive Officer, Oracle: Look Fengfei, it has been amazing to work with you over the past few years. I’m very excited for what comes next and thank you for coming out here and sharing your story with everybody.

Fengfei Chen, Head of Infrastructure Engineering, ByteDance (TikTok): Thank you. Likewise. Thank you. So,

Clay McGuire, Chief Executive Officer, Oracle: we’ve covered a lot of the investments into our foundation and the very real impact that those investments are having on our customers. Something that may not be obvious but it’s important that you all understand is that during the same time period, we’ve also been performing a major upgrade to the foundation of OCI. It was not enough to design OCI differently. We have to continually refresh our architectural core to take advantage of advancements in hardware as well as from everything that we’ve learned both in software and operationally. Today, I’m very pleased to announce Acceleron after many years of hard work.

This project is directly focused on our core mission of performance, efficiency, and security. Acceleron is already today used by all of our customers in some fashion, and with these new additions, will significantly improve the infrastructure experience for all of our customers. So what is Acceleron? It’s a combination of our software and architecture for securing and accelerating all of our input output. It’s a combination of host accelerators, fabric architectures, and fabric accelerators.

Now, we’ve been working on this for a while and so what I want to do now is take a minute to cover some existing investments and today’s new capabilities. First, let’s start with dedicated network fabrics. Now, I don’t know how many of you have tried designing a network with zero networks. So if you have like a, it turns out it doesn’t work so good. Great for security, security will love it.

If you just take everything off the network, super secure. Does it improve performance or availability? Also, something to think about is that when we design Cloud networks, they need to be designed to be non blocking, not oversubscribed to enable flexible placement in a multi tenant environment and that’s what we do. So at a minimum, you need one of those networks. You then get the choice, is one network enough?

We made a very conscious decision to move from one network to two, specifically to enable RDMA fabrics for things like Exadata. But we had to design a system that provided complete RDMA performance while also supporting multi tenancy. And that network served very well for Exadata and then became extremely important for our HPC business. Now, AI also needs RDMA but different. AI workloads need a much bigger cluster size and they care a lot more about total throughput than the absolute lowest latency.

So what we’ve done is we’ve created a unified architecture that allows you to scale up and down the size of your dedicated networks. They can be either latency or throughput optimized and all of this is configured in a secure and hard partition manner that gives you all of the performance you expect from a dedicated network, but all of the security benefits you expect from a Cloud virtualized system. The next thing I want to talk about is disintermediation. Now, disintermediation really is the concept of removing something. So first I might need to explain what it is that we’re removing.

Anyone that’s ever configured networks, you have basic network functions and then you have advanced network functions. Hey, I want to just make a connection, talk to somebody, that’s a basic function. But if you want to do things like network address plant translation or peering two networks together, traditionally that’s done through what are called middle boxes. Middle boxes can be physical or virtual. The good thing is they add this functionality.

The bad thing is that they add latency, they can be performance bottlenecks, they can be difficult to scale, and they can result in overall reduced availability. So, the solution to this is to get rid of the middle boxes and that concept is called disintermediation. It sounds easy but it’s actually quite hard to implement. To do this, you need a very flexible software architecture that allows you to seamlessly move network functions from one location to the other. We’ve been working on this for years and it’s actually already deployed across many of our different network systems.

The net result is that you get significantly lower costs and if you do it right, you can then pass on those savings to customers. That’s a big part of the reason why before when I was talking about our pricing, we have significantly better network fees. It’s because we’ve done engineering work like disintermediation. Next, let’s talk about Converged NIC and this is something new that we’re launching now and comes out with our next generation of hardware. So before I can talk about Converged Network Interfaces, let’s talk about the existing architecture we’ve had at OCI since this inception.

We started by having a separate host NIC and a smart NIC and the reason for that is purely based on security. The host NIC is given to the customer, they have complete control over that. The smart NIC is controlled by Oracle and they only talk to each other over a network interface. Okay. Our original architecture optimized for security over ease of use and performance.

The downside of having these two separate NICs is that it can be expensive because you have to pay for two NICs, you have a performance hit because you have to process the package twice, and you only can expose a network interface. You do not have the ability to do things like expose an NVMe interface. However, we did this because it made bare metal possible. It’s actually pretty easy to get rid of the host NIC and have only a single SmartNIC. You just take the host NIC out, put the SmartNIC in, you’re done.

To do that though, you have to rely on compute VMs for isolation. You reduce your costs and you get better efficiency and latency, but this reduces your security posture and this is just not an option we were willing to accept. So, what we did is we went out and we designed a new architecture. What this architecture does is on a single Smart NIC, we have a hard partitioning between the customer NIC and the provider NIC. What happens is we have dedicated cores and memory for the customer NIC, we have dedicated cores and memory for the provider NIC and they still only communicate over network packets.

Instead of though having two separate cards and an Ethernet cable in between, what you have is you have two separate sets of dedicated hardware with a shared ring buffer that you process as network packets in between. We did all of this in collaboration with the AMD networking team and it’s been an amazing job so far. I’m very excited for the benefits this provides for us and our customers. So now with a converged NIC, you get all of the security benefits we talked about with two separate NICs, you also get an NVMe interface for block storage, you get line rate encryption for all of your traffic, you get seamless patching of even your bare metal host NIC. So suddenly I can do bare metal but still patch your NIC for you and you can get twice the available throughput for compute because you’re not processing the packets twice.

Next, I wanna talk about Zero Trust Packet Routing. Traditionally, network architecture and network security are intertwined. Right? If you go look at any complex network, you’ve got a whole bunch of subnets, ACLs and you’ve got routing rules and the reason for those is both for connectivity as well as for enforcing security boundaries. With Zero Trust Packet Routing which we launched last year, it enables you to write security policies in a security policy language for networks.

Your network architecture is just about network architecture. The result of that is that now you can analyze your security policies individually. It’s much safer and much easier to use. So to illustrate the enhancements that we’ve added to Zero Trust packet routing, I’m gonna walk through a simple example. And that example is object storage.

How do we prevent it from being used to exfiltrate data out of a cloud environment? Well first, we enable private access from our database to object storage for things like backups. So this zipper policy at the bottom, what it does is it enables those database hosts to talk to object storage but only through this private service access that’s been created. Next, we then use an I’m Deny policy to prevent any usage of object storage except through private service access. What that means is that the combination of those two policies prevents any usage of object storage from the Internet.

That means that even if someone were to steal some credentials that would in theory give them access to that object storage bucket, they can’t get that data out through the internet because the through a combination of zipper and I’m denying policies, it’s just not allowed. Alright. I want to talk a bit about multi planar networks, but before that, I probably need to explain what I mean by it. Almost all the networks that we think about today exist inside of a single plane. In fact, it’s so common, we don’t really talk about it.

Single plane networks expose a couple of single points of failure, typically between the hosts and the next layer in the network and oftentimes your T0 layer is also a single point of failure. Now, some networks are designed with two planes. Anyone that’s used like in a traditional enterprise network that has a separate fiber channel storage network and a front end network or even some mission critical workloads have redundant front end networks. The downside of that is it can be expensive and hard to manage. Now, what

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: you can do is you can

Clay McGuire, Chief Executive Officer, Oracle: take a single plane network and you can split it into multiple planes. You get some benefits. It creates redundancy. The downside is it’s hard to use because suddenly your computer goes from having one network interface to say eight network interfaces. And the downside is that on those eight, the maximum size of a single flow is also reduced.

Okay. Now, if you instead expose a single plane to the host, which is what we’re doing now with Acceleron, and behind the scenes you implement multiple planes, You actually can get all of the benefits of a multi plane network such as higher overall availability because now you have redundancy across those layers, you get lower cost because you can build smaller networks with lower radix switches, and you get better performance because there’s fewer hops in the network, but you don’t get the downside of it being hard to use. So hopefully now you understand that Acceleron is a foundation for all of our IO security and acceleration functions. With these new additions, customers will see significantly higher peak performance, they will see lower costs across our infrastructure, and they will have better ease of use and increased functionality while also receiving increased security. We’re just getting started.

There are many more exciting Acceleron enhancements coming soon. Now, instead of just talking about the technology, why don’t we take a look at another customer that’s actually using a lot of this technology already?

Operator: We’ve all been wowed by ChatGPT, but the large language model is only the beginning of what OpenAI hopes to achieve with its cutting edge AI technology. OpenAI’s next installment of its name brand model, GPT five, comes with improved reasoning, memory, and multimodal capabilities. And new APIs will allow more businesses to integrate this powerful technology into their operations, for example, by developing and deploying custom GPT models. OpenAI is also driving innovations in robotics, language translation, and video generation, all with a crucial commitment to uphold human values and ethical standards. Please welcome to the stage, vice president infrastructure and industrial compute is open

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: to the eye. This is what I’m saying. I’m like Yeah. It’s I’ll go around the back. I go, oh.

I’ll bring her around the back. That’s what they told me.

Clay McGuire, Chief Executive Officer, Oracle: He did well. Peter and I were talking about how long the walk is. It really is quite long. Peter? Good morning.

Thank you for coming. For those of you who don’t know, Peter is says a lot, but he also is responsible for all of the infrastructure at OpenAI. Alright. We got a few minutes. As AI continues to accelerate, I think you know this more than anybody else, but the industry is facing a real challenge and how do we actually get enough compute capacity?

How are you thinking about that problem?

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: So where I want to start is, I met Clay about a year ago, think it was last May, and I was freaking out because we were having another launch and our researchers didn’t have enough compute. Twitter was alight with, we didn’t have enough. And this team comes along, Jay Jackson, Luis is out here And I found them online truly through LinkedIn and said, do you have capacity? And these guys show up with 200 megawatts of capacity which watching your presentation was really interesting because for those of you who’ve been in the industry for a while, 200 megawatts is passe now, but five years ago five megawatts would have been a crazy amount to have. And these people showed up with not only capacity, they showed up with an intelligent cluster design, they showed up understanding all of our needs, they showed up understanding our security requirements.

All the things that Clay went into from a from a technical perspective, they showed up with and that blew my mind. So I’m gonna talk for a second about, like, what it takes to you.

Clay McGuire, Chief Executive Officer, Oracle: I see you guys deserve a round

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: of applause. I’m very serious. It has been so inspiring watching Clay and the team build OCI. So I’d love if you all gave them a round of applause for what they built. It’s incredible.

Seriously, you deserve everything. So when I think about kind of where we’re at, we are in the stage of what I call industrializing compute. It’s not just a build to suit data center in North Virginia or Northern Virginia anymore. It is hitting on every single lever and trying to maximize them at every moment from power all the way through to silicon. And so that’s where when I think about the work that we’ve done together, same thing.

You know, NVIDIA has done incredible work but Clay and the team have come to me also and said, would you like to do anything else? And when AMD and Lisa Sue were saying, hey, we really wanna push the next generation of AMD, we’re not getting as much traction as we’d like. We went to Oracle and Oracle said, we will do anything to make this work with you and we had a huge announcement last week about this, we’re super excited. We had our Broadcom announcement, all of the new product integration that these folks are going to do to help us continue to hit on every single lever that we need to hit to hit another 10 x over the next hopefully two years.

Clay McGuire, Chief Executive Officer, Oracle: Okay. Well, just so you know, did not tell him to say those nice things about me. But that’s what I would say even if I had told him to say those nice things about me. Okay. Look, something I think that people want to understand is that, that you you know better than anyone else.

We’ve got these compute constraints, right? You’ve got capacity that can be used for training or optimizing new things, you’ve got capacity that can be used for developing new models, doing more research, and then you’ve got capacity that you need for actually serving your customers. How do you think about the need you have to balance across those and how does it change what we build?

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: Yeah. And and even that’s been a massive paradigm shift. I joined OpenAI in 2019 and that’s when people were just hearing about what we call scaling hypotheses. This idea that as you use more compute, you get better capable models. And so our strategy was always to hit at the front end of whatever the newest generation of GPUs was, get the biggest cluster that we could find, design the biggest cluster and pre train one of these models, which just means lots of synchronous workloads happening all at once.

And then see if we continued on that line that said model got better. And then we launched ChatGPT and some people have heard this story now but I still remember sitting in a conference room December of the year that it was launched and the researchers said, no no no no no. Don’t worry about it as I’m freaking about about the amount of capacity that we have. Saying, it’s just going to be a low key research preview. We’ll pull it down in two weeks.

We just want to show the world this research that we’ve done. And that was our first big learning in this idea that we have to have a fungible fleet. We have to have the ability to go from, being able to run these large pre training models to saying, actually we’re launching Sora now and for the next three weeks we need to be able to flex into that. That wasn’t a capability that we had internally and that’s where I mean, again, all of the things that you were pointing out from multiplayer networks, from a security perspective, which they’re very different between these different workloads, we have to be able to build into that. I get asked all the time, what is the how do you think about training versus inference?

I’m telling you all, let’s stop talking about training versus inference. We are in a new regime now where the models are ideally kind of constantly running and are constantly going through sampling and training and getting better all of the time. That matters because for business work use cases and for databases and other things, as we start feeding data back into the models, they’re going to get better and better and better for each of these individual use cases and that’s where the extensibility of the platform becomes really important as well.

Clay McGuire, Chief Executive Officer, Oracle: Okay. So you mentioned it earlier, we’ve been working together for a little over a year and I It’s great to know that the way that You thought you were calling Jay out for good that you found him on LinkedIn. I’m like, Jay, it’s kinda your job to contact him. Talk about that later. So thanks Peter.

But you know, did you expect us to make this much progress together so quickly? Like, how does how does what we’re doing compare to I think your expectations at the beginning?

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: I I mean, that’s that’s truly why I said the round of applause. There is nobody like you all out in the market and I think I mean, I hope I hope everybody wins. I kind of think of this as an ecosystem view and what you’re helping us do is push kind of some of the more traditional players to really think from a customer centric viewpoint first and that flexibility that we’re going need around compute. And so when we think about co designing things, we think about the clusters that we’re working on together, we think about the trade offs that you have to make to be able to provide us more capacity of some of the multi planar stuff. There’s something called Zetascale 10.

You all should look it up. There was a great a great press release about this. But I have been blown away not just by your willingness to co design things and co engineer, but how everybody across the OCI stack understands these things. We joked about Jay and I won’t mention him again, but I mean very seriously down to people like Jay who understand these problems deeply from a technical perspective, which means that our cycle times drop. We don’t have to do a contract for multiple weeks.

We don’t have to go engineering with the hash multiple times. I I don’t even have to call you to escalate things very often. Like your team just understands our problems and brings us the capacity that we need, so we can continue scaling. So when Sora comes up or a new product comes up, we’re ready to go versus multi month contracted, oh, this doesn’t fit in OCI or whatever it is.

Clay McGuire, Chief Executive Officer, Oracle: Okay. So you’ve talked a lot about the things that have gone very well. What are some of the hurdles that you feel like, you know, we had to overcome that that people should understand?

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: I mean, Abilene came together in eleven months. A data center of that scale, because I’ve worked on some other ones at similar scale, took four years to plan. And so when we go through that, you deal with questions of let’s say fungibility. Where your finance team might say, and my comms team is gonna kill me for this, but you know, they might say, OpenAI is a startup. How how can you possibly like lean into them for this amount of investment?

And you know, the point is that the technology that you all have developed and the infrastructure that you’ve developed is so fungible that we want to use it so that you can get your finance teams comfortable that somebody else could buy it. We’re using the same stuff TikTok US is using. We’re using the same stuff that I’m sure Milwaukee Tool is gonna use and that’s that’s pretty incredible in this space.

Clay McGuire, Chief Executive Officer, Oracle: Okay. So you guys are expanding globally very quickly. I, you know, I I won’t talk about the numbers but I think in terms of just users, in terms of revenue growth, everything is I’ve never seen a company grow like your company has grown. What is the most difficult thing you’re dealing with as you like manage that growth while also focusing on security and efficiency?

Peter, Vice President Infrastructure and Industrial Compute, OpenAI: That’s the $1,000,000,000,000 question. So maybe I’ll give like one example here. There’s a press release every week right now about a new Stargate. I think Stargate Argentina was announced last week. We had Stargate UAE, all these other things and some of those difficulties are in the current policy environment, know, let’s talk bluntly about export controls and other things for chips.

There is always a question of, can we go serve here? Can we run capacity here? What are the requirements that both the government and our own security teams are going to have? And it’s been really interesting working with you all because Oracle has kind of become our one stop shop for any country to go in and say, okay, you understand our internal security standards, you understand the policy requirements, just please go make this work. And we’ve seen this now across many different countries in a very rapidly evolving environment.

It’s a huge win for me that I don’t have to think about that and I can rely on you as we do expand, you know, 10 x year over year at this point in some of those locales.

Clay McGuire, Chief Executive Officer, Oracle: No. It’s a great example. So look, Peter, I really appreciate the work that you and your team have done. The working with your team has been incredible. It wasn’t just that we moved fast.

I think that our pace and urgency is matched across both of our companies. Thank you for coming out here today and telling everybody your story and I’m excited for what comes next. Thank you very much. Thank you. I promise I’m almost done.

AI is clearly extremely useful, but it only is as good as the data that it has access to. Public data is public and it’s exposed through standard self describing interfaces. Private data is actually none of those things. It’s not public and it’s very rarely self describing, but it’s also where the most value is to each of us. So one thing you can do to take advantage of this technology is you can just put all of your data on the Internet and do it in a way that’s self describing and wait for the next training run and the model will have all the answers to your questions you want to know.

That has some downsides, so let’s assume you don’t want to go with that option. What do you do instead? Well, you need to bring secure controlled access to the leading models next to your data. We enable that by integrating the latest AI models in our Gen AI service. We’re committed to always having the best and newest models available.

Once you have the models, you need a way to bring all of your private data together. While you could do that by copying the data into a single location, that’s not the only way. You can also create a shared index of all of your private data but leave that data in its system of origin. You then need fine grained access control. This is critical because you need some way to control what the user of the AI has access to.

So as an example, it’s great that the models can answer questions about your customers and your financials, but should all of your employees be able to get those answers from the model? Okay. Now, once you can answer queries about your private data, that’s a great start but it’s not enough. You then need to perform actions that influence that data, creating a virtuous cycle. You wanna ask questions, make a plan, take action and repeat.

AI is so much more useful when it can do things for us. Our AI database performs two of those key functions. In addition to being a repository for your data, it also acts as an index for external data stores. It provides a single place to enforce access control. This is only possible because of our tireless work to improve our database.

We unify all of your data types and access patterns into a single system. Only then is it possible to solve complex problems like integrating your private data with AI. Our AI database does this by enabling you to pull all of your data assets together by mounting external catalogs. You can use real application security to put access control at the table, row, column and even the cell level directly in the database. It keeps up to date vector indices of the data that’s inside your Oracle database as well as the data you have stored externally.

It also supports open source external formats like Apache Iceberg for seamless integration into your existing data flows. We’ve also created a new Gen AI agent platform with the goal of easily integrating tool usage into your AI workflows. That platform comes prebuilt with agents for common tasks like retrieval augmented generation and coding. It’s compatible with open source frameworks allowing you to reuse everything you do here in other locations It’s pre integrated with the Oracle Application Ecosystem, making it easy to build agents to integrate with other Oracle Applications. So we are announcing our AI Data Platform that brings together the best models, the power of our AI database, and our new agent platform.

The AI Data Platform solves problems for developers and business users. It makes it easy for developers to write custom applications, but it also makes it easy for a data analyst to perform analytics across all of your data. Now, it’s great that we have this new platform but as we talked about earlier, location matters. This new platform only helps you if it’s available where you need it and all of this is available in our Cloud, in other Clouds, and inside your own datacenter. Customers tell us they love the ability to get our data services in all of their Clouds, but they want it to be easier to make financial commitments with more safety.

We are launching Multi Cloud Universal Credits. They enable a customer to work with Oracle and contract once, knowing they can deploy their database services in any Cloud at the same price with the same functionality. We’re also announcing the general availability of Dedicated Region 25, our latest footprint for Dedicated Region. We’ve been continually shrinking the size of our regions. Our first Dedicated Region was actually more than 50 racks and today you get far more functionality in just three racks.

This makes it easier for customers to take advantage of our Cloud where they need it. OCI is in a constant state of reinvention. We have to always be looking for ways to improve our fundamentals so we can deliver on our continual promise to deliver the best in performance, efficiency, and security. Often, these improvements are subtle and you can’t see them because they’re behind the scenes. But the accumulation of hundreds of thousands of these improvements results in a significantly better Cloud.

You can see our commitment to being better every single day and you can see the value that that has for customers like we heard from Fengfei and from Peter. Tomorrow is another day and you will find OCI continuing to focus on improving our performance, our efficiency, and our security. Thank you very much.

This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.

Latest comments

Risk Disclosure: Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers
© 2007-2025 - Fusion Media Limited. All Rights Reserved.