AMD Partners with Oracle Cloud to Enhance AI Performance with New Supercluster

SANTA CLARA, Calif., Sept. 26, 2024 — In an exciting development for AI enthusiasts and tech innovators alike, AMD has announced that Oracle Cloud Infrastructure (OCI) has opted for its powerful AMD Instinct MI300X accelerators, utilizing ROCm open software, to elevate their latest OCI Compute Supercluster instance, named BM.GPU.MI300X.8. This newly launched Supercluster is designed to support advanced AI models, boasting the capacity to manage up to an astonishing 16,384 GPUs within a single cluster. That number might sound astronomical, but it’s essential for tackling the demands of AI workloads that encompass hundreds of billions of parameters.

The magic lies in the ultrafast network fabric technology—a sophisticated networking solution that enhances communication between GPUs, ensuring they work harmoniously to deliver the performance users need. With this infrastructure, OCI is set to tackle heavy AI workload requirements, making it a top choice for companies looking to push their AI capabilities beyond traditional limits.

The Push for High-Performance AI Workloads

The AMD Instinct MI300X is specifically crafted to handle high-demand AI tasks, including inference and training of large language models (LLMs). The innovative OCI bare metal instances powered by these MI300X accelerators are already making waves, with organizations like Fireworks AI swiftly adopting the platform.

Andrew Dieckmann, AMD’s corporate VP and general manager of Data Center GPU Business, emphasizes the growing role of MI300X accelerators in powering critical AI workloads. “Our solutions are gaining traction in AI-intensive markets, promising OCI customers unparalleled performance and efficiency,” he noted. It’s clear that leveraging these advanced solutions allows businesses to explore new heights in innovation and application.

A Diverse Selection of High-Performance Options

Donald Lu, OCI’s senior vice president of software development, also weighed in, highlighting the advantages of the MI300X’s inference capabilities. He stated, “The MI300X accelerators enhance our suite of high-performance bare metal instances, enabling customers to eliminate the inefficiencies often associated with virtualized compute options typically used for AI infrastructure.” This improvement translates into more choices for organizations aiming to accelerate their AI workloads without breaking the bank.

The MI300X has gone through rigorous testing verified by OCI, showcasing its impressive capacity for AI training and inferencing. It stands out for its capability to accommodate large batch sizes and host the biggest LLM models within a single node—factors that have captured the attention of AI model developers and businesses of all kinds.

Real-World Application: Fireworks AI

Fireworks AI serves as a prime example of leveraging this new technology. The platform specializes in creating and deploying generative AI solutions across various industries, boasting an extensive library of over 100 models. CEO Lin Qiao shared insights on how the robust memory capacity provided by MI300X and ROCm software enhances their services. “With the enormous memory at our disposal, we’re able to scale our offerings as our models evolve,” she explained.

This real-world scenario underscores how companies are effectively harnessing AMD’s cutting-edge technology to deliver top-tier AI solutions, regardless of industry.

Conclusion: The Future looks Bright

As AMD continues to introduce groundbreaking technology, the partnership with Oracle Cloud Infrastructure heralds a new era for AI workloads. The potential applications are limitless, and businesses are already beginning to reap the benefits.

The AI Buzz Hub team is excited to see where these breakthroughs take us. Want to stay in the loop on all things AI? Subscribe to our newsletter or share this article with your fellow enthusiasts.