logo
Aperçu Les affaires

AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROC

Certificat
Chine Beijing Qianxing Jietong Technology Co., Ltd. certifications
Chine Beijing Qianxing Jietong Technology Co., Ltd. certifications
Examens de client
Le personnel de vente de Beijing Qianxing Jietong Technology Co.,Ltd sont très professionnel et patient. Ils peuvent fournir des citations rapidement. La qualité et l'emballage des produits sont également très bons. Notre coopération est très lisse.

—— LLC de》 de Festfing DV de 《

Quand je recherchais l'unité centrale de traitement d'Intel et le disque transistorisé de Toshiba instamment, Sandy de Beijing Qianxing Jietong Technology Co., Ltd m'a donné beaucoup d'aide et m'a obtenu les produits que j'ai eus besoin rapidement. Je l'apprécie vraiment.

—— Kitty Yen

Sandy de Beijing Qianxing Jietong Technology Co.,Ltd est un vendeur très soigneux, qui peut me rappeler des erreurs de configuration à temps où j'achète un serveur. Les ingénieurs sont également très professionnels et peuvent rapidement compléter le processus de essai.

—— Strelkin Mikhail Vladimirovich

Nous sommes très satisfaits de notre expérience de travail avec Beijing Qianxing Jietong. La qualité du produit est excellente et la livraison est toujours à l'heure. Leur équipe de vente est professionnelle, patiente et très serviable pour toutes nos questions. Nous apprécions vraiment leur soutien et nous nous réjouissons d'un partenariat à long terme. Fortement recommandé !

—— Ahmad Navid

Qualité: “Grande expérience avec mon fournisseur. Le MikroTik RB3011 était déjà utilisé, mais il était en très bon état et tout fonctionnait parfaitement.et toutes mes préoccupations ont été traitées rapidementUn fournisseur très fiable, très recommandé.

—— Geran Colesio

Je suis en ligne une discussion en ligne

AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROC

April 15, 2026
AMD has announced its MLPerf Inference v6.0 benchmark results, positioning the Instinct MI355X GPU as a highly scalable inference platform capable of supporting single-node, multinode, and heterogeneous deployments. Beyond incremental performance gains, the submission introduces new workloads, demonstrates cluster-scale throughput exceeding 1 million tokens per second, and validates consistent performance reproducibility across an expanding partner ecosystem.

CDNA 4 Architecture Targets High-Capacity Inference


The Instinct MI355X is built on AMD’s CDNA 4 architecture, leveraging a TSMC dual-process chiplet design: compute dies (XCDs) use a 3nm node, while I/O dies utilize 6nm FinFET technology. The multi-chiplet package integrates 185 billion transistors and supports FP4 and FP6 data formats—critical for efficient large-model inference. Each GPU is equipped with up to 288GB of HBM3E memory (delivering 8 TB/sec of memory bandwidth) , enabling support for models up to 520 billion parameters on a single device. AMD emphasizes that this combination of compute density and memory capacity eliminates the need for excessive model partitioning, a key advantage for large-scale inference workloads.

Available in UBB8 configurations, the platform offers both air-cooled and direct liquid-cooled options, aligning with diverse data center deployment requirements. Notably, the MI355X features a 1400W TBP (Thermal Design Power) with liquid cooling, delivering higher performance than its air-cooled counterpart, the MI350X.

Multinode Throughput Surpasses 1 Million Tokens per Second


A standout achievement from the MLPerf v6.0 round is AMD’s cluster-scale throughput exceeding 1 million tokens per second. Using Instinct MI355X GPUs, AMD hit this milestone with Llama 2 70B in both Server and Offline scenarios, as well as with GPT-OSS-120B in Offline mode.

dernière affaire concernant AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROC  0

AMD MLPerf 1M tokens per second graphic

These results reflect a growing industry shift toward evaluating inference performance at the cluster level, rather than per individual accelerator. Aggregate throughput and time-to-serve have become primary metrics for determining production readiness in large-scale AI deployments.

AMD also demonstrated exceptional scaling efficiency. For Llama 2 70B, an 11-node, 87-GPU configuration achieved over 1 million tokens per second across Offline, Server, and Interactive scenarios, with scale-out efficiency ranging from 93% to 98%. For GPT-OSS-120B, a 12-node, 94-GPU cluster delivered similar throughput with over 90% scaling efficiency—proving performance translates effectively as deployments expand beyond a single system.

Generational Gains and Competitive Single-Node Performance


AMD reported significant generational improvements, with the Instinct MI355X delivering 3.1x better performance on Llama 2 70B Server compared to the prior-generation Instinct MI325X, reaching 100,282 tokens per second. This improvement stems from both CDNA 4 architectural enhancements and ROCm software optimizations. Offline scores improved by 4.4x and Server scores by 4.8x compared to prior MLPerf rounds, primarily driven by FP4 quantization—a key feature of the MI355X that unlocks higher throughput for AI workloads.

AMD Inference results vs previous gen graphic

In single-node comparisons against NVIDIA platforms, the MI355X demonstrated strong competitiveness. On Llama 2 70B, it matched NVIDIA B200 in Offline throughput, achieved near parity in Server performance, and outperformed it in Interactive mode. Against NVIDIA B300, the MI355X delivered 92% of Offline performance, 93% of Server performance, and exceeded it by 4% in Interactive mode. Notably, the MI355X also offers superior cost-efficiency, delivering 40% more tokens per dollar compared to the NVIDIA B200.

First-Time Model Enablement Expands Coverage


MLPerf Inference v6.0 introduced several new workloads, and AMD used this round to showcase rapid model enablement. GPT-OSS-120B, a mixture-of-experts model, made its MLPerf debut with the MI355X, achieving competitive results against NVIDIA systems in both Offline and Server scenarios.

AMD also submitted results for Wan-2.2 text-to-video generation, marking its entry into multimodal and generative video inference. While the official submission focused on Single Stream latency, the results were on par with existing platforms. Post-submission tuning further improved performance, highlighting room for optimization as the software stack matures.

These additions underscore AMD’s commitment to expanding beyond traditional LLM benchmarks to support emerging AI workloads across diverse use cases.

ROCm Software Enables Scaling and Heterogeneous Inference


AMD credits much of the MI355X’s performance and scalability to its ROCm software stack. Key enhancements include optimized FP4 execution, improved GPU-to-GPU communication for distributed inference, and support for dynamic workload distribution across heterogeneous environments—critical for mixed-GPU deployments.

AMD MLPerf inference results instinct mI355x graphic
A milestone heterogeneous submission—developed by Dell and MangoBoost—used three AMD Instinct GPU models: MI300X, MI325X, and MI355X. This configuration achieved 141,521 tokens per second on Llama 2 70B Server and 151,843 tokens per second on Llama 2 70B Offline. Notably, the MI355X platform was located in Dell’s U.S. lab, while the MI300X and MI325X systems were in Korea—demonstrating the ability to coordinate distributed systems across geographic locations.

Ecosystem Growth and Reproducibility


AMD’s partner ecosystem expanded significantly in this MLPerf round, with nine companies submitting results across multiple Instinct GPU generations. Participating vendors include Cisco, Dell, Giga Computing, HPE, MangoBoost, MiTAC, Oracle, Supermicro, and Red Hat—reflecting broad industry adoption of AMD’s inference solutions.

Partner submissions closely aligned with AMD’s internal results, typically within 4% and in some cases within 1%. This consistency confirms that MI355X performance is reproducible across OEM and cloud platforms, reducing deployment risk and boosting confidence in real-world performance outcomes.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Coordonnées
Beijing Qianxing Jietong Technology Co., Ltd.

Personne à contacter: Ms. Sandy Yang

Téléphone: 13426366826

Envoyez votre demande directement à nous (0 / 3000)