Update: 2025-11-24 (05:42 AM)
Technical Intelligence Report Date: 2025-11-24
🔲 AMD Hardware & Products
[2025-11-24] AMD Powers Frontier AI Training for Zyphra
Source: AMD Press Releases
Key takeaway relevant to AMD: > This deployment proves that the AMD Instinct MI300X’s 192GB memory capacity provides a structural advantage for Mixture-of-Experts (MoE) models by eliminating the need for complex expert or tensor sharding, significantly simplifying the developer workflow compared to memory-constrained alternatives.
Summary: > Zyphra has announced ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an AMD-accelerated platform. The training utilized AMD Instinct MI300X GPUs, AMD Pensando networking, and the ROCm 6.4 open software stack on IBM Cloud infrastructure, demonstrating the viability of a full AMD-stack for frontier-class AI development.
Details:
- Hardware Configuration: The training was conducted on a cluster of 128 compute nodes, each equipped with 8 AMD Instinct MI300X GPUs and 8 Pensando Pollara 400 Interconnects.
- Model Architecture: ZAYA1-Base features 8.3 billion total parameters with only 760 million active parameters per token, following a Mixture-of-Experts (MoE) design.
- Performance Benchmarks: ZAYA1-Base reportedly outperforms Llama-3-8B and OLMoE. It achieves performance parity with larger or more dense models including Qwen3-4B and Gemma3-12B across reasoning, mathematics, and coding tasks.
- VRAM Utilization: The 192GB HBM capacity of the MI300X allowed Zyphra to avoid costly expert/tensor sharding. This simplified the training stack and improved throughput across the model stack.
- Software & I/O Optimizations:
- The project utilized ROCm 6.4, the latest iteration of AMD’s open software platform.
- Zyphra reported 10x faster model save times by using AMD-optimized distributed I/O, which is critical for checkpointing during large-scale training runs.
- Networking: The integration of AMD Pensando Pollara 400 Interconnects provided the high-performance fabric necessary for the low-latency communication required by MoE architectures, which are sensitive to “all-to-all” communication overhead.
- Strategic Implication: This marks a shift in the market where “frontier” models are no longer the exclusive domain of NVIDIA hardware, validating AMD’s co-design approach (Silicon + Networking + ROCm).