Technical Intelligence Report Date: 2025-11-24


🔲 AMD Hardware & Products

[2025-11-24] AMD Powers Frontier AI Training for Zyphra

Source: AMD Press Releases

Key takeaway relevant to AMD: > This deployment proves that the AMD Instinct MI300X’s 192GB memory capacity provides a structural advantage for Mixture-of-Experts (MoE) models by eliminating the need for complex expert or tensor sharding, significantly simplifying the developer workflow compared to memory-constrained alternatives.

Summary: > Zyphra has announced ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an AMD-accelerated platform. The training utilized AMD Instinct MI300X GPUs, AMD Pensando networking, and the ROCm 6.4 open software stack on IBM Cloud infrastructure, demonstrating the viability of a full AMD-stack for frontier-class AI development.

Details:

  • Hardware Configuration: The training was conducted on a cluster of 128 compute nodes, each equipped with 8 AMD Instinct MI300X GPUs and 8 Pensando Pollara 400 Interconnects.
  • Model Architecture: ZAYA1-Base features 8.3 billion total parameters with only 760 million active parameters per token, following a Mixture-of-Experts (MoE) design.
  • Performance Benchmarks: ZAYA1-Base reportedly outperforms Llama-3-8B and OLMoE. It achieves performance parity with larger or more dense models including Qwen3-4B and Gemma3-12B across reasoning, mathematics, and coding tasks.
  • VRAM Utilization: The 192GB HBM capacity of the MI300X allowed Zyphra to avoid costly expert/tensor sharding. This simplified the training stack and improved throughput across the model stack.
  • Software & I/O Optimizations:
    • The project utilized ROCm 6.4, the latest iteration of AMD’s open software platform.
    • Zyphra reported 10x faster model save times by using AMD-optimized distributed I/O, which is critical for checkpointing during large-scale training runs.
  • Networking: The integration of AMD Pensando Pollara 400 Interconnects provided the high-performance fabric necessary for the low-latency communication required by MoE architectures, which are sensitive to “all-to-all” communication overhead.
  • Strategic Implication: This marks a shift in the market where “frontier” models are no longer the exclusive domain of NVIDIA hardware, validating AMD’s co-design approach (Silicon + Networking + ROCm).