🗞️ AI & GPU Industry Weekly Recap: Feb 23 – Mar 1, 2026


🔑 Key Highlights

  • AMD seals a landmark 5-year, 6-gigawatt deal with Meta Platforms, mirroring its earlier OpenAI pact, featuring a custom MI450 GPU accelerator and co-designed “Helios” Open Rack Wide v3 rackscale systems — potentially worth $115B+ in GPU revenue alone
  • MSI RTX 5090 Lightning Z scalping hits absurd levels, with eBay listings reaching nearly $27,000 (500%+ premium) for the limited 1,300-unit run, underscoring extreme demand for NVIDIA’s flagship Blackwell consumer GPUs
  • AMD launches JAX-AITER, a new open-source bridge bringing its AITER-optimized AI kernels to the JAX framework on ROCm, delivering up to 9.68× speedups over pure-JAX attention implementations on AMD Instinct MI350 GPUs
  • AMD joins ARM’s new “CoreCollective” consortium as a founding member alongside Google, Microsoft, Qualcomm, and Samsung — a notable strategic signal given rumors of AMD’s ARM-powered “Sound Wave” APU
  • AMD’s ROCm ecosystem gets a surge of developer tooling, with new blogs covering PyTorch TunableOp offline tuning, the AMD Resource Manager for Kubernetes-based GPU sharing, and JAX-AITER kernel integration

🤖 AI & Machine Learning

JAX-AITER: AMD Brings Optimized Kernels to JAX

AMD published a detailed technical blog introducing JAX-AITER, an open-source package (github.com/ROCm/jax-aiter) that bridges JAX’s Foreign Function Interface (FFI) to AMD’s AITER (AI Tensor Engine Repository) high-performance kernel library. The integration targets AMD Instinct MI300 and MI350 series GPUs running ROCm.

Key architectural details:

  • Uses JAX FFI + C++ bridge to route JAX device buffers directly to AITER GPU kernels with zero-copy buffer sharing
  • Implements jax.custom_vjp for proper autodiff support in training loops
  • First target: multi-head attention (MHA/FMHA) via flash_attn and flash_attn_varlen APIs
  • GEMM and custom ops currently retain PyTorch as a dependency; roadmap targets fully framework-neutral entry points

Benchmark highlights on AMD Instinct MI350 (bf16, pure JAX vs. JAX-AITER):

Config Pure JAX JAX-AITER Speedup
batch=4, seq=4096, heads=32, dim=64 8.594ms 0.888ms 9.68×
batch=1, seq=8192, heads=8, dim=64 2.230ms 0.301ms 7.39×
batch=2, seq=4096, heads=16, dim=192 4.221ms 0.742ms 5.69×
batch=2, seq=2048, heads=16, dim=192 1.125ms 0.245ms 4.59×

Speedups are most pronounced at longer sequence lengths and higher head counts — exactly the configurations that matter most for large-scale LLM training and inference.

PyTorch TunableOp Offline Tuning on ROCm

AMD’s ROCm team published a comprehensive guide to PyTorch TunableOp offline tuning, available in PyTorch v2.6+. The workflow decouples BLAS kernel selection from model execution:

  1. Collection phase: Record GEMM operations to tunableop_untuned0.csv
  2. Tuning phase: Run torch.cuda.tunable.tune_gemm_in_file() independently
  3. Deployment phase: Load pre-tuned results for accelerated inference

New TunableOp features highlighted:

  • FP8 and TF32 datatype support on MI300 series; MX FP8/FP4 incoming for MI350
  • Rotating buffer simulation (up to 512MB) for cold-cache-accurate benchmarking
  • Real-time result saving (PyTorch 2.10+) to prevent tuning loss on crashes
  • Numerical tolerance checks with absolute and relative thresholds
  • Support for batch GEMM and GEMM with bias tuning

⚡ GPU & Hardware

MSI RTX 5090 Lightning Z: Scalper Frenzy

NVIDIA’s RTX 5090 Founders Edition carries a $2,000 MSRP, but MSI’s limited-edition RTX 5090 Lightning Z (MSRP: $5,090, only 1,300 units produced) has become a scalper magnet:

  • eBay “sold” listings: $6,700–$8,800
  • Active listings: $6,000–$15,000
  • One outlier UK listing: ~$27,000

Performance numbers justify some enthusiasm:

  • ~12% faster than RTX 5090 FE out of the box
  • ~18% faster with manual overclocking (comparable to a theoretical RTX 5090 Ti)
  • Supports a 1,000W Extreme vBIOS (200W over stock OC) and a 2,500W XOC BIOS for competitive overclockers
  • One overclocker cracked a sample from thermal shock at extreme power levels — four samples remain for world record attempts

AMD-Meta MI450: Custom Silicon at Gigawatt Scale

The AMD-Meta deal introduces the first custom MI450 GPU of the MI400 generation (analogous to the MI300A custom part for LLNL). Key details:

  • Tuned specifically for Meta’s inference workloads
  • No additional tapeout required within the MI400 cycle (per AMD CFO Jean Hsu)
  • Possible customizations: adjusted HBM stack count/speed, clock frequencies, or chiplet configuration ratios
  • Delivered within “Helios” Open Rack Wide v3 rackscale systems, co-designed with Meta
  • First 1GW delivery targeted for H2 2026

Meta is also adopting AMD’s upcoming “Venice” Zen 6 EPYC 9006 and future “Verrano” Zen 7 EPYC 9007 CPUs for both AI racks and general datacenter workloads (Facebook, Instagram).

Google Cloud N4 Series: Axion vs. EPYC Turin vs. Xeon

Phoronix benchmarked Google Cloud’s N4-series VMs at 16 vCPUs:

  • N4A (Google Axion ARM64): $0.71/hr
  • N4D (AMD EPYC 9B45 “Turin” Zen 5): $0.77/hr
  • N4 (Intel Xeon Platinum 8581C “Emerald Rapids”): $0.82/hr

AMD EPYC Turin offers competitive performance at a slight cost premium over Axion, while Intel’s Emerald Rapids (notably not the newer Granite Rapids) comes in as the most expensive option. Performance-per-dollar analysis favored Axion for many workloads.


🏭 Industry & Market

AMD-Meta Platforms: A $115B+ Strategic Partnership

The headline deal of the week: AMD and Meta Platforms announced a 5-year, 6-gigawatt strategic partnership structured almost identically to AMD’s October 2025 OpenAI deal:

Financial structure:

  • AMD issued Meta a warrant for 160 million shares (same structure as OpenAI deal) — estimated value ~$69B by 2030 if AMD stock reaches $600
  • “Double digit billions per gigawatt” confirmed by CEO Lisa Su
  • At ~$35,000/GPU average and ~550K GPUs/GW: ~$115.5B in GPU revenue over 5 years (~$23B/year average)
  • Full rackscale system cost (~$35B/GW) pushes total higher

Strategic implications:

  • AMD could achieve ~40% revenue share for AI accelerators at Meta (vs. ~50% for NVIDIA, ~10% for Meta’s own MTIA)
  • OpenAI + Meta commitments alone = 2GW confirmed, providing manufacturing pipeline confidence
  • Subsequent gigawatt tranches to be contracted through 2030 (~1.25GW/year from 2027–2030)
  • Represents AMD’s clearest path to competing structurally with NVIDIA’s Blackwell/Rubin roadmap at hyperscaler scale

Lisa Su’s quote of the week: “We are making a big bet on Meta, and Meta is making a big bet on AMD.”

Context: NVIDIA also signed a deal with Meta last week for “millions of Blackwell and Rubin GPU accelerators,” estimated at $110B–$167B for GB300 NVL72 equivalents — Meta is clearly hedging across both vendors at enormous scale.

AMD Joins ARM’s CoreCollective Consortium

Arm and Linaro launched CoreCollective, a new open-source industry consortium focused on the ARM software ecosystem. AMD joined as a founding member alongside Google, Microsoft, Qualcomm, Samsung, Canonical, Fujitsu, Ampere Computing, Graphcore, CIX, and SUSE. NVIDIA is notably absent.

Focus areas: Android, data centers, confidential computing, edge computing, Linux fundamentals, virtualization.

AMD’s membership is strategically interesting given persistent rumors of an AMD ARM-powered APU codenamed “Sound Wave”, plus existing ARM exposure through the Xilinx acquisition.


🛠️ Developer Ecosystem

AMD Resource Manager: Enterprise GPU Sharing

AMD published a full walkthrough for the AMD Resource Manager, part of the AMD Enterprise AI Suite — a Kubernetes-native platform for centralized AI infrastructure governance on Instinct GPUs.

Key capabilities:

  • Project-based isolation: GPU/CPU/memory quotas per team, with resource borrowing and preemption for priority-based scheduling
  • Unified monitoring: Tracks workloads submitted via kubectl, Kubeflow, Flyte, and other tools
  • GUI + CLI control plane: Dashboard, cluster health, user/secret/storage management
  • AMD Inference Microservices (AIM) integration for LLM serving (e.g., meta-llama-llama-3-1-8b-instruct)

The preemption model allows lower-priority workloads to be suspended when higher-priority projects need their guaranteed quota resources back — critical for multi-team R&D clusters.

AMD SEV-SNP BTB Isolation: Confidential Computing Hardening

AMD posted Linux kernel patches enabling SEV-SNP Branch Target Buffer (BTB) isolation for AMD EPYC processors. The feature ensures that guest VMs protected by Secure Encrypted Virtualization-Secure Nested Paging cannot have their BTB state contaminated by host or peer-VM activity. Companion QEMU patches were also submitted. This advances AMD’s confidential computing story for regulated enterprise and government workloads.

ROCm Tooling Summary This Week

| Tool | Update | |—|—| | JAX-AITER | New open-source bridge; up to 9.68× attention speedup on MI350 | | PyTorch TunableOp | Offline tuning guide; FP8/TF32 support; rotating buffer benchmarking | | AMD Resource Manager | Enterprise GPU sharing with Kubernetes-native preemption | | SEV-SNP BTB Isolation | Linux kernel patches posted for confidential VM security |


📊 Key Takeaways

AMD had arguably its most consequential week of 2026: the Meta Platforms partnership — a five-year, 6-gigawatt, potentially $115B+ commitment built around custom MI450 silicon and co-designed Helios rackscale systems — represents a structural shift in how hyperscalers are diversifying away from NVIDIA dependency, with AMD now holding confirmed multi-gigawatt commitments from both OpenAI and Meta simultaneously. On the software front, AMD’s ROCm ecosystem is maturing rapidly with JAX-AITER delivering up to ~10× kernel-level speedups and TunableOp offline tuning providing enterprise-grade GEMM optimization workflows — critical infrastructure for developers choosing Instinct GPUs over CUDA-native alternatives. Meanwhile, the NVIDIA RTX 5090 Lightning Z scalping crisis and the ongoing Meta-NVIDIA Blackwell/Rubin deal running in parallel underscore a key market dynamic: AI hardware demand remains so acute that both AMD and NVIDIA are simultaneously winning landmark contracts at a scale the industry has never seen before.