News Weekly: 2026-03-30–2026-04-05
🖥️ AI & GPU Industry Weekly Recap: March 30 – April 5, 2026
🔑 Key Highlights
- AMD’s MI355X dominates MLPerf Inference v6.0, posting record results on Llama 2 70B, gpt-oss-120b, and Wan-2.2-T2V-A14B using WMXFP4 quantization across multi-node clusters up to 94 GPUs, with nine AMD partners also submitting in the “Available” category
- AMD’s ROCDXG goes production-ready, delivering open-source ROCm 7.2.1 compute support under Windows Subsystem for Linux (WSL2) for Radeon RX 9000/7000 series and Ryzen AI APUs — a major step toward native Windows ROCm support
- NVIDIA’s App debuts Auto Shader Compilation (beta), automatically recompiling game shaders in the background after driver updates, previewing a broader push toward cloud-distributed Advanced Shader Delivery (ASD)
- Google’s Gemma 4 family lands on NVIDIA hardware, with NVIDIA and Google collaborating to optimize the 2B–31B parameter multimodal models for RTX GPUs, DGX Spark, and Jetson Orin Nano edge devices
- NVIDIA’s Nova open-source GPU driver and AMD’s next-gen AIE4 NPU both received meaningful upstream Linux kernel advances this week, reflecting accelerating open-source hardware ecosystem momentum
🤖 AI & Machine Learning
AMD MLPerf Inference v6.0 Results Published
AMD published full reproduction instructions for its MLPerf Inference v6.0 submission on the AMD Instinct MI355X platform. Key highlights:
- Models benchmarked: Llama 2 70B, gpt-oss-120b (OpenAI’s 120B open model), and Wan-2.2-T2V-A14B (text-to-video)
- Datatype: WMXFP4 (Weight MX FP4) for LLMs, BF16 for video generation
- Cluster sizes: single-node (8× MI355X GPUs) up to 87–94 GPU multi-node configurations
- Llama 2 70B offline throughput: ~103,480 tokens/second on a single 8-GPU MI355X node
- Nine AMD partners submitted independently in the “Available” (commercially purchasable/rentable) category
- ROCm 7.1.0+ required; full Docker-based reproduction pipeline published via
rocm/amd-mlperfcontainer images
NVIDIA Software Lifts MLPerf Inference Performance
NVIDIA continued its narrative that its platform — encompassing CUDA, NVLink, and Dynamo (its open distributed inference framework) — is what drives benchmark leadership, not GPUs alone. Software-layer optimizations pushed NVIDIA’s MLPerf Inference v6.0 numbers to new highs, reinforcing Jensen Huang’s “more than chips” messaging from GTC 2026.
Google Gemma 4 Optimized for NVIDIA GPUs
Google and NVIDIA co-optimized the Gemma 4 model family (E2B, E4B, 26B, 31B) for local and edge deployment:
- Supports reasoning, coding, agentic tool use, vision/video/audio, and 35+ languages
- Runs locally via Ollama and llama.cpp on RTX PCs and DGX Spark
- E2B/E4B target edge devices including Jetson Orin Nano; 26B/31B target RTX workstations and DGX Spark for agentic developer workflows
- Compatible with OpenClaw local agent framework and Unsloth Studio for fine-tuning
- NVIDIA Tensor Cores + CUDA stack deliver day-one efficiency without model-specific optimization overhead
AMD AIE4 NPU Linux Patches Posted
AMD submitted initial Linux kernel patches for the next-gen AIE4 NPU via the AMDXDNA accelerator driver:
- Targets PCI device IDs 0x17F2 and 0x1B0B (NPU3)
- Adds SR-IOV (Single Root I/O Virtualization) support — a notable new capability vs. current AIE2
- Covers device initialization and basic mailbox communication
- AMD’s proactive upstream Linux support approach aims to have drivers mainlined before retail hardware ships
⚡ GPU & Hardware
AMD ROCDXG: Production-Ready ROCm Under WSL2
AMD’s ROCDXG (librocdxg) library — enabling ROCm 7.2.1 on Windows 11 WSL2 — reached production status:
- Open-source under MIT license (one binary blob thunk remains)
- Officially supports Radeon RX 9000 and RX 7000 series, plus Ryzen AI 300 “Strix Point” and Ryzen AI Max “Strix Halo” APUs
- Independently versioned from ROCm releases and Windows display drivers — more flexible than legacy
roc4wsl - Pairs with Adrenalin 26.2.2 Windows 11 driver for full AI and HPC workload support under WSL
- Roadmap targets full native Windows ROCm support
AMD Ryzen AI Max “Strix Halo” Shows Major Linux GPU Gains
Phoronix benchmarks of the Framework Desktop (Ryzen AI Max+ 395, 64GB LPDDR5-8000, Radeon 8060S iGPU) demonstrated significant Vulkan and OpenGL performance improvements when upgrading from Ubuntu 25.04 (Linux 6.14, Mesa 25.0) to Ubuntu 26.04 (Linux 7.0, Mesa 26.0):
- RADV Vulkan driver and RadeonSI Gallium3D both showed meaningful generational uplift
- Highlights the compounding benefits of upstream driver/kernel work on AMD integrated graphics
NVIDIA Nova Driver Advances in Linux 7.1
The NVIDIA Nova Core driver — the Rust-written open-source successor to Nouveau — received its Linux 7.1 pull request:
- Expanded NVIDIA Turing GPU support
- Hardened GPU System Processor (GSP) command queue
- Support for large RPCs, refactored Falcon firmware handling, DebugFS GSM-RM log buffer support
- Still not end-user ready, but advancing steadily in upstream Linux
HarfBuzz 14.0 Introduces GPU-Accelerated Text Rendering
The widely-used HarfBuzz text shaping engine released version 14.0 with the new libharfbuzz-gpu library:
- GPU-based text rasterization using the Slug algorithm — decoding/rasterizing directly in the fragment shader
- Shader support: GLSL, WGSL, Metal MSL, HLSL
- New
hb-gpuutility and interactive WebGPU/WebGL web demo included - Impacts GNOME, KDE, Chromium, LibreOffice, Flutter, Godot, and Java rendering pipelines
NVIDIA App Beta: Auto Shader Compilation
NVIDIA’s updated App introduced Auto Shader Compilation (ASC) in beta:
- Background recompilation triggered after every GPU driver update
- Configurable cache size (e.g., 100 GB ≈ ~20 modern AAA titles) and system utilization tiers (low/medium/high)
- Works only after initial per-game shader compilation is complete
- Precursor to Advanced Shader Delivery (ASD) — Microsoft’s cloud-distributed precompiled shader framework, already adopted by Intel via “Precompiled Shader Distribution”
🏭 Industry & Market
Q1 2026 Linux Ecosystem Recap
Phoronix’s Q1 2026 retrospective highlighted the quarter’s dominant themes:
- Intel Core Ultra Series 3 “Panther Lake” (Core Ultra X7 358H, Arc B390 / Xe3 graphics on Intel 18A process) was the most-benchmarked new platform, showing strong power efficiency gains — up to 95× faster than Penryn-era laptops
- AMD Ryzen 7 9850X3D ($499) generated strong Linux gaming interest; DDR5-4800 proved sufficient for gaming due to 2nd Gen 3D V-Cache architecture
- NVIDIA GB10 Blackwell (Dell Pro Max GB10) featured prominently in AI inference benchmarks, competing against Ryzen AI Max+ 395 “Strix Halo” in CPU-focused workloads
- AI/LLM code contribution debates — including Linus Torvalds’ commentary on “vibe coding” and his own AudioNoise project built with AI assistance — dominated Linux community discourse
Canonical’s Ubuntu 26.04 ROCm Integration Still Pending
Ubuntu 26.04 LTS (due April 23) faces a race condition on AMD ROCm integration:
- Canonical’s promise of
apt install rocmone-command installation remains undelivered at press time - Available archive packages still at ROCm 7.1 (vs. upstream ROCm 7.2.1)
- A Canonical engineer (Talha Can Havadar) just received package upload rights — timeline uncertain
- Current recommendation: use upstream AMD ROCm packages directly rather than Ubuntu archive versions
Intel Cache-Aware Scheduling v4 for Xeon and EPYC
Intel posted the fourth revision of Cache Aware Scheduling patches for the Linux kernel:
- Targets modern Intel Xeon (Granite Rapids/Xeon 6) and AMD EPYC Turin processors with complex LLC domain topologies
- v4 adds CPU scanning depth limits under NUMA balancing, improved LLC ID management, and low-load imbalance tuning
- Prior testing showed significant server workload performance gains on both platforms
- Not yet mainlined; community watching for Linux 7.x inclusion
🛠️ Developer Ecosystem
Rust Graphics Driver Momentum Builds for Linux 7.1
The Linux 7.1 DRM Rust pull request landed a broad set of infrastructure improvements:
- Reworked DMA coherent API, GPU buddy allocator abstractions, DRM shared memory GEM helper abstraction
- Applies to NVIDIA Nova Core (Turing support, GSP hardening) and Arm Mali Tyr driver
- Reflects the formalization of Rust as a permanent part of the Linux kernel (Rust experiment officially concluded in Linux 7.0)
AMD ROCm Blogs: MLPerf v6.0 Reproduction Guide Published
AMD’s ROCm technical blog published a detailed step-by-step reproduction guide for MLPerf Inference v6.0 submissions:
- Docker-based workflows:
rocm/amd-mlperf:mi355x_llama2_70b_inference_6.0and model-specific containers - WMXFP4 quantized model checkpoints available via Hugging Face (
amd/Llama-2-70b-chat-hf-WMXFP4-...) - Covers offline, server, and interactive scenarios with accuracy validation scripts
- Enables third-party customers and partners to independently verify AMD’s published numbers
Ubuntu 26.04 Ships Linux 7.0 + Mesa 26.0
Ubuntu 26.04 LTS (releasing April 23) ships a notably modern stack:
- Linux 7.0 kernel (stable release still pending mid-April)
- Mesa 26.0 graphics drivers with new OpenGL/Vulkan capabilities
- GNOME 50, Python 3.14, OpenJDK 25, GCC 15.2
- NVIDIA R590 series Linux driver available in both 25.10 and 26.04
- Benchmarks on AMD Ryzen 9 9950X + RTX 5080 show meaningful gains vs. Ubuntu 25.10 (Linux 6.17)
NVIDIA AI-Assisted Driver Development Disclosed
In a notable industry first, NVIDIA publicly disclosed that development of its preview DRM Color Pipeline API Linux driver was substantially AI-assisted:
“Nearly all of the code was produced by [Claude Sonnet/Opus], but with a strong emphasis on explicit human direction, review, and iteration.”
- The R595-derived preview driver enables Wayland compositors to leverage GPU hardware for HDR color processing
- Signals growing industry normalization of LLM-assisted systems software development
📊 Key Takeaways
AMD had an exceptionally strong week across both the software and silicon fronts: the MI355X’s MLPerf Inference v6.0 results — including multi-node WMXFP4 inference at scale — demonstrate genuine datacenter competitiveness, while ROCDXG going production-ready and the AIE4 NPU patches signal a maturing, more accessible ROCm ecosystem that now spans Windows WSL2, Linux, and next-gen NPU silicon. NVIDIA, meanwhile, showed that software remains its most potent weapon — from Auto Shader Compilation improving the PC gaming experience to Dynamo and CUDA-layer optimizations driving MLPerf leadership, and even publicly normalizing AI-assisted driver development with Claude.
The broader Linux/open-source GPU ecosystem is at an inflection point: Rust-based drivers (Nova for NVIDIA, Tyr for Mali), Mesa 26.0, the Linux 7.0 kernel, and Ubuntu 26.04’s imminent release are converging to deliver a materially better open-source GPU compute and graphics experience — a rising tide that benefits AMD’s ROCm ambitions, NVIDIA’s Wayland HDR story, and Intel’s Arc/Xe3 momentum simultaneously.