Technical Intelligence Report: 2026-02-16

Executive Summary

Linux Kernel 7.0 & AMD Zen 5: Mainline Linux 7.0 has merged critical CXL address translation support for AMD Zen 5 (EPYC 9005) systems, resolving “Normalized Address” handling via ACPI PRM.
Intel Compute Stack Competition: Benchmarks of Intel’s “Panther Lake” (Arc B390) on Linux 6.19 reveal strong “out-of-the-box” OpenCL performance using the open-source Intel Compute Runtime, positioning it as a viable competitor to AMD Strix Point/ROCm 7.2 configurations.
NVIDIA Infrastructure Expansion: NVIDIA released performance claims for the GB300 NVL72 (Blackwell Ultra), targeting “Agentic AI” with 50x efficiency gains over Hopper and 1.5x lower costs for long-context workloads compared to the standard GB200.

Source: Phoronix

Key takeaway relevant to AMD:

Ensures stability and correct memory addressing for AMD EPYC 9005 (Zen 5) servers using CXL devices running on the latest Linux kernels.
Foundational work that prepares the Linux ecosystem for upcoming Zen 6 platforms.

Summary:

Linux kernel 7.0 has successfully merged ACPI PRMT-based address translation for the Compute Express Link (CXL) subsystem after ten rounds of code review.
This update specifically addresses how AMD Zen 5 platforms handle physical address translation.

Details:

Technical Challenge: Zen 5 systems can be configured to use “Normalized addresses,” where Host Physical Addresses (HPA) differ from System Physical Addresses (SPA). In this mode, CXL endpoints are programmed in passthrough (DPA == HPA) with interleaving disabled.
The Solution: The kernel now uses the ACPI Platform Runtime Mechanism (PRM) handler to translate the Device Physical Address (DPA) to SPA.
Implementation:
- Introduces a new file: core/atl.c (handling ACPI PRM-specific translation).
- While naming mimics the AMD Address Translation Library (CONFIG_AMD_ATL), this kernel implementation is vendor-agnostic and relies on Kbuild/Kconfig options.
Hardware Scope: Debuted with AMD EPYC 9005 series; expected to persist in Zen 6.
Additional Changes: The merge also includes CXL port error protocol handling/reporting and documentation updates for ACPI PRM CXL Address Translation.

Source: Phoronix

Key takeaway relevant to AMD:

Intel’s open-source compute stack on Linux is maturing rapidly. The Arc B390 (Xe3) is showing strong competition against AMD Ryzen AI 9 HX 370 (Strix Point) in OpenCL workloads.
AMD ROCm 7.2 is the current benchmark standard for comparison against Intel’s latest Compute Runtime.

Summary:

Phoronix benchmarked the Intel Core Ultra X7 358H “Panther Lake” with Arc B390 Xe3 graphics using the open-source Intel Compute Runtime on Linux.
Performance was compared against prior Intel generations and the AMD Ryzen AI 9 HX 370.

Details:

Test Environment:
- OS: Linux 6.19 (Ubuntu 26.04 development build).
- Intel Stack: Compute Runtime 26.05.37020.3, Intel Graphics Compiler 2.28.4.
- AMD Stack: ROCm 7.2 (used on Ryzen AI 9 HX 370 / ASUS Zenbook S 14).
Hardware Comparison:
- Intel: Core Ultra X7 358H (Panther Lake/Xe3), Core Ultra 7 258V (Lunar Lake), various older gens (Meteor/Alder/Tiger Lake).
- AMD: Ryzen AI 9 HX 370 (Strix Point).
- Note: No “Strix Halo” (Ryzen AI Max+) samples were available for this test cycle.
Observations:
- Intel Arc B390 worked “out-of-the-box” with the production support in the compute runtime.
- Benchmarks focused on OpenCL and GPU compute performance.
- Testing included SoC power consumption monitoring to evaluate efficiency alongside raw throughput.

Source: NVIDIA Blog

Key takeaway relevant to AMD:

NVIDIA is raising the bar for “Agentic AI” and long-context inference, directly challenging the memory-capacity advantages often touted by AMD’s MI300/MI325 series.
The software optimization stack (TensorRT-LLM, Mooncake) is cited as a major driver of these gains, emphasizing the need for continued ROCm software optimization.

Summary:

NVIDIA released data regarding the Blackwell Ultra (GB300 NVL72) platform, highlighting massive efficiency gains over the Hopper platform and cost reductions compared to the standard GB200.
The focus is on “Agentic AI” workloads which require low latency and processing of massive codebases.

Details:

GB300 NVL72 vs. Hopper:
- Throughput: Up to 50x higher throughput per megawatt.
- Cost: 35x lower cost per token for low-latency workloads.
GB300 NVL72 (Blackwell Ultra) vs. GB200 NVL72:
- Long-Context: 1.5x lower cost per token for workloads with 128,000-token inputs and 8,000-token outputs.
- Compute Specs: 1.5x higher NVFP4 compute performance; 2x faster attention processing.
Software Stack Optimizations:
- Utilizes TensorRT-LLM, NVIDIA Dynamo, Mooncake, and SGLang.
- TensorRT-LLM alone delivered 5x better performance on GB200 for low-latency workloads compared to 4 months prior.
- Features “Programmatic dependent launch” to minimize kernel idle time.
Future Roadmap (Rubin Platform):
- NVIDIA teased the “Vera Rubin NVL72.”
- Claims 10x higher throughput per megawatt compared to Blackwell for MoE inference.
- Can train large MoE models using 1/4th the number of GPUs compared to Blackwell.