Update: 2026-03-25 (07:17 AM)

Executive Summary

AMD Local AI Ecosystem Expansion: AMD’s Ryzen AI (XDNA 2) NPU usability on Linux takes a major step forward with the release of Lemonade 10.0.1. The update vastly reduces setup friction across major Linux distributions (Ubuntu, Arch, Fedora) and introduces streamlined Hugging Face GGUF integration and support for the Qwen3.5-4B model.
NVIDIA Tackles Grid Constraints: NVIDIA and Emerald AI have successfully demonstrated autonomous, “power-flexible” AI factories. By leveraging seconds-level telemetry on a 96-GPU Blackwell Ultra cluster, they proved data centers can dynamically throttle flexible AI workloads to balance national energy grid spikes without disrupting high-priority tasks. This addresses a major hyperscaler bottleneck and sets a new industry standard for smart-datacenter integration.

🤖 ROCm Updates & Software

[2026-03-25] Lemonade 10.0.1 Improves Setup Process For Using AMD Ryzen AI NPUs On Linux

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

This release represents a significant quality-of-life upgrade for AMD developers running local LLMs on Linux, reducing deployment friction for XDNA 2 NPUs and bridging the gap in local edge AI user experience compared to competitors.

Summary:

The Lemonade SDK 10.0.1 update streamlines the installation and execution of local LLMs on Linux via AMD Ryzen AI NPUs, introducing Ubuntu PPA support, new distribution guides, Hugging Face optimizations, and Qwen3.5-4B compatibility.

Details:

Version Updates: Releasing as Lemonade 10.0.1, building upon the recently launched Lemonade SDK 10.0 and FastFlowLM 0.9.35.
Hardware Targeting: Explicitly designed to leverage AMD XDNA 2 NPUs for running Large Language Models (LLMs) efficiently natively on Linux environments.
Linux Distribution Support Expansion:
- Added Debian packages available via a Personal Package Archive (PPA) for easier Ubuntu Linux installation.
- Added specific FastFlowLM setup guides for Arch Linux users.
- Added formal installation documentation for Fedora.
UX/UI Improvements: Implemented native system tray support utilizing AppIndicator3, alongside a generally smoother FastFlowLM installation script.
Model Integration & Support:
- Introduced native NPU support for running Qwen3.5-4B using the latest FastFlowLM pipeline.
- Upgraded the bundled Llama.cpp backend.
- Streamlined the process of searching and adding GGUF model formats directly from Hugging Face repositories.

🤼‍♂️ Market & Competitors

[2026-03-25] Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid

Source: NVIDIA Blog

Key takeaway relevant to AMD:

NVIDIA is actively positioning its hardware and telemetry software (SMI) as a solution to data center power constraints—the most severe bottleneck for hyperscale AI growth. AMD must ensure its ROCm SMI and Instinct platform management tools can integrate seamlessly with smart-grid controllers to remain competitive in large-scale data center bids.

Summary:

A collaboration between NVIDIA, Emerald AI, and UK grid operators (National Grid) proved that “power-flexible” AI clusters can dynamically throttle their power usage to stabilize local energy grids during sudden peak demand events without crashing critical workloads.

Details:

Hardware Configuration: Tested on a cluster of 96 NVIDIA Blackwell Ultra GPUs, interconnected via the NVIDIA Quantum-X800 InfiniBand platform at Nebius’ new London AI factory.
Telemetry & Control: Utilized the NVIDIA System Management Interface (SMI) to extract seconds-level GPU power telemetry, feeding data into the Emerald AI Conductor Platform.
Dynamic Throttling: During simulated grid stress events (such as the UK “TV pickup” phenomenon), the AI cluster successfully ramped down its total power consumption to act as a grid shock absorber.
Workload Prioritization: The Emerald AI Conductor successfully isolated workloads. High-priority AI tasks maintained peak throughput, while only secondary, “flexible” jobs were temporarily slowed down.
Performance Metrics: Achieved 100% alignment with over 200 specific power targets set by EPRI and National Grid.
Scope of Control: The power throttling managed not just the GPUs, but also the host CPUs and surrounding total IT equipment.
Deployment Roadmap: Following successful PoC trials in Arizona, Virginia, Illinois, and London, the technology will see its first real-world commercial deployment at the Aurora AI Factory in Virginia later this year.

📈 GitHub Stats

Category	Repository	Total Stars	1-Day	7-Day	30-Day
AMD Ecosystem	AMD-AGI/GEAK-agent	80	0	+2	+15
AMD Ecosystem	AMD-AGI/Primus	82	0	0	+8
AMD Ecosystem	AMD-AGI/TraceLens	64	0	+1	+5
AMD Ecosystem	ROCm/MAD	33	0	+1	+2
AMD Ecosystem	ROCm/ROCm	6,285	+3	+20	+98
Compilers	openxla/xla	4,112	+2	+22	+104
Compilers	tile-ai/tilelang	5,424	+5	+37	+180
Compilers	triton-lang/triton	18,764	+11	+83	+299
Google / JAX	AI-Hypercomputer/JetStream	418	+1	+2	+6
Google / JAX	AI-Hypercomputer/maxtext	2,186	+2	+13	+40
Google / JAX	jax-ml/jax	35,218	+13	+84	+289
HuggingFace	huggingface/transformers	158,388	+61	+376	+1544
Inference Serving	alibaba/rtp-llm	1,074	0	+4	+25
Inference Serving	efeslab/Atom	336	0	0	0
Inference Serving	llm-d/llm-d	2,740	+45	+108	+220
Inference Serving	sgl-project/sglang	25,011	+56	+315	+1345
Inference Serving	vllm-project/vllm	74,287	+125	+754	+3313
Inference Serving	xdit-project/xDiT	2,575	+1	+7	+31
NVIDIA	NVIDIA/Megatron-LM	15,801	+21	+84	+549
NVIDIA	NVIDIA/TransformerEngine	3,240	+2	+17	+70
NVIDIA	NVIDIA/apex	8,938	+2	+4	+11
Optimization	deepseek-ai/DeepEP	9,071	+6	+23	+77
Optimization	deepspeedai/DeepSpeed	41,903	+15	+62	+253
Optimization	facebookresearch/xformers	10,390	+3	+15	+41
PyTorch & Meta	meta-pytorch/monarch	1,000	+1	+9	+23
PyTorch & Meta	meta-pytorch/torchcomms	351	0	+3	+14
PyTorch & Meta	meta-pytorch/torchforge	657	+1	+10	+36
PyTorch & Meta	pytorch/FBGEMM	1,548	0	+4	+14
PyTorch & Meta	pytorch/ao	2,746	+2	+14	+48
PyTorch & Meta	pytorch/audio	2,848	+1	+4	+17
PyTorch & Meta	pytorch/pytorch	98,563	+34	+212	+871
PyTorch & Meta	pytorch/torchtitan	5,186	+6	+34	+103
PyTorch & Meta	pytorch/vision	17,584	-1	+13	+60
RL & Post-Training	THUDM/slime	4,964	+21	+135	+647
RL & Post-Training	radixark/miles	1,015	+4	+33	+117
RL & Post-Training	volcengine/verl	20,198	+33	+183	+880