Executive Summary

  • AMD Local AI Ecosystem Expansion: AMD’s Ryzen AI (XDNA 2) NPU usability on Linux takes a major step forward with the release of Lemonade 10.0.1. The update vastly reduces setup friction across major Linux distributions (Ubuntu, Arch, Fedora) and introduces streamlined Hugging Face GGUF integration and support for the Qwen3.5-4B model.
  • NVIDIA Tackles Grid Constraints: NVIDIA and Emerald AI have successfully demonstrated autonomous, “power-flexible” AI factories. By leveraging seconds-level telemetry on a 96-GPU Blackwell Ultra cluster, they proved data centers can dynamically throttle flexible AI workloads to balance national energy grid spikes without disrupting high-priority tasks. This addresses a major hyperscaler bottleneck and sets a new industry standard for smart-datacenter integration.

🤖 ROCm Updates & Software

[2026-03-25] Lemonade 10.0.1 Improves Setup Process For Using AMD Ryzen AI NPUs On Linux

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

  • This release represents a significant quality-of-life upgrade for AMD developers running local LLMs on Linux, reducing deployment friction for XDNA 2 NPUs and bridging the gap in local edge AI user experience compared to competitors.

Summary:

  • The Lemonade SDK 10.0.1 update streamlines the installation and execution of local LLMs on Linux via AMD Ryzen AI NPUs, introducing Ubuntu PPA support, new distribution guides, Hugging Face optimizations, and Qwen3.5-4B compatibility.

Details:

  • Version Updates: Releasing as Lemonade 10.0.1, building upon the recently launched Lemonade SDK 10.0 and FastFlowLM 0.9.35.
  • Hardware Targeting: Explicitly designed to leverage AMD XDNA 2 NPUs for running Large Language Models (LLMs) efficiently natively on Linux environments.
  • Linux Distribution Support Expansion:
    • Added Debian packages available via a Personal Package Archive (PPA) for easier Ubuntu Linux installation.
    • Added specific FastFlowLM setup guides for Arch Linux users.
    • Added formal installation documentation for Fedora.
  • UX/UI Improvements: Implemented native system tray support utilizing AppIndicator3, alongside a generally smoother FastFlowLM installation script.
  • Model Integration & Support:
    • Introduced native NPU support for running Qwen3.5-4B using the latest FastFlowLM pipeline.
    • Upgraded the bundled Llama.cpp backend.
    • Streamlined the process of searching and adding GGUF model formats directly from Hugging Face repositories.

🤼‍♂️ Market & Competitors

[2026-03-25] Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid

Source: NVIDIA Blog

Key takeaway relevant to AMD:

  • NVIDIA is actively positioning its hardware and telemetry software (SMI) as a solution to data center power constraints—the most severe bottleneck for hyperscale AI growth. AMD must ensure its ROCm SMI and Instinct platform management tools can integrate seamlessly with smart-grid controllers to remain competitive in large-scale data center bids.

Summary:

  • A collaboration between NVIDIA, Emerald AI, and UK grid operators (National Grid) proved that “power-flexible” AI clusters can dynamically throttle their power usage to stabilize local energy grids during sudden peak demand events without crashing critical workloads.

Details:

  • Hardware Configuration: Tested on a cluster of 96 NVIDIA Blackwell Ultra GPUs, interconnected via the NVIDIA Quantum-X800 InfiniBand platform at Nebius’ new London AI factory.
  • Telemetry & Control: Utilized the NVIDIA System Management Interface (SMI) to extract seconds-level GPU power telemetry, feeding data into the Emerald AI Conductor Platform.
  • Dynamic Throttling: During simulated grid stress events (such as the UK “TV pickup” phenomenon), the AI cluster successfully ramped down its total power consumption to act as a grid shock absorber.
  • Workload Prioritization: The Emerald AI Conductor successfully isolated workloads. High-priority AI tasks maintained peak throughput, while only secondary, “flexible” jobs were temporarily slowed down.
  • Performance Metrics: Achieved 100% alignment with over 200 specific power targets set by EPRI and National Grid.
  • Scope of Control: The power throttling managed not just the GPUs, but also the host CPUs and surrounding total IT equipment.
  • Deployment Roadmap: Following successful PoC trials in Arizona, Virginia, Illinois, and London, the technology will see its first real-world commercial deployment at the Aurora AI Factory in Virginia later this year.

📈 GitHub Stats

Category Repository Total Stars 1-Day 7-Day 30-Day
AMD Ecosystem AMD-AGI/GEAK-agent 80 0 +2 +15
AMD Ecosystem AMD-AGI/Primus 82 0 0 +8
AMD Ecosystem AMD-AGI/TraceLens 64 0 +1 +5
AMD Ecosystem ROCm/MAD 33 0 +1 +2
AMD Ecosystem ROCm/ROCm 6,285 +3 +20 +98
Compilers openxla/xla 4,112 +2 +22 +104
Compilers tile-ai/tilelang 5,424 +5 +37 +180
Compilers triton-lang/triton 18,764 +11 +83 +299
Google / JAX AI-Hypercomputer/JetStream 418 +1 +2 +6
Google / JAX AI-Hypercomputer/maxtext 2,186 +2 +13 +40
Google / JAX jax-ml/jax 35,218 +13 +84 +289
HuggingFace huggingface/transformers 158,388 +61 +376 +1544
Inference Serving alibaba/rtp-llm 1,074 0 +4 +25
Inference Serving efeslab/Atom 336 0 0 0
Inference Serving llm-d/llm-d 2,740 +45 +108 +220
Inference Serving sgl-project/sglang 25,011 +56 +315 +1345
Inference Serving vllm-project/vllm 74,287 +125 +754 +3313
Inference Serving xdit-project/xDiT 2,575 +1 +7 +31
NVIDIA NVIDIA/Megatron-LM 15,801 +21 +84 +549
NVIDIA NVIDIA/TransformerEngine 3,240 +2 +17 +70
NVIDIA NVIDIA/apex 8,938 +2 +4 +11
Optimization deepseek-ai/DeepEP 9,071 +6 +23 +77
Optimization deepspeedai/DeepSpeed 41,903 +15 +62 +253
Optimization facebookresearch/xformers 10,390 +3 +15 +41
PyTorch & Meta meta-pytorch/monarch 1,000 +1 +9 +23
PyTorch & Meta meta-pytorch/torchcomms 351 0 +3 +14
PyTorch & Meta meta-pytorch/torchforge 657 +1 +10 +36
PyTorch & Meta pytorch/FBGEMM 1,548 0 +4 +14
PyTorch & Meta pytorch/ao 2,746 +2 +14 +48
PyTorch & Meta pytorch/audio 2,848 +1 +4 +17
PyTorch & Meta pytorch/pytorch 98,563 +34 +212 +871
PyTorch & Meta pytorch/torchtitan 5,186 +6 +34 +103
PyTorch & Meta pytorch/vision 17,584 -1 +13 +60
RL & Post-Training THUDM/slime 4,964 +21 +135 +647
RL & Post-Training radixark/miles 1,015 +4 +33 +117
RL & Post-Training volcengine/verl 20,198 +33 +183 +880