📅 Engineering Report (2025-11-01 - 2025-11-30)

🚀 Executive Summary

November 2025 was a pivotal month for the AI infrastructure ecosystem, characterized by major version releases across almost every significant library. PyTorch released v2.9.1, triggering a wave of compatibility updates across downstream libraries like vLLM (v0.11.x) and TorchVision. The inference landscape remains highly competitive, with vLLM shifting its default build to CUDA 12.9/PyTorch 2.9 and SGLang releasing robust Gateway features for enterprise deployment. Additionally, MaxText saw its first official PyPI release (v0.1.0), marking a maturity milestone for the JAX LLM ecosystem.

🚨 Major ROCm Releases (7.1.0 & 7.1.1): AMD released two significant updates to the ROCm stack. Key features include official support for PyTorch 2.9, initial support for RHEL 10, and extended support for AMD Instinct MI355X and MI350X GPUs.
Virtualization & OS Support: Enhanced virtualization support for MI300X/MI350X/MI355X on newer Linux kernels (RHEL 10.1, 9.7) and Ubuntu 24.04 guest OS support.
Ecosystem Integration:
- vLLM updates included specific fixes for ROCm, including support for AMD Ryzen AI MAX and AI 300 Series, as well as enabling the FlexAttention backend on ROCm.
- TorchTitan documentation was updated to officially acknowledge AMD’s fork, signaling tighter integration.
Performance Tools: New releases of ROCm Compute Profiler (v3.3.1) and Systems Profiler (v1.2.1) brought dynamic process attachment and improved roofline analysis.

Competitive Analysis

NVIDIA Blackwell Readiness: Competitor libraries are aggressively preparing for next-gen hardware. TransformerEngine v2.9 added FP8 block scaling specifically for the Blackwell (SM100) architecture. xFormers v0.0.33 added Cutlass FMHA ops for Blackwell.
Inference Velocity: vLLM v0.11.x has moved to CUDA 12.9.1 and PyTorch 2.9.0 as the default build, widening the software version gap competitors must bridge.
RLHF Scaling: THUDM/slime v0.2.0 introduced a fully FSDP-based backend and PPO support, positioning itself as a serious contender for large-scale RLHF training, supporting both NVIDIA and AMD hardware (via SGLang/vLLM patches).

📂 Category Updates

AMD Ecosystem

[ROCm/ROCm]

Key Activity:
- Major platform updates focusing on next-gen Instinct hardware (MI350/MI355) and enterprise Linux support.
- Extensive library updates (hipBLASLt, MIOpen, RCCL) to support new FP8 datatypes and kernel optimizations.
Details:
- [2025-11-26] 🚨 RELEASE: rocm-7.1.1: Added RHEL 10.1 support, MI355X multimedia engine reset, and PyTorch 2.9 compatibility.
- [2025-11-03] 🚨 RELEASE: rocm-7.1.0: Added RHEL 10.0 support, MI355X/MI350X virtualization support, and deprecated ROCm-EP in favor of MIGraphX.
Metrics: 63 New PRs 47 New Issues

[AMD-AGI/Primus]

Key Activity:
- Documentation restructuring and CLI enhancements.
- Backend improvements for MaxText interaction.
Details:
- [2025-11-17] DOC UPDATE: CLI auto-discover subcommands.
- [2025-11-XX] PR: Backend support for custom model config and model args via CLI for MaxText.
Metrics: 52 New PRs 1 New Issue

[AMD-AGI/TraceLens]

Key Activity:
- Focus on kernel time calculation and telemetry improvements for JAX.
Details:
- [2025-11-XX] Highlight: Calculation of kernel busy time based on HLO op for JAX.
Metrics: 17 New PRs 14 New Issues

PyTorch Ecosystem

[pytorch/pytorch]

Key Activity:
- Maintenance release fixing critical regressions in 3D convolutions and Inductor compilation bugs.
Details:
- [2025-11-12] 🚨 RELEASE: v2.9.1: Fixes memory regressions in F.conv3d with bfloat16 and Inductor bugs related to Gemma compilation.
Metrics: 1481 New PRs 1007 New Issues

[pytorch/torchtitan]

Key Activity:
- Integration improvements and documentation updates.
Details:
- [2025-11-06] DOC UPDATE: Added specific documentation for AMD’s fork of TorchTitan.
- [2025-11-XX] PR: Added ability to precompile models.
Metrics: 88 New PRs 28 New Issues

Inference & Serving

[vllm-project/vllm]

Key Activity:
- Massive refactor moving to PyTorch 2.9 and CUDA 12.9.
- Expanded model support (DeepSeek V3.2, Qwen3-VL, GLM-4.5).
- Significant AMD ROCm specific fixes (Ryzen AI MAX support, AITER backend split).
Details:
- [2025-11-20] 🚨 RELEASE: v0.11.2: Bug fixes for Ray multi-node and async scheduling.
- [2025-11-18] 🚨 RELEASE: v0.11.1: Major update. Default build updated to torch 2.9.0. Added Anthropic API support.
- [2025-11-XX] PR: Enable FlexAttention backend on ROCm.
- [2025-11-XX] PR: Add support for AMD Ryzen AI MAX / AI 300 Series.
Metrics: High activity (1456 commits noted in release).

[sgl-project/sglang]

Key Activity:
- Rapid release cycle for the “Model Gateway” component (v0.2.1 - v0.2.3).
- Core engine update to v0.5.5 supporting video generation.
Details:
- [2025-11-17] 🚨 RELEASE: gateway-v0.2.3: Added PostgreSQL support for chat history and Bucket Mode Routing.
- [2025-11-06] 🚨 RELEASE: v0.5.5: Day 0 support for Kimi-K2 and Minimax-M2. Native video generation support.
Metrics: 0 New PRs recorded (repo data may be snapshot based, but release notes indicate high activity).

[llm-d/llm-d]

Key Activity:
- Release of v0.4.0 featuring a “Midstreamed” vLLM image.
- Added CPU, AWS, and XPU specific images.
Details:
- [2025-11-26] 🚨 RELEASE: v0.4.0: Major component update including llm-d-inference-scheduler and llm-d-routing-sidecar.
- [2025-11-06] 🚨 RELEASE: v0.3.1: Added ARM support and refactored image builds.
Metrics: 0 New PRs recorded (High release activity).

[THUDM/slime]

Key Activity:
- Major release enabling RLHF training at scale.
Details:
- [2025-11-28] 🚨 RELEASE: v0.2.0: FSDP Backend, PPO Support, and MTP Training.
Metrics: 0 New PRs recorded.

JAX Ecosystem

[jax-ml/jax]

Key Activity:
- Feature release focusing on JIT improvements and linear algebra.
Details:
- [2025-11-18] 🚨 RELEASE: jax-v0.8.1: jax.jit now supports decorator factory pattern. New eigh implementation selection.
Metrics: 0 New PRs recorded.

[AI-Hypercomputer/maxtext]

Key Activity:
- Maturation of the library with the first PyPI release.
Details:
- [2025-11-06] 🚨 RELEASE: maxtext-v0.1.0: First PyPI package release.
Metrics: 0 New PRs recorded.

NVIDIA & Competitor Ecosystem

[NVIDIA/TransformerEngine]

Key Activity:
- Preparation for Blackwell architecture.
Details:
- [2025-11-11] 🚨 RELEASE: v2.9: Added FP8 block scaling for NVIDIA Blackwell (SM100).
Metrics: 0 New PRs recorded.

[facebookresearch/xformers]

Key Activity:
- Support for newer hardware and Flash Attention versions.
Details:
- [2025-11-12] 🚨 RELEASE: v0.0.33: Added Cutlass FMHA op for Blackwell GPUs.
Metrics: 0 New PRs recorded.