Contents

At the Main Conference
SGLang Featured in the GTC Keynote
Open-Source AI Panel at GTC
SGLang Training Lab at GTC 2026
Side Events
SGLang × RadixArk GTC Happy Hour
Banghua at Novita's GTC Event
LinkedIn × SGLang Meetup: LLMs for Search & Recommendation
LinkedIn Engineering Talks
SGLang: Roadmap and Miles Framework
Industry Speakers
Panel Discussion
Looking Ahead

Highlights of SGLang at NVIDIA GTC 2026

The SGLang TeamMarch 31, 2026

SGLang came to NVIDIA GTC 2026 with panels, a happy hour, a 200-person meetup, and a hands-on training lab. Three days, five events, one packed week at the center of the LLM ecosystem and left with a lot to share. If you missed it, here's the full recap.

SGLang at GTC 2026: five events, three days.

At the Main Conference

SGLang Featured in the GTC Keynote

SGLang was featured on the NVIDIA AI ecosystem slide during Jensen Huang's GTC keynote. We are honored to be recognized as part of the infrastructure stack behind AI-native applications.

SGLang on NVIDIA's AI ecosystem slide during the GTC 2026 keynote.

📝 X recap post

Open-Source AI Panel at GTC

On Tuesday, Ying Sheng joined the GTC panel "The State of Open-Source AI" alongside Vartika Singh (Strategic AI Lead, NVIDIA), Jonathan Cohen (VP of Applied Research, NVIDIA), Ion Stoica (Professor, EECS, UC Berkeley), Jeff Boudier (VP of Product, Hugging Face), and Ranjay Krishna (Director of Multimodal and Embodied AI, Ai2).

The panel examined open-source AI's growing role as the primary R&D engine for sophisticated AI systems: what makes open ecosystems trustworthy, scalable, and production-ready, and the community infrastructure enabling reproducible, auditable research.

Ying Sheng (second from left) on the "The State of Open-Source AI" panel at GTC 2026.

🎬 Watch the recording on NVIDIA On-Demand

SGLang Training Lab at GTC 2026

On Thursday morning, the RadixArk team led an official GTC training lab: "High-Performance LLM Serving and Training with SGLang".

The lab covered three areas:

Performance tuning with the SGLangCookbook: practical techniques for improving serving throughput and latency in real deployments
Profiling and bottleneck analysis: a developer-oriented walkthrough of identifying and resolving performance bottlenecks in LLM serving systems
SGLang × Miles RL integration: a live demonstration of running SGLang as the inference backend inside a real RL training loop using the Miles framework

The SGLang Training Lab at GTC 2026: hands-on LLM performance tuning and RL training.

🎬 Watch full recording on NVIDIA On-Demand

📁 Download the training lab materials

Side Events

SGLang × RadixArk GTC Happy Hour

On Tuesday evening, SGLang and RadixArk co-hosted a GTC Happy Hour that brought together builders, researchers, and founders from across the inference and training ecosystem, including friends from OpenAI, xAI, DeepMind, Meta, NVIDIA, Ollama, and more.

SGLang × RadixArk Happy Hour.

The evening featured two technical spotlights:

Banghua Zhu (RadixArk) introduced RadixArk and Miles, SGLang's native RL training framework purpose-built for large-scale MoE post-training workloads.
Jason Zhao (ScitiX) presented SiMM, an open-source in-memory KV cache engine integrated with SGLang for long-context serving.

Banghua introducing RadixArk and the Miles RL framework.

Thank you to Z Potentials and ScitiX for sponsoring the event and making it possible.

📝 X recap post

Banghua at Novita's GTC Event

Banghua Zhu joined Novita's GTC event with over 700 attendees. The discussion covered Jensen Huang's remarks on the inflection point between inference cost and demand, the key drivers behind the agentic AI movement, and what it takes for AI products to deliver real value. Banghua shared his perspective on how SGLang is shaping the future of inference infrastructure, enabling next-generation use cases from OpenClaw to agentic inference, and driving the evolution of open models and open infrastructure.

Banghua presenting at Novita's GTC event.

Partners represented included NVIDIA, RadixArk, OpenRouter, Google DeepMind, Kimi (Moonshot AI), Alibaba Cloud, MiniMax, Z.ai, Hugging Face, and Kilo Code.

LinkedIn × SGLang Meetup: LLMs for Search & Recommendation

On Wednesday evening, we hosted approximately 200 engineers at LinkedIn's Mountain View headquarters alongside teams from LinkedIn, TikTok, Meta, and NVIDIA for a deep dive into production LLM systems for search and recommendation.

SGLang swag at the LinkedIn meetup.

LinkedIn Engineering Talks

LinkedIn opened with three engineering presentations:

Fedor Borisyuk: Semantic search at scale
Zhipeng Wang: Modeling optimizations for LLM-driven ranking
Sundara Raman Ramachandran: LLM inference infrastructure optimizations, including a prefill-only serving path delivering 2–3× throughput gains on H100s, upstreamed back to SGLang

LinkedIn engineers presenting on semantic search, ranking, and inference infrastructure.

Relevant work from LinkedIn's engineering team: [1] [2] [3] [4] [5] [6] [7]

SGLang: Roadmap and Miles Framework

SGLang core developer Liangsheng Yin walked through SGLang's H1 2026 roadmap.

Mao Cheng then presented the Miles RL framework, addressing training–inference mismatch in production through three core techniques:

Importance sampling corrections: compensating for distribution shift between training and inference
Inference-training alignment: ensuring consistency between rollout behavior and gradient updates
Rollout Routing Replay (R3): replay-based routing for efficient use of generated rollout data

Mao Cheng presenting the Miles RL framework and its approach to training–inference alignment.

Industry Speakers

Hongyu Lu (TikTok): LLM search at scale
Luke Simon and Xi Liu (Meta): Generative Reasoning Reranker [paper link]
Anish Maddipoti (NVIDIA): Dynamo + NeMoRL

Panel Discussion

The closing panel, hosted by Qing Lan, featured Wenfeng Zhuo, Fedor Borisyuk, Luke Simon, and Mao Cheng. Topics included:

Semantic ID vs. embedding retrieval
Whether unified retrieval + ranking (OneRec-style systems) is production-ready
Inference and training challenges in LLM recsys
Recent breakthroughs accelerating LLM adoption for recommendations
The role of continuous learning in production recommendation systems

The closing panel: Wenfeng Zhuo, Fedor Borisyuk, Luke Simon, and Mao Cheng, moderated by Qing Lan.

This is exactly the kind of collaboration that will define the next generation of recommendation systems: production teams and open-source infrastructure co-evolving together.

📝 LinkedIn recap post

Looking Ahead

GTC 2026 made clear how much the production ecosystem is converging around open-source infrastructure. From semantic search at LinkedIn scale to RL post-training for frontier MoE models, SGLang is increasingly the shared layer underneath.

We'll keep building in the open. Follow our Luma calendar for future meetups, office hours, and community events.