TEN
|
Industry Insights

Why GPU Scheduling is Non-Negotiable in 2026: KubeCon EU Highlights

May 11, 2026

Why GPU Scheduling is Non-Negotiable in 2026: KubeCon EU Highlights

KubeCon EU 2026, held in Amsterdam, was the largest edition ever with over 13,500 attendees. And the dominant theme this year was clear: AI workloads are now running on Kubernetes at scale.

Among all the announcements, one stood out as the most significant for infrastructure operators: NVIDIA's move.


Why NVIDIA Open-Sourced Its GPU Scheduling Tools

At KubeCon EU 2026, NVIDIA donated three core projects to CNCF (Cloud Native Computing Foundation). CNCF is the nonprofit foundation that manages standards for cloud native technologies, including Kubernetes itself. Donating projects to CNCF is essentially a declaration to transform proprietary technology into an industry-wide standard.

The three donated projects:

1) GPU DRA (Dynamic Resource Allocation) Driver. A tool that enables partial GPU allocation across multiple containers in Kubernetes. Previously, the standard approach was to assign an entire GPU to a single container. DRA changes that structure.

2) KAI Scheduler. A scheduler that automatically places GPU workloads optimally, determining how much GPU to allocate to each task and how to prioritize them. It uses a time-slicing based GPU sharing approach.

3) Grove. A tool for GPU monitoring and management.

The core message of this announcement is singular: GPU infrastructure management is becoming an official standard within the Kubernetes ecosystem.

<이미지 1>


80% of AI Workloads Now Run on Kubernetes

A striking figure was shared during one of the KubeCon sessions: approximately 80% of AI workloads are currently managed on Kubernetes.

Regardless of the exact number, the direction is clear. AI model training, inference, and serving are all happening on Kubernetes container environments, and this trend is accelerating.

The challenge is that Kubernetes was never designed to manage GPUs. It handles CPU and memory resource management natively, but fine-grained GPU control is not provided as a built-in capability. This is why features like GPU scheduling, GPU partitioning, and multi-tenant isolation require separate solutions.

The tools NVIDIA donated are designed to fill exactly this gap.


The Inflection Point: Inference Overtakes Training

The projections shared during KubeCon keynotes are also worth noting. In 2023, roughly two-thirds of AI compute demand was concentrated on training. By the end of 2026, this ratio is expected to reverse, with inference accounting for the larger share.

Inference has fundamentally different characteristics from training. Training is a one-time intensive effort, but inference occurs continuously every time an AI thinks, decides, or acts. As agentic AI expands, this demand runs 24/7 without stopping.

What this transition means for infrastructure operations is straightforward: fixed allocation, manual scheduling, and opaque usage tracking cannot sustain GPU infrastructure in the inference era.

<이미지 2>


GPU Scheduling Is No Longer a Nice-to-Have. It's Essential Infrastructure.

If you had to summarize KubeCon EU 2026's message in one line, it would be this: GPU scheduling and multi-tenant isolation are no longer optional. They are essential infrastructure for every organization running AI workloads on Kubernetes.

That said, a reality check also emerged from the event. While demos and prototypes were plentiful, the consensus was that organizations with production-ready, reliable GPU operations setups were still a minority.

The tools NVIDIA donated provide foundational, primitive-level functionality. The enterprise-grade capabilities actually needed in production, such as fine-grained resource isolation, accurate billing, multi-cluster management, and root cause analysis for failures, must be built on top of these foundations.

<이미지 3>


Where TEN's AIPub Fits in This Landscape

TEN's AIPub is a Kubernetes-based AI workload orchestration platform that already addresses the problems discussed at KubeCon at a production-ready, commercial level.

GPU 100-Block Spatial Partitioning. While NVIDIA's DRA driver enables partial GPU allocation, AIPub goes further by splitting a single GPU into up to 100 blocks. Using spatial partitioning that separates both cores and memory, AIPub achieves 8.15x lower interference compared to time-slicing.

In-House Developed Scheduler. While KAI Scheduler supports time-slicing based GPU sharing, AIPub's scheduler is optimized for spatial partitioning-based GPU management. Its priority-based sequential processing algorithm fundamentally prevents job pending and deadlock states. Operators can directly change priorities and force resource reclamation.

Enterprise-Grade Monitoring. From data center level to container level, AIPub monitors the entire AI infrastructure in real time using over 40 proprietary metrics. It also supports usage-based billing and chargeback to enable a complete FinOps framework.

RBAC (Role-Based Access Control). Access control extends beyond GPU nodes to container images and storage volumes, with fine-grained permissions configurable per project and per user.


The Question for Infrastructure Teams in 2026

KubeCon EU 2026 confirmed the direction of GPU infrastructure operations.

In an era where GPU scheduling is becoming a Kubernetes standard and inference demand is overtaking training, it's time to assess whether your organization's GPU infrastructure is prepared for this transition.

How is your organization's GPU scheduling currently operated?

👉 Learn more about AIPub

📩 Talk to our team