Kubernetes v1.36 [alpha](disabled by default)Workload-aware preemption introduces a preemption mechanism specifically designed for PodGroups. When a PodGroup cannot be scheduled, the scheduler utilizes a preemption logic that tries to make scheduling of this PodGroup possible. This approach is used exclusively during PodGroup scheduling and replaces the default preemption mechanism for pods from a given PodGroup.
When this feature is enabled, the scheduler treats the PodGroup as a single preemptor unit, rather than evaluating individual pods from a PodGroup in isolation. To make room for the pending pods in the group, it searches for victims across the entire cluster, and knows how to treat and preempt other PodGroups as victims according to their disruption modes.
This feature depends on the Gang Scheduling
and the Workload API.
Ensure the GenericWorkload
and GangScheduling feature gates
and the scheduling.k8s.io/v1alpha2 API group are enabled in the cluster.
The workload-aware preemption process follows the same principles as default preemption with a few differences:
Cluster-wide domain: Instead of evaluating preemption node by node, the scheduler evaluates the entire cluster as a single domain. It selects a set of victims across multiple nodes that can be removed to make enough room for the preemptor PodGroup to be scheduled.
Victim importance hierarchy: The scheduler decides which preemption units (individual pods or PodGroups) are more critical and should be spared from preemption using a strict hierarchy:
Pod group priority and disruption: The scheduler considers the specific priority and disruption mode of a PodGroup to evaluate if and how its pods can be preempted during preemption events.
priority or disruptionMode fields of that PodGroup.