#
Heimdall scheduler
Heimdall is the component that performs smart routing and scheduling across multiple inference pods, deciding which pod each request should be sent to. It is implemented according to the Kubernetes Gateway API Inference Extension and can operate together with various gateway controllers.
Heimdall's behavior can be flexibly defined by composing multiple scheduling profiles, filters, scorers, and pickers as plugins. The MoAI Inference Framework provides a wide range of plugins — from advanced plugins that enable fully automated routing and scheduling to low-level plugins that allow fine-grained customization.
#
Architecture
#
Control flow
sequenceDiagram
participant Gateway
participant Inference-pod as Inference pod
participant Heimdall
participant Scheduling-profile-1 as Scheduling profile 1
participant Scheduling-profile-2 as Scehduling profile 2
Gateway ->>+ Heimdall: Request
loop
Note over Heimdall: Profile handler plugin
alt If the scheduling profile 1 is picked
Heimdall ->>+ Scheduling-profile-1: Request
Note over Scheduling-profile-1: Filter plugins
Note over Scheduling-profile-1: Scorer plugins
Note over Scheduling-profile-1: Picker plugin
Scheduling-profile-1 -->>- Heimdall: Routing candidates
else If the scheduling profile 2 is picked
Heimdall ->>+ Scheduling-profile-2: Request
Note over Scheduling-profile-2: Filter plugins
Note over Scheduling-profile-2: Scorer plugins
Note over Scheduling-profile-2: Picker plugin
Scheduling-profile-2 -->>- Heimdall: Routing candidates
end
end
Note over Heimdall: Pre-request plugins
Heimdall -->>- Gateway: Updated request and routing information
Gateway ->>+ Inference-pod: Request
Inference-pod -->>- Gateway: Response
Gateway ->>+ Heimdall: Response
Note over Heimdall: Post-response plugins
Heimdall -->>- Gateway: Updated response
#
Plugins
MoAI Inference Framework supports six categories of plugins — profile handler, pre-request, post-response, filter, scorer, and picker plugins. Users need to instantiate and activate some of the plugins to define the behavior of Heimdall.
#
Profile handler
#
Filter
#
Scorer
#
Picker
#
Configuration
global:
imagePullSecrets:
- name: moreh-registry
config:
apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
- type: pd-profile-handler
- type: prefill-filter
- type: decode-filter
- type: queue-scorer
- type: kv-cache-utilization-scorer
- type: max-score-picker
schedulingProfiles:
- name: prefill
plugins:
- pluginRef: prefill-filter
- pluginRef: queue-scorer
- pluginRef: kv-cache-utilization-scorer
- pluginRef: max-score-picker
- name: decode
plugins:
- pluginRef: decode-filter
- pluginRef: queue-scorer
- pluginRef: kv-cache-utilization-scorer
- pluginRef: max-score-picker
gateway:
name: mif
gatewayClassName: istio
serviceMonitor:
labels:
release: prometheus-stack
#
config section
The config section defines Heimdall's routing rules. As explained earlier, this is achieved by appropriately combining and configuring multiple plugins.
This section begins with config.apiVersion and config.kind, which should be set to inference.networking.x-k8s.io/v1alpha1 and EndpointPickerConfig, respectively.
The config.plugins list instantiates the plugins to be used, specifying their respective parameters. In most cases, a single instance per plugin is sufficient, but you may instantiate the same plugin multiple times under different names if you need to apply different parameters. Each plugin instance is defined by an entry of the following form:
- name: <instanceName>
type: <pluginName>
parameters:
<parameter>: <value>
name(optional): The name used to reference the plugin instance in scheduling profiles. If omitted, the plugin'stypeis used as its name.type(required): The predefined name of the plugin to instantiate.parameters(optional): Configuration options specific to this plugin instance.
The config.schedulingProfiles combines zero or more filter and scorer plugins with exactly one picker plugin to define a rule for making routing decisions. Each scheduling profile is defined by an entry of the following form:
- name: <profileName>
plugins:
- pluginRef: <instanceName>
weight: <weight>
name(required): The name of the scheduling profile.plugins(required): A list of filter, scorer, and picker plugins that make up the scheduling profile.pluginRef(required): The name of the plugin instance as defined in theconfig.pluginssection.weight(optional, for scorer plugins only): A numerical weight that influences the scoring outcome. Higher weights increase the impact of the scorer on the final decision. If omitted, a default weight of 1 is applied.