# Prerequisites

This document introduces the prerequisites for the MoAI Inference Framework and provides instructions on how to install them.

To follow this document, you need to understand the configuration of the Kubernetes cluster where the MoAI Inference Framework will be installed. Since Moreh provides support for installing the MoAI Inference Framework at customer sites, if you encounter any difficulties, you can request assistant from the Moreh team.

# Target system

To install the MoAI Inference Framework, you must have

Kubernetes 1.29 or later
At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
cluster-admin privilege for the Kubernetes cluster
A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.)
A Docker private registry accessible from the Kubernetes cluster

# cert-manager

cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes workloads. It is essential for managing TLS certificates within the MoAI Inference Framework.

Deploy cert-manager using the following command:

helm upgrade -i cert-manager oci://quay.io/jetstack/charts/cert-manager \
    --version v1.18.4 \
    -n cert-manager \
    --create-namespace \
    --set crds.enabled=true

# moai-inference-framework

The moai-inference-framework Helm chart deploys the dependencies required by the MoAI Inference Framework, excluding GPU and network-related components. To deploy it, you first need to add Moreh's Helm chart repository.

helm repo add moreh https://moreh-dev.github.io/helm-charts

If you have already added the repository, make sure to update it.

helm repo update moreh

The container images for the MoAI Inference Framework are distributed through a private repository on Amazon ECR 255250787067.dkr.ecr.ap-northeast-2.amazonaws.com, and you need to obtain an authorization token to download them. To facilitate this, the moai-inference-framework chart defines the installation of an ECR token refresher. You need to specify your AWS credentials in the values file to configure the ECR token refresher.

The AWS credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) should have been provided to you along with your purchase or trial issuance of the MoAI Inference Framework. If you did not receive this information, please contact your point of purchase separately.

Create a moai-inference-framework-values.yaml file as follows. You need to replace <AWS_ACCESS_KEY_ID> and <AWS_SECRET_ACCESS_KEY> with your own values.

moai-inference-framework-values.yaml
ecrTokenRefresher:
  aws:
    accessKeyId: <AWS_ACCESS_KEY_ID>
    secretAccessKey: <AWS_SECRET_ACCESS_KEY>

In addition, if dependencies such as keda, kube-prometheus-stack, lws are already installed in your cluster, you should skip their installation by setting the corresponding values to false in the moai-inference-framework-values.yaml file. Refer to moai-inference-framework README to see the full list of dependencies.

Then, deploy the moai-inference-framework chart using the following command:

helm upgrade -i mif moreh/moai-inference-framework \
    --version v0.1.0 \
    -n mif \
    --create-namespace \
    -f moai-inference-framework-values.yaml

# moai-inference-preset

The moai-inference-preset Helm chart deploys the presets for the MoAI Inference Framework. The presets define preconfigured ways to run inference containers (e.g., Moreh vLLM containers) in the MoAI Inference Framework.

helm upgrade -i moai-inference-preset moreh/moai-inference-preset \
    --version v0.1.0 \
    -n mif

# AMD GPU operator

This section describes how to set up the AMD GPU Operator on a Kubernetes cluster. See AMD GPU Operator / Kubernetes (Helm) for more details.

Add the ROCm's GPU Operator Helm chart repository.

helm repo add rocm https://rocm.github.io/gpu-operator
helm repo update rocm

Create a namespace for the AMD GPU Operator.

kubectl create namespace amd-gpu

During the installation of the AMD GPU Operator, the GPU driver image needs to be built and pushed to a Docker registry. For more details, see AMD GPU Operator / Preparing Pre-compiled Driver Images. The private registry mentioned earlier in the "target system" section can be used for this purpose.

Create a Docker registry secret in the amd-gpu namespace to enable access to the private registry. Set the <registry>, <username>, and <password> values to the information for your private registry.

kubectl create secret -n amd-gpu \
    docker-registry private-registry \
    --docker-server=<registry> \
    --docker-username=<username> \
    --docker-password=<password>

Then, create a gpu-operator-values.yaml file with the following content. Please replace <registry> on line 7 with the URL of your private registry. You may also change the image name amdgpu-driver, if necessary, according to your private registry's policies.

gpu-operator-values.yaml
deviceConfig:
  spec:
    driver:
      enable: true
      version: "6.4.3"
      blacklist: true
      image: "<registry>/amdgpu-driver"
      imageRegistrySecret:
        name: private-registry
      imageRegistryTLS:
        insecure: false
        insecureSkipTLSVerify: false
      tolerations: &tolerations
        - key: amd.com/gpu
          operator: Exists
          effect: NoSchedule
    devicePlugin:
      devicePluginTolerations: *tolerations
    metricsExporter:
      prometheus:
        serviceMonitor:
          enable: true
          interval: 10s
          labels:
            release: prometheus-stack
      tolerations: *tolerations

node-feature-discovery:
  worker:
    tolerations: *tolerations

If the AMD GPU Operator is already installed on your system, verify that the toleration key is set to amd.com/gpu. MoAI Inference Framework assumes this name.

You can install the AMD GPU Operator as follows.

helm upgrade -i gpu-operator rocm/gpu-operator-charts \
    --version v1.4.0 \
    -n amd-gpu \
    -f gpu-operator-values.yaml

Note that installing the operator and GPU driver may take some time. After the installation is complete, you can verify that the gpu-operator pods are running using the following command.

kubectl get pods -n amd-gpu

Expected output
NAME                                                              READY   STATUS    RESTARTS   AGE
default-device-plugin-fxj66                                       1/1     Running   0          108s
default-metrics-exporter-r2l6h                                    1/1     Running   0          108s
default-node-labeller-qhqdl                                       1/1     Running   0          2m35s
gpu-operator-gpu-operator-charts-controller-manager-69856dhd67k   1/1     Running   0          4m20s
gpu-operator-kmm-controller-7b5dd7b48b-fpcv6                      1/1     Running   0          4m20s
gpu-operator-kmm-webhook-server-c7bfc864-tfqdb                    1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-gc-7649c47d5d-55rcn           1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-master-fc889959c-sx7wv        1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-worker-4tnns                  1/1     Running   0          4m20s

Tip

You can monitor the installation progress using the kubectl get pods -n amd-gpu -w command instead.

# RDMA device plugin

# Host driver and OFED installation

You need to install the device drivers and OFED software for InfiniBand or RoCE NICs on the host OS. Follow the instructions provided by your hardware vendor.

This must be completed before joining the node to the Kubernetes cluster. By running the following command on the host OS, you can verify that the OFED software has been installed correctly and that it recognizes the NICs. If no devices are shown, there is an issue with the installation.

ibv_devices

Expected output
device           node GUID
<device_name>    <16-hex GUID>
<device_name>    <16-hex GUID>
...

# RDMA device plugin installation

This section describes how to install the rdma-shared-device-plugin. See k8s-rdma-shared-dev-plugin / README for more details.

First, create a rdma-shared-device-plugin.yaml file as follows. You need to replace <device> on line 21 with your RDMA NIC's network interface name. If multiple NICs are installed on the server, you must list all interface names (e.g., "devices": ["ib0", "ib1"]).

Tip

You can check the network interface names using the ip addr command.

rdma-shared-device-plugin.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: rdma-devices
  namespace: kube-system
  labels:
    app.kubernetes.io/name: rdma-shared-device-plugin
    app.kubernetes.io/version: v1.5.2
    app.kubernetes.io/instance: rdma-shared-device-plugin
data:
  config.json: |
    {
      "periodicUpdateInterval": 300,
      "configList": [
        {
          "resourcePrefix": "mellanox",
          "resourceName": "hca",
          "rdmaHcaMax": 1000,
          "devices": [
            "<device>"
          ]
        }
      ]
    }

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: rdma-shared-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: rdma-shared-device-plugin
    app.kubernetes.io/version: v1.5.2
    app.kubernetes.io/instance: rdma-shared-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: rdma-shared-device-plugin
      app.kubernetes.io/instance: rdma-shared-device-plugin
  updateStrategy:
    rollingUpdate:
      maxUnavailable: "30%"
  template:
    metadata:
      labels:
        app.kubernetes.io/name: rdma-shared-device-plugin
        app.kubernetes.io/version: v1.5.2
        app.kubernetes.io/instance: rdma-shared-device-plugin
    spec:
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
        - key: amd.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
        - name: device-plugin
          image: ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.5.2
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: plugins-registry
              mountPath: /var/lib/kubelet/plugins_registry
            - name: config
              mountPath: /k8s-rdma-shared-dev-plugin
            - name: devs
              mountPath: /dev/
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: plugins-registry
          hostPath:
            path: /var/lib/kubelet/plugins_registry
        - name: config
          configMap:
            name: rdma-devices
            items:
              - key: config.json
                path: config.json
        - name: devs
          hostPath:
            path: /dev/

If the RDMA device plugin is already installed on your system, verify that the resource name is set to mellanox/hca. MoAI Inference Framework assumes this name. This does not imply that the actual hardware vendor must be Mellanox.

Then, create an rdma-shared-device-plugin DaemonSet using the following command.

kubectl apply -f rdma-shared-device-plugin.yaml

You can verify that the rdma-shared-device-plugin pods are running using the following command.

kubectl get pods -n kube-system -l app.kubernetes.io/instance=rdma-shared-device-plugin

Expected output
NAME                              READY   STATUS    RESTARTS   AGE
rdma-shared-device-plugin-wh9fz   1/1     Running   0          7s

# Node labeling for heterogeneous accelerators

The moai-accelerator NodeFeatureRule enables the identification of heterogeneous accelerators by assigning vendor and model labels to nodes. This metadata facilitates targeted scheduling or automated selection of optimal accelerator. For a full list of supported hardware, please refer to supported devices.

Apply the moai-accelerator NodeFeatureRule using the following command.

kubectl apply -f https://raw.githubusercontent.com/moreh-dev/mif/refs/heads/main/config/nfd/moai-accelerator.yaml

# Gateway

Add the Gateway API and Gateway API Inference Extension CRDs.

kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.1.0/manifests.yaml

You can verify the CRDs are installed using the following command.

kubectl get crd | grep 'networking.k8s*.io'

Expected output
gatewayclasses.gateway.networking.k8s.io            2025-12-12T02:03:07Z
gateways.gateway.networking.k8s.io                  2025-12-12T02:03:07Z
grpcroutes.gateway.networking.k8s.io                2025-12-12T02:03:07Z
httproutes.gateway.networking.k8s.io                2025-12-12T02:03:07Z
inferencepools.inference.networking.k8s.io          2025-12-12T02:03:08Z
referencegrants.gateway.networking.k8s.io           2025-12-12T02:03:07Z

You can use any gateway controller compatible with the Gateway API Inference Extension. We recommend using either Istio or Kgateway, and installation instructions for both are provided below.

Add the Istio Helm chart repository.

helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update istio

Install the Istio base chart.

helm upgrade -i istio-base istio/base \
    --version 1.28.1 \
    -n istio-system \
    --create-namespace

Create a istiod-values.yaml file and install the Istio control plane.

istiod-values.yaml
pilot:
  env:
    PILOT_ENABLE_ALPHA_GATEWAY_API: "true"
    ENABLE_GATEWAY_API_INFERENCE_EXTENSION: "true"

helm upgrade -i istiod istio/istiod \
    --version 1.28.1 \
    -n istio-system \
    -f istiod-values.yaml

Install the Kgateway CRDs.

helm upgrade -i kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds \
    --version v2.1.1 \
    -n kgateway-system \
    --create-namespace

Create a kgateway-values.yaml file and install the Kgateway controller.

kgateway-values.yaml
inferenceExtension:
  enabled: true

helm upgrade -i kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
    --version v2.1.1 \
    -n kgateway-system \
    -f kgateway-values.yaml