# Prerequisites

This document introduces the prerequisites for the MoAI Inference Framework and provides instructions on how to install them.


# Target system

To install the MoAI Inference Framework, you must have

  • Kubernetes 1.26 or later
  • At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
  • cluster-admin privilege for the Kubernetes cluster
  • A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics)
  • A Docker private registry accessible from the Kubernetes cluster

# Monitoring components

For the monitoring features of the MoAI Inference Framework, you need to install the Prometheus, Prometheus Operator, Node Exporter, and Grafana using the kube-prometheus-stack Helm chart. First, add the Prometheus Community Helm chart repository.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update prometheus-community

Create a prometheus-stack-values.yaml file as follows.

  • The Prometheus stack installs many components by default, but by configuring it as shown below, you can disable the unnecessary ones and achieve a minimal installation.
  • To create a volume for storing the metrics collected by Prometheus, you need to replace <storageClassName> on line 45 with the name of your own StorageClass.
prometheus-stack-values.yaml
defaultRules:
  create: false

windowsMonitoring:
  enabled: false
alertmanager:
  enabled: false
grafana:
  enabled: true
kubernetesServiceMonitors:
  enabled: true
kubeApiServer:
  enabled: false
kubelet:
  enabled: true
kubeControllerManager:
  enabled: false
coreDns:
  enabled: false
kubeDns:
  enabled: false
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
kubeProxy:
  enabled: false
kubeStateMetrics:
  enabled: true
nodeExporter:
  enabled: true

prometheusOperator:
  enabled: true
  tls:
    enabled: false

prometheus:
  enabled: true

  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "<storageClassName>"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi

Install the Prometheus stack.

helm upgrade -i prometheus-stack prometheus-community/kube-prometheus-stack \
    --version 77.11.1 \
    -n prometheus-stack \
    --create-namespace \
    -f prometheus-stack-values.yaml

You can verify that the four Prometheus stack pods are running using the following command.

kubectl get pods -n prometheus-stack
Expected output
NAME                                                   READY   STATUS    RESTARTS   AGE
prometheus-prometheus-stack-kube-prom-prometheus-0     2/2     Running   0          96s
prometheus-stack-grafana-575db48fc9-t8m5z              3/3     Running   0          107s
prometheus-stack-kube-prom-operator-7c4fc9bf49-zd625   1/1     Running   0          107s
prometheus-stack-kube-state-metrics-76f45dd6c7-d76nx   1/1     Running   0          107s
prometheus-stack-prometheus-node-exporter-5hsmv        1/1     Running   0          107s

# AMD GPU operator

This section describes how to set up the AMD GPU Operator on a Kubernetes cluster. See AMD GPU Operator / Kubernetes (Helm) for more details.

# Certification

The AMD GPU Operator requires cert-manager to be installed in the cluster. First, add the Jetstack Helm chart repository.

helm repo add jetstack https://charts.jetstack.io
helm repo update jetstack

Create a cert-manager-values.yaml file as shown below, then install cert-manager using this file.

cert-manager-values.yaml
crds:
  enabled: true
helm upgrade -i cert-manager jetstack/cert-manager \
    --version v1.18.3 \
    -n cert-manager \
    --create-namespace \
    -f cert-manager-values.yaml

You can verify that the three cert-manager pods are running using the following command.

kubectl get pods -n cert-manager
Expected output
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-74b7f6cbbc-hc587              1/1     Running   0          5m
cert-manager-cainjector-58c9d76cb8-cgx5t   1/1     Running   0          5m
cert-manager-webhook-5875b545cf-7x8tc      1/1     Running   0          5m

# GPU operator installation

Add the ROCm's GPU Operator Helm chart repository.

helm repo add rocm https://rocm.github.io/gpu-operator
helm repo update rocm

Create a namespace for the AMD GPU Operator.

kubectl create namespace amd-gpu

During the installation of the AMD GPU Operator, the GPU driver image needs to be built and pushed to a Docker registry. For more details, see AMD GPU Operator / Preparing Pre-compiled Driver Images. The private registry mentioned earlier in the "target system" section can be used for this purpose.

Create a Docker registry secret in the amd-gpu namespace to enable access to the private registry. Set the <registry>, <username>, and <password> values to the information for your private registry.

kubectl create secret -n amd-gpu \
    docker-registry private-registry \
    --docker-server=<registry> \
    --docker-username=<username> \
    --docker-password=<password>

Then, create a gpu-operator-values.yaml file with the following content. Please replace <registry> on line 7 with the URL of your private registry. You may also change the image name amdgpu-driver, if necessary, according to your private registry's policies.

gpu-operator-values.yaml
deviceConfig:
  spec:
    driver:
      enable: true
      version: "6.4.3"
      blacklist: true
      image: "<registry>/amdgpu-driver"
      imageRegistrySecret:
        name: private-registry
      imageRegistryTLS:
        insecure: false
        insecureSkipTLSVerify: false
      tolerations: &tolerations
        - key: amd.com/gpu
          operator: Exists
          effect: NoSchedule
    devicePlugin:
      devicePluginTolerations: *tolerations
    metricsExporter:
      prometheus:
        serviceMonitor:
          enable: true
          interval: 10s
          labels:
            release: prometheus-stack
      tolerations: *tolerations

node-feature-discovery:
  worker:
    tolerations: *tolerations

You can install the AMD GPU Operator as follows.

helm upgrade -i gpu-operator rocm/gpu-operator-charts \
    --version v1.4.0 \
    -n amd-gpu \
    -f gpu-operator-values.yaml

Note that installing the operator and GPU driver may take some time. After the installation is complete, you can verify that the gpu-operator pods are running using the following command.

kubectl get pods -n amd-gpu
Expected output
NAME                                                              READY   STATUS    RESTARTS   AGE
default-device-plugin-fxj66                                       1/1     Running   0          108s
default-metrics-exporter-r2l6h                                    1/1     Running   0          108s
default-node-labeller-qhqdl                                       1/1     Running   0          2m35s
gpu-operator-gpu-operator-charts-controller-manager-69856dhd67k   1/1     Running   0          4m20s
gpu-operator-kmm-controller-7b5dd7b48b-fpcv6                      1/1     Running   0          4m20s
gpu-operator-kmm-webhook-server-c7bfc864-tfqdb                    1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-gc-7649c47d5d-55rcn           1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-master-fc889959c-sx7wv        1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-worker-4tnns                  1/1     Running   0          4m20s

# RDMA device plugin

# Host driver and OFED installation

You need to install the device drivers and OFED software for InfiniBand or RoCE NICs on the host OS. Follow the instructions provided by your hardware vendor.

This must be completed before joining the node to the Kubernetes cluster. By running the following command on the host OS, you can verify that the OFED software has been installed correctly and that it recognizes the NICs. If no devices are shown, there is an issue with the installation.

ibv_devices
Expected output
device           node GUID
<device_name>    <16-hex GUID>
<device_name>    <16-hex GUID>
...

# RDMA device plugin installation

This section describes how to install the rdma-shared-device-plugin. See k8s-rdma-shared-dev-plugin / README for more details.

First, create a rdma-shared-device-plugin.yaml file as follows. You need to replace <device> on line 21 with your RDMA NIC's network interface name. If multiple NICs are installed on the server, you must list all interface names (e.g., "devices": ["ib0", "ib1"]).

rdma-shared-device-plugin.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: rdma-devices
  namespace: kube-system
  labels:
    app.kubernetes.io/name: rdma-shared-device-plugin
    app.kubernetes.io/version: v1.5.2
    app.kubernetes.io/instance: rdma-shared-device-plugin
data:
  config.json: |
    {
      "periodicUpdateInterval": 300,
      "configList": [
        {
          "resourcePrefix": "mellanox",
          "resourceName": "hca",
          "rdmaHcaMax": 1000,
          "devices": [
            "<device>"
          ]
        }
      ]
    }

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: rdma-shared-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: rdma-shared-device-plugin
    app.kubernetes.io/version: v1.5.2
    app.kubernetes.io/instance: rdma-shared-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: rdma-shared-device-plugin
      app.kubernetes.io/instance: rdma-shared-device-plugin
  updateStrategy:
    rollingUpdate:
      maxUnavailable: "30%"
  template:
    metadata:
      labels:
        app.kubernetes.io/name: rdma-shared-device-plugin
        app.kubernetes.io/version: v1.5.2
        app.kubernetes.io/instance: rdma-shared-device-plugin
    spec:
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
        - key: amd.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
        - name: device-plugin
          image: ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.5.2
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: plugins-registry
              mountPath: /var/lib/kubelet/plugins_registry
            - name: config
              mountPath: /k8s-rdma-shared-dev-plugin
            - name: devs
              mountPath: /dev/
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: plugins-registry
          hostPath:
            path: /var/lib/kubelet/plugins_registry
        - name: config
          configMap:
            name: rdma-devices
            items:
              - key: config.json
                path: config.json
        - name: devs
          hostPath:
            path: /dev/

Then, create an rdma-shared-device-plugin DaemonSet using the following command.

kubectl apply -f rdma-shared-device-plugin.yaml

You can verify that the rdma-shared-device-plugin pods are running using the following command.

kubectl get pods -n kube-system -l app.kubernetes.io/instance=rdma-shared-device-plugin
Expected output
NAME                              READY   STATUS    RESTARTS   AGE
rdma-shared-device-plugin-wh9fz   1/1     Running   0          7s

# Gateway

Add the Gateway API and Gateway API Inference Extension CRDs.

kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.1.0/manifests.yaml

You can verify the CRDs are installed using the following command.

kubectl get crd | grep 'networking.k8s*.io'
Expected output
gatewayclasses.gateway.networking.k8s.io            2025-12-12T02:03:07Z
gateways.gateway.networking.k8s.io                  2025-12-12T02:03:07Z
grpcroutes.gateway.networking.k8s.io                2025-12-12T02:03:07Z
httproutes.gateway.networking.k8s.io                2025-12-12T02:03:07Z
inferencepools.inference.networking.k8s.io          2025-12-12T02:03:08Z
referencegrants.gateway.networking.k8s.io           2025-12-12T02:03:07Z

You can use any gateway controller compatible with the Gateway API Inference Extension. We recommend using either Istio or Kgateway, and installation instructions for both are provided below.

Add the Istio Helm chart repository.

helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update istio

Install the Istio base chart.

helm upgrade -i istio-base istio/base \
    --version 1.28.1 \
    -n istio-system \
    --create-namespace

Create a istiod-values.yaml file and install the Istio control plane.

istiod-values.yaml
pilot:
  env:
    PILOT_ENABLE_ALPHA_GATEWAY_API: "true"
    ENABLE_GATEWAY_API_INFERENCE_EXTENSION: "true"
helm upgrade -i istiod istio/istiod \
    --version 1.28.1 \
    -n istio-system \
    -f istiod-values.yaml

Install the Kgateway CRDs.

helm upgrade -i kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds \
    --version v2.1.1 \
    -n kgateway-system \
    --create-namespace

Create a kgateway-values.yaml file and install the Kgateway controller.

kgateway-values.yaml
inferenceExtension:
  enabled: true
helm upgrade -i kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
    --version v2.1.1 \
    -n kgateway-system \
    -f kgateway-values.yaml

# Amazon ECR token for Moreh's container image repository

The container images of the MoAI Inference Framework are distributed through a private repository on Amazon ECR 255250787067.dkr.ecr.ap-northeast-2.amazonaws.com. To download them, you need to obtain an authorization token.

You need to have a namespace for deploying and running the MoAI Inference Framework. In this guide, we assume the namespace is named mif.

kubectl create namespace mif

First, store your AWS credentials received from Moreh or another provider as Kubernetes secrets.

kubectl create secret -n mif generic aws-credentials \
    --from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
    --from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

Then, create the following aws-ecr-token-refresher.yaml file and apply it.

aws-ecr-token-refresher.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-ecr-token-refresher
  namespace: mif

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: aws-ecr-token-refresher
  namespace: mif
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "delete", "create", "update", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: aws-ecr-token-refresher
  namespace: mif
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: aws-ecr-token-refresher
subjects:
  - kind: ServiceAccount
    name: aws-ecr-token-refresher
    namespace: mif

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: aws-ecr-token-refresher
  namespace: mif
spec:
  schedule: "0 */6 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          serviceAccountName: aws-ecr-token-refresher
          containers:
            - name: aws-ecr-token-refresher
              image: heyvaldemar/aws-kubectl:58dad7caa5986ceacd1bc818010a5e132d80452b
              command:
                - bash
                - -c
                - |
                  kubectl create secret -n ${NAMESPACE} docker-registry moreh-registry \
                    --docker-server=255250787067.dkr.ecr.ap-northeast-2.amazonaws.com \
                    --docker-username=AWS \
                    --docker-password=$(aws ecr get-login-password --region ${AWS_REGION}) \
                    --dry-run=client -o yaml | \
                    kubectl apply -f -

                  echo "ECR token refreshed at $(date)"
              env:
                - name: NAMESPACE
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.namespace
                - name: AWS_REGION
                  value: ap-northeast-2
                - name: AWS_ACCESS_KEY_ID
                  valueFrom:
                    secretKeyRef:
                      name: aws-credentials
                      key: AWS_ACCESS_KEY_ID
                - name: AWS_SECRET_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: aws-credentials
                      key: AWS_SECRET_ACCESS_KEY
kubectl apply -f aws-ecr-token-refresher.yaml

This will create a CronJob that refreshes the ECR token every 6 hours. You can verify that the CronJob has been created by running the following command.

kubectl get cronjobs -n mif
Expected output
NAME                      SCHEDULE      TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
aws-ecr-token-refresher   0 */6 * * *   <none>     False     0        <none>          5s

In addition, run the following command to execute the Job once immediately and create the initial moreh-registry secret.

kubectl create job -n mif initial-aws-ecr-token-refresh \
  --from=cronjob/aws-ecr-token-refresher

You can check whether the moreh-registry secret has been created using the following command.

kubectl get secret -n mif moreh-registry
Expected output
NAME             TYPE                             DATA   AGE
moreh-registry   kubernetes.io/dockerconfigjson   1      101s