#
Prerequisites
This document introduces the prerequisites for the MoAI Inference Framework and provides instructions on how to install them.
To follow this document, you need to understand the configuration of the Kubernetes cluster where the MoAI Inference Framework will be installed. Since Moreh provides support for installing the MoAI Inference Framework at customer sites, if you encounter any difficulties, you can request assistant from the Moreh team.
#
Target system
To install the MoAI Inference Framework, you must have
- Kubernetes 1.26 or later
- At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
cluster-adminprivilege for the Kubernetes cluster- A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics)
- A Docker private registry accessible from the Kubernetes cluster
#
Monitoring components
For the monitoring features of the MoAI Inference Framework, you need to install the Prometheus, Prometheus Operator, Node Exporter, and Grafana using the kube-prometheus-stack Helm chart. First, add the Prometheus Community Helm chart repository.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update prometheus-community
Create a prometheus-stack-values.yaml file as follows.
- The Prometheus stack installs many components by default, but by configuring it as shown below, you can disable the unnecessary ones and achieve a minimal installation.
- To create a volume for storing the metrics collected by Prometheus, you need to replace
<storageClassName>on line 45 with the name of your own StorageClass.
Tip
You can check the name of your StorageClass using the kubectl get sc command.
defaultRules:
create: false
windowsMonitoring:
enabled: false
alertmanager:
enabled: false
grafana:
enabled: true
kubernetesServiceMonitors:
enabled: true
kubeApiServer:
enabled: false
kubelet:
enabled: true
kubeControllerManager:
enabled: false
coreDns:
enabled: false
kubeDns:
enabled: false
kubeEtcd:
enabled: false
kubeScheduler:
enabled: false
kubeProxy:
enabled: false
kubeStateMetrics:
enabled: true
nodeExporter:
enabled: true
prometheusOperator:
enabled: true
tls:
enabled: false
prometheus:
enabled: true
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "<storageClassName>"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
Install the Prometheus stack.
helm upgrade -i prometheus-stack prometheus-community/kube-prometheus-stack \
--version 77.11.1 \
-n prometheus-stack \
--create-namespace \
-f prometheus-stack-values.yaml
You can verify that the four Prometheus stack pods are running using the following command.
kubectl get pods -n prometheus-stack
NAME READY STATUS RESTARTS AGE
prometheus-prometheus-stack-kube-prom-prometheus-0 2/2 Running 0 96s
prometheus-stack-grafana-575db48fc9-t8m5z 3/3 Running 0 107s
prometheus-stack-kube-prom-operator-7c4fc9bf49-zd625 1/1 Running 0 107s
prometheus-stack-kube-state-metrics-76f45dd6c7-d76nx 1/1 Running 0 107s
prometheus-stack-prometheus-node-exporter-5hsmv 1/1 Running 0 107s
#
AMD GPU operator
This section describes how to set up the AMD GPU Operator on a Kubernetes cluster. See AMD GPU Operator / Kubernetes (Helm) for more details.
#
Certification
The AMD GPU Operator requires cert-manager to be installed in the cluster. First, add the Jetstack Helm chart repository.
helm repo add jetstack https://charts.jetstack.io
helm repo update jetstack
Create a cert-manager-values.yaml file as shown below, then install cert-manager using this file.
crds:
enabled: true
helm upgrade -i cert-manager jetstack/cert-manager \
--version v1.18.3 \
-n cert-manager \
--create-namespace \
-f cert-manager-values.yaml
You can verify that the three cert-manager pods are running using the following command.
kubectl get pods -n cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-74b7f6cbbc-hc587 1/1 Running 0 5m
cert-manager-cainjector-58c9d76cb8-cgx5t 1/1 Running 0 5m
cert-manager-webhook-5875b545cf-7x8tc 1/1 Running 0 5m
#
GPU operator installation
Add the ROCm's GPU Operator Helm chart repository.
helm repo add rocm https://rocm.github.io/gpu-operator
helm repo update rocm
Create a namespace for the AMD GPU Operator.
kubectl create namespace amd-gpu
During the installation of the AMD GPU Operator, the GPU driver image needs to be built and pushed to a Docker registry. For more details, see AMD GPU Operator / Preparing Pre-compiled Driver Images. The private registry mentioned earlier in the "target system" section can be used for this purpose.
Create a Docker registry secret in the amd-gpu namespace to enable access to the private registry. Set the <registry>, <username>, and <password> values to the information for your private registry.
kubectl create secret -n amd-gpu \
docker-registry private-registry \
--docker-server=<registry> \
--docker-username=<username> \
--docker-password=<password>
Then, create a gpu-operator-values.yaml file with the following content. Please replace <registry> on line 7 with the URL of your private registry. You may also change the image name amdgpu-driver, if necessary, according to your private registry's policies.
deviceConfig:
spec:
driver:
enable: true
version: "6.4.3"
blacklist: true
image: "<registry>/amdgpu-driver"
imageRegistrySecret:
name: private-registry
imageRegistryTLS:
insecure: false
insecureSkipTLSVerify: false
tolerations: &tolerations
- key: amd.com/gpu
operator: Exists
effect: NoSchedule
devicePlugin:
devicePluginTolerations: *tolerations
metricsExporter:
prometheus:
serviceMonitor:
enable: true
interval: 10s
labels:
release: prometheus-stack
tolerations: *tolerations
node-feature-discovery:
worker:
tolerations: *tolerations
If the AMD GPU Operator is already installed on your system, verify that the toleration key is set to amd.com/gpu. MoAI Inference Framework assumes this name.
You can install the AMD GPU Operator as follows.
helm upgrade -i gpu-operator rocm/gpu-operator-charts \
--version v1.4.0 \
-n amd-gpu \
-f gpu-operator-values.yaml
Note that installing the operator and GPU driver may take some time. After the installation is complete, you can verify that the gpu-operator pods are running using the following command.
kubectl get pods -n amd-gpu
NAME READY STATUS RESTARTS AGE
default-device-plugin-fxj66 1/1 Running 0 108s
default-metrics-exporter-r2l6h 1/1 Running 0 108s
default-node-labeller-qhqdl 1/1 Running 0 2m35s
gpu-operator-gpu-operator-charts-controller-manager-69856dhd67k 1/1 Running 0 4m20s
gpu-operator-kmm-controller-7b5dd7b48b-fpcv6 1/1 Running 0 4m20s
gpu-operator-kmm-webhook-server-c7bfc864-tfqdb 1/1 Running 0 4m20s
gpu-operator-node-feature-discovery-gc-7649c47d5d-55rcn 1/1 Running 0 4m20s
gpu-operator-node-feature-discovery-master-fc889959c-sx7wv 1/1 Running 0 4m20s
gpu-operator-node-feature-discovery-worker-4tnns 1/1 Running 0 4m20s
Tip
You can monitor the installation progress using the kubectl get pods -n amd-gpu -w command instead.
#
RDMA device plugin
#
Host driver and OFED installation
You need to install the device drivers and OFED software for InfiniBand or RoCE NICs on the host OS. Follow the instructions provided by your hardware vendor.
This must be completed before joining the node to the Kubernetes cluster. By running the following command on the host OS, you can verify that the OFED software has been installed correctly and that it recognizes the NICs. If no devices are shown, there is an issue with the installation.
ibv_devices
device node GUID
<device_name> <16-hex GUID>
<device_name> <16-hex GUID>
...
#
RDMA device plugin installation
This section describes how to install the rdma-shared-device-plugin. See k8s-rdma-shared-dev-plugin / README for more details.
First, create a rdma-shared-device-plugin.yaml file as follows. You need to replace <device> on line 21 with your RDMA NIC's network interface name. If multiple NICs are installed on the server, you must list all interface names (e.g., "devices": ["ib0", "ib1"]).
Tip
You can check the network interface names using the ip addr command.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rdma-devices
namespace: kube-system
labels:
app.kubernetes.io/name: rdma-shared-device-plugin
app.kubernetes.io/version: v1.5.2
app.kubernetes.io/instance: rdma-shared-device-plugin
data:
config.json: |
{
"periodicUpdateInterval": 300,
"configList": [
{
"resourcePrefix": "mellanox",
"resourceName": "hca",
"rdmaHcaMax": 1000,
"devices": [
"<device>"
]
}
]
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: rdma-shared-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/name: rdma-shared-device-plugin
app.kubernetes.io/version: v1.5.2
app.kubernetes.io/instance: rdma-shared-device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/name: rdma-shared-device-plugin
app.kubernetes.io/instance: rdma-shared-device-plugin
updateStrategy:
rollingUpdate:
maxUnavailable: "30%"
template:
metadata:
labels:
app.kubernetes.io/name: rdma-shared-device-plugin
app.kubernetes.io/version: v1.5.2
app.kubernetes.io/instance: rdma-shared-device-plugin
spec:
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- key: amd.com/gpu
operator: Exists
effect: NoSchedule
containers:
- name: device-plugin
image: ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.5.2
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: plugins-registry
mountPath: /var/lib/kubelet/plugins_registry
- name: config
mountPath: /k8s-rdma-shared-dev-plugin
- name: devs
mountPath: /dev/
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: plugins-registry
hostPath:
path: /var/lib/kubelet/plugins_registry
- name: config
configMap:
name: rdma-devices
items:
- key: config.json
path: config.json
- name: devs
hostPath:
path: /dev/
If the RDMA device plugin is already installed on your system, verify that the resource name is set to mellanox/hca. MoAI Inference Framework assumes this name. This does not imply that the actual hardware vendor must be Mellanox.
Then, create an rdma-shared-device-plugin DaemonSet using the following command.
kubectl apply -f rdma-shared-device-plugin.yaml
You can verify that the rdma-shared-device-plugin pods are running using the following command.
kubectl get pods -n kube-system -l app.kubernetes.io/instance=rdma-shared-device-plugin
NAME READY STATUS RESTARTS AGE
rdma-shared-device-plugin-wh9fz 1/1 Running 0 7s
#
Gateway
Add the Gateway API and Gateway API Inference Extension CRDs.
kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.1.0/manifests.yaml
You can verify the CRDs are installed using the following command.
kubectl get crd | grep 'networking.k8s*.io'
gatewayclasses.gateway.networking.k8s.io 2025-12-12T02:03:07Z
gateways.gateway.networking.k8s.io 2025-12-12T02:03:07Z
grpcroutes.gateway.networking.k8s.io 2025-12-12T02:03:07Z
httproutes.gateway.networking.k8s.io 2025-12-12T02:03:07Z
inferencepools.inference.networking.k8s.io 2025-12-12T02:03:08Z
referencegrants.gateway.networking.k8s.io 2025-12-12T02:03:07Z
You can use any gateway controller compatible with the Gateway API Inference Extension. We recommend using either Istio or Kgateway, and installation instructions for both are provided below.
Add the Istio Helm chart repository.
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update istio
Install the Istio base chart.
helm upgrade -i istio-base istio/base \
--version 1.28.1 \
-n istio-system \
--create-namespace
Create a istiod-values.yaml file and install the Istio control plane.
pilot:
env:
PILOT_ENABLE_ALPHA_GATEWAY_API: "true"
ENABLE_GATEWAY_API_INFERENCE_EXTENSION: "true"
helm upgrade -i istiod istio/istiod \
--version 1.28.1 \
-n istio-system \
-f istiod-values.yaml
Install the Kgateway CRDs.
helm upgrade -i kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds \
--version v2.1.1 \
-n kgateway-system \
--create-namespace
Create a kgateway-values.yaml file and install the Kgateway controller.
inferenceExtension:
enabled: true
helm upgrade -i kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
--version v2.1.1 \
-n kgateway-system \
-f kgateway-values.yaml
#
Amazon ECR token for Moreh's container image repository
The container images of the MoAI Inference Framework are distributed through a private repository on Amazon ECR 255250787067.dkr.ecr.ap-northeast-2.amazonaws.com. To download them, you need to obtain an authorization token.
The AWS credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ADDRESS_KEY) should have been provided to you along with your purchase or trial issuance of the MoAI Inference Framework. If you did not receive this information, please contact your point of purchase separately.
You need to have a namespace for deploying and running the MoAI Inference Framework. In this guide, we assume the namespace is named mif.
kubectl create namespace mif
First, store your AWS credentials received from Moreh or another provider as Kubernetes secrets.
kubectl create secret -n mif generic aws-credentials \
--from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
--from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
Then, create the following aws-ecr-token-refresher.yaml file and apply it.
apiVersion: v1
kind: ServiceAccount
metadata:
name: aws-ecr-token-refresher
namespace: mif
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: aws-ecr-token-refresher
namespace: mif
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "delete", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: aws-ecr-token-refresher
namespace: mif
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: aws-ecr-token-refresher
subjects:
- kind: ServiceAccount
name: aws-ecr-token-refresher
namespace: mif
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: aws-ecr-token-refresher
namespace: mif
spec:
schedule: "0 */6 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
serviceAccountName: aws-ecr-token-refresher
containers:
- name: aws-ecr-token-refresher
image: heyvaldemar/aws-kubectl:58dad7caa5986ceacd1bc818010a5e132d80452b
command:
- bash
- -c
- |
kubectl create secret -n ${NAMESPACE} docker-registry moreh-registry \
--docker-server=255250787067.dkr.ecr.ap-northeast-2.amazonaws.com \
--docker-username=AWS \
--docker-password=$(aws ecr get-login-password --region ${AWS_REGION}) \
--dry-run=client -o yaml | \
kubectl apply -f -
echo "ECR token refreshed at $(date)"
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: AWS_REGION
value: ap-northeast-2
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-credentials
key: AWS_SECRET_ACCESS_KEY
kubectl apply -f aws-ecr-token-refresher.yaml
This will create a CronJob that refreshes the ECR token every 6 hours. You can verify that the CronJob has been created by running the following command.
kubectl get cronjobs -n mif
NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
aws-ecr-token-refresher 0 */6 * * * <none> False 0 <none> 5s
In addition, run the following command to execute the Job once immediately and create the initial moreh-registry secret.
kubectl create job -n mif initial-aws-ecr-token-refresh \
--from=cronjob/aws-ecr-token-refresher
You can check whether the moreh-registry secret has been created using the following command.
kubectl get secret -n mif moreh-registry
NAME TYPE DATA AGE
moreh-registry kubernetes.io/dockerconfigjson 1 101s