# Prerequisites

# Kubernetes tools

# kubectl

This section describes how to install kubectl. See Kubernetes / Install and Set Up kubectl on Linux for more details.

You can install kubectl binary with curl on Linux as follows. Please replace <kubernetesVersion> and <kubeconfigPath> with your desired Kubernetes version and the path to your kubeconfig file, respectively.

KUBECTL_VERSION=<kubernetesVersion>
curl -LO https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
export KUBECONFIG=<kubeconfigPath>

You can verify the installation by running the following command. Note that the printed version may vary depending on your cluster version.

kubectl version
Expected output
Client Version: v1.32.9
Kustomize Version: v5.5.0
Server Version: v1.32.8

# Helm

You can install Helm by running the following command. See Helm / Installing Helm for more details.

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

You can verify the installation by running the following command. Note that the printed version may vary depending on the Helm version installed.

helm version
Expected output
version.BuildInfo{Version:"v3.19.0", GitCommit:"3d8990f0836691f0229297773f3524598f46bda6", GitTreeState:"clean", GoVersion:"go1.24.7"}

# Monitoring components

For the monitoring features of the MoAI Inference Framework, you need to install the Prometheus, Prometheus Operator, Node Exporter, and Grafana using the kube-prometheus-stack Helm chart. First, add the Prometheus Community Helm chart repository.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update prometheus-community

Since the Prometheus stack installs many components by default, we recommend disabling unnecessary ones to achieve a minimal installation. Create a prometheus-stack-values.yaml file as follows. Please replace <storageClassName> with your own StorageClass name.

prometheus-stack-values.yaml
defaultRules:
  create: false

windowsMonitoring:
  enabled: false
alertmanager:
  enabled: false

grafana:
  enabled: true

kubernetesServiceMonitors:
  enabled: false
kubeApiServer:
  enabled: false
kubelet:
  enabled: false
kubeControllerManager:
  enabled: false
coreDns:
  enabled: false
kubeDns:
  enabled: false
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
kubeProxy:
  enabled: false
kubeStateMetrics:
  enabled: false

nodeExporter:
  enabled: true

prometheusOperator:
  enabled: true
  tls:
    enabled: false

prometheus:
  enabled: true

  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "<storageClassName>"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi

Install the Prometheus stack.

helm upgrade -i prometheus-stack prometheus-community/kube-prometheus-stack \
    --version 77.11.1 \
    -n prometheus-stack \
    --create-namespace \
    -f prometheus-stack-values.yaml

You can verify that the Prometheus stack pods are running using the following command.

kubectl get pods -n prometheus-stack
Expected output
NAME                                                   READY   STATUS    RESTARTS   AGE
prometheus-prometheus-stack-kube-prom-prometheus-0     2/2     Running   0          100s
prometheus-stack-grafana-7c655db89f-9ltch              3/3     Running   0          116s
prometheus-stack-kube-prom-operator-56d44cb7db-8w5v5   1/1     Running   0          116s
prometheus-stack-prometheus-node-exporter-ppsgg        1/1     Running   0          116s

# AMD GPU operator

This section describes how to set up the AMD GPU Operator on a Kubernetes cluster. See AMD GPU Operator / Kubernetes (Helm) for more details.

# Certification

The AMD GPU Operator requires cert-manager to be installed in the cluster. First, add the Jetstack Helm chart repository.

helm repo add jetstack https://charts.jetstack.io
helm repo update jetstack

Create a cert-manager-values.yaml file as shown below, then install cert-manager using this file.

cert-manager-values.yaml
crds:
  enabled: true
helm upgrade -i cert-manager jetstack/cert-manager \
    --version v1.18.3 \
    -n cert-manager \
    --create-namespace \
    -f cert-manager-values.yaml

You can verify that the cert-manager pods are running using the following command.

kubectl get pods -n cert-manager
Expected output
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-74b7f6cbbc-hc587              1/1     Running   0          5m
cert-manager-cainjector-58c9d76cb8-cgx5t   1/1     Running   0          5m
cert-manager-webhook-5875b545cf-7x8tc      1/1     Running   0          5m

# GPU operator installation

Add the ROCm's GPU Operator Helm chart repository.

helm repo add rocm https://rocm.github.io/gpu-operator
helm repo update rocm

Create a namespace for the AMD GPU Operator.

kubectl create namespace amd-gpu

Create a Docker registry secret in the amd-gpu namespace. Please replace <registry>, <username>, and <password> with your own values.

kubectl create secret -n amd-gpu \
    docker-registry private-registry \
    --docker-server=<registry> \
    --docker-username=<username> \
    --docker-password=<password>

Create a gpu-operator-values.yaml file with the following content. Please replace <registry> and <repository> with your own values.

gpu-operator-values.yaml
deviceConfig:
  spec:
    driver:
      enable: true
      version: "6.4.3"
      blacklist: true
      image: "<registry>/<repository>"
      imageRegistrySecret:
        name: private-registry
      imageRegistryTLS:
        insecure: false
        insecureSkipTLSVerify: false
      tolerations: &tolerations
        - key: amd.com/gpu
          operator: Exists
          effect: NoSchedule
    devicePlugin:
      devicePluginTolerations: *tolerations
    metricsExporter:
      prometheus:
        serviceMonitor:
          enabled: true
          interval: 10s
          labels:
            release: prometheus-stack
      tolerations: *tolerations

node-feature-discovery:
  worker:
    tolerations: *tolerations

You can install the AMD GPU Operator as follows.

helm upgrade -i gpu-operator rocm/gpu-operator-charts \
    --version v1.4.0 \
    -n amd-gpu \
    -f gpu-operator-values.yaml

You can verify that the gpu-operator pods are running using the following command.

kubectl get pods -n amd-gpu
Expected output
NAME                                                              READY   STATUS    RESTARTS   AGE
default-device-plugin-fxj66                                       1/1     Running   0          108s
default-metrics-exporter-r2l6h                                    1/1     Running   0          108s
default-node-labeller-qhqdl                                       1/1     Running   0          2m35s
gpu-operator-gpu-operator-charts-controller-manager-69856dhd67k   1/1     Running   0          4m20s
gpu-operator-kmm-controller-7b5dd7b48b-fpcv6                      1/1     Running   0          4m20s
gpu-operator-kmm-webhook-server-c7bfc864-tfqdb                    1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-gc-7649c47d5d-55rcn           1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-master-fc889959c-sx7wv        1/1     Running   0          4m20s
gpu-operator-node-feature-discovery-worker-4tnns                  1/1     Running   0          4m20s

# RDMA device plugin

# Host driver installation

You need to install the device drivers and OFED software for InfiniBand or RoCE NICs on the host OS. This must be completed before joining the node to the Kubernetes cluster. It is recommended to follow the instructions provided by your hardware vendor.

Check the IB NIC has been properly detected.

lspci -vvv | grep -i Mellanox | grep -i ConnectX

Set the environment variables.

OS_VER=$(. /etc/os-release;echo $ID$VERSION_ID)
KERNEL_VERSION=$(uname -r)
BASE_URL=https://content.mellanox.com/ofed
OFED_VER=23.10-3.2.2.0
OFED_TGZ_FILE=MLNX_OFED_LINUX-$OFED_VER-$OS_VER-x86_64.tgz
OFED_PREFIX_DIR=MLNX_OFED-$OFED_VER
OFED_DIR=MLNX_OFED_LINUX-$OFED_VER-$OS_VER-x86_64

Install the required dependency packages.

sudo apt-get update -y
sudo apt-get install -y lm-sensors xfsprogs net-tools libnuma-dev \
  ocl-icd-opencl-dev sqlite3 libsqlite3-dev libboost-all-dev libbz2-dev \
  openmpi-bin libtinfo-dev universal-ctags cscope nmon sox google-perftools \
  libssl-dev pstack libomp-dev libmsgpack-dev clang llvm llvm-12-dev \
  clang-format-12 libclang-12-dev libstdc++-12-dev

Download and install the OFED driver.

wget $BASE_URL/$OFED_PREFIX_DIR/$OFED_TGZ_FILE
tar -xzvf $OFED_TGZ_FILE
cd ./$OFED_DIR
./mlnxofedinstall --without-fw-update --all --ovs-dpdk --upstream-libs --with-nfsrdma --without-ucx --without-openmpi --force --kernel $KERNEL_VERSION

Check the RoCE NIC has been properly detected.

lspci -vvv | grep -i 'Broadcom' | grep -i 'Ethernet controller'

Install the required dependency packages.

sudo apt-get update -y
sudo apt-get install -y ca-certificates htop net-tools vim zip wget curl \
  iputils-ping pciutils python3 infiniband-diags iproute2 binutils perftest \
  git make autoconf sudo libtool g++ bc

Download and install the RoCE NIC driver.

wget https://docs.broadcom.com/docs-and-downloads/ethernet-network-adapters/NXE/Thor2/GCA2/bcm5760x_231.2.63.0a.zip
unzip ./bcm5760x_231.2.63.0a.zip
cd ./bcm5760x_231.2.63.0a/utils/linux_installer
sudo bash ./install.sh -i \
  "$(lspci -vvv 2> /dev/null | grep 'Broadcom' | grep 'Ethernet controller' | cut -d ' ' -f 1)"

# RDMA device plugin installation

This section describes how to install the rdma-shared-device-plugin. See k8s-rdma-shared-dev-plugin / README for more details.

First, create a rdma-shared-device-plugin.yaml file as follows. Please replace <device> with your RDMA NIC's network interface name.

rdma-shared-device-plugin.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: rdma-devices
  namespace: kube-system
  labels:
    app.kubernetes.io/name: rdma-shared-device-plugin
    app.kubernetes.io/version: v1.5.2
    app.kubernetes.io/instance: rdma-shared-device-plugin
data:
  config.json: |
    {
      "periodicUpdateInterval": 300,
      "configList": [
        {
          "resourcePrefix": "mellanox",
          "resourceName": "hca",
          "rdmaHcaMax": 1000,
          "devices": [
            "<device>"
          ]
        }
      ]
    }

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: rdma-shared-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: rdma-shared-device-plugin
    app.kubernetes.io/version: v1.5.2
    app.kubernetes.io/instance: rdma-shared-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: rdma-shared-device-plugin
      app.kubernetes.io/instance: rdma-shared-device-plugin
  updateStrategy:
    rollingUpdate:
      maxUnavailable: "30%"
  template:
    metadata:
      labels:
        app.kubernetes.io/name: rdma-shared-device-plugin
        app.kubernetes.io/version: v1.5.2
        app.kubernetes.io/instance: rdma-shared-device-plugin
    spec:
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
        - key: amd.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
        - name: device-plugin
          image: ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.5.2
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: plugins-registry
              mountPath: /var/lib/kubelet/plugins_registry
            - name: config
              mountPath: /k8s-rdma-shared-dev-plugin
            - name: devs
              mountPath: /dev/
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: plugins-registry
          hostPath:
            path: /var/lib/kubelet/plugins_registry
        - name: config
          configMap:
            name: rdma-devices
            items:
              - key: config.json
                path: config.json
        - name: devs
          hostPath:
            path: /dev/

Then, create an rdma-shared-device-plugin DaemonSet using the following command.

kubectl apply -f rdma-shared-device-plugin.yaml

You can verify that the rdma-shared-device-plugin pods are running using the following command.

kubectl get pods -n kube-system -l app.kubernetes.io/instance=rdma-shared-device-plugin

# Gateway

Add the Gateway API and Gateway API Inference Extension CRDs.

kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.1.0/manifests.yaml

You can use any gateway controller compatible with the Gateway API Inference Extension. We recommend using either Istio or Kgateway, and installation instructions for both are provided below.

Add the Istio Helm chart repository.

helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update istio

Install the Istio base chart.

helm upgrade -i istio-base istio/base \
    --version 1.28.0 \
    -n istio-system \
    --create-namespace

Create a istiod-values.yaml file and install the Istio control plane.

istiod-values.yaml
pilot:
  env:
    PILOT_ENABLE_ALPHA_GATEWAY_API: "true"
    ENABLE_GATEWAY_API_INFERENCE_EXTENSION: "true"
helm upgrade -i istiod istio/istiod \
    --version 1.28.0 \
    -n istio-system \
    -f istiod-values.yaml

Install the Kgateway CRDs.

helm upgrade -i kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds \
    --version v2.1.1 \
    -n kgateway-system \
    --create-namespace

Create a kgateway-values.yaml file and install the Kgateway controller.

kgateway-values.yaml
inferenceExtension:
  enabled: true
helm upgrade -i kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
    --version v2.1.1 \
    -n kgateway-system \
    -f kgateway-values.yaml

# Amazon ECR token for Moreh's container image repository

The container images of the MoAI Inference Framework are distributed through a private repository on Amazon ECR. To download them, you need to obtain an authorization token. First, store your AWS credentials as Kubernetes secrets as follows.

kubectl create namespace mif
kubectl create secret -n mif generic aws-credentials \
    --from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
    --from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

Then, create the following aws-ecr-token-refresher.yaml file and apply it.

aws-ecr-token-refresher.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-ecr-token-refresher
  namespace: mif

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: aws-ecr-token-refresher
  namespace: mif
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "delete", "create", "update", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: aws-ecr-token-refresher
  namespace: mif
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: aws-ecr-token-refresher
subjects:
  - kind: ServiceAccount
    name: aws-ecr-token-refresher
    namespace: mif

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: aws-ecr-token-refresher
  namespace: mif
spec:
  schedule: "0 */6 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          serviceAccountName: aws-ecr-token-refresher
          containers:
            - name: aws-ecr-token-refresher
              image: heyvaldemar/aws-kubectl:58dad7caa5986ceacd1bc818010a5e132d80452b
              command:
                - bash
                - -c
                - |
                  kubectl create secret -n ${NAMESPACE} docker-registry moreh-registry \
                    --docker-server=255250787067.dkr.ecr.ap-northeast-2.amazonaws.com \
                    --docker-username=AWS \
                    --docker-password=$(aws ecr get-login-password --region ${AWS_REGION}) \
                    --dry-run=client -o yaml | \
                    kubectl apply -f -

                  echo "ECR token refreshed at $(date)"
              env:
                - name: NAMESPACE
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.namespace
                - name: AWS_REGION
                  value: ap-northeast-2
                - name: AWS_ACCESS_KEY_ID
                  valueFrom:
                    secretKeyRef:
                      name: aws-credentials
                      key: AWS_ACCESS_KEY_ID
                - name: AWS_SECRET_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: aws-credentials
                      key: AWS_SECRET_ACCESS_KEY
kubectl apply -f aws-ecr-token-refresher.yaml

This CronJob runs every 6 hours to refresh the ECR token. To create the initial moreh-registry secret, you can run the following command.

kubectl create job -n mif initial-aws-ecr-token-refresh \
  --from=cronjob/aws-ecr-token-refresher

You can check whether the moreh-registry secret has been created using the following command.

kubectl get secret -n mif moreh-registry
Expected output
NAME             TYPE                             DATA   AGE
moreh-registry   kubernetes.io/dockerconfigjson   1      101s