# Log collection

This document explains how to enable centralized log collection for the MoAI Inference Framework using Loki (log aggregation) and Vector (log collection agent).

# Overview

flowchart TB
    pods["`**Inference Service Pods**`"]
    vector["`**Vector**`"]
    grafana["`**Grafana**`"]

    subgraph log_storage["Log Storage"]
        loki["`**Loki**`"]
        minio[("**MinIO**")]
    end

    pods -->|"container logs"| vector
    vector -->|"transforms + labels"| loki
    loki -->|"LogQL"| grafana

# Architecture details

# Loki

Property Value
Helm chart grafana/loki v6.30.0
App version 3.5.1
Storage backend S3 (MinIO), TSDB index
Retention 90 days (2160 h)
Ingestion limit 30 MB/s, 60 MB burst
Max entries/query 50 000
Deployment Distributed (gateway / read / write / backend)

# Vector

Property Value
Helm chart vector/vector v0.39.0
Deployment DaemonSet (Agent mode, one pod per node)
Log source Pods labelled mif.moreh.io/log.collect=true (kubernetes_logs)
Log format JSON parsing applied only to pods labelled mif.moreh.io/log.format=json
Tolerations unschedulable, compute, amd.com/gpu

# MinIO

Property Value
Helm chart minio/minio v5.4.0
Mode Standalone
Bucket loki (created via post-install Job on startup)
Loki credentials Dedicated loki user with S3 policy scoped to loki bucket
Resources 2 Gi memory (requests)
Persistence emptyDir (ephemeral by default)
Deployment Single pod

# Component naming

Service names are derived from the Helm release name. With the default release name mif:

Service Name (same-namespace access)
MinIO mif-minio
Loki gateway mif-loki-gateway
Loki read mif-loki-read
Loki write mif-loki-write

Vector connects to Loki using the release-prefixed service name since all components are co-located in the same namespace.


# Prerequisites

  • The moai-inference-framework Helm chart installed (or being installed).

# Installation

Log collection is installed as part of the moai-inference-framework Helm chart. See Prerequisites for the required values and install command.


# Verifying the installation

Check that all Loki components are running.

kubectl get pods -n mif -l app.kubernetes.io/name=loki
Expected output (all pods Running)
NAME                           READY   STATUS    RESTARTS   AGE
loki-backend-0                 1/1     Running   0          2m
loki-gateway-xxxxxxxxx-xxxxx   1/1     Running   0          2m
loki-read-xxxxxxxxx-xxxxx      1/1     Running   0          2m
loki-write-0                   1/1     Running   0          2m

Check that Vector is running on all nodes.

kubectl get pods -n mif -l app.kubernetes.io/name=vector
Expected output (one pod per node, all Running)
NAME           READY   STATUS    RESTARTS   AGE
vector-xxxxx   1/1     Running   0          2m
vector-yyyyy   1/1     Running   0          2m

Check Vector logs to confirm it is shipping to Loki without errors.

kubectl logs -n mif -l app.kubernetes.io/name=vector --tail=50

# Enabling log collection for a pod

Vector collects logs only from pods that explicitly opt in. Two pod labels control this behavior.

# Opt-in label

Add the mif.moreh.io/log.collect=true label to a pod to include its logs in Vector's collection. Pods without this label are ignored entirely.

metadata:
  labels:
    mif.moreh.io/log.collect: "true"

# Log format label

Add the mif.moreh.io/log.format=json label to enable structured JSON log parsing for a pod. When set, Vector parses each log line as JSON and promotes the following fields:

JSON field Mapped to
msg message
time timestamp
level level (Loki label)
others merged into the event

Without this label, the log line is forwarded as-is without any JSON parsing.

metadata:
  labels:
    mif.moreh.io/log.collect: "true"
    mif.moreh.io/log.format: "json"

# Searching logs in Grafana

# Accessing Grafana

If you have not yet accessed Grafana, follow the Accessing Grafana guide to retrieve admin credentials, set up port forwarding, and log in.

# Opening the Explore view

After logging in to Grafana, click on the Explore icon (compass) in the left sidebar. You will see the Explore view with a query editor:

Grafana Explore view
Grafana Explore view

# Selecting the Loki datasource

If the datasource is not already set to Loki, click the datasource dropdown at the top of the page and select Loki:

Selecting the Loki datasource
Selecting the Loki datasource

# Switching to Code mode

The query editor defaults to Builder mode, which provides a visual query builder. To write LogQL queries directly, click the Code button to switch to Code mode:

Switching to Code mode
Switching to Code mode

# Running a log query

Enter a LogQL query in the query editor and click Run query (or press Shift+Enter). For example, {namespace="default"} returns all logs from the default namespace. The screenshot below shows the results, which include both plain-text and JSON-formatted logs collected from different pods:

Log search results in Grafana
Log search results in Grafana

# Labels available for log search

Vector enriches every log entry with the following labels, which can be used as LogQL selectors:

Label Source Example value
namespace kubernetes.pod_namespace default
inference_service pod label app.kubernetes.io/instance llama-3-2-1b
pool_name pod label mif.moreh.io/pool heimdall
role pod label mif.moreh.io/role prefill, decode
app pod label app.kubernetes.io/name vllm
node_name VECTOR_SELF_NODE_NAME env var (injected by Vector) gpu-node-01
level parsed from JSON log field level (pods with mif.moreh.io/log.format=json only) info, warn, error

# Query examples

Filter by a single label:

{namespace="default"}
{inference_service="llama-3-2-1b"}
{pool_name="heimdall"}
{role="decode"}

Combine multiple labels and search for a keyword in the log line:

{namespace="default", inference_service="llama-3-2-1b", role="prefill"} |= "error"

Filter by log level (available only for JSON-formatted pods):

{namespace="default", level="error"}

# Using an external MinIO

If MinIO is already deployed outside this chart, set minio.enabled: false and configure lokiBucket with the host and credentials of a MinIO user that has read/write access to the loki bucket.

Same namespace — if the existing MinIO service name matches <release>-minio, only credentials are required:

moai-inference-framework-values.yaml
minio:
  enabled: false
lokiBucket:
  accessKey: <accessKey>
  secretKey: <secretKey>

Different namespace — set lokiBucket.host to the FQDN so that Loki can resolve it cross-namespace:

moai-inference-framework-values.yaml
minio:
  enabled: false
lokiBucket:
  host: <minio.minio.svc.cluster.local>
  accessKey: <accessKey>
  secretKey: <secretKey>

# Disabling log collection

moai-inference-framework-values.yaml
minio:
  enabled: false
loki:
  enabled: false
vector:
  enabled: false