Prometheus Operator

Prometheus Operator in Thalassa Cloud Kubernetes

Prometheus Operator makes it easy to set up and manage monitoring with Prometheus in Kubernetes. With it, you can collect application metrics, set up alerts, and use dashboards—all as Kubernetes resources.

This guide will show you how to install Prometheus Operator, collect metrics, set up alerts with Alertmanager, and view dashboards in Grafana.

What is Prometheus Operator?

Prometheus Operator lets you manage Prometheus, Alertmanager, and related tools using Kubernetes resources. Instead of manual setup, you define what you want to monitor with Kubernetes CRDs, and the operator handles the rest.

Key features include:

  • Automatic discovery of what needs monitoring (pods, services, etc.).
  • Easy configuration using ServiceMonitor and PodMonitor resources.
  • Supports high availability, failover, and persistent storage—for example, using Thalassa Cloud’s block storage.
  • Manages Prometheus, Alertmanager, and can provide custom metrics for autoscaling.

Prerequisites

Before installing Prometheus Operator, ensure the following prerequisites are met:

  • You have a running Kubernetes cluster in Thalassa Cloud. Prometheus Operator requires Kubernetes 1.19 or later (all Thalassa Cloud clusters meet this requirement).
  • You have cluster access configured with kubectl. Use tcloud kubernetes connect to set up access, or configure your kubeconfig manually.
  • You have cluster administrator permissions, as installing Prometheus Operator requires creating cluster-level resources.
  • You have planned for persistent storage and resource requirements. Prometheus stores metrics data, which can grow over time—use Thalassa Cloud’s block storage and allocate sufficient CPU and memory for Prometheus and related components.

Installing Prometheus Operator

The recommended way to install Prometheus Operator is using the official Helm chart from the Prometheus Community, which provides a straightforward installation process and makes configuration easy.

First, add the Prometheus Community Helm repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create a namespace for Prometheus:

kubectl create namespace monitoring

Install Prometheus Operator using Helm:

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=tc-block \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

This installs the full Prometheus stack, including Prometheus, Alertmanager, and Grafana. The configuration specifies Thalassa Cloud’s block storage class (tc-block) for persistent storage and requests 50Gi of storage. Adjust the storage size based on your retention requirements.

Verify that Prometheus Operator is running:

kubectl get pods -n monitoring

You should see pods for the Prometheus Operator, Prometheus, Alertmanager, and Grafana. Check their status:

kubectl get pods -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-operator

The logs should show that the operator is running and ready to manage Prometheus instances.

Accessing Prometheus and Grafana

After installation, you can access Prometheus and Grafana through port forwarding or by exposing them through services. For initial setup and testing, port forwarding is the simplest approach.

Port forward Prometheus:

kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090

Open your browser and navigate to http://localhost:9090 to access the Prometheus UI. You can query metrics, view targets, and check alert status.

Port forward Grafana:

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Open your browser and navigate to http://localhost:3000 to access Grafana. The default username is admin, and you can get the password:

kubectl get secret prometheus-grafana -n monitoring -o jsonpath='{.data.admin-password}' | base64 -d && echo

Production Access

For production use, expose Prometheus and Grafana through ingress or load balancers. You can configure services of type LoadBalancer to use Thalassa Cloud’s load balancers, or set up an ingress controller. See the Service Load Balancers documentation for details.

Configuring Service Discovery

Prometheus Operator uses ServiceMonitor and PodMonitor resources to configure what metrics Prometheus should collect. This Kubernetes-native approach makes it easy to add monitoring for applications.

Create a ServiceMonitor to scrape metrics from a service:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-metrics
  namespace: default
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

This ServiceMonitor tells Prometheus to scrape metrics from services labeled with app: my-app on the metrics port. Prometheus automatically discovers and monitors these services.

Apply the ServiceMonitor:

kubectl apply -f servicemonitor.yaml

Prometheus will automatically pick up the ServiceMonitor and begin scraping metrics. You can verify this in the Prometheus UI by navigating to Status → Targets, where you should see your service listed.

For applications that expose metrics directly from pods rather than services, use a PodMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-pod-metrics
  namespace: default
spec:
  selector:
    matchLabels:
      app: my-app
  podMetricsEndpoints:
  - port: metrics
    interval: 30s
    path: /metrics

This PodMonitor scrapes metrics directly from pods, which is useful for applications that don’t use services or for sidecar metrics.

Configuring Alerting

Prometheus Operator manages Alertmanager for handling alerts. You configure alerting rules using PrometheusRule resources, and Alertmanager handles routing and notifications.

Create a PrometheusRule with alert definitions:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: default
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
  - name: my-app
    interval: 30s
    rules:
    - alert: HighMemoryUsage
      expr: container_memory_usage_bytes{pod=~"my-app.*"} > 500000000
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage in my-app"
        description: "Pod {{ $labels.pod }} is using {{ $value }} bytes of memory"
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total{pod=~"my-app.*"}[5m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod is crash looping"
        description: "Pod {{ $labels.pod }} is restarting frequently"

This PrometheusRule defines two alerts: one for high memory usage and one for pods that are crash looping. Apply it:

kubectl apply -f prometheusrule.yaml

Prometheus will automatically load these rules and evaluate them. When conditions are met, alerts are sent to Alertmanager.

Configure Alertmanager to send notifications. Edit the Alertmanager configuration:

kubectl edit secret alertmanager-prometheus-kube-prometheus-alertmanager -n monitoring

Or create a custom Alertmanager configuration:

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-prometheus-kube-prometheus-alertmanager
  namespace: monitoring
type: Opaque
stringData:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 12h
      receiver: 'web.hook'
    receivers:
    - name: 'web.hook'
      webhook_configs:
      - url: 'http://your-webhook-url'

This configuration routes alerts to a webhook. You can configure email, Slack, PagerDuty, or other notification channels. See the Alertmanager documentation for all supported receivers.

Configuring Storage

Prometheus stores metrics data, and the amount of storage needed depends on your retention period, scrape interval, and number of metrics. Thalassa Cloud’s block storage provides high-performance, durable storage for Prometheus data.

When installing Prometheus Operator, you can configure storage in the Helm values:

prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: tc-block
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 100Gi

This configures Prometheus to use Thalassa Cloud’s block storage with 100Gi capacity. Adjust the size based on your retention requirements. As a rough estimate, plan for approximately 1-2 bytes per sample per time series, multiplied by your retention period and scrape frequency. For more information about storage classes, see the Storage Classes documentation.

You can resize the storage later if needed. See the Resize Persistent Volume guide for details about expanding Prometheus storage.

For long-term retention, consider using Prometheus’s remote write feature to send metrics to external storage systems or Thalassa Cloud’s object storage for archival purposes.

References and Further Reading

For more detailed information and official documentation, see the following resources: