Alerting

Thalassa Prometheus Service includes built-in Alertmanager for configuring alerts and routing notifications. This page covers the working of the Alertmanager, Ruler and configuration for Thalassa Prometheus Service.

Overview

The alerting system consists of:

  • Alert rules, which define the conditions that trigger alerts.
  • Recording rules, that are used to pre-compute expensive queries, improving performance.
  • Alertmanager configuration, for routing alerts to various notification channels.
  • And Notification channels, such as email, Slack, Teams, webhooks, and other supported integrations.

Managing Rules and Alertmanager Configuration

Thalassa Prometheus Service supports multiple methods for managing alert rules, recording rules, and Alertmanager configuration, including Mimirtool (a command-line tool for managing rules and Alertmanager config), the Console (a web-based interface for managing rules and notification channels), Prometheus APIs (compatible with standard Prometheus Ruler and Alertmanager APIs), and other tools that support Prometheus APIs such as amtool and promtool.

Creating Alert Rules

Basic Alert Rule

Create an alert rule that triggers when a condition is met:

groups:
- name: infrastructure
  interval: 30s
  rules:
  - alert: HighCPUUsage
    expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
      team: platform
    annotations:
      summary: "High CPU usage detected"
      description: "CPU usage is {{ $value }}% on {{ $labels.instance }}"

Alert Rule Components

  • expr: PromQL expression that evaluates to true when alert should fire
  • for: Duration the condition must be true before alerting
  • labels: Labels attached to the alert
  • annotations: Human-readable information about the alert

Recording Rules

Recording rules pre-compute expensive queries to improve query performance and reduce costs:

groups:
- name: recording_rules
  interval: 30s
  rules:
  - record: instance:node_cpu:rate5m
    expr: rate(node_cpu_seconds_total[5m])
  
  - record: instance:node_memory:usage_percent
    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Managing Rules with Mimirtool

Mimirtool is a command-line tool for managing rules and Alertmanager configuration in Cortex/Mimir-based Prometheus services.

Installation

Download Mimirtool from the Grafana Mimir releases or install via package manager.

Authentication

Configure authentication using OIDC:

# Set environment variables
export MIMIR_ADDRESS=https://prometheus.nl-01.thalassa.cloud
export MIMIR_TENANT_ID=<your-tenant-id>

# Authenticate using OIDC
export THALASSA_BEARER_TOKEN=$(tcloud oidc get-bearer-token)

mimirtool auth login \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Managing Alert Rules

Load rules from file:

mimirtool rules load rules.yaml \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

List existing rules:

mimirtool rules list \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Delete rules:

mimirtool rules delete <namespace> \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Verify rules:

mimirtool rules verify rules.yaml \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Managing Alertmanager Configuration

You can configure Alertmanager for your Prometheus tenant using mimirtool. This allows you to upload, view, or remove your Alertmanager settings directly from the command line.

Load Alertmanager config:

mimirtool alertmanager load alertmanager.yaml \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Get current Alertmanager config:

mimirtool alertmanager get \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Delete Alertmanager config:

mimirtool alertmanager delete \
  --address $MIMIR_ADDRESS \
  --tenant-id $MIMIR_TENANT_ID \
  --token $THALASSA_BEARER_TOKEN

Managing Rules via Prometheus APIs

Thalassa Prometheus Service is compatible with standard Prometheus Ruler and Alertmanager APIs.

Ruler API

List rule groups:

curl -H "Authorization: Bearer $TOKEN" \
  https://prometheus.nl-01.thalassa.cloud/api/v1/rules

Get rules for namespace:

curl -H "Authorization: Bearer $TOKEN" \
  https://prometheus.nl-01.thalassa.cloud/api/v1/rules/<namespace>

Load rules (POST):

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/yaml" \
  --data-binary @rules.yaml \
  https://prometheus.nl-01.thalassa.cloud/api/v1/rules/<namespace>

Delete rule group:

curl -X DELETE \
  -H "Authorization: Bearer $TOKEN" \
  https://prometheus.nl-01.thalassa.cloud/api/v1/rules/<namespace>/<group>

Alertmanager API

Get Alertmanager config:

curl -H "Authorization: Bearer $TOKEN" \
  https://prometheus.nl-01.thalassa.cloud/api/v1/alerts

Set Alertmanager config:

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/yaml" \
  --data-binary @alertmanager.yaml \
  https://prometheus.nl-01.thalassa.cloud/api/v1/alerts

Alert Routing

Route alerts to different channels based on severity or labels:

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'default'
  routes:
  - match:
      severity: critical
    receiver: 'pagerduty'
  - match:
      severity: warning
    receiver: 'slack-alerts'
  - match:
      team: platform
    receiver: 'email-team'

References