Remote Write Configuration

Remote Write Configuration

Thalassa Prometheus Service accepts metrics via the Prometheus remote write protocol. Configure your Prometheus instances (or Prometheus Operator) to forward collected metrics to the remote write endpoint.

Overview

Remote write enables your Prometheus instances to forward collected metrics to Thalassa Prometheus Service, offloading long-term storage responsibilities to the managed service. This architecture allows you to aggregate metrics from multiple Prometheus instances into a single queryable endpoint, simplifying metric management across your infrastructure. By using remote write, you can significantly reduce local storage requirements for your Prometheus instances while maintaining access to historical metrics through the managed service.

Basic Configuration

Prometheus Configuration

Add remote write configuration to your prometheus.yml:

remote_write:
  - url: https://prometheus-write.nl-01.thalassa.cloud/api/v1/push
    oauth2:
      client_id: <service-account-id>
      client_secret: <service-account-secret>
      token_url: https://api.thalassa.cloud/v1/oidc/token

Prometheus Operator Configuration

For Kubernetes deployments using Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  remoteWrite:
    - url: https://prometheus-write.nl-01.thalassa.cloud/api/v1/push
      oauth2:
        clientId:
          name: prometheus-credentials
          key: client-id
        clientSecret:
          name: prometheus-credentials
          key: client-secret
        tokenUrl: https://api.thalassa.cloud/v1/oidc/token

Create the secret with credentials:

kubectl create secret generic prometheus-credentials \
  --from-literal=client-id=<service-account-id> \
  --from-literal=client-secret=<service-account-secret>

Authentication

OIDC Authentication

Thalassa Prometheus Service uses OIDC authentication integrated with Thalassa Cloud IAM. For production deployments, we recommend using a service account. To set up a service account, create it in IAMService Accounts, grant it appropriate permissions for Prometheus remote write, and generate access credentials. Use these credentials in your Prometheus configuration for secure, automated authentication.

Alternatively, you can use a personal access token created in your account settings. While simpler to set up, personal access tokens are less suitable for automation and should be used primarily for testing or manual operations.

OIDC Token Exchange

For CI/CD pipelines, use OIDC token exchange:

# Exchange OIDC token for Thalassa Cloud access token
TOKEN=$(tcloud oidc token-exchange \
  --subject-token "${OIDC_TOKEN}" \
  --organisation-id "${THALASSA_ORGANISATION_ID}" \
  --service-account-id "${THALASSA_SERVICE_ACCOUNT_ID}")

# Use token in remote_write configuration

IP ACLs

Restrict remote write access by IP address:

Via Console

To configure IP ACLs through the console, navigate to your Prometheus instance and go to ConfigurationRemote WriteIP ACLs. You can add individual IP addresses, such as 192.168.1.100, or CIDR blocks like 192.168.1.0/24 to allow entire network ranges. Multiple entries can be added to accommodate multiple source networks or specific hosts that need to send metrics.

Via API

curl -X PUT \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "ip_acls": [
      "192.168.1.0/24",
      "10.0.0.0/8"
    ]
  }' \
  https://api.thalassa.cloud/v1/prometheus/instances/<instance-id>/remote-write/acls

Rate Limiting

Configure rate limits to control remote write throughput:

Via Console

To configure rate limits through the console, navigate to ConfigurationRemote WriteRate Limits. Set the maximum samples per second that can be ingested, and configure the burst size which determines how many samples can be sent in a short burst before rate limiting takes effect. These settings help prevent overload and control costs while allowing for traffic spikes during high-activity periods.

Via API

curl -X PUT \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limit": {
      "samples_per_second": 10000,
      "burst_size": 50000
    }
  }' \
  https://api.thalassa.cloud/v1/prometheus/instances/<instance-id>/remote-write/rate-limit

Write Relabeling

Filter and transform metrics before sending to remote write:

remote_write:
  - url: https://prometheus-write.nl-01.thalassa.cloud/api/v1/push
    write_relabel_configs:
      # Drop specific metrics
      - source_labels: [__name__]
        regex: 'up|prometheus_.*'
        action: drop
      
      # Keep only specific metrics
      - source_labels: [__name__]
        regex: 'node_.*|http_.*'
        action: keep
      
      # Add labels
      - target_label: environment
        replacement: production
      
      # Modify labels
      - source_labels: [cluster]
        target_label: cluster_name

Queue Configuration

Tune remote write queue settings for optimal performance:

remote_write:
  - url: https://prometheus-write.nl-01.thalassa.cloud/api/v1/push
    queue_config:
      max_samples_per_send: 1000      # Samples per batch
      batch_send_deadline: 5s          # Max time to wait for batch
      max_retries: 3                   # Retry attempts
      min_backoff: 30ms                # Initial backoff
      max_backoff: 100ms               # Maximum backoff
      capacity: 10000                  # Queue capacity

Conditional Processing

Use conditionals to filter metrics based on complex rules:

remote_write:
  - url: https://prometheus-write.nl-01.thalassa.cloud/api/v1/push
    write_relabel_configs:
      # Only send metrics from production environment
      - source_labels: [environment]
        regex: 'production'
        action: keep
      
      # Drop high-cardinality metrics
      - source_labels: [__name__]
        regex: '.*_bucket.*'
        action: drop

Best Practices

Use service accounts for authentication in production environments, as they provide better security and are designed for automated deployments. Configure IP ACLs to restrict remote write access by IP address, adding an additional layer of security to your metrics ingestion. Set appropriate rate limits to prevent overload and control costs while allowing for normal traffic patterns.

Use write relabeling to filter out unnecessary metrics before they are sent to the remote write endpoint, reducing both storage costs and query complexity. Regularly monitor the remote write queue status in your Prometheus UI to ensure metrics are being forwarded successfully and to detect any issues early. Before deploying to production, thoroughly test your remote write configuration in a non-production environment to verify authentication, network connectivity, and metric forwarding.

Troubleshooting

Remote Write Failing

If remote write is failing, first verify that your authentication credentials are correct and that the service account or personal access token has the necessary permissions. Check that your source IP address is included in the IP ACLs configuration if IP restrictions are enabled. Review your rate limit settings to ensure they aren’t too restrictive for your metric volume. Finally, check your Prometheus logs for specific remote write error messages that can help identify the root cause.

High Queue Size

When experiencing high queue sizes, consider increasing the max_samples_per_send value in your queue configuration to allow larger batches. If possible, reduce scrape intervals for non-critical metrics to decrease the overall sample rate. Use write relabeling to filter out unnecessary metrics, reducing the volume of data that needs to be forwarded. Also verify network connectivity between your Prometheus instance and the remote write endpoint to ensure there are no network issues causing delays.

Authentication Errors

For authentication errors, verify that your service account has the correct permissions assigned in IAM. If using OIDC token exchange, ensure the token exchange process is working correctly and that all required parameters are provided. Review your IAM roles and policies to confirm that the service account has access to the Prometheus service resources.

References