SUSE Application Collection: Thanos

Get started

Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments.

Before exploring the chart characteristics, let’s start by deploying the default configuration:


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection}

Please check the authentication guide if you need to configure Application Collection OCI credentials in your Kubernetes cluster.

Chart overview

The Thanos Helm chart distributed in Application Collection is made from scratch, incorporating good practices and standardizations.

By default, the chart deploys a minimal Thanos setup with only the Query component enabled, which can also be disabled. This modular approach allows you to enable only the components you need, keeping your deployment lightweight and tailored to your specific use case.

Chart configuration

To view the supported configuration options and documentation, run:


helm show values oci://dp.apps.rancher.io/charts/thanos

To view the contents of the Helm chart’s README file, run:


helm show readme oci://dp.apps.rancher.io/charts/thanos

Cluster configuration

The chart provides a clusterConf section to configure shared settings across all Thanos components. This ensures consistency and simplifies the configuration of common features like object storage, tracing and security.

Object storage

Thanos relies on object storage for metric retention. You can define the connection settings using the clusterConf.objstoreConfiguration parameter, which is automatically applied to all relevant components (Sidecar, Compactor, Store Gateway and Ruler).

You can provide the configuration as a YAML string, a key-value map, or an array of entries. The chart will automatically generate the objstore_config.yml file and mount it to the relevant components.

The following example will configure the Thanos chart to work with an existing MinIO deployment.

First, install the MinIO chart with a predefined bucket name:


helm install <minio-release-name> oci://dp.apps.rancher.io/charts/minio \
    --set global.imagePullSecrets={application-collection} \
    --set mode=standalone \
    --set rootUser=<minio-access_key> \
    --set rootPassword=<minio-secret_key> \
    --set "buckets[0].name=<minio-bucket-name>" \
    --set "buckets[0].policy=none" \
    --set "buckets[0].purge=false"

This is a minimal setup intended for testing purposes. To learn more about installation and usage, see the MinIO reference guide.

Then, configure Thanos to use this MinIO instance:

objstore.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
store:
  enabled: true

Given that the Query component doesn’t need an object store, enable the Store component to check that the configuration works as expected:


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values objstore.yaml

You can verify that the Store component has successfully applied the object storage configuration:


$ kubectl logs -f statefulset/<release-name>-thanos-store
ts=2025-12-23T23:12:48.274103713Z caller=factory.go:39 level=info msg="loading bucket configuration"
...
ts=2025-12-23T23:12:48.275010338Z caller=store.go:594 level=info msg="starting store node"
ts=2025-12-23T23:12:48.275046546Z caller=store.go:492 level=info msg="initializing bucket store"
ts=2025-12-23T23:12:48.275106171Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
ts=2025-12-23T23:12:48.275126755Z caller=http.go:72 level=info service=http/server component=store msg="listening for requests and metrics" address=0.0.0.0:10902
...
ts=2025-12-23T23:12:48.280287588Z caller=fetcher.go:690 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=5.1885ms duration_ms=5 cached=0 returned=0 partial=0
ts=2025-12-23T23:12:48.280362255Z caller=store.go:509 level=info msg="bucket store ready" init_duration=5.302666ms
ts=2025-12-23T23:12:48.280477838Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready

For more details on the supported object storage providers and their configuration options, refer to the Thanos Storage documentation .

Tracing

Thanos supports multiple tracing backends to help you debug and understand the performance of your distributed setup.

You can provide the configuration as a YAML string, a key-value map, or an array of entries. The chart will automatically generate the tracing_config.yml file and mount it to the relevant components.

The following example will configure the Thanos chart to work with an existing Jaeger Operator deployment.

tracing.yaml


clusterConf:
  tracingConfiguration: |
    type: JAEGER
    config:
      sampler_type: const
      sampler_param: 1
      endpoint: http://<jaeger-operator-release-name>-jaeger-collector.default.svc.cluster.local:14268/api/traces


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values tracing.yaml

You can verify that the Query component has successfully applied the tracing configuration:


$ kubectl logs -f deployment/<release-name>-thanos-query
ts=2025-12-24T10:16:55.543313814Z caller=factory.go:43 level=info msg="loading tracing configuration"

For more details on the supported tracing backends, refer to the Thanos Tracing documentation .

HTTPS configuration

You can secure the HTTP endpoints of the Thanos components using TLS and Basic Authentication.

To enable TLS for HTTP connections, use the clusterConf.https.tls section. You will need to provide an existing Kubernetes Secret containing the certificate and key.

First, generate a TLS certificate. Read how to do it here if needed. You can use cert-manager as well.

Once it is generated, create the Kubernetes secret for that certificate:


kubectl create secret generic tls-certs --from-file ca.crt --from-file tls.crt --from-file tls.key

https-tls.yaml


clusterConf:
  https:
    tls:
      enabled: true
      existingSecret: tls-certs
      certFilename: tls.crt
      keyFilename: tls.key
      clientCAFilename: ca.crt


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values http-tls.yaml


$ kubectl logs -f deployment/<release-name>-thanos-query
...
ts=2025-12-23T17:40:54.466627934Z caller=handler.go:87 level=info service=http/server component=query caller=tls_config.go:347 time=2025-12-23T17:40:54.466618309Z
msg="Listening on" address=[::]:10902
ts=2025-12-23T17:40:54.466937475Z caller=handler.go:87 level=info service=http/server component=query caller=tls_config.go:383 time=2025-12-23T17:40:54.466934559Z
msg="TLS is enabled." http2=true address=[::]:10902

To enable Basic Authentication for the HTTP endpoints, use the clusterConf.https.auth section. You can either provide the users directly in the values.yaml (hashed with bcrypt) or reference an existing Secret.

https-auth.yaml


clusterConf:
  https:
    auth:
      enabled: true
      basicAuthUsers:
        admin: "$2y$10$..."


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values http-auth.yaml

For more details on Thanos HTTPS feature, refer to the Thanos HTTPS documentation .

gRPC configuration

Thanos components communicate with each other using gRPC, which can be secured with TLS.

To enable TLS for the gRPC server (used by components to accept connections), use the clusterConf.grpc.server.tls section.

grpc-server-tls.yaml


clusterConf:
  grpc:
    server:
      tls:
        enabled: true
        existingSecret: thanos-grpc-server-certs
        certFilename: tls.crt
        keyFilename: tls.key
        clientCAFilename: ca.crt


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values grpc-server-tls.yaml


$ kubectl logs -f deployment/<release-name>-thanos-query
ts=2025-12-23T18:07:21.798682732Z caller=endpointset.go:270 level=info msg="performing initial DNS resolution for endpoints"
ts=2025-12-23T18:07:21.798720566Z caller=endpointset.go:288 level=info msg="initial DNS resolution completed"
ts=2025-12-23T18:07:21.799064441Z caller=options.go:33 level=info protocol=gRPC msg="enabling server side TLS"
ts=2025-12-23T18:07:21.799274691Z caller=options.go:74 level=info protocol=gRPC msg="server TLS client verification enabled"
ts=2025-12-23T18:07:21.799551649Z caller=query.go:663 level=info msg="starting query node"

To enable TLS for the gRPC client (used by components to initiate connections), use the clusterConf.grpc.client.tls section as done for the server side.

Components configuration

Each Thanos component has its own configuration section in the values.yaml file where you can enable it and configure its specific settings:

compactor: Compactor component (default: disabled)
query: Query component (default: enabled)
queryFrontend: Query Frontend component (default: disabled)
receiver: Receiver component (default: disabled)
rule: Rule component (default: disabled)
store: Store component (default: disabled)

All Thanos components in this chart follow a consistent configuration pattern. For any given configuration file required by a component (for example, endpoint_config.yml for Query, hashrings.json for Receiver), the chart provides two key parameters:

component.xyzConfigurationFile (such as clusterConf.objstoreConfigurationFile): Defines the name of the configuration file to be created.
component.xyzConfiguration (such as clusterConf.objstoreConfiguration): Defines the content of that file. This can be provided as a YAML string, a key-value map, or a list.

When component.xyzConfiguration is set, the chart automatically enables the component’s ConfigMap (component.configMap) and populates it with the provided content. This ConfigMap is then mounted into the component’s pods, ensuring the configuration is available at runtime.

For a practical example of this pattern, see the Compactor section below.

Compactor

The Thanos Compactor component applies the retention policy to the data stored in the object storage and downsamples historical data for faster querying.

To enable the Compactor, set compactor.enabled to true. You should also configure the retention periods for different resolutions.

compactor.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
compactor:
  enabled: true
  retentionResolutionRaw: "60d"
  retentionResolution5m: "120d"
  retentionResolution1h: "1y"

To verify the Compactor component is running and appliying non-default retentions, you can check the logs:


$ kubectl logs -f statefulset/thanos-compactor
ts=2025-12-24T15:02:59.21188579Z caller=factory.go:39 level=info msg="loading bucket configuration"
...
ts=2025-12-24T15:02:59.214360665Z caller=compact.go:705 level=info msg="starting compact node"
...
ts=2025-12-24T15:02:59.35457879Z caller=blocks_cleaner.go:45 level=info msg="started cleaning of blocks marked for deletion"
ts=2025-12-24T15:02:59.354590165Z caller=blocks_cleaner.go:62 level=info msg="cleaning of blocks marked for deletion done"
ts=2025-12-24T15:02:59.354594123Z caller=compact.go:1545 level=info msg="start of initial garbage collection"
ts=2025-12-24T15:02:59.354626123Z caller=compact.go:1566 level=info msg="start of compactions"
ts=2025-12-24T15:02:59.354643873Z caller=compact.go:1602 level=info msg="compaction iterations done"
...
ts=2025-12-24T15:02:59.371453082Z caller=retention.go:33 level=info msg="start optional retention"
ts=2025-12-24T15:02:59.371460498Z caller=retention.go:48 level=info msg="optional retention apply done"

The Compactor must be run as a singleton (only one instance) per bucket to avoid data corruption. Ensure that no other Compactor is running against the same bucket.

Query

The Thanos Query component implements the Prometheus HTTP v1 API to query data in a Thanos cluster via PromQL. It gathers the data needed to evaluate the query from underlying StoreAPIs, deduplicates the result and returns it to the client.

The Query component needs to know about the other components (StoreAPIs) to query data from them. The chart automatically configures the Query component to discover the enabled components within the same release (Store Gateway, Ruler or Receiver). However, if you have external components (like a Thanos Sidecar running in another namespace or cluster), you need to configure them explicitly.

You can use the query.endpointConfiguration parameter to provide a list of StoreAPIs to query.

query-endpoints.yaml


query:
  endpointConfiguration: |
    - targets:
      - "prometheus-operated.monitoring.svc.cluster.local:10901"

This will generate the endpoint_config.yml file and pass it to the Query component.

Alternatively, you can configure the query.sidecar section to automatically discover the Sidecar service. If you are using the Prometheus Operator chart from Application Collection, this can be done as follows:

query-sidecar.yaml


query:
  sidecar:
    serviceName: "<prometheus-operator-release-name>-thanos-discovery"
    namespace: "default"

To verify the Query component is running and added the sidecar endpoint, you can check the logs:


$ kubectl logs -f deploy/thanos-query
ts=2025-12-24T15:09:18.118585971Z caller=endpointset.go:270 level=info msg="performing initial DNS resolution for endpoints"
ts=2025-12-24T15:09:18.119578846Z caller=endpointset.go:288 level=info msg="initial DNS resolution completed"
ts=2025-12-24T15:09:18.120094096Z caller=options.go:29 level=info protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
ts=2025-12-24T15:09:18.120490929Z caller=query.go:663 level=info msg="starting query node"
ts=2025-12-24T15:09:18.120571387Z caller=query.go:647 level=info msg="waiting for initial endpoint discovery before marking gRPC as ready" timeout=30s
ts=2025-12-24T15:09:18.120611887Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
ts=2025-12-24T15:09:18.120622596Z caller=http.go:72 level=info service=http/server component=query msg="listening for requests and metrics" address=0.0.0.0:10902
ts=2025-12-24T15:09:18.120878512Z caller=handler.go:87 level=info service=http/server component=query caller=tls_config.go:347 time=2025-12-24T15:09:18.120851929Z msg="Listening on" address=[::]:10902
ts=2025-12-24T15:09:18.120912262Z caller=handler.go:87 level=info service=http/server component=query caller=tls_config.go:350 time=2025-12-24T15:09:18.120907762Z msg="TLS is disabled." http2=false address=[::]:10902
ts=2025-12-24T15:09:18.122553137Z caller=endpointset.go:367 level=info component=endpointset msg="adding new sidecar with [storeEndpoints rulesAPI exemplarsAPI targetsAPI MetricMetadataAPI]" address=10.42.0.167:10901 extLset="{prometheus=\"default/<prometheus-operator-release-name>-prometheus\", prometheus_replica=\"prometheus-<prometheus-operator-release-name>-prometheus-0\"}"

Query Frontend

The Thanos Query Frontend component implements the Prometheus HTTP v1 API and proxies it to the Query component. It provides features like splitting and caching to accelerate the query execution.

To enable the Query Frontend, set queryFrontend.enabled to true.

query-frontend.yaml


queryFrontend:
  enabled: true


$ kubectl get deployment --selector app.kubernetes.io/instance=<release-name>
NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
<release-name>-thanos-query            1/1     1            1           35s
<release-name>-thanos-query-frontend   1/1     1            1           35s

Receiver

The Thanos Receiver component accepts metrics from Prometheus via remote write and stores them in the object storage. It is an alternative to the Sidecar approach, useful when you cannot run a Sidecar alongside Prometheus (like in air-gapped environments or when using a managed Prometheus service).

To enable the Receiver, set receiver.enabled to true. You also need to configure the hashrings, which define how the data is distributed across the Receiver replicas.

receiver.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
receiver:
  enabled: true
  replicaCount: 3
  replicationFactor: 2
  hashringsConfiguration: |
    [
      {
        "hashring": "default",
        "tenants": []
      }
    ]

The chart will automatically generate the hashrings.json file and mount it to the Receiver pods.


$ kubectl logs -f deployment/<release-name>-thanos-query
...
ts=2025-12-23T22:52:14.292060875Z caller=endpointset.go:367 level=info component=endpointset msg="adding new receive with [storeEndpoints exemplarsAPI]" address=10.42.0.217:10901 extLset="{receive=\"true\", replica=\"thanos-receiver-2\", tenant_id=\"default-tenant\"}"
ts=2025-12-23T22:52:14.292086Z caller=endpointset.go:367 level=info component=endpointset msg="adding new receive with [storeEndpoints exemplarsAPI]" address=10.42.0.215:10901 extLset="{receive=\"true\", replica=\"thanos-receiver-0\", tenant_id=\"default-tenant\"}"
ts=2025-12-23T22:52:14.292090458Z caller=endpointset.go:367 level=info component=endpointset msg="adding new receive with [storeEndpoints exemplarsAPI]" address=10.42.0.218:10901 extLset="{receive=\"true\", replica=\"thanos-receiver-1\", tenant_id=\"default-tenant\"}"
 
$ kubectl logs -f statefulset/thanos-receiver
Found 3 pods, using pod/thanos-receiver-0
...
ts=2025-12-23T22:51:39.220270774Z caller=multitsdb.go:776 level=info component=receive component=multi-tsdb tenant=default-tenant msg="TSDB is now ready"
ts=2025-12-23T22:51:39.221408024Z caller=intrumentation.go:56 level=info component=receive msg="changing probe status" status=ready
ts=2025-12-23T22:51:39.221417441Z caller=receive.go:744 level=info component=receive msg="storage started, and server is ready to receive requests"

Rule

The Thanos Rule component evaluates Prometheus recording and alerting rules against the Thanos Query component. It is useful when you want to have a global view of your alerts or when you want to downsample data for long-term storage.

To enable the Rule component, set rule.enabled to true.

The Rule component is unique because it can load multiple rule files. The rule.ruleFiles parameter is a glob pattern that matches filenames within the configuration directory.

You can define your primary rule file using the standard ruleConfiguration and ruleConfigurationFile pair. To add additional rule files, you can define new configuration pairs (for example, extraRuleConfiguration and extraRuleConfigurationFile) and then add them to the rule.configMap parameter. This ensures the chart mounts the file in the configuration directory. As long as the filenames match the ruleFiles pattern, they will be loaded by the Rule component.

rule.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
rule:
  enabled: true
  # This glob pattern matches any file ending in rule.yml
  ruleFiles: "*rule.yml"
  ruleConfiguration: |
    groups:
    - name: example
      rules:
      - record: job:http_inprogress_requests:sum
        expr: sum(http_inprogress_requests) by (job)
  extraRuleConfiguration: |
    groups:
    - name: extra
      rules:
      - alert: HighRequestLatency
        expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
        for: 10m
  configMap:
    extra_rule.yml: '{{ .Values.rule.extraRuleConfiguration }}'

To verify the Rule component is running and evaluating rules, you can check the logs:


$ kubectl logs -f statefulset/thanos-rule
...
ts=2025-12-24T07:17:49.688826773Z caller=rule.go:892 level=info component=rules msg="starting rule node"
ts=2025-12-24T07:17:49.688903732Z caller=intrumentation.go:56 level=info component=rules msg="changing probe status" status=ready
ts=2025-12-24T07:17:49.689033065Z caller=handler.go:87 level=info caller=manager.go:176 time=2025-12-24T07:17:49.688996523Z msg="Starting rule manager..."
ts=2025-12-24T07:17:49.689071523Z caller=handler.go:87 level=info caller=manager.go:176 time=2025-12-24T07:17:49.689068898Z msg="Starting rule manager..."
ts=2025-12-24T07:17:49.689095148Z caller=grpc.go:158 level=info component=rules service=gRPC/server component=rule msg="listening for serving gRPC" address=0.0.0.0:10901
ts=2025-12-24T07:17:49.68916319Z caller=intrumentation.go:75 level=info component=rules msg="changing probe status" status=healthy
ts=2025-12-24T07:17:49.689172315Z caller=http.go:72 level=info component=rules service=http/server component=rule msg="listening for requests and metrics" address=0.0.0.0:10902
ts=2025-12-24T07:17:49.68911194Z caller=rule.go:1057 level=info component=rules msg="reload rule files" numFiles=2

You can also verify the rules are loaded and active by accessing the Thanos Rule UI.

You might see warnings like no query API server reachable during startup, which are expected. Once the Query component is ready, the Rule component will successfully evaluate the rules.

Store

The Thanos Store component (also known as Store Gateway) implements the Store API on top of historical data in an object storage bucket. It acts primarily as an API gateway and therefore does not need significant amounts of local disk space. It joins a Thanos cluster on startup and advertises the data it can access. This data is generally safe to delete across restarts at the cost of increased startup times.

To enable the Store component in standalone mode, set store.enabled to true.

store.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
store:
  enabled: true


$ kubectl get statefulset --selector app.kubernetes.io/instance=<release-name>
NAME                          READY   AGE
<release-name>-thanos-store   1/1     4m21s

Sharding

Thanos Store allows you to shard data based on time ranges. This is useful for distributing the load across multiple Store Gateway instances, especially when dealing with large amounts of historical data.

You must configure shards using the store.shards parameter. When sharding is enabled, you should also set store.architecture to sharded.

Thanos Store allows you to shard the data based on constant time or time duration relative to current time. The following example defines two shards:

shard-recent: Handles data from the last two weeks.
shard-historical: Handles data older than two weeks.

store-sharding-time.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
store:
  enabled: true
  architecture: sharded
  shards:
    - name: "shard-recent"
      minTime: "-2w"
    - name: "shard-historical"
      maxTime: "-2w"

Each shard will be deployed as a separate StatefulSet. You can also customize resources and other settings per shard.


$ kubectl get statefulset --selector app.kubernetes.io/instance=<release-name>
NAME                                           READY   AGE
<release-name>-thanos-store-shard-historical   1/1     2m46s
<release-name>-thanos-store-shard-recent       1/1     2m46s

For more details on time-based partitioning, refer to the Thanos Store documentation .

You can also shard based on labels using relabel configurations. This is useful to segregate data based on tenants, regions or other labels. The following example defines two shards:

shard-eu: Handles data with label region="eu".
shard-us: Handles data with label region="us".

store-sharding-label.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
store:
  enabled: true
  architecture: sharded
  shards:
    - name: "shard-eu"
      relabelConfigFile: "relabel_shard_eu.yml"
    - name: "shard-us"
      relabelConfigFile: "relabel_shard_us.yml"
  shardEURelabelConfiguration: |
    - action: keep
      source_labels: [region]
      regex: eu
  shardUSRelabelConfiguration: |
    - action: keep
      source_labels: [region]
      regex: us
  configMap:
    enabled: true
    relabel_shard_us.yml: '{{ .Values.store.shardEURelabelConfiguration }}'
    relabel_shard_us.yml: '{{ .Values.store.shardUSRelabelConfiguration }}'

The store.configMap parameter allows you to inject the relabel configuration files directly into the Store pods.


$ kubectl get statefulset --selector app.kubernetes.io/instance=<release-name>
NAME                                   READY   AGE
<release-name>-thanos-store-shard-eu   1/1     32s
<release-name>-thanos-store-shard-us   1/1     32s

For more details on relabelling, refer to the Thanos Sharding documentation .

General component configuration

Besides specific per-component configurations seen above, the chart allows for extensive customization of each component. You can configure resources, pod security contexts, annotations and more.

query-resources.yaml


query:
  podTemplates:
    labels:
      my-custom-annotation: "value"
    containers:
      query:
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1000m"
            memory: "2Gi"

Advanced component configuration

You can also pass extra arguments to the components using the args field in the container configuration.

query-args.yaml


query:
  podTemplates:
    containers:
      query:
        args:
          - --query.auto-downsampling

For more details on Thanos component configuration, refer to the Thanos official documentation .

Persistence

By default, Thanos achieves data persistence via persistent volume claims, one per replica, of 8Gi each. The size and other persistence settings are configurable via Helm chart persistence.* parameters. Find all the persistence options in the values and the README files.

As an example, to configure 16Gi sized persistent volume claims, you would pass the following values file to the installation command:

persistence.yaml


persistence:
  resources:
    requests:
      storage: 16Gi


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values persistence.yaml

Prometheus Operator

Thanos integrates with Prometheus to provide long-term storage and a global view. Deploying the Thanos Sidecar alongside your Prometheus instances is the most common way to achieve this.

You can deploy the Prometheus Operator chart from Application Collection and configure it to inject the Thanos Sidecar.

Create a prometheus-values.yaml file with the following configuration to enable the Thanos Sidecar and disable local compaction (as Thanos will handle it):

prometheus-values.yaml


prometheus:
  prometheusSpec:
    retention: 2h
    disableCompaction: true
    thanos:
      image: dp.apps.rancher.io/containers/thanos:0.40.1-7.13
      version: v0.40.1
      objectStorageConfig:
        secret:
          type: S3
          config:
            bucket: <minio-bucket-name>
            endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
            access_key: <minio-access_key>
            secret_key: <minio-secret_key>
            insecure: true
  thanosService:
    enabled: true

Then, install the chart:


helm install <prometheus-operator-release-name> oci://dp.apps.rancher.io/charts/prometheus-operator \
    --set global.imagePullSecrets={application-collection} \
    --values prometheus-values.yaml

Once deployed, the Thanos Sidecar will expose a gRPC endpoint that the Thanos Query component can connect to.


$ kubectl describe statefulset prometheus-<prometheus-operator-release-name>-prometheus
...
   thanos-sidecar:
    Image:       dp.apps.rancher.io/containers/thanos:0.40.1-7.13
    Ports:       10902/TCP, 10901/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      sidecar
      --prometheus.url=http://127.0.0.1:9090/
...

Metrics

The Thanos components can expose Prometheus metrics to be scraped by a Prometheus server. The metrics are disabled by default but you can enable them via metrics.enabled Helm chart parameter. This will add the necessary annotations to the pods so they can be discovered by Prometheus.

metrics.yaml


metrics:
  enabled: true


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values metrics.yaml

Prometheus can now scrape the metrics. For example, to check the metrics of the Query component:


$ kubectl port-forward svc/<release-name>-thanos-query 10902
$ curl -s localhost:10902/metrics | grep thanos_status
# HELP thanos_status Represents status (0 indicates failure, 1 indicates success) of the component.
# TYPE thanos_status gauge
thanos_status{check="healthy",component="query"} 1

PrometheusRule

The Helm chart can also deploy a PrometheusRule resource containing alerting rules for the Thanos components. This requires installing the Prometheus Operator in the cluster.

To enable the creation of the PrometheusRule resource, you need to set metrics.prometheusRule.enabled to true.

Additionally, the chart includes a set of essential meta-monitoring rules provided by the Thanos project. These rules are designed to alert on critical issues such as component absence or halt. You can enable them by setting metrics.prometheusRule.includeEssentialRules to true.

alerts.yaml


metrics:
  enabled: true
  prometheusRule:
    enabled: true
    includeEssentialRules: true

For more details on the included alerts, refer to the Thanos examples documentation .

Operations

Access the UI

Some Thanos components (Query, Query Frontend and Rule) expose a web UI to inspect the state of the component. By default, the chart creates a ClusterIP service for each enabled component.

You can access these UIs by port-forwarding the respective service to your local machine using kubectl port-forward.

Query: Main entry point for querying metrics. It provides a PromQL interface similar to Prometheus.
Query Frontend: Prefer this over the Query UI if enabled, as it provides caching and splitting capabilities.
Rule: Allows you to inspect the loaded recording and alerting rules, as well as the status of the alerts.

Run the appropriate command for the component you want to access:


# Query
kubectl port-forward svc/my-release-thanos-query 10902:10902
 
# Query Frontend
kubectl port-forward svc/my-release-thanos-query-frontend 10902:10902
 
# Rule
kubectl port-forward svc/my-release-thanos-rule 10902:10902

Then, open http://localhost:10902 in your browser.

If Basic Authentication is enabled, you will need to log in to access the UI.

Scale Thanos

Scale Thanos components to handle increased load and ensure high availability. Different components require approaching this differently:

Stateless Components: Components like Query and Query Frontend are stateless and can be scaled by increasing the replicaCount.
Rule: Deployed as a StatefulSet. Can be scaled by increasing replicaCount for high availability.
Store: Can be scaled by increasing replicaCount (for availability) or by using sharding (for performance and data distribution).
Receiver: Can be scaled using hashrings to distribute the write load across multiple instances. See Receiver for more details.

Adapt volume permissions

The Thanos Helm chart has a feature to adapt and prepare volume permissions before the database initialization. Depending on the environment where you are deploying the Helm chart, this can be necessary to run the application. You can activate the feature via Helm chart parameters (adapt_permissions.yaml).

This feature is available for all stateful components (Compactor, Receiver, Rule and Store). The following example enables it for the Receiver component:

adapt_permissions.yaml


clusterConf:
  objstoreConfiguration: |
    type: S3
    config:
      bucket: <minio-bucket-name>
      endpoint: "<minio-release-name>.default.svc.cluster.local:9000"
      access_key: <minio-access_key>
      secret_key: <minio-secret_key>
      insecure: true
receiver:
  podTemplates:
    initContainers:
      volume-permissions:
        enabled: true


helm install <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection} \
    --values adapt_permissions.yaml

Usually, you only need to adapt the permissions once. After Thanos is properly running, you can upgrade the Helm chart and deactivate the volume-permissions init container:


helm upgrade <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --reuse-values --set receiver.podTemplates.initContainers.volume-permissions.enabled=false

Upgrade the chart

In general, an in-place upgrade of your Thanos installation can be performed using the built-in Helm upgrade workflow:


helm upgrade <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --set global.imagePullSecrets={application-collection}

To reuse the existing values and only upgrade the chart version:


helm upgrade <release-name> oci://dp.apps.rancher.io/charts/thanos \
    --reuse-values

Be aware that changes from version to version may include breaking changes in Thanos itself or in the Helm chart templates. Always check the release notes before proceeding with an upgrade.

Uninstall the chart

Removing an installed Thanos Helm chart release is simple:


helm uninstall <release-name>

The Thanos nodes are deployed as StatefulSets , hence using Volume Claim Templates to store your most precious resource in a database installation: your data. These PVCs are not directly controlled by Helm and they are not removed when uninstalling the related chart.

When you are ready to remove the PVCs and your data, you will need to explicitly delete them:


kubectl delete pvc --selector app.kubernetes.io/instance=<release-name>