helm-charts

opentelemetry-database-monitoring

Version: 0.1.1 Type: application AppVersion: 1.16.0

A Helm chart that deploys OpenTelemetry-based deep monitoring for PostgreSQL using a sidecar collector pattern. It installs stored functions on the target database, creates a dedicated monitoring user, and runs an OpenTelemetryCollector sidecar that emits metrics via OTLP.

When argoEvents.enabled=true, the chart also installs Argo Events to run the database setup Job after the OTel Operator injects the sidecar into a target Pod.

Requirements

Repository Name Version
https://argoproj.github.io/argo-helm argo-events 2.4.22

How it works

Sidecar collector

┌───────────────────────────────────────────┐
│  Your PostgreSQL Pod                      │
│                                           │
│  ┌─────────────┐   ┌─────────────────┐    │
│  │  PostgreSQL │   │  OTel Collector │    │
│  │  container  │◄──│    (sidecar)    │    │
│  └─────────────┘   └────────┬────────┘    │
│                              │ OTLP/gRPC  │
└──────────────────────────────┼────────────┘
                               ▼
                     Node IP :4317
                 (DaemonSet collector
                  or any OTLP endpoint)

The chart creates a single shared ConfigMap (postgresql-monitoring-setup) that embeds pg-monitoring-setup.sql (the stored functions), and then for each database entry in values.yaml:

  1. Secret — a <name>-pg-monitor-credentials Secret holding the auto-generated (or idempotently preserved) password for the otel_monitor user. When postgres.databases[*].namespace differs from the Helm release namespace, the same Secret is replicated into the target namespace so the injected sidecar can resolve secretKeyRef in the Pod’s namespace.
  2. Setup Job — waits for PostgreSQL to be ready, runs the setup SQL on the otel database, and rotates the otel_monitor password to match the Secret. See Argo Events setup below for when and how this Job is created.
  3. OpenTelemetryCollector (sidecar) — an opentelemetry.io/v1beta1 CR in the Helm release namespace that configures the contrib collector with all receivers and exports metrics to $K8S_NODE_IP:4317.

Argo Events setup

When argoEvents.enabled=true and argoEvents.triggerSetupJob=true (the default), the chart installs Argo Events and uses it to run the setup Job after the OTel Operator injects the sidecar into a target Pod. This avoids racing the database setup against sidecar startup and supports deploying DB monitoring in one namespace while Postgres runs in another.

┌─────────────────────────────────────────────────────────────────────────┐
│  Helm release namespace (e.g. observability)                            │
│                                                                         │
│  EventBus ──► EventSource ──► NATS ──► Sensor ──► Setup Job           │
│                  │                              │                       │
│  OpenTelemetryCollector CR ◄─────────────────────┘ (same release ns)   │
│  Monitor Secret (for Job)                                               │
│  ConfigMap (setup SQL)                                                  │
│  Argo Events controller (argo-events subchart)                          │
└─────────────────────────────────────────────────────────────────────────┘
         ▲                                    │
         │ cross-namespace inject             │ cross-namespace pod watch
         │ annotation                         │ (RBAC in target namespace)
         │                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Target namespace (e.g. demo)                                           │
│                                                                         │
│  PostgreSQL Pod  +  injected OTel sidecar                               │
│  Monitor Secret (replica for sidecar secretKeyRef)                      │
└─────────────────────────────────────────────────────────────────────────┘

The flow for each postgres.databases entry:

  1. EventBus — NATS-backed bus in the release namespace (argoEvents.eventBusName, default default).
  2. EventSource — watches Pod ADD events in the database target namespace(s). Uses filter.afterStart: true, so only Pods created after the EventSource starts trigger events (existing Pods are ignored).
  3. Sensor (one per database) — subscribes to the EventSource and filters on the sidecar.opentelemetry.io/inject annotation (argoEvents.sidecarInjectAnnotation). When the annotation matches the expected value, it creates the setup Job in the release namespace.
  4. Setup Job — same SQL setup as the Helm-hook path; connects to Postgres using cluster DNS (<host>.<namespace>.svc.cluster.local) when the database runs in another namespace.

When argoEvents.enabled=false, or argoEvents.triggerSetupJob=false, the setup Job is created directly as a Helm post-install / post-upgrade hook instead.

Sidecar inject annotation

The OTel Operator must inject the sidecar before the Sensor fires. Annotate the target Pod (or its controller template) with sidecar.opentelemetry.io/inject:

Scenario Expected annotation value
Postgres in the same namespace as the Helm release postgres-dbm-sidecar (the sidecar-name)
Postgres in a different namespace <releaseNamespace>/<sidecar-name> (e.g. observability/postgres-dbm-sidecar)

The Sensor filter is derived automatically from postgres.databases[*].namespace and sidecar-name. The OpenTelemetryCollector CR always lives in the release namespace; cross-namespace injection is driven by the annotation prefix.

Cross-namespace example

Install DB monitoring in observability, target Postgres in demo:

# values.yaml for: helm install dbm … -n observability
postgres:
  enabled: true
  databases:
    - name: postgresql
      namespace: demo
      sidecar-name: postgres-dbm-sidecar
      host: postgresql
      user: root
      pwd: otel
      port: 5432

argoEvents:
  enabled: true
  triggerSetupJob: true

On the Postgres Pod (in demo):

podAnnotations:
  sidecar.opentelemetry.io/inject: "observability/postgres-dbm-sidecar"

After install, restart the Postgres Pod so the EventSource emits a fresh ADD event:

kubectl delete pod -n demo -l app.kubernetes.io/name=postgresql
kubectl get jobs -n observability -l app.kubernetes.io/component=db-monitoring-setup -w

Disabling Argo Events

argoEvents:
  enabled: false          # do not install the controller or EventBus/EventSource/Sensor
  triggerSetupJob: false  # use Helm hook Jobs even when argoEvents.enabled=true

Argo Events CRDs are bundled under crds/ and installed before other chart resources. The argo-events subchart is gated by argoEvents.enabled; its own CRD install is disabled (argo-events.crds.install: false) because this chart ships the CRDs.

Prerequisites

Installation

helm repo add tsuga-charts https://tsuga-dev.github.io/helm-charts/
helm repo update

helm install pg-monitoring tsuga-charts/opentelemetry-database-monitoring \
  --set postgres.enabled=true \
  --set postgres.databases[0].name=postgresql \
  --set postgres.databases[0].host=postgresql \
  --set postgres.databases[0].user=root \
  --set postgres.databases[0].pwd=<superuser-password>

With Argo Events enabled (default), install into the namespace where you want the sidecar CRs, EventBus, and setup Jobs to live. Set postgres.databases[*].namespace when Postgres runs elsewhere.

Examples

Single database (same namespace)

postgres:
  enabled: true
  databases:
    - name: postgresql
      sidecar-name: postgres-dbm-sidecar
      user: root
      pwd: otel
      port: 5432
      host: postgresql

argoEvents:
  enabled: true
  triggerSetupJob: true

Annotate the Postgres Pod with sidecar.opentelemetry.io/inject: postgres-dbm-sidecar.

Cross-namespace database

postgres:
  enabled: true
  databases:
    - name: postgresql
      namespace: demo
      sidecar-name: postgres-dbm-sidecar
      user: root
      pwd: otel
      port: 5432
      host: postgresql

argoEvents:
  enabled: true
  triggerSetupJob: true

Annotate the Postgres Pod in demo with sidecar.opentelemetry.io/inject: <releaseNamespace>/postgres-dbm-sidecar.

Multiple databases

Each entry produces an independent set of resources (Secret, Sensor, sidecar CR, and setup Job trigger).

postgres:
  enabled: true
  databases:
    - name: postgresql
      namespace: demo
      sidecar-name: postgres-dbm-sidecar
      user: root
      pwd: otel
      port: 5432
      host: postgresql
    - name: postgresql2
      namespace: staging
      sidecar-name: postgres2-dbm-sidecar
      user: root
      pwd: otel
      port: 5432
      host: postgresql2

Database setup

The setup Job runs assets/pg-monitoring-setup.sql idempotently. It:

  1. Creates the otel_monitor user (if absent) and grants it pg_monitor + SELECT on pg_stat_database.
  2. Enables the pg_stat_statements extension.
  3. Creates the otel schema and grants USAGE to otel_monitor.
  4. Installs the following stored functions (all STABLE, safe for repeated execution):
Function Description
otel.pg_table_stats() Vacuum/analyze ages, autovacuum counts, rows modified since last analyze. Top 50 tables by n_mod_since_analyze.
otel.pg_bloat() Table bloat ratio and wasted bytes using the standard check_postgres algorithm. Top 20 tables by wasted bytes.
otel.pg_connections() Connection count and max state age per database per connection state.
otel.pg_lock_counts() Lock counts by type, mode, and granted status.
otel.pg_blocking_queries() Count of queries currently waiting on a lock.
otel.pg_idle_in_transaction() Idle-in-transaction connection count and max age per database.
otel.pg_replication_slots() Replication slot lag in bytes per slot.
otel.pg_wal() WAL segment file count and total WAL directory size.
otel.pg_vacuum_progress() Active vacuum operations with phase and block progress (PG 10+).
otel.pg_top_queries() Top 20 queries by call count from pg_stat_statements. Query text truncated to 200 characters.
  1. Grants EXECUTE on all otel functions to otel_monitor.
  2. Rotates the otel_monitor password to match the value in the Secret.

Note: The Job connects to the otel database (-d otel). This database must exist before the chart is installed.

Metrics

The sidecar collector runs two categories of receivers.

Built-in postgresql receiver

Polls the PostgreSQL stats system views directly. Default collection interval.

Metric Unit Description
postgresql.bgwriter.buffers.allocated {buffer} Buffers allocated by bgwriter
postgresql.bgwriter.buffers.writes {buffer} Buffers written by bgwriter
postgresql.bgwriter.checkpoint.count {checkpoint} Checkpoint count
postgresql.bgwriter.duration ms Checkpoint write/sync duration
postgresql.bgwriter.maxwritten 1 Times bgwriter stopped a round due to too many buffers written
postgresql.blks_hit {hit} Buffer cache hits
postgresql.blks_read {read} Disk blocks read
postgresql.connection.max {connection} Max connections configured
postgresql.database.locks {lock} Database lock counts
postgresql.deadlocks {deadlock} Deadlocks detected
postgresql.replication.data_delay By Replication data delay
postgresql.sequential_scans {scan} Sequential scans
postgresql.temp_files {file} Temp files created
postgresql.tup_deleted / tup_fetched / tup_inserted / tup_returned / tup_updated {tuple} Row operation counters
postgresql.wal.age s WAL archiver age
postgresql.wal.lag s Replication WAL lag

sqlquery receivers (stored functions)

Receiver Interval Metrics
sqlquery/postgresql_top_queries default postgresql.query.calls, .rows, .exec_time, .plan_time, .blocks.shared.*, .blocks.temp.* — attributed by db.namespace, db.user.name, server.address, db.query.text, db.version
sqlquery/pg_table_stats 5 min postgresql.table.vacuum.age, .autovacuum.age, .analyze.age, .autoanalyze.age, .autovacuum.count, .analyze.count, .autoanalyze.count, .rows.modified.since.analyze
sqlquery/pg_bloat 1 hour postgresql.table.bloat.ratio, postgresql.table.bloat.bytes
sqlquery/pg_connections 30 s postgresql.backends (count), postgresql.backends.age (max state age)
sqlquery/pg_locks 30 s postgresql.lock.count, postgresql.lock.blocking_queries, postgresql.connection.idle_in_transaction, postgresql.connection.idle_in_transaction.age
sqlquery/pg_replication 30 s postgresql.replication.slot.lag
sqlquery/pg_wal 60 s postgresql.wal.file_count, postgresql.wal.bytes
sqlquery/pg_vacuum_progress 60 s postgresql.vacuum.heap_blks_total, .heap_blks_scanned, .heap_blks_vacuumed, postgresql.vacuum.age

All metrics are batched (max 5000 per batch) and exported via OTLP/gRPC to $K8S_NODE_IP:4317.

Security notes

Upgrading

Without Argo Events (argoEvents.enabled=false or argoEvents.triggerSetupJob=false): the setup Job runs on both post-install and post-upgrade Helm hooks with a before-hook-creation deletion policy, so it re-runs on every helm upgrade.

With Argo Events (default): the setup Job is created by a Sensor when a new target Pod is added with the correct inject annotation. Upgrading the chart updates EventSource/Sensor configuration but does not re-run setup on existing Pods. Restart the target Postgres Pod after upgrade if you need setup to run again.

All SQL statements are idempotent and the password rotation uses the existing Secret value.

Troubleshooting

Setup Job never created (Argo Events enabled)

  1. Confirm EventBus, EventSource, and Sensor exist in the Helm release namespace:
    kubectl get eventbus,eventsource,sensor -n <release-namespace>
    
  2. Verify the Sensor filter matches the Pod annotation (same namespace → sidecar-name; cross-namespace → <releaseNamespace>/<sidecar-name>):
    kubectl get sensor -n <release-namespace> -o yaml | rg 'value:|eventName:'
    kubectl get pod -n <target-namespace> <postgres-pod> -o jsonpath='{.metadata.annotations.sidecar\.opentelemetry\.io/inject}{"\n"}'
    
  3. The EventSource ignores Pods that existed before it started (afterStart: true). Restart the Postgres Pod after install or upgrade:
    kubectl delete pod -n <target-namespace> -l app.kubernetes.io/name=postgresql
    kubectl get jobs -n <release-namespace> -l app.kubernetes.io/component=db-monitoring-setup -w
    
  4. Remove stale EventBus/EventSource/Sensor objects left over from an install in a different release namespace.

Job fails immediately

Check that the otel database exists and the superuser credentials are correct:

kubectl logs -l app.kubernetes.io/component=db-monitoring-setup -n <release-namespace>

Sidecar init container fails with secret not found

The monitor Secret must exist in the Pod’s namespace. Set postgres.databases[*].namespace so the chart replicates the Secret into the target namespace.

No metrics in the collector

Verify the OpenTelemetryCollector CR was injected as a sidecar into the target Pod. The OTel Operator must be running and the Pod must have the sidecar.opentelemetry.io/inject annotation (or the operator must be configured for namespace-wide injection).

pg_stat_statements not available

The extension must be listed in shared_preload_libraries in postgresql.conf and PostgreSQL must be restarted before it can be created. Add shared_preload_libraries = 'pg_stat_statements' to your PostgreSQL configuration and restart the server before installing the chart.

Top queries missing

Queries run by the collector itself are excluded via the /* otel-collector-ignore */ comment prefix. If otel.pg_top_queries() returns no rows, check that pg_stat_statements.track is not set to none.

Values

Key Type Default Description
argo-events.crds.install bool false  
argo-events.enabled bool true  
argoEvents.enabled bool true  
argoEvents.eventBus.auth string "none"  
argoEvents.eventBus.create bool true  
argoEvents.eventBus.replicas int 1  
argoEvents.eventBusName string "default"  
argoEvents.eventSource.name string ""  
argoEvents.eventSource.serviceAccount.create bool true  
argoEvents.eventSource.serviceAccount.name string ""  
argoEvents.eventSource.watchNamespace string ""  
argoEvents.job.generateNameSuffix string "setup-"  
argoEvents.sensor.serviceAccount.create bool true  
argoEvents.sensor.serviceAccount.name string ""  
argoEvents.sidecarInjectAnnotation string "sidecar.opentelemetry.io/inject"  
argoEvents.triggerSetupJob bool true  
postgres.databases[0].host string ""  
postgres.databases[0].name string "postgresql"  
postgres.databases[0].namespace string ""  
postgres.databases[0].port int 5432  
postgres.databases[0].pwd string ""  
postgres.databases[0].sidecar-name string "postgres-dbm-sidecar"  
postgres.databases[0].user string ""  
postgres.enabled bool false  
postgres.image string "ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib"