> ## Documentation Index
> Fetch the complete documentation index at: https://docs.obsy.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OTel Collector Monitors

> 11 pre-built health monitors for your OTel collector in Datadog or Grafana.

Obsy can create a set of pre-built monitors in Datadog or Grafana that alert you when your OTel collector is unhealthy. These monitors cover every layer of the collector pipeline.

***

## Creating monitors

1. Go to **OTel Collector** and click **Create Monitors** on the cluster card.
2. Select the **observability platform** (Datadog or Grafana).
3. Select the **namespace** where the collector is installed (default: `obsy-system`).
4. Optionally select a **Slack channel** — monitor alerts will be posted there.
5. Click **Create Monitors**.

Obsy creates all 11 monitors in your platform and automatically adds the `@webhook-obsy` notification handle so alerts route back to Obsy for RCA and incident creation.

***

## The 11 monitors

### Gateway health

| Monitor                           | What it detects                                     |
| --------------------------------- | --------------------------------------------------- |
| **Gateway pod health**            | Gateway pod is not running or restarting frequently |
| **High span export failure rate** | More than 5% of spans are failing to export         |
| **Export queue near capacity**    | Queue is more than 80% full (backpressure building) |
| **High memory usage**             | Gateway process memory exceeds 1.5 GB               |
| **Memory limiter refusing spans** | Memory limiter is dropping incoming spans           |

### Node collector health

| Monitor                             | What it detects                                           |
| ----------------------------------- | --------------------------------------------------------- |
| **Node collector pod health**       | One or more node collector DaemonSet pods are not running |
| **High metric export failure rate** | Node collector failing to export metrics                  |
| **High log export failure rate**    | Node collector failing to export log records              |

### Ingestion health

| Monitor                 | What it detects                                                                  |
| ----------------------- | -------------------------------------------------------------------------------- |
| **No spans received**   | Gateway has received zero spans for 30+ minutes (possible instrumentation break) |
| **No metrics received** | Node collector has received zero metrics for 30+ minutes                         |
| **No logs received**    | Node collector has received zero log records for 30+ minutes                     |

***

## Notification routing

All 11 monitors include `@webhook-obsy` as a notification handle. When any monitor triggers:

1. Datadog/Grafana sends the alert to your Obsy webhook URL
2. Obsy creates an Alert entity and runs RCA
3. If auto-create rules match, an incident is opened
4. If Slack is configured, a message is posted to your incident channel

If you added a Slack channel in the Create Monitors modal, `@slack-{channel-name}` is also added to each monitor's notification list — alerts go to both Obsy and Slack simultaneously.

***

## Recreating monitors

If you need to update the monitors (e.g. after changing the namespace or adding a new platform), click **Recreate Monitors** on the cluster card. Obsy deletes the existing monitors and creates fresh ones.

***

## Manual monitor management

You can edit individual monitors directly in Datadog or Grafana after creation — Obsy doesn't overwrite manual changes unless you click **Recreate Monitors**.