Skip to main content

Step 1 — Start the document

The fastest way is from a resolved incident or RCA (see Postmortems overview). Obsy pre-fills as much as possible from the incident timeline and RCA report. If starting manually: Postmortems → New postmortem.

Step 2 — Fill in the header

  • Title: A clear, factual summary. Bad: “Server was down”. Good: “Payment service 503 errors due to database connection pool exhaustion (2026-01-15)”.
  • Date: The date the incident occurred (not when the postmortem was written).
  • Authors: Your name and anyone else contributing.
  • Severity: Match the incident severity (SEV1–SEV4).
  • Duration: Calculate from first alert to resolution. Minutes matter — be precise.

Step 3 — Write the impact

Describe who was affected and how:
  • Number of users affected (or percentage)
  • Specific features / endpoints that were down or degraded
  • Business impact (e.g. revenue, SLA breach, customer-facing errors)
Example:
“Approximately 12% of checkout attempts failed for US customers between 14:32 and 15:07 UTC. Estimated 3,200 failed transactions. SLA breach of 8 minutes.”

Step 4 — Build the timeline

List key events in chronological order with UTC timestamps:
14:32 — First 503 errors observed (Datadog alert fired)
14:34 — Incident opened, SEV2 declared
14:38 — Lead engineer joined incident channel
14:45 — Root cause identified: connection pool limit hit after deploy
14:58 — Config rollback applied
15:07 — Error rate returned to baseline, incident resolved
Obsy imports timeline entries from the incident automatically if you created the postmortem from an incident.

Step 5 — Root cause and contributing factors

Copy from the RCA or write from scratch:
  • Root cause: One clear sentence. The specific technical reason.
  • Contributing factors: Bullet list of secondary issues.

Step 6 — What went well / what went wrong

Be honest. Both sections are important: What went well:
  • Alert fired within 2 minutes of the deploy
  • Team assembled quickly in Slack
  • Rollback procedure was documented and fast
What went wrong:
  • Connection pool limit wasn’t tested under production load
  • No canary stage for this service
  • Status page update was delayed by 15 minutes

Step 7 — Action items

Each action item must have an owner and a due date. Vague items don’t get done.
#ActionOwnerDue
1Add connection pool monitoring alert@alice2026-01-22
2Implement canary deployment for payment-service@bob2026-02-01
3Add load test to payment-service CI pipeline@carol2026-01-29

Step 8 — Save and share

Click Save. The postmortem is immediately visible to all org members. Copy the URL and share it in your team Slack channel or sprint review.