Template

Solo SLA Loop Starter Kit (Copy‑Paste Templates for a Human‑in‑the‑Loop Incident Loop)

Copy‑paste templates to stand up a minimal, human‑in‑the‑loop SLA communication loop: a Notion ops manual, Statuspage API updater snippets (with approval gate), Slack cadence workflow, a two‑model credit calculator, and a post‑incident report template. Built for solo operators who need reliability signals without hiring.

How to use this kit:

  1. Copy each section into your tools (Notion, Slack, Make/Zapier, your terminal/spoke service).
  2. Replace every field in [BRACKETS] with your details. Keep the suggested defaults unless you have stronger reasons.
  3. Test in a sandbox/private page first. Do not auto‑publish public updates in the first [APPROVAL_WINDOW_MINUTES] minutes — approve manually.
  4. Ship the loop: Monitor → Draft update (private) → Slack cadence reminders → Approve→Publish → Auto‑credit → Post‑incident report.

Suggested defaults you can keep today:

  • [UPTIME_SLO_PERCENT]=99.9
  • [UPDATE_CADENCE_MINUTES]=20 or 30
  • [APPROVAL_WINDOW_MINUTES]=15
  • Credit model: 5% per 30 minutes of downtime, capped at 50% (see Calculator section).

1) Notion Ops Manual — SLA definitions, routing, and comms templates

Copy this whole block into a Notion page called “Ops Manual → SLAs & Incidents.” Replace [BRACKETS].


SLA Policy for [SERVICE_NAME]

Scope

  • Covered service(s): [SERVICE_SCOPE]
  • Environments: [ENVIRONMENTS] (e.g., Production only)
  • Business hours (local time [TIMEZONE]): [OFFICE_HOURS_START]–[OFFICE_HOURS_END], Mon–Fri
  • Off‑hours policy: [OFF_HOURS_POLICY] (e.g., best‑effort with slower updates)

Availability Target (Uptime SLO)

  • Target: [UPTIME_SLO_PERCENT]% monthly
  • Minutes in month: [MINUTES_IN_MONTH] (e.g., 43,200 for 30 days)
  • Allowed downtime this month (mins) = ROUND((1 - [UPTIME_SLO_PERCENT]/100) * [MINUTES_IN_MONTH], 1)

Support SLAs

  • Time to First Response (TTFR): within [TTFR_MINUTES] minutes via [PRIMARY_CHANNELS] during business hours
  • Next Response Time: within [NEXT_RESPONSE_MINUTES] minutes while ticket is open (business hours)
  • Resolution goal (not a guarantee): [TARGET_MTTR_HOURS] hours for Sev‑2+, [TARGET_MTTR_DAYS] days for Sev‑3

Severity Levels (tie to customer impact)

  • Sev‑1 Critical: [CRITERIA_SEV1] (e.g., full outage for >[SEV1_MINUTES] mins or data loss)
  • Sev‑2 Major: [CRITERIA_SEV2] (e.g., degraded core function; workarounds exist)
  • Sev‑3 Minor: [CRITERIA_SEV3] (e.g., minor feature or narrow subset)

Components and Mapping

  • Statuspage Page ID: [STATUSPAGE_PAGE_ID]
  • Components:
    • [COMPONENT_NAME_1] → [COMPONENT_ID_1]
    • [COMPONENT_NAME_2] → [COMPONENT_ID_2]

Alert Routing

  • Monitor source(s): [MONITOR_TOOL] (checks: [CHECK_TYPES])
  • Open incident if: [OPEN_CRITERIA] (e.g., 2 consecutive failures across 2 regions)
  • Route to Slack channel: #[INCIDENT_CHANNEL_PREFIX]-[DATE]
  • Escalation: [ESCALATION_RULE] (e.g., after [ESCALATE_MINUTES] mins, call [PHONE_NUMBER])

External Communication Runbook

  • Update cadence during active incidents: every [UPDATE_CADENCE_MINUTES] minutes with new info or next‑ETA.
  • First 10–15 mins: draft internally, do not auto‑publish. Require manual approval.
  • Message templates (fill and reuse):
    • Investigating: “We’re investigating increased [SYMPTOM] affecting [AFFECTED_COMPONENTS]. Next update by [NEXT_UPDATE_ETA].”
    • Identified: “We’ve identified a cause related to [CAUSE_HINT]. Mitigation in progress. Next update by [NEXT_UPDATE_ETA].”
    • Monitoring: “A fix has been rolled out. We’re monitoring recovery. Next update by [NEXT_UPDATE_ETA].”
    • Resolved: “This incident is resolved. Impact: [IMPACT_SUMMARY]. Duration: [DURATION]. Root cause: [ROOT_CAUSE_ONE_LINE].”

SLA Credits (Policy Reference)

  • Model A (Uptime bands):
    • 99.1%–99.98%: 10% credit; 95%–99%: 25% credit; <95%: 50% credit of monthly fee for affected service.
  • Model B (Per‑interval): 5% credit per 30 minutes of downtime, capped at 50% per month.
  • Applied on request within [CREDIT_REQUEST_WINDOW_DAYS] days and against future invoices only.

Data Retention

  • Retain incident timelines & post‑incident reports for [RETENTION_MONTHS] months in Notion.

Ownership

  • Incident Commander (IC): [PRIMARY_OWNER_NAME] ([PRIMARY_OWNER_CONTACT])
  • Delegate: [DELEGATE_NAME] ([DELEGATE_CONTACT])

2) Statuspage updater snippets — curl, Make, and Zapier (with approval gate)

Use these snippets to integrate a monitor → (draft) → approve → publish flow. Always store your key as a secret and use the Authorization header.

Key notes before you paste:

  • Auth header format: Authorization: OAuth [STATUSPAGE_API_KEY].
  • As of June 30, 2026, query‑param API keys are deprecated — use the header above.
  • Prefer: build the JSON → post to Slack for approval → on ✅ approval, call the API.

Environment placeholders:

  • [STATUSPAGE_API_KEY] (store in secret manager)
  • [STATUSPAGE_PAGE_ID]
  • [INCIDENT_ID]
  • [COMPONENT_ID_1], [COMPONENT_ID_2]
  • [INCIDENT_NAME], [PUBLIC_MESSAGE], [NEXT_UPDATE_ETA]

A) Create Incident (no notifications yet)

curl -X POST \
  https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents \
  -H &#39;Content-Type: application/json&#39; \
  -H &#39;Authorization: OAuth [STATUSPAGE_API_KEY]&#39; \
  -d &#39;{
    &quot;incident&quot;: {
      &quot;name&quot;: &quot;[INCIDENT_NAME]&quot;,
      &quot;status&quot;: &quot;investigating&quot;,
      &quot;impact_override&quot;: &quot;[none|minor|major|critical]&quot;,
      &quot;deliver_notifications&quot;: false,
      &quot;body&quot;: &quot;[PUBLIC_MESSAGE]&quot;,
      &quot;component_ids&quot;: [&quot;[COMPONENT_ID_1]&quot;, &quot;[COMPONENT_ID_2]&quot;]
    }
  }&#39;

B) Append Update (identified/monitoring) — notify subscribers

curl -X POST \
  https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents/[INCIDENT_ID]/incident_updates \
  -H &#39;Content-Type: application/json&#39; \
  -H &#39;Authorization: OAuth [STATUSPAGE_API_KEY]&#39; \
  -d &#39;{
    &quot;incident_update&quot;: {
      &quot;status&quot;: &quot;[investigating|identified|monitoring|resolved]&quot;,
      &quot;deliver_notifications&quot;: true,
      &quot;body&quot;: &quot;[PUBLIC_MESSAGE] Next update by [NEXT_UPDATE_ETA].&quot;
    }
  }&#39;

C) Resolve Incident — final public message

curl -X POST \
  https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents/[INCIDENT_ID]/incident_updates \
  -H &#39;Content-Type: application/json&#39; \
  -H &#39;Authorization: OAuth [STATUSPAGE_API_KEY]&#39; \
  -d &#39;{
    &quot;incident_update&quot;: {
      &quot;status&quot;: &quot;resolved&quot;,
      &quot;deliver_notifications&quot;: true,
      &quot;body&quot;: &quot;Resolved. Impact: [IMPACT_SUMMARY]. Duration: [DURATION]. Root cause: [ROOT_CAUSE_ONE_LINE].&quot;
    }
  }&#39;

Optional: set component statuses explicitly (instead of just associating components) by using a components object when creating the incident:

{
  &quot;incident&quot;: {
    &quot;name&quot;: &quot;[INCIDENT_NAME]&quot;,
    &quot;status&quot;: &quot;investigating&quot;,
    &quot;impact_override&quot;: &quot;major&quot;,
    &quot;deliver_notifications&quot;: false,
    &quot;body&quot;: &quot;[PUBLIC_MESSAGE]&quot;,
    &quot;components&quot;: { &quot;[COMPONENT_ID_1]&quot;: &quot;major_outage&quot; }
  }
}

Make (Integromat) HTTP module values:

  • URL: https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents
  • Method: POST
  • Headers: Content-Type: application/json, Authorization: OAuth [STATUSPAGE_API_KEY]
  • Body (raw): paste the JSON from A) with mapped fields from your monitor alert
  • Next step: Slack → Post message to #[INCIDENTS_CHANNEL] with the JSON preview + “React ✅ to publish”
  • Filter: only proceed to HTTP “Append Update” step if Slack reaction contains ✅ within [APPROVAL_WINDOW_MINUTES]

Zapier (Webhooks by Zapier → Slack):

  • Trigger: [MONITOR_ALERT_TRIGGER]
  • Action 1: Code step (build incident JSON from trigger)
  • Action 2: Slack → Send a message to #[INCIDENTS_CHANNEL] including the JSON preview
  • Action 3 (Path A, if approved): Webhooks by Zapier → Custom Request
    • Method: POST
    • URL: https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents
    • Data: JSON from Action 1
    • Headers: Authorization: OAuth [STATUSPAGE_API_KEY], Content-Type: application/json

Security & safety:

  • Keep deliver_notifications: false until human approval.
  • Log incident_id responses in your DB/sheet: [STORAGE_LOCATION] for later updates.
  • Rate‑limit and de‑dup: only open a new incident if one isn’t already open for the same component/symptom in the last [DEDUP_WINDOW_MINUTES] minutes.

3) Slack incident channel workflow — cadence enforcement without spam

Create or reuse a dedicated channel per incident. Use one of these patterns and pin the block below.

Channel naming:

  • #[INC]-[YYYYMMDD]-[SHORT_SLUG] (e.g., #inc-20260529-api-timeouts)

Pinned message template:

Incident: [INCIDENT_NAME]
Opened: [OPENED_AT_LOCAL]
IC: [INCIDENT_COMMANDER]  |  Delegate: [DELEGATE]
Update cadence: every [UPDATE_CADENCE_MINUTES] minutes until resolution
First 10–15 mins: draft internally; do not auto‑publish
Latest public status: [LINK_TO_STATUSPAGE_INCIDENT]
Next update due by: [NEXT_UPDATE_ETA]

Slack quick commands (fastest setup):

  • Start cadence (20m): /remind #inc-… &quot;Post public update (what changed + next ETA).&quot; every 20 minutes
  • Start cadence (30m): /remind #inc-… &quot;Post public update (what changed + next ETA).&quot; every 30 minutes
  • Stop cadence when resolved: /remind list → “Mark as complete”

Workflow Builder (button‑start with approval signal):

  1. Trigger: “Shortcut” named “Start Incident Cadence”. Inputs: [UPDATE_CADENCE_MINUTES].
  2. Step: Post a message to the channel with the pinned template.
  3. Step: Add a Delay for [UPDATE_CADENCE_MINUTES] minutes.
  4. Step: Post “Reminder: publish an update only if there’s new info. Else push the next ETA.”
  5. Loop: Repeat steps 3–4 until someone posts “/resolve” or adds the 🟢 emoji to the pinned message.

Copy‑ready update blocks (paste, then customize):

  • Investigating: “We’re investigating increased [SYMPTOM] impacting [COMPONENTS/USERS]. Next update by [NEXT_UPDATE_ETA].”
  • Identified: “Cause identified ([CAUSE_HINT]). Mitigating now. Next update by [NEXT_UPDATE_ETA].”
  • Monitoring: “Fix deployed. Monitoring metrics and user reports. Next update by [NEXT_UPDATE_ETA].”
  • Resolved: “Resolved. Impact: [IMPACT_SUMMARY]. Duration: [DURATION]. Root cause: [ROOT_CAUSE_ONE_LINE].”

Noise guardrails:

  • Never post “no change.” If nothing changed, post a shorter note with a fresh next‑ETA.
  • Collapse duplicates: centralize thread(s) with links, archive stray chatter.

4) SLA Credit Calculator — two ready models with spreadsheet formulas

Decide your policy and paste one of these into a Notion table or a spreadsheet. Fields in [BRACKETS] are your inputs.

Inputs (all models):

  • [MONTHLY_FEE] (e.g., 2000)
  • [TOTAL_DOWNTIME_MINUTES] (e.g., 65)
  • [TOTAL_MINUTES_IN_MONTH] (e.g., 43200)

Derived:

  • Uptime % = 1 - ([TOTAL_DOWNTIME_MINUTES]/[TOTAL_MINUTES_IN_MONTH])

Model A — Uptime bands (tiered credits)

  • Bands:
    • 99.1%–99.98% → 10%
    • 95%–99% → 25%
    • <95% → 50%
  • Spreadsheet formula (credit %):
=IF([Uptime%]&lt;0.95,50, IF([Uptime%]&lt;=0.99,25, IF([Uptime%]&lt;0.9998,10,0)))
  • Credit $: =[MONTHLY_FEE] * [Credit%]

Model B — Per‑interval credits (simple, SMB‑friendly)

  • Parameters: [INTERVAL_MIN]=30, [CREDIT_PER_INTERVAL_PERCENT]=5, [MAX_CREDIT_PERCENT]=50
  • Intervals = CEILING([TOTAL_DOWNTIME_MINUTES]/[INTERVAL_MIN])
  • Credit % = MIN([Intervals]*[CREDIT_PER_INTERVAL_PERCENT], [MAX_CREDIT_PERCENT])
  • Credit $ = =[MONTHLY_FEE] * ([Credit%]/100)

Example (paste into your sheet to validate):

  • Given [MONTHLY_FEE]=2000, [TOTAL_DOWNTIME_MINUTES]=65, [TOTAL_MINUTES_IN_MONTH]=43200 → Uptime ≈ 99.85%
  • Model B: Intervals = CEILING(65/30)=3 → Credit %=15% → Credit $= $300

Implementation tip:

  • Store incident metadata in a sheet/db row: [INCIDENT_ID], [START_AT], [END_AT], [DURATION_MIN], [AFFECTED_COMPONENT], [CREDIT_MODEL], [CREDIT_$].
  • Auto‑email a draft credit note to [BILLING_CONTACT_EMAIL] when an incident is marked Resolved and [TOTAL_DOWNTIME_MINUTES] > 0.

5) Post‑incident report (PIR) — fill‑in template for consistency

Copy this into a new Notion page titled “Post‑Incident Report (PIR) Template.” Use it after every Sev‑1/Sev‑2.


Post‑Incident Report — [INCIDENT_NAME]

Summary

  • Date: [DATE]
  • Duration: [DURATION] (start [START_AT_LOCAL] → end [END_AT_LOCAL])
  • Severity: [SEVERITY]
  • Components affected: [COMPONENTS]
  • Customer impact: [IMPACT_SUMMARY]

Timeline (UTC)

  • [YYYY‑MM‑DD HH:MM] — Detected by [SOURCE]
  • [YYYY‑MM‑DD HH:MM] — Incident opened on Statuspage (link: [INC_LINK])
  • [YYYY‑MM‑DD HH:MM] — [KEY_UPDATE]
  • [YYYY‑MM‑DD HH:MM] — Resolved

Root Cause

  • Primary cause: [ROOT_CAUSE]
  • Contributing factors: [FACTORS]
  • Why not detected earlier: [GAP]

Remediation

  • Fix implemented: [FIX]
  • Validation/monitoring in place: [VALIDATION]
  • Owner: [OWNER]

SLA & Credits

  • Downtime minutes: [TOTAL_DOWNTIME_MINUTES]
  • Credit model used: [CREDIT_MODEL]
  • Calculated credit: [CREDIT_PERCENT]% → $[CREDIT_DOLLARS]
  • Applied on invoice: [INVOICE_MONTH]

Follow‑ups (checklist)

  • Add/adjust monitor to catch [MISSED_SIGNAL]
  • Update runbook section: [SECTION]
  • Backfill tests/alerts by [DATE]
  • Notify affected customers with PIR link by [DATE]

Links

  • Statuspage incident: [INC_LINK]
  • Internal Slack channel: #[INCIDENT_CHANNEL]
  • Logs/dashboards: [OBSERVABILITY_LINKS]

Usage notes:

  • Publish a concise customer‑facing PIR on Statuspage (if applicable); keep this full version internal.
  • Tag with [TAGS] for later search (e.g., “timeouts”, “deploy‑pipeline”).