Solo SLA Loop Starter Kit (Copy‑Paste Templates for a Human‑in‑the‑Loop Incident Loop)
Copy‑paste templates to stand up a minimal, human‑in‑the‑loop SLA communication loop: a Notion ops manual, Statuspage API updater snippets (with approval gate), Slack cadence workflow, a two‑model credit calculator, and a post‑incident report template. Built for solo operators who need reliability signals without hiring.
How to use this kit:
- Copy each section into your tools (Notion, Slack, Make/Zapier, your terminal/spoke service).
- Replace every field in [BRACKETS] with your details. Keep the suggested defaults unless you have stronger reasons.
- Test in a sandbox/private page first. Do not auto‑publish public updates in the first [APPROVAL_WINDOW_MINUTES] minutes — approve manually.
- Ship the loop: Monitor → Draft update (private) → Slack cadence reminders → Approve→Publish → Auto‑credit → Post‑incident report.
Suggested defaults you can keep today:
- [UPTIME_SLO_PERCENT]=99.9
- [UPDATE_CADENCE_MINUTES]=20 or 30
- [APPROVAL_WINDOW_MINUTES]=15
- Credit model: 5% per 30 minutes of downtime, capped at 50% (see Calculator section).
1) Notion Ops Manual — SLA definitions, routing, and comms templates
Copy this whole block into a Notion page called “Ops Manual → SLAs & Incidents.” Replace [BRACKETS].
SLA Policy for [SERVICE_NAME]
Scope
- Covered service(s): [SERVICE_SCOPE]
- Environments: [ENVIRONMENTS] (e.g., Production only)
- Business hours (local time [TIMEZONE]): [OFFICE_HOURS_START]–[OFFICE_HOURS_END], Mon–Fri
- Off‑hours policy: [OFF_HOURS_POLICY] (e.g., best‑effort with slower updates)
Availability Target (Uptime SLO)
- Target: [UPTIME_SLO_PERCENT]% monthly
- Minutes in month: [MINUTES_IN_MONTH] (e.g., 43,200 for 30 days)
- Allowed downtime this month (mins) = ROUND((1 - [UPTIME_SLO_PERCENT]/100) * [MINUTES_IN_MONTH], 1)
Support SLAs
- Time to First Response (TTFR): within [TTFR_MINUTES] minutes via [PRIMARY_CHANNELS] during business hours
- Next Response Time: within [NEXT_RESPONSE_MINUTES] minutes while ticket is open (business hours)
- Resolution goal (not a guarantee): [TARGET_MTTR_HOURS] hours for Sev‑2+, [TARGET_MTTR_DAYS] days for Sev‑3
Severity Levels (tie to customer impact)
- Sev‑1 Critical: [CRITERIA_SEV1] (e.g., full outage for >[SEV1_MINUTES] mins or data loss)
- Sev‑2 Major: [CRITERIA_SEV2] (e.g., degraded core function; workarounds exist)
- Sev‑3 Minor: [CRITERIA_SEV3] (e.g., minor feature or narrow subset)
Components and Mapping
- Statuspage Page ID: [STATUSPAGE_PAGE_ID]
- Components:
- [COMPONENT_NAME_1] → [COMPONENT_ID_1]
- [COMPONENT_NAME_2] → [COMPONENT_ID_2]
Alert Routing
- Monitor source(s): [MONITOR_TOOL] (checks: [CHECK_TYPES])
- Open incident if: [OPEN_CRITERIA] (e.g., 2 consecutive failures across 2 regions)
- Route to Slack channel: #[INCIDENT_CHANNEL_PREFIX]-[DATE]
- Escalation: [ESCALATION_RULE] (e.g., after [ESCALATE_MINUTES] mins, call [PHONE_NUMBER])
External Communication Runbook
- Update cadence during active incidents: every [UPDATE_CADENCE_MINUTES] minutes with new info or next‑ETA.
- First 10–15 mins: draft internally, do not auto‑publish. Require manual approval.
- Message templates (fill and reuse):
- Investigating: “We’re investigating increased [SYMPTOM] affecting [AFFECTED_COMPONENTS]. Next update by [NEXT_UPDATE_ETA].”
- Identified: “We’ve identified a cause related to [CAUSE_HINT]. Mitigation in progress. Next update by [NEXT_UPDATE_ETA].”
- Monitoring: “A fix has been rolled out. We’re monitoring recovery. Next update by [NEXT_UPDATE_ETA].”
- Resolved: “This incident is resolved. Impact: [IMPACT_SUMMARY]. Duration: [DURATION]. Root cause: [ROOT_CAUSE_ONE_LINE].”
SLA Credits (Policy Reference)
- Model A (Uptime bands):
- 99.1%–99.98%: 10% credit; 95%–99%: 25% credit; <95%: 50% credit of monthly fee for affected service.
- Model B (Per‑interval): 5% credit per 30 minutes of downtime, capped at 50% per month.
- Applied on request within [CREDIT_REQUEST_WINDOW_DAYS] days and against future invoices only.
Data Retention
- Retain incident timelines & post‑incident reports for [RETENTION_MONTHS] months in Notion.
Ownership
- Incident Commander (IC): [PRIMARY_OWNER_NAME] ([PRIMARY_OWNER_CONTACT])
- Delegate: [DELEGATE_NAME] ([DELEGATE_CONTACT])
2) Statuspage updater snippets — curl, Make, and Zapier (with approval gate)
Use these snippets to integrate a monitor → (draft) → approve → publish flow. Always store your key as a secret and use the Authorization header.
Key notes before you paste:
- Auth header format:
Authorization: OAuth [STATUSPAGE_API_KEY]. - As of June 30, 2026, query‑param API keys are deprecated — use the header above.
- Prefer: build the JSON → post to Slack for approval → on ✅ approval, call the API.
Environment placeholders:
- [STATUSPAGE_API_KEY] (store in secret manager)
- [STATUSPAGE_PAGE_ID]
- [INCIDENT_ID]
- [COMPONENT_ID_1], [COMPONENT_ID_2]
- [INCIDENT_NAME], [PUBLIC_MESSAGE], [NEXT_UPDATE_ETA]
A) Create Incident (no notifications yet)
curl -X POST \
https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents \
-H 'Content-Type: application/json' \
-H 'Authorization: OAuth [STATUSPAGE_API_KEY]' \
-d '{
"incident": {
"name": "[INCIDENT_NAME]",
"status": "investigating",
"impact_override": "[none|minor|major|critical]",
"deliver_notifications": false,
"body": "[PUBLIC_MESSAGE]",
"component_ids": ["[COMPONENT_ID_1]", "[COMPONENT_ID_2]"]
}
}'
B) Append Update (identified/monitoring) — notify subscribers
curl -X POST \
https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents/[INCIDENT_ID]/incident_updates \
-H 'Content-Type: application/json' \
-H 'Authorization: OAuth [STATUSPAGE_API_KEY]' \
-d '{
"incident_update": {
"status": "[investigating|identified|monitoring|resolved]",
"deliver_notifications": true,
"body": "[PUBLIC_MESSAGE] Next update by [NEXT_UPDATE_ETA]."
}
}'
C) Resolve Incident — final public message
curl -X POST \
https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents/[INCIDENT_ID]/incident_updates \
-H 'Content-Type: application/json' \
-H 'Authorization: OAuth [STATUSPAGE_API_KEY]' \
-d '{
"incident_update": {
"status": "resolved",
"deliver_notifications": true,
"body": "Resolved. Impact: [IMPACT_SUMMARY]. Duration: [DURATION]. Root cause: [ROOT_CAUSE_ONE_LINE]."
}
}'
Optional: set component statuses explicitly (instead of just associating components) by using a components object when creating the incident:
{
"incident": {
"name": "[INCIDENT_NAME]",
"status": "investigating",
"impact_override": "major",
"deliver_notifications": false,
"body": "[PUBLIC_MESSAGE]",
"components": { "[COMPONENT_ID_1]": "major_outage" }
}
}
Make (Integromat) HTTP module values:
- URL:
https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents - Method: POST
- Headers:
Content-Type: application/json,Authorization: OAuth [STATUSPAGE_API_KEY] - Body (raw): paste the JSON from A) with mapped fields from your monitor alert
- Next step: Slack → Post message to #[INCIDENTS_CHANNEL] with the JSON preview + “React ✅ to publish”
- Filter: only proceed to HTTP “Append Update” step if Slack reaction contains ✅ within [APPROVAL_WINDOW_MINUTES]
Zapier (Webhooks by Zapier → Slack):
- Trigger: [MONITOR_ALERT_TRIGGER]
- Action 1: Code step (build
incidentJSON from trigger) - Action 2: Slack → Send a message to #[INCIDENTS_CHANNEL] including the JSON preview
- Action 3 (Path A, if approved): Webhooks by Zapier → Custom Request
- Method: POST
- URL:
https://api.statuspage.io/v1/pages/[STATUSPAGE_PAGE_ID]/incidents - Data: JSON from Action 1
- Headers:
Authorization: OAuth [STATUSPAGE_API_KEY],Content-Type: application/json
Security & safety:
- Keep
deliver_notifications: falseuntil human approval. - Log
incident_idresponses in your DB/sheet: [STORAGE_LOCATION] for later updates. - Rate‑limit and de‑dup: only open a new incident if one isn’t already open for the same component/symptom in the last [DEDUP_WINDOW_MINUTES] minutes.
3) Slack incident channel workflow — cadence enforcement without spam
Create or reuse a dedicated channel per incident. Use one of these patterns and pin the block below.
Channel naming:
- #[INC]-[YYYYMMDD]-[SHORT_SLUG] (e.g., #inc-20260529-api-timeouts)
Pinned message template:
Incident: [INCIDENT_NAME]
Opened: [OPENED_AT_LOCAL]
IC: [INCIDENT_COMMANDER] | Delegate: [DELEGATE]
Update cadence: every [UPDATE_CADENCE_MINUTES] minutes until resolution
First 10–15 mins: draft internally; do not auto‑publish
Latest public status: [LINK_TO_STATUSPAGE_INCIDENT]
Next update due by: [NEXT_UPDATE_ETA]
Slack quick commands (fastest setup):
- Start cadence (20m):
/remind #inc-… "Post public update (what changed + next ETA)." every 20 minutes - Start cadence (30m):
/remind #inc-… "Post public update (what changed + next ETA)." every 30 minutes - Stop cadence when resolved:
/remind list→ “Mark as complete”
Workflow Builder (button‑start with approval signal):
- Trigger: “Shortcut” named “Start Incident Cadence”. Inputs: [UPDATE_CADENCE_MINUTES].
- Step: Post a message to the channel with the pinned template.
- Step: Add a Delay for [UPDATE_CADENCE_MINUTES] minutes.
- Step: Post “Reminder: publish an update only if there’s new info. Else push the next ETA.”
- Loop: Repeat steps 3–4 until someone posts “/resolve” or adds the 🟢 emoji to the pinned message.
Copy‑ready update blocks (paste, then customize):
- Investigating: “We’re investigating increased [SYMPTOM] impacting [COMPONENTS/USERS]. Next update by [NEXT_UPDATE_ETA].”
- Identified: “Cause identified ([CAUSE_HINT]). Mitigating now. Next update by [NEXT_UPDATE_ETA].”
- Monitoring: “Fix deployed. Monitoring metrics and user reports. Next update by [NEXT_UPDATE_ETA].”
- Resolved: “Resolved. Impact: [IMPACT_SUMMARY]. Duration: [DURATION]. Root cause: [ROOT_CAUSE_ONE_LINE].”
Noise guardrails:
- Never post “no change.” If nothing changed, post a shorter note with a fresh next‑ETA.
- Collapse duplicates: centralize thread(s) with links, archive stray chatter.
4) SLA Credit Calculator — two ready models with spreadsheet formulas
Decide your policy and paste one of these into a Notion table or a spreadsheet. Fields in [BRACKETS] are your inputs.
Inputs (all models):
- [MONTHLY_FEE] (e.g., 2000)
- [TOTAL_DOWNTIME_MINUTES] (e.g., 65)
- [TOTAL_MINUTES_IN_MONTH] (e.g., 43200)
Derived:
- Uptime % =
1 - ([TOTAL_DOWNTIME_MINUTES]/[TOTAL_MINUTES_IN_MONTH])
Model A — Uptime bands (tiered credits)
- Bands:
- 99.1%–99.98% → 10%
- 95%–99% → 25%
- <95% → 50%
- Spreadsheet formula (credit %):
=IF([Uptime%]<0.95,50, IF([Uptime%]<=0.99,25, IF([Uptime%]<0.9998,10,0)))
- Credit $:
=[MONTHLY_FEE] * [Credit%]
Model B — Per‑interval credits (simple, SMB‑friendly)
- Parameters: [INTERVAL_MIN]=30, [CREDIT_PER_INTERVAL_PERCENT]=5, [MAX_CREDIT_PERCENT]=50
- Intervals =
CEILING([TOTAL_DOWNTIME_MINUTES]/[INTERVAL_MIN]) - Credit % =
MIN([Intervals]*[CREDIT_PER_INTERVAL_PERCENT], [MAX_CREDIT_PERCENT]) - Credit $ =
=[MONTHLY_FEE] * ([Credit%]/100)
Example (paste into your sheet to validate):
- Given [MONTHLY_FEE]=2000, [TOTAL_DOWNTIME_MINUTES]=65, [TOTAL_MINUTES_IN_MONTH]=43200 → Uptime ≈ 99.85%
- Model B: Intervals = CEILING(65/30)=3 → Credit %=15% → Credit $= $300
Implementation tip:
- Store incident metadata in a sheet/db row: [INCIDENT_ID], [START_AT], [END_AT], [DURATION_MIN], [AFFECTED_COMPONENT], [CREDIT_MODEL], [CREDIT_$].
- Auto‑email a draft credit note to [BILLING_CONTACT_EMAIL] when an incident is marked Resolved and [TOTAL_DOWNTIME_MINUTES] > 0.
5) Post‑incident report (PIR) — fill‑in template for consistency
Copy this into a new Notion page titled “Post‑Incident Report (PIR) Template.” Use it after every Sev‑1/Sev‑2.
Post‑Incident Report — [INCIDENT_NAME]
Summary
- Date: [DATE]
- Duration: [DURATION] (start [START_AT_LOCAL] → end [END_AT_LOCAL])
- Severity: [SEVERITY]
- Components affected: [COMPONENTS]
- Customer impact: [IMPACT_SUMMARY]
Timeline (UTC)
- [YYYY‑MM‑DD HH:MM] — Detected by [SOURCE]
- [YYYY‑MM‑DD HH:MM] — Incident opened on Statuspage (link: [INC_LINK])
- [YYYY‑MM‑DD HH:MM] — [KEY_UPDATE]
- [YYYY‑MM‑DD HH:MM] — Resolved
Root Cause
- Primary cause: [ROOT_CAUSE]
- Contributing factors: [FACTORS]
- Why not detected earlier: [GAP]
Remediation
- Fix implemented: [FIX]
- Validation/monitoring in place: [VALIDATION]
- Owner: [OWNER]
SLA & Credits
- Downtime minutes: [TOTAL_DOWNTIME_MINUTES]
- Credit model used: [CREDIT_MODEL]
- Calculated credit: [CREDIT_PERCENT]% → $[CREDIT_DOLLARS]
- Applied on invoice: [INVOICE_MONTH]
Follow‑ups (checklist)
- Add/adjust monitor to catch [MISSED_SIGNAL]
- Update runbook section: [SECTION]
- Backfill tests/alerts by [DATE]
- Notify affected customers with PIR link by [DATE]
Links
- Statuspage incident: [INC_LINK]
- Internal Slack channel: #[INCIDENT_CHANNEL]
- Logs/dashboards: [OBSERVABILITY_LINKS]
Usage notes:
- Publish a concise customer‑facing PIR on Statuspage (if applicable); keep this full version internal.
- Tag with [TAGS] for later search (e.g., “timeouts”, “deploy‑pipeline”).