ChecklistApril 17, 2026

Production‑Safe Automations Checklist: Idempotency, Retries, DLQ, and Circuit Breakers (Zapier · Make · n8n)

A paste‑ready, step‑by‑step checklist to make Zapier, Make, and n8n workflows production‑safe with idempotency keys, explicit retries, a DLQ you can replay, and Slack‑aware circuit breakers. Built for solo automation operators shipping client‑critical flows without hiring.

From EpisodeFour Production Controls That Let You Sleep Through Client Traffic Spikes

Ship workflows that survive real traffic. Work top‑down: add an idempotency gate, make retries explicit, route failures to a dead‑letter queue (DLQ) you can replay, and cap Slack output with a circuit breaker. Follow these steps in order and paste the exact snippets where noted.

Create a shared idempotency store (SQL) for all tools

Stand up a single table any workflow can write to before doing side effects. Insert-once decides if a run is new or a duplicate.

-- Postgres
CREATE TABLE IF NOT EXISTS idempotency_keys (
  key TEXT PRIMARY KEY,
  source TEXT NOT NULL,
  run_ref TEXT,
  seen_at TIMESTAMPTZ DEFAULT now()
);
-- Returns 1 row if new; 0 rows if duplicate
INSERT INTO idempotency_keys (key, source, run_ref)
VALUES ($1, $2, $3)
ON CONFLICT DO NOTHING
RETURNING 1;

-- MySQL/MariaDB
CREATE TABLE IF NOT EXISTS idempotency_keys (
  `key` VARCHAR(255) PRIMARY KEY,
  `source` VARCHAR(120) NOT NULL,
  `run_ref` VARCHAR(255),
  `seen_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- New =&gt; affected_rows = 1; Duplicate =&gt; 0
INSERT IGNORE INTO idempotency_keys (`key`, `source`, `run_ref`)
VALUES (?, ?, ?);

Key rule: write the idempotency key before any external API call; abort the run when the insert indicates a duplicate.

Standardize your idempotency key algorithm

Derive a stable, collision‑resistant key from provider event IDs or normalized payload fields. Persist the key plus a run reference for audits.

// Pseudocode (use in Zapier Code, Make function, or n8n Function)
const stable = {
  provider: input.provider,               // e.g., &quot;stripe&quot;
  event_id: input.event_id || null,       // prefer official event IDs
  path: input.path?.toLowerCase(),        // e.g., &quot;/orders/create&quot;
  natural_id: String(input.order_id||&quot;&quot;).trim(),
};
const key = sha256(JSON.stringify(stable));
return { key, run_ref: `${stable.provider}:${stable.event_id||stable.natural_id}` };

3
Zapier — set the Autoreplay override deliberately
In the Zap editor, open the left‑sidebar gear → Advanced settings → Autoreplay. Pick one: Use account setting, Always replay, or Never replay. Note: publishing a Zap with custom error handlers disables Autoreplay for that Zap; plan explicit retries and DLQ instead.

Zapier — add step‑level error handlers and route to DLQ

On critical actions (HTTP, AI, DB), open step ••• → Add error handler. Pattern: if 429/5xx → delay + retry; else POST error context to your DLQ endpoint, then Stop run.

// Webhooks by Zapier → POST body to DLQ
{
  &quot;platform&quot;: &quot;zapier&quot;,
  &quot;workflow&quot;: &quot;[ZAP NAME]&quot;,
  &quot;run_id&quot;: &quot;{{zap_meta_human_now}}|{{zap_run_id}}&quot;,
  &quot;step&quot;: &quot;{{zap_step_name}}&quot;,
  &quot;error_type&quot;: &quot;{{zap_error_type}}&quot;,
  &quot;error_message&quot;: &quot;{{zap_error_message}}&quot;,
  &quot;payload&quot;: {{steps.trigger}},
  &quot;replay_url&quot;: &quot;https://zapier.com/app/history/[RUN_ID]&quot;
}

5
Zapier — centralize failures with Zapier Manager → Webhook (DLQ)
Create a Zap: Trigger = Zapier Manager ‘New Zap Error’ → Action = Webhooks by Zapier (POST to your DLQ API). Store the raw error plus replay URL. Replays: you can replay entire runs; step‑by‑step replay is limited when error handlers are present.
6
Make — enable Incomplete executions (DLQ‑like holding area)
Open Scenario settings → check Store incomplete executions. This preserves failed bundles for inspection and safe replay instead of dropping them or looping indefinitely.
7
Make — attach Break handlers and capture error bundles
Right‑click critical modules → Add error handler → Break. Save the error to Incomplete executions and include key fields in the error bundle. After a fix, you can auto‑complete or manually replay from the failed module.
8
Make — verify exponential backoff and rate‑limit handling
Transient errors (timeouts, connection resets, RateLimitError/HTTP 429) auto‑retry with exponential backoff; logic/mapping errors do not. If you hit 429s, process sequentially, add Sleep between requests, or batch where possible.
9
Make — rehearse Run replay for safe recovery
Open History → select a failed run → Run replay. Replays consume credits and reuse the stored trigger output; confirm downstream idempotency so replays don’t duplicate side effects.
10
n8n — set Retry On Fail and On Error on each critical node
Open a node → Settings → enable Retry On Fail (cap attempts; add backoff) and choose On Error behavior (Stop or Continue using error output). Use conservative caps on external writes and route errors to the Error Workflow.
11
n8n — configure a global Error Workflow (DLQ sink)
Workflow Settings → Error Workflow = [Error Handler]. In the handler, log execution.id/url, node, error message, and your idempotency key to a persistent DLQ store so you can replay in a controlled way later.

n8n — run Queue mode with worker concurrency

Use Redis‑backed queue mode to avoid pile‑ups and to control parallelism. Example docker‑compose:

services:
  redis:
    image: redis:6
  n8n:
    image: n8nio/n8n:latest
    environment:
      - N8N_EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
    depends_on: [redis]
  worker:
    image: n8nio/n8n:latest
    command: n8n worker --concurrency=5
    environment:
      - N8N_EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
    depends_on: [redis]

Start with low concurrency on write‑heavy flows; raise carefully once success rate ≥ 93%.

13
n8n — reply 2xx early on webhooks to cut duplicates
Place Respond to Webhook immediately after Webhook to acknowledge fast, then queue background work. If the workflow errors before responding, n8n returns 500 and providers may retry.
```
// Respond to Webhook → 200 OK body
{ &quot;ok&quot;: true, &quot;received&quot;: {{$json.event_id || $json.id}} }
```
14
All tools — add a lightweight idempotency gate in‑flow
Before external writes, check the key; if duplicate, short‑circuit. n8n: Remove Duplicates node (pre‑filter) + Redis/DataStore. Make: Data store or SQL. Zapier: Storage, Tables, or external KV/DB.
```
// Gate pattern (pseudocode)
const inserted = upsertIdempotency(key, source, runRef); // returns true if new
if (!inserted) return { skipped: true, reason: &quot;duplicate&quot; };
// proceed with side effects
```

Slack — batch, honor Retry‑After, and add a circuit breaker

Keep posts ≤ 1 message/second per channel and on 429 wait the Retry‑After seconds before retrying. Add a breaker: if error rate or latency crosses a threshold, pause non‑critical posts and send a single summarized alert.

// Minimal 429 handling + per‑channel throttle (Node.js pseudo)
async function postWithGuard(channel, payload) {
  await rateLimiter.take(`slack:${channel}`, 1000); // 1 msg/sec/channel
  const res = await fetch(&quot;https://slack.com/api/chat.postMessage&quot;, {
    method: &quot;POST&quot;,
    headers: { Authorization: `Bearer ${SLACK_TOKEN}`, &quot;Content-Type&quot;: &quot;application/json&quot; },
    body: JSON.stringify({ channel, ...payload })
  });
  if (res.status === 429) {
    const wait = Number(res.headers.get(&quot;retry-after&quot;) || 1);
    await sleep(wait * 1000);
    return queue.retry(channel, payload); // requeue once with backoff
  }
}
// Circuit breaker input → Slack summary
const alert = {
  &quot;text&quot;: &quot;🚨 Circuit breaker tripped&quot;,
  &quot;blocks&quot;: [
    {&quot;type&quot;:&quot;section&quot;,&quot;text&quot;:{&quot;type&quot;:&quot;mrkdwn&quot;,&quot;text&quot;:&quot;*Breaker: ON* — suppressed non‑critical posts&quot;}},
    {&quot;type&quot;:&quot;context&quot;,&quot;elements&quot;:[{&quot;type&quot;:&quot;mrkdwn&quot;,&quot;text&quot;:&quot;error_rate=12% | p95=11.2s | window=5m&quot;}]}
  ]
};

Use one status message per channel per second maximum; burst cautiously and always obey Retry‑After on 429s.

Define your DLQ record schema and replay runbook

Log enough to fix and reprocess confidently, then practice the replay path on each tool.

{
  &quot;id&quot;: &quot;uuid&quot;,
  &quot;platform&quot;: &quot;zapier|make|n8n&quot;,
  &quot;workflow&quot;: &quot;[name]&quot;,
  &quot;run_id&quot;: &quot;[provider-run-id]&quot;,
  &quot;step&quot;: &quot;[failing-step]&quot;,
  &quot;error_type&quot;: &quot;[class]&quot;,
  &quot;error_message&quot;: &quot;[message]&quot;,
  &quot;idempotency_key&quot;: &quot;[key]&quot;,
  &quot;payload&quot;: {&quot;...&quot;: &quot;raw input excerpt&quot;},
  &quot;attempts&quot;: 3,
  &quot;first_seen_at&quot;: &quot;2026-04-17T16:42:00Z&quot;,
  &quot;last_seen_at&quot;: &quot;2026-04-17T16:43:12Z&quot;,
  &quot;replay_url&quot;: &quot;[native-history-or-run-link]&quot;,
  &quot;status&quot;: &quot;queued|replayed|resolved&quot;
}

Replay steps: Zapier (History → replay run), Make (History → Run replay on stored bundle), n8n (requeue item or trigger a targeted replay workflow with the captured payload).