Episode 2·

Universal Run Log + Slack Alerts in 90 Minutes

Intro

This episode is for solo automation consultants who are tired of learning about broken workflows from client emails. You'll get a complete monitoring system that logs every automation run to Notion and sends smart Slack alerts only when something actually needs your attention.

In This Episode

Jordan shares the exact system he built after an eleven-day silent failure nearly cost him a $40K client. You'll build a universal run log in Notion that captures every workflow execution across your entire stack, then layer on intelligent Slack alerts that respect API rate limits and only fire on failures, performance degradation, and consecutive error patterns. The build includes instrumenting existing workflows with a single webhook call, designing Slack alerts that use threaded replies to stay scannable, and implementing proper retry logic for both Notion and Slack APIs. Everything is designed to work at the scale of solo operators running 30-100 workflows across multiple clients without the complexity or cost of enterprise monitoring tools.

Key Takeaways

  • Set up one Notion database that every automation writes to after each run, using small scalar properties and external payload links to respect API limits
  • Configure Slack alerts to trigger only on failures, p95 duration spikes, and consecutive error patterns - not every successful run - to avoid alert fatigue
  • Implement proper rate limiting and Retry-After handling for both Notion (3 requests/second) and Slack (1 message/second per channel) to prevent silent failures

Timestamps

Companion Resource

Jordan: Someone DM'd me last week — and I'm paraphrasing, but it was basically this: "Jordan, I have twenty-three Zaps, a dozen Make scenarios, and a handful of n8n workflows running across eight clients. How do I know when something breaks?" And my first question back was — how are you finding out about failures right now?

Jordan: The answer was what I expected. "Usually the client tells me."

Jordan: Yeah. That was me eighteen months ago. I had a client — commercial cleaning company, good retainer, solid relationship — and their invoice sync workflow had been silently failing for eleven days. Eleven days. No errors in my inbox, no Slack pings, nothing. The Make scenario was throwing a four-oh-four because their accounting platform changed an endpoint, and the error handler I'd set up was... posting to a Slack channel I'd muted.

Jordan: I found out because the client's bookkeeper emailed me on a Friday afternoon asking why invoices hadn't synced since the second week of March. That's the kind of email that makes your stomach drop. Not because the fix is hard — the fix took eight minutes — but because you've been billing a retainer for a service that wasn't running. And you had no idea.

Jordan: That weekend I built the system we're building today. One Notion database. Every workflow, every run, every status — logged to a single table. And Slack alerts that only fire when something actually needs my attention. Not every event. Not every success. Just the failures, the slowdowns, and the patterns that mean something's about to break.

Jordan: Here's what you're walking away with today. A universal run log — one Notion database that every workflow in your stack writes to after every single run. Plus a Slack alerting layer that respects rate limits and only pings you on failures, duration spikes, and consecutive error patterns. The whole build takes roughly ninety minutes, it works across Make, Zapier, and n8n, and once it's live, you will never learn about a broken workflow from a client email again. I'm Jordan. This is Headcount Zero. Let's build it.

Jordan: Okay, so before we touch a single database property, I want to explain why the obvious approach — just post every event to Slack — is the wrong move. Because that's what I tried first. And it's what most people try first.

Jordan: After that invoice sync disaster, I set up error handlers on every Make scenario to post to a dedicated Slack channel. Hashtag-automation-alerts. Felt great for about a week. Then I had a Tuesday where a client's CRM sync ran two hundred and forty times — it triggers on every contact update — and about thirty of those threw transient timeouts that resolved on retry. My alerts channel had thirty messages in ten minutes. You know what I did? I muted the channel. Right back where I started.

Jordan: And here's the technical reason this doesn't scale. Slack enforces roughly one message per second per channel. That's not a suggestion — it's a hard limit. Zapier actually documents this in their help center because their users hit it constantly. You exceed that rate, you get HTTP four-twenty-nine responses, and your alerts start failing silently. So now your alerting system needs its own alerting system. That's stack sprawl. That's the opposite of what we want.

Jordan: The other trap — and this one's sneaky — is thinking you can just read your Slack history later to reconstruct what happened. Maybe you build a little dashboard that pulls from conversations dot history. Bad idea. Last May, Slack changed the rate limits for non-Marketplace apps. If you're running a custom internal bot — which most of us are — conversations dot history and conversations dot replies dropped to one request per minute, fifteen messages per response, for any newly created or newly installed app. Internal customer-built apps in your own workspace still get the higher Tier Three limits, but if you're distributing anything to clients or reinstalling frequently, you're looking at severe read throttling. The point is — don't use Slack as your database. Use Slack as your notification layer. Your database is Notion.

Jordan: So here's the design. One Notion database called Run Log. Every automation in your stack — Make scenarios, Zaps, n8n workflows, custom scripts, all of it — writes one row to this database after every run. Success or failure. And the key design principle is this: keep the write small.

Jordan: Notion's API gives you an average of three requests per second per integration. That's the rate limit. You can burst above it briefly, but sustained, you're designing for three per second. And each request has a payload cap of five hundred kilobytes. So you do not want to stuff error logs, full request bodies, or debug traces into the Notion write. You want small scalar properties — text, numbers, dates, selects — and a URL that points to the full payload stored somewhere cheap, like S3 or even a simple JSON file in Google Cloud Storage.

Jordan: The properties I use — and these are in the template on the Resources page if you want to copy them directly — are: workflow name as the title, run ID as rich text — and this doubles as your idempotency key so you never get duplicate rows — client name, started at as a date, duration in milliseconds as a number, status as a select with three options: ok, warn, and fail. Then error code as rich text, payload link as a URL pointing to the full request and response body, and rerun URL — also a URL — that links directly to wherever you trigger a manual rerun in your platform.

Jordan: Oh — and I added two optional fields later that I wish I'd had from day one. Environment — prod, staging, dev — as a select, and cost in cents as a number. That cost field is gold when you're running token-heavy AI workflows. You can filter your run log by client, sum the cost column, and know exactly what each client's automations cost you this month. Ties right back to the usage billing system we built in episode eight.

Jordan: The actual API call is a single POST to Notion's pages endpoint. You're creating a page in the database with those properties. The whole payload is maybe two kilobytes. Well under the five hundred KB cap, no arrays anywhere near the hundred-element limit. One request, done.

Jordan: Now — if you want human-readable details on the Notion page itself, like a formatted error summary or a step-by-step trace, you can append blocks to the page after creation. But there are constraints. Max one hundred child blocks per append request, and only two levels of nesting per request. Thomas Frank has a great write-up on handling these limits if you're doing anything complex. For our purposes, I usually skip the block append entirely on the write path. The run log row plus the payload link gives me everything I need. I only append blocks manually when I'm debugging something specific.

Jordan: So how does every workflow actually write to this database? One webhook. That's it. You set up a single Make scenario — or a Zap, or an n8n workflow — that accepts a POST request with a small JSON body and writes it to Notion. Then at the end of every automation you build, you add one final step: HTTP request to that webhook with the run summary.

Jordan: In Make, that's an HTTP module at the end of your scenario. In Zapier, it's a Webhooks by Zapier step. In n8n, it's an HTTP Request node. The payload is always the same shape — workflow name, run ID, client, timestamps, status, error code if any, and the two URLs. Takes roughly eight minutes to instrument an existing workflow. For new builds, I just copy the module from a template and map the variables. Maybe three minutes.

Jordan: The run ID is important. Generate a unique ID per execution — most platforms give you one natively. Make has the execution ID, Zapier has the Zap history ID, n8n has the execution ID. Use that as your run ID. If your webhook fires twice because of a retry, the run ID lets you deduplicate on the Notion side. You're not going to get duplicate rows polluting your log.

Jordan: Now the Slack layer. And this is where the design matters most, because the instinct is to alert on everything and the correct move is to alert on almost nothing.

Jordan: Three conditions trigger a Slack message. First — any run with status equals fail. That's non-negotiable. If something broke, I want to know. Second — if the p95 duration across the last ten runs of a specific workflow crosses a threshold I've set. This catches the slow degradation that precedes a full failure. A workflow that normally takes three seconds but has been averaging eight seconds for the last ten runs? Something's wrong. Maybe an API is throttling you, maybe a database query is timing out. You want to catch that before it becomes a hard failure. Third — more than three consecutive failures on the same workflow within ten minutes. That's not a transient blip. That's a pattern.

Jordan: Everything else — every successful run, every normal-duration execution — writes silently to Notion and never touches Slack. This is the difference between an alerting system you actually use and one you mute by Thursday.

Jordan: The Slack implementation has a trick to it that I really like. You post the parent message — the compact one-liner — using chat dot postMessage through the Slack Web API. That gives you back a timestamp, the ts value, in the response. You save that. Then you post the detailed breakdown — error codes, rerun links, payload links — as a threaded reply using an incoming webhook with thread underscore ts set to that parent timestamp. One clean line in the channel. All the details tucked into the thread. Your alerts channel stays scannable.

Jordan: And the rate safety is straightforward. You're posting at most one message per second per channel — that's Slack's limit. If you get a four-twenty-nine, the response includes a Retry-After header telling you exactly how many seconds to wait. You wait that long, retry once, and move on. The critical thing — and Slack's own docs emphasize this — is that the backoff is scoped to that specific method for that specific workspace. Your other API calls, your other workspaces, none of that is affected. So your retry logic should be per-method, not global. Don't freeze your entire automation stack because one Slack post got rate-limited.

Jordan: Now — I can already hear some of you thinking this. "Jordan, this isn't real observability. If I need actual monitoring, I should be using Datadog or Papertrail or Grafana." And... yeah. You're not wrong.

Jordan: This is not a replacement for a production observability stack. If you're running hundreds of thousands of executions a day, if you need distributed tracing, if you need SLOs and uptime dashboards — Notion and Slack are not the answer. Slack's own rate limit documentation explicitly points you toward third-party logging services for high-volume use cases.

Jordan: But here's who this is for. This is for the person running thirty to a hundred workflows across five to twelve clients. You're not Datadog's customer. You're not going to pay two hundred dollars a month for a monitoring platform when your entire tool stack costs three hundred. What you need is a searchable audit trail and a way to know when something breaks before your client does. That's it. That's the job.

Jordan: And the beautiful thing about this design is that it's swappable. The thresholding logic, the Retry-After handling, the parent-plus-thread Slack pattern — all of that stays the same if you later swap Notion for Postgres or Supabase and add a Grafana dashboard on top. You're not building throwaway infrastructure. You're building the alerting logic that scales with you, on a storage layer that's free and fast to ship right now.

Jordan: So let me walk through what this looks like in practice. It's Wednesday morning. I'm working on a new build for a client. My phone buzzes — Slack notification. I glance at it. One line: "invoice sync — fail — eight-point-four seconds — Acme Corp — run e-one-b-two-c-three." I tap into the thread. Error code: HTTP five-oh-four. There's a link to the full payload and a link to rerun the workflow.

Jordan: I open the payload, see that Acme's accounting API timed out, check their status page — yep, they're having an incident. I don't need to do anything yet. If it fails three more times in ten minutes, I'll get a second alert for the consecutive failure pattern. Otherwise, the next scheduled run will probably succeed on its own.

Jordan: Total time spent: forty-five seconds. And I knew about it before Acme's bookkeeper did. That's the whole point of this build. Not dashboards. Not graphs. Just — know before the client knows.

Jordan: If you want to skip the setup from scratch, the Run Log template, the Slack app manifest, the Block Kit payloads, and the Make, Zapier, and n8n snippets are all on the Resources page. It's the exact database schema and code blocks we just walked through. Paste them in, fill in your tokens and channel IDs, and you're live.

Jordan: So remember that DM I mentioned at the top? Twenty-three Zaps, a dozen Make scenarios, eight clients, no idea when something breaks. That person shipped this exact build last weekend. Took them about an hour and a half. They messaged me Monday morning — and I'll read it to you because it's the best summary of what this system does. They said: "First Slack alert came in Sunday at two AM. A client's Shopify webhook had changed its payload format. I fixed it before coffee. The client never knew."

Jordan: Before coffee. That's the bar. Not a monitoring dashboard you check once a week. Not a spreadsheet you update manually. Just — your phone buzzes, you glance at one line, and you know exactly what broke, for which client, and how to fix it. Everything else logs silently to Notion where you can search it, filter it, and prove to clients that their automations ran four hundred and twelve times last month without a single failure.

Jordan: Here's what I want you to do this week. Pick one workflow — your most critical one, the one that would be the worst client email if it failed silently — and instrument it. One webhook, one Notion write, one Slack alert on failure. That's it. Don't try to instrument everything at once. Start with the workflow that scares you most. Once you see that first alert come through and you realize you caught a failure before anyone else knew, you'll instrument the rest by the weekend.

Jordan: That's it for today. I'm Jordan. This is Headcount Zero. Go build something.

automation monitoringSlack alertsNotion databaseworkflow loggingerror handlingMake.comZapiern8nAPI rate limitssolo consulting