Episode 6·

Never Get Surprised by AI Bills Again: Throttle + Kill Switch

Intro

For solo operators scaling AI usage across multiple clients who need automated cost controls that prevent surprise overages without manual babysitting. You'll walk away with a working system that meters spend per client, throttles non-critical flows, and kills automations before they spike your bills.

In This Episode

Jordan walks through the four-layer guardrail system he built after a non-critical Zap burned through $230 in Zapier overages in one month. You'll see how to build a cost ledger in Google Sheets that tracks tokens, tasks, and credits per client across OpenAI, Anthropic, Zapier, and Make. Then implement Slack alerts at 60, 80, and 100 percent of budget, auto-throttle rules using Delay After Queue and n8n concurrency caps, and kill switches that use Zapier Manager, Make's Scenarios API, and n8n's workflow activation endpoints to shut down non-critical automations before they eat margin. The episode includes specific rate limits, pricing references, and a complete template pack with alert formulas and platform-specific toggle recipes.

Key Takeaways

  • Build a unified cost ledger that tracks LLM tokens, Zapier tasks, and Make credits per client with automated Slack alerts at 60/80/100% of budget to catch overages before they hit your invoice
  • Implement throttling using Zapier's Delay After Queue, n8n's N8N_CONCURRENCY_PRODUCTION_LIMIT, and Make's built-in retry backoff to stay under platform rate limits without manual intervention
  • Set up kill switches using Zapier Manager, Make's /scenarios/{id}/stop API, and n8n's Activate/Deactivate operations to automatically disable non-critical automations when usage hits 100% with one-click re-enable

Timestamps

Companion Resource

  • Zapier Help Center

    help.zapier.com

    • - Zapier Tables steps in Zaps are limited to 450 requests per 60 seconds and 150 requests per 5 seconds (per Zap–table combination).
  • Zapier Pricing

    zapier.com

    • - Zapier switches to pay‑per‑task at 1.25× the base task price once a plan’s monthly task limit is exceeded.
  • Zapier Help: Zap limits

    help.zapier.com

    • - Zapier documents a 20,000 requests per 5 minutes limit for instant triggers per user; polling triggers on Free/Trial are held if exceeding 200 requests per 10 minutes per Zap.
  • Zapier Help: Add delays to Zaps

    help.zapier.com

    • - Delay After Queue serializes Zap runs with a configurable wait and is recommended by Zapier to avoid throttling; it still must respect Zapier/app rate limits.
  • Make Help: Introducing credits

    help.make.com

    • - Make transitioned billing from operations to credits; most non‑AI modules consume 1 credit per operation, while built‑in AI or advanced features may consume variable credits tied to tokens, time, or file size.
  • Make Developer Hub: Scenarios API

    developers.make.com

    • - Make exposes scenario toggle endpoints: POST /api/v2/scenarios/{id}/start (activate) and POST /api/v2/scenarios/{id}/stop (deactivate), requiring scenarios:write scope.
  • Make Help: Automatic retry of incomplete executions

    help.make.com

    • - Make automatically retries incomplete executions for rate‑limit/connection/timeouts using a backoff schedule and caps parallel retries per scenario at 3 to avoid cascading failures.
  • n8n Docs: Concurrency control; Queue mode

    docs.n8n.io

    • - n8n supports production concurrency caps via N8N_CONCURRENCY_PRODUCTION_LIMIT, and queue mode with worker concurrency to control parallel executions.
  • n8n Docs: n8n node + template 3229

    docs.n8n.io

    • - n8n documents Activate/Deactivate workflow operations via its own API (exposed through the built‑in n8n node and templates).
  • OpenAI API pricing

    platform.openai.com

    • - OpenAI publishes per‑model prices per 1M tokens on the API pricing page.
  • Anthropic Pricing

    anthropic.com

    • - Anthropic publishes Claude model pricing per MTok input/output on its pricing page.
  • Gemini API pricing

    ai.google.dev

    • - Google’s Gemini API pricing page lists per‑1M token rates and notes that billing for Grounding with Google Search starts January 5, 2026 after a free tier.
  • Anthropic Usage & Cost API

    docs.anthropic.com

    • - Anthropic provides a Usage & Cost API for programmatic spend monitoring and alerts.
  • OpenAI Help: API Usage Dashboard

    help.openai.com

    • - OpenAI’s API usage dashboard supports exports for custom analysis; large ranges may be chunked into multiple files.
  • Zapier Workflow API: Rate Limiting

    docs.zapier.com

    • - Zapier’s Workflow API (Powered by Zapier) enforces request rate limits such as 60 req/min per IP and 150 req/min per partner.
  • Zapier Help: Rate limits for Zapier Tables and Zaps

    help.zapier.com

    • - Zapier Tables + Zaps rate‑limit behavior
    • - Grounds the throttle design (queue + wait windows) and explains expected 429s and autoreplay behavior.
  • Zapier Manager app + blog guide

    zapier.com

    • - Zapier Manager can turn Zaps on/off
    • - Provides an in‑product action for implementing a kill‑switch without external APIs.
  • Make Help Center: Introducing credits

    help.make.com

    • - Operations → Credits migration
    • - Explains the credits model and why AI/advanced modules can consume variable credits — key to metering.
  • Make Help Center: Automatic retry of incomplete executions

    help.make.com

    • - Make auto‑retry/backoff schedule
    • - Supports the auto‑throttle strategy by relying on Make’s backoff and capped parallel retries.
  • n8n Docs: Concurrency control + Queue mode

    docs.n8n.io

    • - n8n throttling and concurrency caps
    • - Backs the queue + wait throttling pattern for self‑hosted n8n.
  • n8n Docs: n8n core node + templates

    docs.n8n.io

    • - Activate/Deactivate workflows via API
    • - Confirms a practical kill‑switch for n8n (publish/unpublish/activate/deactivate) callable from n8n’s own API.
  • Google AI for Developers: Gemini API pricing

    ai.google.dev

    • - Gemini token pricing and Search Grounding billing date
    • - Current token pricing reference and note that certain features begin billing on Jan 5, 2026 — matters for alert calibrations.
  • Zapier Help: Add delays to Zaps (Delay After Queue)

    help.zapier.com

    • - Zapier queue + delay pattern
    • - Documents the preferred anti‑throttle pattern to serialize runs and respect app/Zapier limits.

Jordan: April ninth. Wednesday night. I'm reconciling invoices — the monthly ritual where I pull up OpenAI, Anthropic, and Zapier side by side and pretend I'm not scared to look. And this time... the Zapier number doesn't make sense. I'm staring at a task count that's forty-two percent over my plan limit. Forty-two percent. Which means every task past the cap has been billed at one-point-two-five times the base rate. I didn't get a warning. I didn't get an email that said "hey, you're close." The Zaps just... kept running. And Zapier kept charging.

Jordan: So I start digging. Which Zaps burned through the overage? It's not the critical ones — not the client onboarding flows, not the invoice sync. It's a batch enrichment Zap I built for one client's CRM. Runs every six hours, hits an API, updates about three hundred records. Totally non-critical. Could have paused for a week and nobody would have noticed. But it didn't pause. Because I had no system telling it to pause. No alert at sixty percent. No throttle at eighty. No kill switch at a hundred. Just... an open tap running into a paid meter.

Jordan: That overage cost me about two hundred and thirty dollars. Not catastrophic. But here's what kept me up — I have twelve clients. If three of them had spiked the same way in the same month, that's seven hundred dollars of margin I didn't budget for. Gone. Because a non-critical Zap didn't know it was non-critical.

Jordan: That Thursday morning I built the system I'm showing you today.

Jordan: The problem is not that AI costs are unpredictable. OpenAI publishes per-token prices. Anthropic publishes per-token prices. Zapier tells you exactly what a task costs past your plan limit — one-point-two-five times base. The problem is that nobody builds the wiring between those numbers and an actual stop signal. So today you're getting that wiring. One usage ledger that tracks tokens, tasks, and Make credits per client. Slack alerts at sixty, eighty, and a hundred percent of budget. Auto-throttle rules that slow non-critical flows before they spike. And a kill switch on every platform — Zapier, Make, n8n — that shuts down the stuff that doesn't matter before it eats the margin on the stuff that does.

Jordan: Okay, so here's what I was dealing with that Thursday morning. Twelve clients. Three cost surfaces per client — LLM tokens across OpenAI and Anthropic, Zapier tasks, and Make credits. And every one of those surfaces has a different unit, a different price, and a different way of telling you how much you've used. OpenAI gives you a usage dashboard with exports. Anthropic has their Usage and Cost API — which is actually great, you can pull spend programmatically. Zapier shows task counts in your account settings. Make shows credit consumption in the scenario logs. But none of them talk to each other. And none of them can tell you "client X is at seventy-eight percent of their monthly budget across all platforms."

Jordan: That's the gap. Not visibility per vendor — visibility per client, across vendors. So the first thing I built was a cost ledger. Google Sheets. One row per usage event. Twelve columns — date, client name, source, model or plan, metric type, quantity, unit, and then the pricing fields. Input price per million tokens, output price per million tokens, task unit price for Zapier, credit unit price for Make. And then a calculated cost column that picks the right formula based on the metric type.

Jordan: This takes roughly twenty minutes to set up from scratch. If you grab the template from the Resources page, it takes about five — you just replace the bracketed fields with your client names and your current pricing.

Jordan: Now — the pricing tab matters more than people think. OpenAI, Anthropic, and Google all publish per-million-token rates. You can look them up right now. But here's the catch with Make — they moved from operations to credits, and most non-AI modules still cost one credit per operation. Fine. But their built-in AI modules? Variable. The credit cost scales with tokens, processing time, even file size in some cases. So if you're running a Make scenario that calls an AI module, you cannot just count operations anymore. You have to track credits, and you have to know that a single AI step might burn five or ten credits where a normal HTTP module burns one.

Jordan: I learned this the fun way. Had a client whose content generation scenario looked like it was running fifty operations a month. Fifty. Totally fine. Except those fifty operations included an AI summarizer that was consuming about three hundred credits. The scenario log showed it. I just... wasn't looking at the right number.

Jordan: So the ledger is layer one. Layer two is alerts. And this is where it gets satisfying, because the math is simple. You set a monthly budget per client per source in a Budgets tab. The Rollup tab pulls actual spend from the ledger, divides by budget, and flags when you cross sixty, eighty, or a hundred percent. A column called Alert Level does the threshold check. Another column called Last Notified tracks which alert already fired so you don't get duplicate Slack messages.

Jordan: The Slack integration is one Zap or one Make scenario — watches the Rollup tab for new alerts, posts to your channel, and updates Last Notified. That's it. At sixty percent, you get an informational ping. At eighty, a warning. At a hundred, an action alert. And the hundred-percent alert is the one that triggers the kill switch — but I'll get to that.

Jordan: Now, someone's going to say — and I've gotten this DM — "Jordan, OpenAI has a usage dashboard. Anthropic has budget alerts built in. Why am I building a spreadsheet?" Fair question. And the answer is: use those tools. Absolutely use them. Set budget alerts in every vendor console you have access to. But here's what those dashboards cannot do. They cannot turn off a Zapier Zap. They cannot stop a Make scenario. They cannot deactivate an n8n workflow. They can tell you the house is on fire. They cannot turn off the stove. That's what the kill switch is for, and that's why the ledger exists — to be the single source that connects vendor spend to platform actions.

Jordan: Alright — layer three. Throttling. This is the part that runs quietly in the background and prevents you from ever needing the kill switch in the first place. The idea is simple: slow down non-critical flows so they don't spike your usage in bursts.

Jordan: On Zapier, the tool is Delay After Queue. You add it near the top of any Zap that handles bursty events — webhooks, bulk table updates, anything that might fire fifty times in a minute. You give the queue a name, set a wait between runs — I usually start with ten to fifteen seconds — and now that Zap processes one run at a time instead of stampeding through your task limit. Zapier's own docs recommend this pattern specifically to avoid hitting their rate limits. And those limits are real — four hundred fifty requests per minute per Zap-table combination, a hundred fifty per five seconds. If you're doing anything with Zapier Tables, you will hit these without a queue.

Jordan: On n8n — if you're self-hosting — the lever is an environment variable called N8N underscore CONCURRENCY underscore PRODUCTION underscore LIMIT. Set it to whatever your infrastructure and your external APIs can handle. I run mine at five for most client instances. If you're using queue mode with workers, you set worker concurrency separately. And then inside individual workflows, you add Wait nodes to pace loops that call rate-limited APIs. The combination of concurrency caps plus in-workflow waits keeps you under external rate limits without manual babysitting.

Jordan: Make is the easiest here, honestly. Make has automatic retry with backoff built in. If a module hits a rate limit or a timeout, Make retries with staged delays and caps parallel retries at three per scenario. So you don't need to build a queue — you need to not fight the queue that already exists. Space your scenario schedules conservatively. If you're replaying a backlog, chunk it through a Data Store instead of dumping everything at once.

Jordan: Throttle first, scale later. I have that written on a sticky note on my monitor. It's not glamorous advice. But every time I've ignored it, I've regretted it within a week.

Jordan: Layer four. The kill switch. This is the part that actually stops the bleeding when a client's usage hits a hundred percent of budget.

Jordan: The principle is critical — you only kill non-critical automations. You never put a client's invoice sync, their onboarding flow, or their payment processing on the kill list. You decide in advance which flows are non-critical — batch enrichments, report generators, content schedulers, analytics syncs — and you tag them. In the Budgets tab, each source-client pair has a Priority column: critical or non-critical. And a Kill Switch Armed column: true or false.

Jordan: When the Rollup tab hits a hundred percent and the priority is non-critical and the kill switch is armed — that's when the automation fires.

Jordan: On Zapier, you use Zapier Manager. It's a built-in app that can turn Zaps on and off. Your kill-switch Zap watches the Rollup tab, loops through a list of non-critical Zap IDs, and calls Zapier Manager to turn each one off. Then it posts to Slack with a re-enable link — a webhook URL that triggers a second Zap to turn everything back on. One click. That's the re-enable.

Jordan: On Make, you hit the Scenarios API. POST to slash api slash v2 slash scenarios slash your scenario ID slash stop. That deactivates the scenario. To re-enable, same endpoint but with slash start. You need a personal access token with scenarios-write scope. I run this from a separate Make scenario — the toggler scenario — so it never accidentally disables itself. Yes, I learned that one the hard way too.

Jordan: On n8n, the built-in n8n node has Activate and Deactivate workflow operations. There's even an official template — template thirty-two twenty-nine — that demonstrates scheduled activation and deactivation using the native API. Quick note if you're on n8n v2: the UI labels changed to Publish and Unpublish, but the activation endpoints still work through the API and the built-in node. So your kill-switch workflow deactivates the non-critical workflows by ID, posts confirmation to Slack, and provides a webhook-triggered re-enable path. Same pattern as Zapier and Make — just different verbs.

Jordan: Actually — I want to flag something about the n8n endpoint documentation. The built-in node and the official templates clearly support Activate and Deactivate operations, and community implementations confirm the REST endpoints work. But as of right now, n8n doesn't have a single static reference page that enumerates those endpoints the way Make's Scenarios API docs do. So if you're building this on n8n, follow the template and the built-in node operations rather than trying to reverse-engineer endpoint paths from the docs. It works. The documentation just isn't as tidy as Make's.

Jordan: So that's the full system. Ledger feeds the Rollup. Rollup triggers alerts at sixty, eighty, a hundred. Throttle rules keep non-critical flows from spiking in the first place. And the kill switch catches anything that gets through the throttle and hits the cap.

Jordan: I want to be honest about one thing, though. This system adds complexity. You now have a spreadsheet to maintain, alert automations to monitor, and kill-switch Zaps or scenarios that could theoretically misfire. If you accidentally put a critical flow on the non-critical list, you could disable something that matters. That's a real risk. Which is why the test plan matters — and I mean actually running it. Add a fake client to your Budgets tab with tiny budgets. Push test rows into the ledger that cross each threshold. Watch the Slack alerts fire. Watch the kill switch toggle your test Zaps off. Click the re-enable link. Confirm everything comes back. Do this before you arm it on a real client. Takes about thirty minutes. Saves you from the kind of mistake that's much harder to explain than a two-hundred-dollar overage.

Jordan: And recalibrate monthly. Token prices change — Gemini started billing for Search Grounding features back in January. Make's credit costs shift when they update AI modules. Your clients' usage patterns drift as they adopt the tools you built for them. The budget numbers in that spreadsheet are not set-and-forget. They're a monthly five-minute review.

Jordan: That two-hundred-and-thirty-dollar overage from April? It would not have happened with this system. The sixty-percent alert would have pinged me mid-month. The throttle would have slowed the enrichment Zap down before it burned through the remaining tasks. And if it somehow still hit the cap, the kill switch would have turned it off — and I would have gotten a Slack message with a one-click re-enable link instead of a surprise on my invoice.

Jordan: That's the difference between monitoring costs and controlling them. Monitoring tells you what happened. Control stops what's about to happen.

Jordan: Here's your one move this week. Open a Google Sheet — or grab the Guardrails Template Pack on the Resources page, which has the ledger, the alert formulas, and the kill-switch recipes for all three platforms ready to go. Pick your highest-spend client. Set their budget. Wire the sixty-percent alert to Slack. Just that. You can add the throttle and the kill switch next week. But the alert alone will change how you think about your margins.

Jordan: I'm Jordan. This is Headcount Zero. Go build the guardrails.

cost controlautomation guardrailsZapier task limitsMake creditstoken monitoringkill switchthrottlingbudget alertsmargin protectionn8n rate limitsoverage preventionclient cost tracking