Jordan: A research team published a benchmark in April called PIIBench. They ran every major PII detector against a unified corpus — Presidio, cloud services, commercial APIs — and the span-level F1 scores were... not what you'd hope. On the hardest categories, detectors that vendors market as production-ready were missing entities that a human would catch in seconds.
Now hold that number in your head. Because at the same time, OpenAI released their Privacy Filter model — open-weight, self-hostable — and reported state-of-the-art F1 on PII-Masking-300k. Strong results. Genuinely impressive.
So which is it? Are PII detectors reliable or not?
The answer is both. And that's exactly the problem. Because if you're a solo operator running Make scenarios or n8n workflows that touch client data — emails, phone numbers, API keys, health records, whatever — and you're sending that data to an LLM or a third-party API without stripping it first... you are trusting a detection layer you haven't tested, configured by someone who doesn't know your data, running at a confidence threshold you didn't set.
And you have zero proof it's working. No log. No audit trail. Nothing you can show a client who asks, "How do you handle our sensitive data?"
I know. Because that was me seven months ago.
How many of your workflows touch data that could identify a real person — and how many of those workflows strip that data before it leaves your infrastructure? Not "probably handles it." Not "the vendor says they don't store inputs." Actually strips it, logs what it stripped, and gives you a receipt you could hand to a client's compliance team tomorrow.
I'm Jordan. This is Headcount Zero. Today you're getting a PII redaction subflow — detect, mask, log — that drops in before any external API or LLM call in Make or n8n. And the log it produces is the part that changes your business, because it turns invisible security work into something a client can see.
So let me map the actual problem, because most people think about PII redaction wrong. They think the risk is the LLM call itself — the prompt going to OpenAI or Anthropic. And yes, that's a risk. But it's one of at least four places your client's data can leak.
First, the prompt. Obviously. You're sending text to a model, and that text might contain a name, an email, a Social Security number, an API key — anything the client typed or your system ingested.
Second — and this is the one people miss — function call arguments. If your workflow calls a third-party API as part of a chain, the arguments you pass might contain PII that the LLM extracted or reformatted.
Third, the model's output. The response can echo back or even hallucinate PII that wasn't in the original input.
And fourth — this is the one that got me — your logs and traces. If you're using LangSmith, or Make's execution history, or n8n's workflow logs, every input and output is sitting there in plain text. Unmasked. Queryable. Sometimes indefinitely.
LangSmith actually documents this. Their docs show how to use Presidio to mask inputs and outputs before they hit the trace store. Which tells you something — even the observability vendors know that raw data in logs is a liability.
And here's where it gets uncomfortable. You might be thinking, "Well, my LLM provider handles this." And... sort of. Vertex AI has a non-configurable safety filter that blocks some PII by default. But Google's own documentation recommends running Cloud DLP as an additional layer for de-identification. Their words, not mine. The default filter is a floor, not a ceiling.
Amazon Bedrock Guardrails give you more control — configurable PII filters, an ApplyGuardrail API you can even use with external models — but that's a managed service with per-request pricing.
And Anthropic? Anthropic doesn't have a built-in PII toggle at all. They publish a "PII purifier" prompt pattern in their docs, which is basically a system prompt that asks Claude to redact things. That's it. A prompt.
So the vendor landscape is inconsistent. Some platforms block some things by default. Some give you configurable filters. Some give you a prompt template and wish you luck. None of them give you a log of what they caught, what they missed, and what decision they made — which is the thing your client's compliance team actually wants to see.
That's why you build your own subflow. And it's simpler than it sounds.
The pattern has three steps. Detect, mask, log. Every time your workflow is about to make an external call — LLM, API, webhook, whatever — the data passes through this subflow first. The detector scans the text for entities: emails, phone numbers, Social Security numbers, API keys, JWTs, whatever you've configured. The masker replaces or pseudonymizes those entities based on rules you set per entity type. And the logger emits a single JSON object — one per run — that records what was found, what was masked, which detector version ran, and whether the payload was passed through clean or modified.
That log is the whole game. It's the artifact that turns "I handle PII carefully" into "here are four hundred and twelve redaction events from the last ninety days, broken down by entity type, with detector version and decision status on every one." That's the client proof problem solved — not with a trust center page, but with structured evidence.
For the detector, you have two strong self-hosted options right now. Microsoft Presidio — open source, runs in Docker, combines regex patterns with named entity recognition and context words. You can spin up the analyzer and anonymizer containers in about ten minutes. It supports custom recognizers, so you can add patterns for OpenAI API keys, GitHub tokens, AWS access keys — anything with a predictable format. The anonymizer gives you operators per entity type: replace, redact, hash, encrypt. So you can fully mask an email but pseudonymize an IBAN with a salted SHA-256 hash — keeping the structure for downstream logic while stripping the actual value.
The other option is newer. OpenAI released their Privacy Filter model in April — open-weight, designed for high-throughput workflows. Their model card reports state-of-the-art F1 on PII-Masking-300k, and it covers categories that Presidio's default recognizers don't handle as well out of the box — things like account numbers and secrets. You can self-host it, which means your data never leaves your infrastructure. That matters. Because the whole point of PII redaction is to not send sensitive data to a third party — and if your redaction service is itself a third-party API, you've just moved the problem.
Actually — I should slow down on that, because it's a real decision point. Self-hosted versus managed. Let me walk through the cost side.
If you run Presidio or Privacy Filter locally, your cost is compute. A small Docker instance on a five-dollar-a-month VPS can handle the volume most solo operators generate. No per-request billing. No data egress. Latency is local — roughly twenty to fifty milliseconds per scan depending on text length and recognizer count.
Managed services flip that equation. AWS Comprehend bills PII detection in one-hundred-character units, minimum three units per request. The example pricing on their page shows about a hundredth of a cent per unit for redaction. Sounds tiny until you're processing a few thousand requests a month — then it's real money, and it's money that scales linearly with volume. Google Cloud DLP charges three dollars per gigabyte for inspection and two dollars per gigabyte for transformation, with the first gig free on each. Bedrock Guardrails prices per request — about a tenth of a cent per request after AWS cut pricing by up to eighty-five percent in late twenty twenty-four.
For a solo operator doing, say, five hundred to two thousand LLM calls a month? Self-hosted wins on cost by a wide margin. You're paying five to fifteen dollars a month in compute versus potentially fifty to a hundred in managed API fees. And you're not sending raw data over the network to get it redacted — which, again, defeats the purpose.
The managed services make sense at enterprise scale, or when you need a specific compliance certification the managed provider carries. But for our world? Presidio in Docker. Or Privacy Filter on a GPU instance if you want the context-aware detection. That's the move.
Okay, so the actual build. In Make, this is a subflow you duplicate before every external call. First module: an HTTP request to your Presidio analyzer — POST the text, specify the entity types you want detected. Emails, phone numbers, SSN, IBAN, API keys, GitHub tokens, AWS credentials, JWTs. The analyzer returns a list of recognized entities with types, offsets, and confidence scores.
Second module: another HTTP request to the Presidio anonymizer. You pass the original text plus the analyzer results plus your operator map — which tells it how to handle each entity type. Mask emails fully. Keep the last four digits of a phone number. Replace API keys with a bracketed placeholder. Hash IBANs. Whatever your client's data handling requirements specify.
Third module: build the redaction log. This is a JSON object — run ID, timestamp, environment, detector name and version, entity counts by type, a sample of masked spans — just the offsets and entity types, never the raw values — and the decision: input masked, pass-through, or fail-closed.
Fourth module: write that log to your sink. BigQuery, Postgres, S3, Logflare — wherever you store structured events.
And then the router. This is critical. If the detector or anonymizer throws an error — timeout, malformed response, anything — you do not pass the original text through. You fail closed. Queue the payload, fire an alert to Slack or email, and stop the workflow. The raw data does not leave your infrastructure if the redaction layer is down.
That fail-closed branch is non-negotiable. Because here's what PIIBench showed us — detectors miss things. Even good detectors miss things on hard categories. So your design has to assume the detector will occasionally fail, and the failure mode has to be safe. Not "send it anyway and hope." Safe.
In n8n, the same pattern works as a sub-workflow. HTTP Request node for detection, HTTP Request for anonymization, a Function node to build the log object, another HTTP Request to write the log, and a Switch node for routing. You can import the whole thing as a JSON workflow and set environment variables for your Presidio URL, log endpoint, and alert channel.
Now — the honest objection. And it's a real one. If you mask or replace PII before sending text to an LLM, you're changing the input. And changed inputs can degrade the model's reasoning. A prompt that says "Schedule a follow-up with about their account " gives the model less context than the original. In some workflows — summarization, entity extraction, relationship mapping — that loss of context matters.
I've seen this myself. I had a client workflow that extracted action items from meeting transcripts. When I started masking names and emails before the LLM call, the model's ability to attribute action items to specific people dropped noticeably. It would say "someone should follow up" instead of "Sarah should follow up." Which... is less useful.
So here's how you handle it. Pseudonymization instead of blunt masking. Instead of replacing "sarah.jones at acme dot com" with a generic mask, you replace it with a stable placeholder — "PERSON_ONE" or a consistent hash that maps back to the original in your internal system. The LLM sees a consistent identifier. It can still reason about relationships and attribution. But the actual PII never leaves your infrastructure.
Presidio supports this natively — the hash operator with a salted key gives you deterministic pseudonyms. Same input always produces the same hash, so the model sees consistent entities across a conversation. And you can de-pseudonymize on the way back if you need the real values in your final output.
The other piece is scanning both directions. Don't just redact inputs — scan outputs too. The model might echo back PII from its training data, or reconstruct identifiers from context clues. Run the same subflow on the response before it hits your logs or your client-facing system. Same detector, same log, same fail-closed branch. Two passes per call. The latency cost is minimal — forty to a hundred milliseconds total for both scans on a local Presidio instance.
A twenty twenty-six paper called LLM-Redactor evaluated eight different privacy-preserving techniques for LLM requests and found that layered approaches — combining redaction with rephrasing and local routing — reduced leak rates significantly compared to any single technique alone. So the research backs what the implementation experience suggests: one layer is not enough. But two layers — pre-call redaction plus post-call scanning, with pseudonymization preserving reasoning quality — gets you most of the way there without destroying model performance.
And the log captures all of it. Both passes. Both directions. Entity counts, detector version, decision status. That's your audit trail.
So let me bring this back to where we started. PIIBench showed us that detectors are imperfect. OpenAI's Privacy Filter showed us they're getting better fast. Both things are true. And neither one changes what you need to do — which is stop trusting someone else's filter and build your own subflow that detects, masks, and proves it with a log.
The Starter Redaction Pack is on the episode page — Presidio config with tuned recognizers, a regex fallback library with tests, Make and n8n subflow templates, and the JSON log schema with dashboard tiles. It's the exact setup I run in production. Takes about forty-five minutes to wire up the first time. After that, you're duplicating a subflow — roughly two minutes per workflow.
Here's your one thing for this week. Pick one workflow — just one — that sends client data to an external API or LLM. Drop the redaction subflow in front of it. Run it for a week. Then pull up the log and look at what it caught. I promise you, the first time you see entity counts in that log — three emails masked, one API key caught, zero leaks — you'll understand why this matters more than any vendor checkbox ever could.
I'm Jordan. This is Headcount Zero. Go build something that protects what your clients trusted you with.