TemplateMay 1, 2026

Starter Redaction Pack: Presidio + Secrets Regex + Redaction Log Schema (Make/n8n‑Ready)

Copy‑ready configs and subflows to detect→mask→log PII and secrets before any external call. Includes Presidio docker + recognizers, a secrets regex library with tests, Make/n8n subflows, and an auditable Redaction Log schema with dashboard tiles.

From EpisodeBuild a PII Redaction Subflow That Logs Every Detection

Contents↓ Download PDF

Quick‑start map [copy/paste + fill]Presidio: docker + tuned recognizers + operator map Regex fallbacks + tests [drop‑in]Make subflow [detect→mask→annotate]n8n subflow [import JSON + set env]Redaction Log schema + dashboard tiles Detector swap layer [Privacy Filter / cloud DLP]Fail‑closed branch [deterministic + alert]Defaults, validation, and rollout [copy this]

Drop this pack in front of any external API/LLM call to detect and mask PII/secrets, then write a single annotated Redaction Log per run. Pick a detector (local Presidio or OpenAI Privacy Filter), wire the subflow, and ship with audit‑ready logs. Replace anything in [BRACKETS] for your stack.

Quick‑start map [copy/paste + fill]

Choose your detector: [presidio|openai_privacy_filter|cloud_service]. 2) Set your log sink: [BigQuery|Postgres|S3|Logflare]. 3) Wire the subflow before every external call and before traces/metrics. 4) Run the tests, then flip to production with fail‑closed routing.

Config map you’ll reuse across sections:

[WORKSPACE_NAME]: Project/workspace label for logs.
[RUN_ID]: Unique id per request/task.
[PRESIDIO_URL]: e.g., http://presidio:3000.
[PRIVACY_FILTER_URL]: If hosting OpenAI Privacy Filter locally.
[ANON_SALT]: Secret salt for hashing/pseudonymization.
[LOG_DESTINATION]: e.g., BigQuery dataset.table or S3 bucket/key prefix.
[ALERT_CHANNEL]: Email/Slack/Webhook for fail‑closed alerts.

↑ Back to top

Presidio: docker + tuned recognizers + operator map

Use the local Presidio stack for zero data egress and predictable cost. This sample stands up Analyzer + Anonymizer and adds tuned recognizers for common secrets.

# docker-compose.yaml (minimal)
version: &#39;3.9&#39;
services:
  presidio-analyzer:
    image: mcr.microsoft.com/presidio-analyzer:latest
    environment:
      - PYTHONIOENCODING=utf-8
    ports: [&#39;3000:3000&#39;]
  presidio-anonymizer:
    image: mcr.microsoft.com/presidio-anonymizer:latest
    environment:
      - ANON_SALT=[ANON_SALT]
    ports: [&#39;3001:3001&#39;]

Custom recognizers (augment built-ins like EMAIL_ADDRESS, PHONE_NUMBER, US_SSN):

# recognizers.yaml (Presidio pattern-based)
- name: &#39;OPENAI_API_KEY&#39;
  supported_language: &#39;en&#39;
  patterns:
    - name: &#39;sk_prefix&#39;
      regex: &#39;sk-[A-Za-z0-9]{32,48}&#39;
      score: 0.75
  context: [&#39;openai&#39;, &#39;api&#39;, &#39;key&#39;, &#39;secret&#39;]

- name: &#39;GITHUB_TOKEN&#39;
  supported_language: &#39;en&#39;
  patterns:
    - name: &#39;ghp_token&#39;
      regex: &#39;ghp_[A-Za-z0-9]{36}&#39;
      score: 0.75
  context: [&#39;github&#39;, &#39;token&#39;]

- name: &#39;AWS_ACCESS_KEY_ID&#39;
  supported_language: &#39;en&#39;
  patterns:
    - name: &#39;akia_prefix&#39;
      regex: &#39;(AKIA|ASIA)[0-9A-Z]{16}&#39;
      score: 0.7
  context: [&#39;aws&#39;, &#39;access&#39;, &#39;key&#39;, &#39;id&#39;]

- name: &#39;AWS_SECRET_ACCESS_KEY&#39;
  supported_language: &#39;en&#39;
  patterns:
    - name: &#39;40char_secret&#39;
      regex: &#39;(?i)(aws_?secret_?access_?key)\s*[:=]\s*[A-Za-z0-9/+=]{40}&#39;
      score: 0.8
  context: [&#39;aws&#39;, &#39;secret&#39;]

- name: &#39;IBAN_CODE&#39;
  supported_language: &#39;en&#39;
  patterns:
    - name: &#39;iban_core&#39;
      regex: &#39;\b[A-Z]{2}\d{2}[A-Z0-9]{11,30}\b&#39;
      score: 0.65
  context: [&#39;iban&#39;, &#39;bank&#39;, &#39;account&#39;]

- name: &#39;JWT_TOKEN&#39;
  supported_language: &#39;en&#39;
  patterns:
    - name: &#39;jwt_like&#39;
      regex: &#39;\beyJ[\w-]*\.[\w-]*\.[\w-]*\b&#39;
      score: 0.7
  context: [&#39;jwt&#39;, &#39;bearer&#39;, &#39;authorization&#39;]

Anonymizer operator map (entity → action). Choose mask/hash/replace to balance utility vs privacy.

# anonymizer_map.yaml
operators:
  EMAIL_ADDRESS: { type: &#39;mask&#39;, masking_char: &#39;*&#39;, chars_to_mask: &#39;all&#39; }
  PHONE_NUMBER: { type: &#39;mask&#39;, masking_char: &#39;•&#39;, unmasked_end_chars: 2 }
  US_SSN: { type: &#39;mask&#39;, masking_char: &#39;X&#39;, unmasked_end_chars: 4 }
  IBAN_CODE: { type: &#39;hash&#39;, hash_type: &#39;sha256&#39;, key: &#39;[ANON_SALT]&#39; }
  OPENAI_API_KEY: { type: &#39;replace&#39;, new_value: &#39;[API_KEY_MASKED]&#39; }
  GITHUB_TOKEN: { type: &#39;replace&#39;, new_value: &#39;[GITHUB_TOKEN_MASKED]&#39; }
  AWS_ACCESS_KEY_ID: { type: &#39;replace&#39;, new_value: &#39;[AWS_ACCESS_KEY_ID_MASKED]&#39; }
  AWS_SECRET_ACCESS_KEY: { type: &#39;replace&#39;, new_value: &#39;[AWS_SECRET_ACCESS_KEY_MASKED]&#39; }
  JWT_TOKEN: { type: &#39;replace&#39;, new_value: &#39;[JWT_MASKED]&#39; }
  DEFAULT: { type: &#39;mask&#39;, masking_char: &#39;*&#39;, chars_to_mask: &#39;all&#39; }

Minimal call contract (HTTP):

POST [PRESIDIO_URL]/analyze
body: { &#39;text&#39;: &#39;[RAW_TEXT]&#39;, &#39;language&#39;: &#39;en&#39;, &#39;entities&#39;: [&#39;EMAIL_ADDRESS&#39;,&#39;PHONE_NUMBER&#39;,&#39;US_SSN&#39;,&#39;IBAN_CODE&#39;,&#39;OPENAI_API_KEY&#39;,&#39;GITHUB_TOKEN&#39;,&#39;AWS_ACCESS_KEY_ID&#39;,&#39;AWS_SECRET_ACCESS_KEY&#39;,&#39;JWT_TOKEN&#39;] }

POST [PRESIDIO_URL]/anonymize
body: { &#39;text&#39;: &#39;[RAW_TEXT]&#39;, &#39;analyzer_results&#39;: [..from analyze..], &#39;anonymizer_config&#39;: { ..from anonymizer_map.. } }

Notes:

Keep [ANON_SALT] secret; rotate it per environment.
Add domain dictionaries (e.g., known client IDs) via deny/allow lists to cut false positives.
Image/PDF pipelines: Presidio has image/DICOM support—run those at ingest, then text redaction here.

↑ Back to top

Regex fallbacks + tests [drop‑in]

Use these when the detector is offline or for simple pre‑filters. Each pattern includes a test string. Keep them conservative to avoid masking too much.

# secrets_piiregex.yaml
EMAIL:        &#39;(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b&#39;
PHONE_E164:   &#39;\+?[1-9]\d{8,14}&#39;
US_SSN:       &#39;\b(?!000|666|9\d\d)\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b&#39;
IBAN_SIMPLE:  &#39;\b[A-Z]{2}\d{2}[A-Z0-9]{11,30}\b&#39;
OPENAI_KEY:   &#39;sk-[A-Za-z0-9]{32,48}&#39;
GITHUB_TOKEN: &#39;ghp_[A-Za-z0-9]{36}&#39;
AWS_AKID:     &#39;(AKIA|ASIA)[0-9A-Z]{16}&#39;
AWS_SAK:      &#39;(?i)(aws_?secret_?access_?key)\s*[:=]\s*[A-Za-z0-9/+=]{40}&#39;
JWT:          &#39;\beyJ[\w-]*\.[\w-]*\.[\w-]*\b&#39;

Tiny test harness (Python + pytest):

# test_redaction_regex.py
import re, yaml
rx = yaml.safe_load(open(&#39;secrets_piiregex.yaml&#39;))
CASES = {
  &#39;EMAIL&#39;: &#39;Contact me at a.ops+dev@example.io today&#39;,
  &#39;OPENAI_KEY&#39;: &#39;token sk-1234567890ABCDEFGHIJKLMNOPQRSTUV&#39;,
  &#39;AWS_AKID&#39;: &#39;env AKIAABCDEFGHIJKLMNOP&#39;,
  &#39;US_SSN&#39;: &#39;holder 123-45-6789&#39;,
}
for k,v in CASES.items():
    assert re.search(rx[k], v, flags=re.I)

Fallback masking (Python):

# naive_mask.py
import re, yaml
rx = yaml.safe_load(open(&#39;secrets_piiregex.yaml&#39;))
text = open(&#39;[INPUT_FILE]&#39;).read()
for label, pat in rx.items():
    text = re.sub(pat, f&#39;[{label}_MASKED]&#39;, text, flags=re.I)
open(&#39;[OUTPUT_FILE]&#39;, &#39;w&#39;).write(text)

Tip: run regex pre‑filters before model detection to trim obvious secrets and reduce false positives downstream.

↑ Back to top

Make subflow [detect→mask→annotate]

This template routes: Detect → Anonymize → Emit Redaction Log → Route. Duplicate the subflow before every external call and before trace/log export.

Node list and settings:

HTTP: Detect

Name: 'Detect PII (Presidio)'
URL: [PRESIDIO_URL]/analyze
Method: POST
Body (JSON): { "text": "{{1.input_text}}", "language": "en", "entities": ["EMAIL_ADDRESS","PHONE_NUMBER","US_SSN","IBAN_CODE","OPENAI_API_KEY","GITHUB_TOKEN","AWS_ACCESS_KEY_ID","AWS_SECRET_ACCESS_KEY","JWT_TOKEN"] }

HTTP: Anonymize

Name: 'Anonymize (Presidio)'
URL: [PRESIDIO_URL]/anonymize
Body: { "text": "{{1.input_text}}", "analyzer_results": {{1.body}}, "anonymizer_config": {{2.anonymizer_map}} }

JSON: Build Redaction Log

Name: 'Redaction Log'
Template object: see "Redaction Log schema" section; fill [RUN_ID],[ENV],[WORKSPACE_NAME].

Data store/HTTP: Write Log

Destination: [LOG_DESTINATION]

Router: Decisions

If detection/anonymize error → 'Fail‑Closed' path: Queue payload + POST to [ALERT_CHANNEL].
Else if entity_counts.total > 0 → Proceed with {{masked_text}}.
Else → Pass original input.

Export/Import tip:

Save this as a Make subflow and call it as the first module inside any scenario making external calls. Keep [ANON_SALT] and [PRESIDIO_URL] in Make variables per environment.

↑ Back to top

n8n subflow [import JSON + set env]

Import this minimal workflow, connect credentials, and set environment variables. Place the Subworkflow node before any external request or trace export.

{
  &quot;name&quot;: &quot;Redaction Subflow&quot;,
  &quot;nodes&quot;: [
    {
      &quot;id&quot;: &quot;DetectPII&quot;,
      &quot;name&quot;: &quot;Detect PII (Presidio)&quot;,
      &quot;type&quot;: &quot;n8n-nodes-base.httpRequest&quot;,
      &quot;parameters&quot;: {
        &quot;url&quot;: &quot;[PRESIDIO_URL]/analyze&quot;,
        &quot;method&quot;: &quot;POST&quot;,
        &quot;jsonParameters&quot;: true,
        &quot;options&quot;: {},
        &quot;bodyParametersJson&quot;: &quot;{\n  \&quot;text\&quot;: {{ $json.input_text }},\n  \&quot;language\&quot;: \&quot;en\&quot;,\n  \&quot;entities\&quot;: [\&quot;EMAIL_ADDRESS\&quot;,\&quot;PHONE_NUMBER\&quot;,\&quot;US_SSN\&quot;,\&quot;IBAN_CODE\&quot;,\&quot;OPENAI_API_KEY\&quot;,\&quot;GITHUB_TOKEN\&quot;,\&quot;AWS_ACCESS_KEY_ID\&quot;,\&quot;AWS_SECRET_ACCESS_KEY\&quot;,\&quot;JWT_TOKEN\&quot;]\n}&quot;
      }
    },
    {
      &quot;id&quot;: &quot;Anonymize&quot;,
      &quot;name&quot;: &quot;Anonymize (Presidio)&quot;,
      &quot;type&quot;: &quot;n8n-nodes-base.httpRequest&quot;,
      &quot;parameters&quot;: {
        &quot;url&quot;: &quot;[PRESIDIO_URL]/anonymize&quot;,
        &quot;method&quot;: &quot;POST&quot;,
        &quot;jsonParameters&quot;: true,
        &quot;bodyParametersJson&quot;: &quot;{\n  \&quot;text\&quot;: {{ $json.input_text }},\n  \&quot;analyzer_results\&quot;: {{ $(&#39;DetectPII&#39;).item.json.body }},\n  \&quot;anonymizer_config\&quot;: {{ $json.anonymizer_map }}\n}&quot;
      }
    },
    {
      &quot;id&quot;: &quot;BuildLog&quot;,
      &quot;name&quot;: &quot;Build Redaction Log&quot;,
      &quot;type&quot;: &quot;n8n-nodes-base.function&quot;,
      &quot;parameters&quot;: {
        &quot;functionCode&quot;: &quot;const det = $items(&#39;DetectPII&#39;)[0].json.body || [];\nconst masked = $items(&#39;Anonymize&#39;)[0].json.text || &#39;&#39;;\nconst counts = det.reduce((m,e)=&gt;{m[e.entity_type]=(m[e.entity_type]||0)+1; m.total=(m.total||0)+1; return m;},{});\nreturn [{ json: {\n  run_id: $json.RUN_ID, env: &#39;[ENV]&#39;, workspace: &#39;[WORKSPACE_NAME]&#39;,\n  detector: { name: &#39;presidio&#39;, version: &#39;[DET_VERSION]&#39; },\n  entity_counts: counts,\n  sample_masked_spans: det.slice(0,5).map(e=&gt;({ entity_type: e.entity_type, start: e.start, end: e.end })),\n  decision: counts.total&gt;0 ? &#39;input_masked&#39; : &#39;pass_through&#39;,\n  input_bytes: Buffer.from($json.input_text||&#39;&#39;).length,\n  output_bytes: Buffer.from(masked||&#39;&#39;).length,\n  latency_ms: { detect: 0, anonymize: 0 },\n  ts: new Date().toISOString()\n}, paired: { masked_text: masked } }];&quot;
      }
    },
    {
      &quot;id&quot;: &quot;WriteLog&quot;,
      &quot;name&quot;: &quot;Write Log&quot;,
      &quot;type&quot;: &quot;n8n-nodes-base.httpRequest&quot;,
      &quot;parameters&quot;: { &quot;url&quot;: &quot;[LOG_ENDPOINT]&quot;, &quot;method&quot;: &quot;POST&quot;, &quot;jsonParameters&quot;: true, &quot;bodyParametersJson&quot;: &quot;={{$json}}&quot; }
    },
    {
      &quot;id&quot;: &quot;Route&quot;,
      &quot;name&quot;: &quot;Router&quot;,
      &quot;type&quot;: &quot;n8n-nodes-base.switch&quot;,
      &quot;parameters&quot;: { &quot;property&quot;: &quot;={{$json.entity_counts.total || 0}}&quot;, &quot;rules&quot;: { &quot;rules&quot;: [ { &quot;operation&quot;: &quot;larger&quot;, &quot;value&quot;: 0 } ] } }
    }
  ],
  &quot;connections&quot;: { &quot;DetectPII&quot;: { &quot;main&quot;: [[{&quot;node&quot;:&quot;Anonymize&quot;,&quot;type&quot;:&quot;main&quot;,&quot;index&quot;:0}]] }, &quot;Anonymize&quot;: { &quot;main&quot;: [[{&quot;node&quot;:&quot;BuildLog&quot;,&quot;type&quot;:&quot;main&quot;,&quot;index&quot;:0}]] }, &quot;BuildLog&quot;: { &quot;main&quot;: [[{&quot;node&quot;:&quot;WriteLog&quot;,&quot;type&quot;:&quot;main&quot;,&quot;index&quot;:0}], [{&quot;node&quot;:&quot;Route&quot;,&quot;type&quot;:&quot;main&quot;,&quot;index&quot;:0}]] } }
}

Notes:

Replace [LOG_ENDPOINT] with your sink (HTTP collector, webhook, etc.).
Add an Error Trigger node to route exceptions to [ALERT_CHANNEL] and store the raw payload in a quarantine queue.

↑ Back to top

Redaction Log schema + dashboard tiles

Emit one JSON object per run. Keep it small, consistent, and query‑friendly.

Schema (conceptual):

{
  &quot;run_id&quot;: &quot;[RUN_ID]&quot;,
  &quot;ts&quot;: &quot;2026-05-01T12:00:00Z&quot;,
  &quot;env&quot;: &quot;[ENV]&quot;,
  &quot;workspace&quot;: &quot;[WORKSPACE_NAME]&quot;,
  &quot;detector&quot;: { &quot;name&quot;: &quot;presidio|privacy_filter|cloud&quot;, &quot;version&quot;: &quot;[DET_VERSION]&quot; },
  &quot;entity_counts&quot;: { &quot;EMAIL_ADDRESS&quot;: 2, &quot;OPENAI_API_KEY&quot;: 1, &quot;total&quot;: 3 },
  &quot;sample_masked_spans&quot;: [
    { &quot;entity_type&quot;: &quot;EMAIL_ADDRESS&quot;, &quot;start&quot;: 92, &quot;end&quot;: 107 },
    { &quot;entity_type&quot;: &quot;OPENAI_API_KEY&quot;, &quot;start&quot;: 144, &quot;end&quot;: 185 }
  ],
  &quot;decision&quot;: &quot;input_masked|pass_through|fail_closed&quot;,
  &quot;input_bytes&quot;: 1234,
  &quot;output_bytes&quot;: 1210,
  &quot;latency_ms&quot;: { &quot;detect&quot;: 38, &quot;anonymize&quot;: 12 },
  &quot;cost_estimate_usd&quot;: 0.0000,
  &quot;source&quot;: { &quot;service&quot;: &quot;[SERVICE_NAME]&quot;, &quot;operation&quot;: &quot;[OP_NAME]&quot; }
}

Example tiles (adapt to your BI tool):

Entity counts by type (7d): group by entity_type (explode entity_counts) and sum values.
Mask vs pass decisions: count by decision per day.
Detector/version: group by detector.name, detector.version to track rollouts.

BigQuery helper views (pseudo‑SQL):

-- explode entity_counts
SELECT run_id, ts, env, detector.name AS detector, detector.version AS version,
       key AS entity_type, value AS cnt
FROM `[LOG_DATASET].[TABLE]`, UNNEST(JSON_EXTRACT_KEYS(entity_counts)) AS key,
UNNEST([STRUCT(CAST(JSON_VALUE(JSON_EXTRACT(entity_counts, CONCAT(&#39;$.&#39;, key)) ) AS INT64) AS value)]);

Tip: store 2–5 masked span examples max; never store raw values.

↑ Back to top

Detector swap layer [Privacy Filter / cloud DLP]

Switch detectors without changing the rest of the subflow.

Option A — OpenAI Privacy Filter (self‑hosted):

Endpoint: [PRIVACY_FILTER_URL]/filter
Request: { "text": "[RAW_TEXT]" }
Response: { "masked_text": "...", "entities": [{"type":"email","start":..,"end":..}], "version": "[DET_VERSION]" }
Map entities to the Redaction Log; use masked_text downstream.

Option B — Cloud DLP/PII service (managed):

Wrap a thin adapter that normalizes results to { masked_text, entities[], version }.
Expect higher latency + per‑request cost; set cost_estimate_usd in the log from headers/bytes.

Keep the operator map consistent so your downstream behavior doesn’t change as you swap detectors.

↑ Back to top

Fail‑closed branch [deterministic + alert]

Use this when redaction fails or the detector times out.

Pseudocode:

try {
  detect(); anonymize(); write_log(&#39;input_masked&#39;|&#39;pass_through&#39;); forward(masked_or_raw);
} catch (e) {
  queue(original_payload, reason=e.message); write_log(&#39;fail_closed&#39;); alert([ALERT_CHANNEL]);
}

Checklist:

Queue: [S3 bucket|KV|DB table] named [WORKSPACE_NAME]-quarantine-[ENV].
Alert: POST to [ALERT_CHANNEL] with [RUN_ID], reason, and a link to the quarantined item.
Auto‑retry: backoff with jitter; max [N] attempts.
Circuit‑breaker: if fail‑closed rate > [THRESHOLD]% over [WINDOW] mins, disable non‑essential external calls and surface a status banner to ops.

↑ Back to top

Defaults, validation, and rollout [copy this]

Fast defaults for a solo operator:

Language: 'en' (extend later per client).
Entities: email, phone, SSN, IBAN, API keys/tokens (OpenAI, GitHub, AWS), JWT.
Operators: mask secrets fully; pseudonymize IBAN with SHA‑256 + [ANON_SALT]; keep last 2–4 digits for phone/SSN.
Placement: pre‑call redaction subflow + post‑call output scan (reuse same subflow).
Logs: one entry per run, no raw values, include detector name/version, entity_counts, decision, latency, and cost.

Validation sample:

&quot;Email jordan.ops@example.io and use sk-1234567890ABCDEFGHIJKLMNOPQRSTUV. AWS key AKIAABCDEFGHIJKLMNOP. SSN 123-45-6789.&quot;
→ &quot;Email ********************* and use [API_KEY_MASKED]. AWS key [AWS_ACCESS_KEY_ID_MASKED]. SSN XXX-XX-6789.&quot;

Rollout plan:

Dev: run regex‑only prefilter + Presidio; verify logs.
Stage: dual‑scan inputs/outputs; measure mask rate and latency budget.
Prod: enable fail‑closed; alert on any detector errors; review dashboards weekly.

↑ Back to top