Template

Starter Redaction Pack: Presidio + Secrets Regex + Redaction Log Schema (Make/n8n‑Ready)

Copy‑ready configs and subflows to detect→mask→log PII and secrets before any external call. Includes Presidio docker + recognizers, a secrets regex library with tests, Make/n8n subflows, and an auditable Redaction Log schema with dashboard tiles.

Drop this pack in front of any external API/LLM call to detect and mask PII/secrets, then write a single annotated Redaction Log per run. Pick a detector (local Presidio or OpenAI Privacy Filter), wire the subflow, and ship with audit‑ready logs. Replace anything in [BRACKETS] for your stack.

Quick‑start map [copy/paste + fill]

  1. Choose your detector: [presidio|openai_privacy_filter|cloud_service]. 2) Set your log sink: [BigQuery|Postgres|S3|Logflare]. 3) Wire the subflow before every external call and before traces/metrics. 4) Run the tests, then flip to production with fail‑closed routing.

Config map you’ll reuse across sections:

  • [WORKSPACE_NAME]: Project/workspace label for logs.
  • [RUN_ID]: Unique id per request/task.
  • [PRESIDIO_URL]: e.g., http://presidio:3000.
  • [PRIVACY_FILTER_URL]: If hosting OpenAI Privacy Filter locally.
  • [ANON_SALT]: Secret salt for hashing/pseudonymization.
  • [LOG_DESTINATION]: e.g., BigQuery dataset.table or S3 bucket/key prefix.
  • [ALERT_CHANNEL]: Email/Slack/Webhook for fail‑closed alerts.

Presidio: docker + tuned recognizers + operator map

Use the local Presidio stack for zero data egress and predictable cost. This sample stands up Analyzer + Anonymizer and adds tuned recognizers for common secrets.

# docker-compose.yaml (minimal)
version: '3.9'
services:
  presidio-analyzer:
    image: mcr.microsoft.com/presidio-analyzer:latest
    environment:
      - PYTHONIOENCODING=utf-8
    ports: ['3000:3000']
  presidio-anonymizer:
    image: mcr.microsoft.com/presidio-anonymizer:latest
    environment:
      - ANON_SALT=[ANON_SALT]
    ports: ['3001:3001']

Custom recognizers (augment built-ins like EMAIL_ADDRESS, PHONE_NUMBER, US_SSN):

# recognizers.yaml (Presidio pattern-based)
- name: 'OPENAI_API_KEY'
  supported_language: 'en'
  patterns:
    - name: 'sk_prefix'
      regex: 'sk-[A-Za-z0-9]{32,48}'
      score: 0.75
  context: ['openai', 'api', 'key', 'secret']

- name: 'GITHUB_TOKEN'
  supported_language: 'en'
  patterns:
    - name: 'ghp_token'
      regex: 'ghp_[A-Za-z0-9]{36}'
      score: 0.75
  context: ['github', 'token']

- name: 'AWS_ACCESS_KEY_ID'
  supported_language: 'en'
  patterns:
    - name: 'akia_prefix'
      regex: '(AKIA|ASIA)[0-9A-Z]{16}'
      score: 0.7
  context: ['aws', 'access', 'key', 'id']

- name: 'AWS_SECRET_ACCESS_KEY'
  supported_language: 'en'
  patterns:
    - name: '40char_secret'
      regex: '(?i)(aws_?secret_?access_?key)\s*[:=]\s*[A-Za-z0-9/+=]{40}'
      score: 0.8
  context: ['aws', 'secret']

- name: 'IBAN_CODE'
  supported_language: 'en'
  patterns:
    - name: 'iban_core'
      regex: '\b[A-Z]{2}\d{2}[A-Z0-9]{11,30}\b'
      score: 0.65
  context: ['iban', 'bank', 'account']

- name: 'JWT_TOKEN'
  supported_language: 'en'
  patterns:
    - name: 'jwt_like'
      regex: '\beyJ[\w-]*\.[\w-]*\.[\w-]*\b'
      score: 0.7
  context: ['jwt', 'bearer', 'authorization']

Anonymizer operator map (entity → action). Choose mask/hash/replace to balance utility vs privacy.

# anonymizer_map.yaml
operators:
  EMAIL_ADDRESS: { type: 'mask', masking_char: '*', chars_to_mask: 'all' }
  PHONE_NUMBER: { type: 'mask', masking_char: '•', unmasked_end_chars: 2 }
  US_SSN: { type: 'mask', masking_char: 'X', unmasked_end_chars: 4 }
  IBAN_CODE: { type: 'hash', hash_type: 'sha256', key: '[ANON_SALT]' }
  OPENAI_API_KEY: { type: 'replace', new_value: '[API_KEY_MASKED]' }
  GITHUB_TOKEN: { type: 'replace', new_value: '[GITHUB_TOKEN_MASKED]' }
  AWS_ACCESS_KEY_ID: { type: 'replace', new_value: '[AWS_ACCESS_KEY_ID_MASKED]' }
  AWS_SECRET_ACCESS_KEY: { type: 'replace', new_value: '[AWS_SECRET_ACCESS_KEY_MASKED]' }
  JWT_TOKEN: { type: 'replace', new_value: '[JWT_MASKED]' }
  DEFAULT: { type: 'mask', masking_char: '*', chars_to_mask: 'all' }

Minimal call contract (HTTP):

POST [PRESIDIO_URL]/analyze
body: { 'text': '[RAW_TEXT]', 'language': 'en', 'entities': ['EMAIL_ADDRESS','PHONE_NUMBER','US_SSN','IBAN_CODE','OPENAI_API_KEY','GITHUB_TOKEN','AWS_ACCESS_KEY_ID','AWS_SECRET_ACCESS_KEY','JWT_TOKEN'] }

POST [PRESIDIO_URL]/anonymize
body: { 'text': '[RAW_TEXT]', 'analyzer_results': [..from analyze..], 'anonymizer_config': { ..from anonymizer_map.. } }

Notes:

  • Keep [ANON_SALT] secret; rotate it per environment.
  • Add domain dictionaries (e.g., known client IDs) via deny/allow lists to cut false positives.
  • Image/PDF pipelines: Presidio has image/DICOM support—run those at ingest, then text redaction here.

Regex fallbacks + tests [drop‑in]

Use these when the detector is offline or for simple pre‑filters. Each pattern includes a test string. Keep them conservative to avoid masking too much.

# secrets_piiregex.yaml
EMAIL:        '(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b'
PHONE_E164:   '\+?[1-9]\d{8,14}'
US_SSN:       '\b(?!000|666|9\d\d)\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b'
IBAN_SIMPLE:  '\b[A-Z]{2}\d{2}[A-Z0-9]{11,30}\b'
OPENAI_KEY:   'sk-[A-Za-z0-9]{32,48}'
GITHUB_TOKEN: 'ghp_[A-Za-z0-9]{36}'
AWS_AKID:     '(AKIA|ASIA)[0-9A-Z]{16}'
AWS_SAK:      '(?i)(aws_?secret_?access_?key)\s*[:=]\s*[A-Za-z0-9/+=]{40}'
JWT:          '\beyJ[\w-]*\.[\w-]*\.[\w-]*\b'

Tiny test harness (Python + pytest):

# test_redaction_regex.py
import re, yaml
rx = yaml.safe_load(open('secrets_piiregex.yaml'))
CASES = {
  'EMAIL': 'Contact me at a.ops+dev@example.io today',
  'OPENAI_KEY': 'token sk-1234567890ABCDEFGHIJKLMNOPQRSTUV',
  'AWS_AKID': 'env AKIAABCDEFGHIJKLMNOP',
  'US_SSN': 'holder 123-45-6789',
}
for k,v in CASES.items():
    assert re.search(rx[k], v, flags=re.I)

Fallback masking (Python):

# naive_mask.py
import re, yaml
rx = yaml.safe_load(open('secrets_piiregex.yaml'))
text = open('[INPUT_FILE]').read()
for label, pat in rx.items():
    text = re.sub(pat, f'[{label}_MASKED]', text, flags=re.I)
open('[OUTPUT_FILE]', 'w').write(text)

Tip: run regex pre‑filters before model detection to trim obvious secrets and reduce false positives downstream.

Make subflow [detect→mask→annotate]

This template routes: Detect → Anonymize → Emit Redaction Log → Route. Duplicate the subflow before every external call and before trace/log export.

Node list and settings:

  1. HTTP: Detect
  • Name: 'Detect PII (Presidio)'
  • URL: [PRESIDIO_URL]/analyze
  • Method: POST
  • Body (JSON): { "text": "{{1.input_text}}", "language": "en", "entities": ["EMAIL_ADDRESS","PHONE_NUMBER","US_SSN","IBAN_CODE","OPENAI_API_KEY","GITHUB_TOKEN","AWS_ACCESS_KEY_ID","AWS_SECRET_ACCESS_KEY","JWT_TOKEN"] }
  1. HTTP: Anonymize
  • Name: 'Anonymize (Presidio)'
  • URL: [PRESIDIO_URL]/anonymize
  • Body: { "text": "{{1.input_text}}", "analyzer_results": {{1.body}}, "anonymizer_config": {{2.anonymizer_map}} }
  1. JSON: Build Redaction Log
  • Name: 'Redaction Log'
  • Template object: see "Redaction Log schema" section; fill [RUN_ID],[ENV],[WORKSPACE_NAME].
  1. Data store/HTTP: Write Log
  • Destination: [LOG_DESTINATION]
  1. Router: Decisions
  • If detection/anonymize error → 'Fail‑Closed' path: Queue payload + POST to [ALERT_CHANNEL].
  • Else if entity_counts.total > 0 → Proceed with {{masked_text}}.
  • Else → Pass original input.

Export/Import tip:

  • Save this as a Make subflow and call it as the first module inside any scenario making external calls. Keep [ANON_SALT] and [PRESIDIO_URL] in Make variables per environment.

n8n subflow [import JSON + set env]

Import this minimal workflow, connect credentials, and set environment variables. Place the Subworkflow node before any external request or trace export.

{
  "name": "Redaction Subflow",
  "nodes": [
    {
      "id": "DetectPII",
      "name": "Detect PII (Presidio)",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "[PRESIDIO_URL]/analyze",
        "method": "POST",
        "jsonParameters": true,
        "options": {},
        "bodyParametersJson": "{\n  \"text\": {{ $json.input_text }},\n  \"language\": \"en\",\n  \"entities\": [\"EMAIL_ADDRESS\",\"PHONE_NUMBER\",\"US_SSN\",\"IBAN_CODE\",\"OPENAI_API_KEY\",\"GITHUB_TOKEN\",\"AWS_ACCESS_KEY_ID\",\"AWS_SECRET_ACCESS_KEY\",\"JWT_TOKEN\"]\n}"
      }
    },
    {
      "id": "Anonymize",
      "name": "Anonymize (Presidio)",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "[PRESIDIO_URL]/anonymize",
        "method": "POST",
        "jsonParameters": true,
        "bodyParametersJson": "{\n  \"text\": {{ $json.input_text }},\n  \"analyzer_results\": {{ $('DetectPII').item.json.body }},\n  \"anonymizer_config\": {{ $json.anonymizer_map }}\n}"
      }
    },
    {
      "id": "BuildLog",
      "name": "Build Redaction Log",
      "type": "n8n-nodes-base.function",
      "parameters": {
        "functionCode": "const det = $items('DetectPII')[0].json.body || [];\nconst masked = $items('Anonymize')[0].json.text || '';\nconst counts = det.reduce((m,e)=>{m[e.entity_type]=(m[e.entity_type]||0)+1; m.total=(m.total||0)+1; return m;},{});\nreturn [{ json: {\n  run_id: $json.RUN_ID, env: '[ENV]', workspace: '[WORKSPACE_NAME]',\n  detector: { name: 'presidio', version: '[DET_VERSION]' },\n  entity_counts: counts,\n  sample_masked_spans: det.slice(0,5).map(e=>({ entity_type: e.entity_type, start: e.start, end: e.end })),\n  decision: counts.total>0 ? 'input_masked' : 'pass_through',\n  input_bytes: Buffer.from($json.input_text||'').length,\n  output_bytes: Buffer.from(masked||'').length,\n  latency_ms: { detect: 0, anonymize: 0 },\n  ts: new Date().toISOString()\n}, paired: { masked_text: masked } }];"
      }
    },
    {
      "id": "WriteLog",
      "name": "Write Log",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": { "url": "[LOG_ENDPOINT]", "method": "POST", "jsonParameters": true, "bodyParametersJson": "={{$json}}" }
    },
    {
      "id": "Route",
      "name": "Router",
      "type": "n8n-nodes-base.switch",
      "parameters": { "property": "={{$json.entity_counts.total || 0}}", "rules": { "rules": [ { "operation": "larger", "value": 0 } ] } }
    }
  ],
  "connections": { "DetectPII": { "main": [[{"node":"Anonymize","type":"main","index":0}]] }, "Anonymize": { "main": [[{"node":"BuildLog","type":"main","index":0}]] }, "BuildLog": { "main": [[{"node":"WriteLog","type":"main","index":0}], [{"node":"Route","type":"main","index":0}]] } }
}

Notes:

  • Replace [LOG_ENDPOINT] with your sink (HTTP collector, webhook, etc.).
  • Add an Error Trigger node to route exceptions to [ALERT_CHANNEL] and store the raw payload in a quarantine queue.

Redaction Log schema + dashboard tiles

Emit one JSON object per run. Keep it small, consistent, and query‑friendly.

Schema (conceptual):

{
  "run_id": "[RUN_ID]",
  "ts": "2026-05-01T12:00:00Z",
  "env": "[ENV]",
  "workspace": "[WORKSPACE_NAME]",
  "detector": { "name": "presidio|privacy_filter|cloud", "version": "[DET_VERSION]" },
  "entity_counts": { "EMAIL_ADDRESS": 2, "OPENAI_API_KEY": 1, "total": 3 },
  "sample_masked_spans": [
    { "entity_type": "EMAIL_ADDRESS", "start": 92, "end": 107 },
    { "entity_type": "OPENAI_API_KEY", "start": 144, "end": 185 }
  ],
  "decision": "input_masked|pass_through|fail_closed",
  "input_bytes": 1234,
  "output_bytes": 1210,
  "latency_ms": { "detect": 38, "anonymize": 12 },
  "cost_estimate_usd": 0.0000,
  "source": { "service": "[SERVICE_NAME]", "operation": "[OP_NAME]" }
}

Example tiles (adapt to your BI tool):

  • Entity counts by type (7d): group by entity_type (explode entity_counts) and sum values.
  • Mask vs pass decisions: count by decision per day.
  • Detector/version: group by detector.name, detector.version to track rollouts.

BigQuery helper views (pseudo‑SQL):

-- explode entity_counts
SELECT run_id, ts, env, detector.name AS detector, detector.version AS version,
       key AS entity_type, value AS cnt
FROM `[LOG_DATASET].[TABLE]`, UNNEST(JSON_EXTRACT_KEYS(entity_counts)) AS key,
UNNEST([STRUCT(CAST(JSON_VALUE(JSON_EXTRACT(entity_counts, CONCAT('$.', key)) ) AS INT64) AS value)]);

Tip: store 2–5 masked span examples max; never store raw values.

Detector swap layer [Privacy Filter / cloud DLP]

Switch detectors without changing the rest of the subflow.

Option A — OpenAI Privacy Filter (self‑hosted):

  • Endpoint: [PRIVACY_FILTER_URL]/filter
  • Request: { "text": "[RAW_TEXT]" }
  • Response: { "masked_text": "...", "entities": [{"type":"email","start":..,"end":..}], "version": "[DET_VERSION]" }
  • Map entities to the Redaction Log; use masked_text downstream.

Option B — Cloud DLP/PII service (managed):

  • Wrap a thin adapter that normalizes results to { masked_text, entities[], version }.
  • Expect higher latency + per‑request cost; set cost_estimate_usd in the log from headers/bytes.

Keep the operator map consistent so your downstream behavior doesn’t change as you swap detectors.

Fail‑closed branch [deterministic + alert]

Use this when redaction fails or the detector times out.

Pseudocode:

try {
  detect(); anonymize(); write_log('input_masked'|'pass_through'); forward(masked_or_raw);
} catch (e) {
  queue(original_payload, reason=e.message); write_log('fail_closed'); alert([ALERT_CHANNEL]);
}

Checklist:

  • Queue: [S3 bucket|KV|DB table] named [WORKSPACE_NAME]-quarantine-[ENV].
  • Alert: POST to [ALERT_CHANNEL] with [RUN_ID], reason, and a link to the quarantined item.
  • Auto‑retry: backoff with jitter; max [N] attempts.
  • Circuit‑breaker: if fail‑closed rate > [THRESHOLD]% over [WINDOW] mins, disable non‑essential external calls and surface a status banner to ops.

Defaults, validation, and rollout [copy this]

Fast defaults for a solo operator:

  • Language: 'en' (extend later per client).
  • Entities: email, phone, SSN, IBAN, API keys/tokens (OpenAI, GitHub, AWS), JWT.
  • Operators: mask secrets fully; pseudonymize IBAN with SHA‑256 + [ANON_SALT]; keep last 2–4 digits for phone/SSN.
  • Placement: pre‑call redaction subflow + post‑call output scan (reuse same subflow).
  • Logs: one entry per run, no raw values, include detector name/version, entity_counts, decision, latency, and cost.

Validation sample:

"Email jordan.ops@example.io and use sk-1234567890ABCDEFGHIJKLMNOPQRSTUV. AWS key AKIAABCDEFGHIJKLMNOP. SSN 123-45-6789."
→ "Email ********************* and use [API_KEY_MASKED]. AWS key [AWS_ACCESS_KEY_ID_MASKED]. SSN XXX-XX-6789."

Rollout plan:

  1. Dev: run regex‑only prefilter + Presidio; verify logs.
  2. Stage: dual‑scan inputs/outputs; measure mask rate and latency budget.
  3. Prod: enable fail‑closed; alert on any detector errors; review dashboards weekly.