Template

Supabase RAG Starter Pack — Paste‑Ready SQL, Cron, and Retrieval Guardrails

Paste‑ready SQL, cron jobs, an Edge Function scaffold, and Make/n8n ingestion blueprints to launch a private RAG stack on Supabase with pgvector, automatic re‑embeds, and retrieval guardrails. Built for solo automation consultants shipping client‑private Q&A and helpdesk assistants.

Use this template to stand up a production‑grade private RAG baseline on Supabase in under an hour. Copy each section into your project, replace the [BRACKETS], and run the SQL blocks in your Supabase SQL editor. Then deploy the Edge Function and choose either the Make.com or n8n blueprint for ingestion. Guardrails (keyword probe + max‑distance) are included so you can block low‑confidence LLM calls on day one.

00 — Prerequisites and defaults [FILL THESE FIRST]

  • Supabase project: [SUPABASE_PROJECT_NAME]
  • Schema name (optional): [SCHEMA] (default: public)
  • Tenancy key (if you serve multiple clients): [TENANT_ID]
  • Embedding model: [EMBED_MODEL] (e.g., text-embedding-3-small)
  • Embedding dimensions: [EMBED_DIM] (e.g., 1536 for OpenAI small)
  • Guardrail thresholds:
    • Max cosine distance: [MAX_COSINE_DISTANCE] (start at 0.20; tune per corpus)
    • Minimum lexical overlap: [MIN_TS_RANK] (start at 0.050.10)
  • Chunking defaults:
    • Target tokens per chunk: [CHUNK_TOKENS] (start 600)
    • Overlap tokens: [OVERLAP_TOKENS] (start 120)
  • Edge Function name: [EDGE_FUNCTION_NAME] (e.g., embed-upsert)
  • Ingestion choice: [MAKE|N8N]
  • Service environment:
    • [SUPABASE_URL]
    • [SUPABASE_SERVICE_ROLE_KEY]
    • [OPENAI_API_KEY] (or [COHERE_API_KEY])

How to apply:

  1. Run SQL 01→05 in order. 2) Deploy the Edge Function from the scaffold. 3) Turn on pg_cron jobs (SQL 06). 4) Import the Make or n8n blueprint and paste your keys/URLs. 5) Test the Retrieval Query (SQL 07) with your own [QUERY_TEXT].

SQL 01 — Enable extensions and helpers

Run this once per project. It enables required extensions and a utility hash function used for change detection.

-- SQL 01 — Extensions & utilities
create extension if not exists vector;           -- pgvector
create extension if not exists pgcrypto;         -- for gen_random_uuid() + digest()
create extension if not exists pg_trgm;          -- for trigram/tsvector helpers
create extension if not exists unaccent;         -- optional text normalization
create extension if not exists pg_net;           -- optional: HTTP from SQL (cron→Edge call)
create extension if not exists pg_cron;          -- scheduler

-- Utility: stable SHA256 for change detection
create or replace function util_sha256(text) returns text language sql immutable as $$
  select encode(digest($1, 'sha256'), 'hex');
$$;

SQL 02 — Schema: documents, chunks, embed_queue

Documents are your sources; chunks are the units you embed and retrieve. Defaults assume OpenAI small (1,536 dims). Adjust [EMBED_DIM] if you change models.

-- SQL 02 — Core tables
create table if not exists documents (
  id uuid primary key default gen_random_uuid(),
  tenant_id text not null default 'default',
  source_url text,
  source_type text,            -- e.g., 'notion', 'pdf', 'md'
  title text,
  raw text,                    -- full unchunked source (optional)
  meta jsonb not null default '{}'::jsonb,
  content_hash text not null,  -- util_sha256(raw or canonicalized text)
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);

create table if not exists chunks (
  id uuid primary key default gen_random_uuid(),
  tenant_id text not null default 'default',
  document_id uuid not null references documents(id) on delete cascade,
  ordinal int not null,                   -- order within document
  text text not null,                     -- chunk content
  token_count int,
  embedding vector([EMBED_DIM]),          -- set to your model dims
  last_embedded_at timestamptz,
  content_hash text not null,             -- util_sha256(text)
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now(),
  tsv tsvector generated always as (
    setweight(to_tsvector('english', coalesce(title(text), '')), 'A') ||
    setweight(to_tsvector('english', coalesce(text, '')), 'B')
  ) stored
);

-- Lightweight queue for re-embeds
create table if not exists embed_queue (
  id bigserial primary key,
  chunk_id uuid not null references chunks(id) on delete cascade,
  enqueue_reason text not null,           -- 'changed-hash' | 'scheduled-refresh'
  attempts int not null default 0,
  status text not null default 'queued',  -- 'queued' | 'processing' | 'done' | 'error'
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);

-- Helpful view: unembedded or stale chunks (refresh older than [REFRESH_DAYS])
create or replace view v_chunks_needing_embed as
select c.*
from chunks c
where c.embedding is null
   or c.last_embedded_at is null
   or c.updated_at > c.last_embedded_at
   or c.last_embedded_at < now() - interval '[REFRESH_DAYS] days';

SQL 03 — Indexes: HNSW or IVFFlat + GIN for lexical gate

Pick one index path to start. HNSW is great for higher recall and larger corpora; IVFFlat builds faster and uses less memory. For very small sets (< 10k chunks), you can skip ANNS and use exact search first.

-- SQL 03A — HNSW (cosine) for medium→large sets
create index if not exists idx_chunks_embedding_hnsw
  on chunks using hnsw (embedding vector_cosine_ops)
  with (m = 16, ef_construction = 64);

-- Optional: tune search depth at session or database level
-- alter database [YOUR_DB] set hnsw.ef_search = 40;  -- start 20–60

-- SQL 03B — IVFFlat (cosine) for faster builds / lower memory
-- choose lists ≈ sqrt(n_chunks); adjust as corpus grows
create index if not exists idx_chunks_embedding_ivfflat
  on chunks using ivfflat (embedding vector_cosine_ops)
  with (lists = 100);
-- Optional: increase probes for better recall at some latency cost
-- alter database [YOUR_DB] set ivfflat.probes = 10;  -- start 5–20

-- SQL 03C — Keyword probe index (cheap lexical gate)
create index if not exists idx_chunks_tsv on chunks using gin (tsv);

-- Supporting btree indexes
create index if not exists idx_chunks_doc on chunks(document_id);
create index if not exists idx_chunks_tenant on chunks(tenant_id);

Index choice policy (put this in your runbook):

  • < 10k chunks: exact search (no ANNS) is fine; add GIN on tsv only.
  • 10k–500k chunks: start with IVFFlat (lists ≈ sqrt(n)); switch to HNSW if recall suffers.
  • > 500k chunks or high‑recall QA: HNSW first; profile memory and build times before prod.

SQL 04 — RLS (Row‑Level Security) scaffolding

Enable RLS to ensure per‑tenant isolation. Use service role only in trusted server contexts (Edge Function). Replace [TENANT_ID] checks with your auth claim mapping.

-- SQL 04 — RLS per-tenant
alter table documents enable row level security;
alter table chunks enable row level security;
alter table embed_queue enable row level security;

-- Example policies: tenant can read its rows; only service role can write
create policy &quot;tenant_select_docs&quot; on documents
  for select using (tenant_id = current_setting(&#39;request.jwt.claims.tenant_id&#39;, true));
create policy &quot;tenant_select_chunks&quot; on chunks
  for select using (tenant_id = current_setting(&#39;request.jwt.claims.tenant_id&#39;, true));

-- Service role bypass for writes (Edge Function uses service key)
create policy &quot;service_write_docs&quot; on documents for all
  using (auth.role() = &#39;service_role&#39;) with check (auth.role() = &#39;service_role&#39;);
create policy &quot;service_write_chunks&quot; on chunks for all
  using (auth.role() = &#39;service_role&#39;) with check (auth.role() = &#39;service_role&#39;);
create policy &quot;service_write_queue&quot; on embed_queue for all
  using (auth.role() = &#39;service_role&#39;) with check (auth.role() = &#39;service_role&#39;);

SQL 05 — Enqueue functions (changed‑hash and nightly refresh)

These helpers enqueue changed chunks and perform nightly refresh checks. Use pg_cron to run them. If you prefer event‑driven only, you can skip the nightly sweep.

-- SQL 05 — Queue helpers
create or replace function enqueue_if_changed(p_chunk_id uuid, p_new_hash text) returns void
language plpgsql as $$
begin
  if exists (
    select 1 from chunks c
    where c.id = p_chunk_id and c.content_hash &lt;&gt; p_new_hash
  ) then
    insert into embed_queue (chunk_id, enqueue_reason)
    values (p_chunk_id, &#39;changed-hash&#39;)
    on conflict do nothing;
  end if;
end; $$;

create or replace function enqueue_nightly_refresh() returns int
language plpgsql as $$
declare v_count int; begin
  insert into embed_queue (chunk_id, enqueue_reason)
  select id, &#39;scheduled-refresh&#39; from v_chunks_needing_embed
  on conflict do nothing;
  get diagnostics v_count = row_count; return v_count;
end; $$;

SQL 06 — pg_cron: schedule refresh and optional queue drain

This schedules the nightly refresh and, if you want fully in‑DB scheduling, posts to your Edge Function. Replace [EDGE_FUNCTION_URL] if using pg_net. Otherwise, your worker (Make/n8n) can call the Edge Function directly.

-- SQL 06 — pg_cron jobs
-- Nightly at 02:15 UTC: sweep for stale/unembedded chunks
select cron.schedule(&#39;rag-nightly-refresh&#39;, &#39;15 2 * * *&#39;, $$
  select enqueue_nightly_refresh();
$$);

-- Optional: kick the Edge Function every minute to drain the queue (pg_net)
-- Replace [EDGE_FUNCTION_URL] and add auth header for service key
select cron.schedule(&#39;rag-embed-drain&#39;, &#39;* * * * *&#39;, $$
  select net.http_post(
    url := &#39;[EDGE_FUNCTION_URL]&#39;,
    headers := jsonb_build_object(&#39;Authorization&#39;,&#39;Bearer &#39;||current_setting(&#39;supabase.service_key&#39;, true)),
    body := jsonb_build_object(&#39;op&#39;,&#39;drain-queue&#39;)
  );
$$);

Edge Function — Embed & upsert scaffold (TypeScript/Deno)

Copy this into supabase/functions/[EDGE_FUNCTION_NAME]/index.ts and deploy. It drains embed_queue, generates embeddings, and upserts vectors. Replace [BRACKETS] and confirm your env vars in the Supabase dashboard.

// Edge Function — [EDGE_FUNCTION_NAME]
// Runtime: Deno. Permissions: service role key in env.
// Env: SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY, OPENAI_API_KEY, EMBED_MODEL, EMBED_DIM
import &#39;jsr:@supabase/functions-js/edge-runtime&#39;;
import OpenAI from &#39;npm:openai@4&#39;;
import { createClient } from &#39;https://esm.sh/@supabase/supabase-js@2&#39;;

const supabase = createClient(Deno.env.get(&#39;SUPABASE_URL&#39;)!, Deno.env.get(&#39;SUPABASE_SERVICE_ROLE_KEY&#39;)!);
const openai = new OpenAI({ apiKey: Deno.env.get(&#39;OPENAI_API_KEY&#39;)! });

async function nextJob() {
  const { data, error } = await supabase
    .from(&#39;embed_queue&#39;)
    .select(&#39;id, chunk_id&#39;)
    .eq(&#39;status&#39;, &#39;queued&#39;)
    .order(&#39;id&#39;, { ascending: true })
    .limit(1)
    .maybeSingle();
  if (error || !data) return null;
  await supabase.from(&#39;embed_queue&#39;).update({ status: &#39;processing&#39; }).eq(&#39;id&#39;, data.id);
  return data;
}

async function embedText(text: string) {
  const model = Deno.env.get(&#39;EMBED_MODEL&#39;) || &#39;text-embedding-3-small&#39;;
  const dims = Number(Deno.env.get(&#39;EMBED_DIM&#39;) || 1536);
  const res = await openai.embeddings.create({ model, input: text, dimensions: dims });
  return res.data[0].embedding;
}

async function processJob(job: { id: number; chunk_id: string }) {
  const { data: chunk } = await supabase.from(&#39;chunks&#39;).select(&#39;id, text&#39;).eq(&#39;id&#39;, job.chunk_id).single();
  if (!chunk) throw new Error(&#39;Missing chunk&#39;);
  const vec = await embedText(chunk.text);
  const { error: upErr } = await supabase
    .from(&#39;chunks&#39;)
    .update({ embedding: vec, last_embedded_at: new Date().toISOString() })
    .eq(&#39;id&#39;, job.chunk_id);
  if (upErr) throw upErr;
  await supabase.from(&#39;embed_queue&#39;).update({ status: &#39;done&#39; }).eq(&#39;id&#39;, job.id);
}

Deno.serve(async (req) =&gt; {
  const payload = await req.json().catch(() =&gt; ({}));
  if (payload.op === &#39;drain-queue&#39;) {
    let processed = 0;
    for (let i = 0; i &lt; 50; i++) { // soft cap per invocation
      const job = await nextJob();
      if (!job) break;
      try { await processJob(job); processed++; } catch (e) {
        await supabase.from(&#39;embed_queue&#39;).update({ status: &#39;error&#39; }).eq(&#39;id&#39;, job.id);
      }
    }
    return new Response(JSON.stringify({ ok: true, processed }), { headers: { &#39;Content-Type&#39;: &#39;application/json&#39; } });
  }
  return new Response(JSON.stringify({ ok: true, note: &#39;POST {&quot;op&quot;:&quot;drain-queue&quot;}&#39; }), { headers: { &#39;Content-Type&#39;: &#39;application/json&#39; } });
});

SQL 07 — Retrieval: lexical probe + cosine distance cap

This request pattern enforces a cheap lexical gate, a semantic distance cap, and returns top‑K with citations. Swap [PLACEHOLDERS] and test in the SQL editor.

-- SQL 07 — Retrieval with guardrails
with
params as (
  select
    &#39;[TENANT_ID]&#39;::text as tenant_id,
    websearch_to_tsquery(&#39;english&#39;, &#39;[QUERY_TEXT]&#39;) as q_ts,
    &#39;[RAW_QUERY_FOR_EMBED]&#39;::text as q_text,
    &#39;[MAX_COSINE_DISTANCE]&#39;::float as max_d,
    [TOP_K]::int as k
),
query_vec as (
  -- Compute query embedding in your app; pass as parameter $1::vector
  select $1::vector as v
),
lexical as (
  select c.id
  from chunks c, params p
  where c.tenant_id = p.tenant_id and c.tsv @@ p.q_ts
),
semantic as (
  select c.id, c.document_id, c.text,
         (c.embedding &lt;=&gt; q.v) as cos_distance
  from chunks c
  join query_vec q on true
  join params p on true
  where c.tenant_id = p.tenant_id
    and c.id in (select id from lexical)          -- lexical gate
    and c.embedding is not null
    and (c.embedding &lt;=&gt; q.v) &lt;= p.max_d          -- distance cap
  order by c.embedding &lt;=&gt; q.v
  limit (select k from params)
)
select s.*, d.title, d.source_url
from semantic s
join documents d on d.id = s.document_id;

Notes:

  • Pass the query embedding as a bind parameter to avoid recomputing in‑DB.
  • If semantic returns zero rows, skip the LLM call and surface “No confident match.” Log the query to revisit thresholds.

Ingestion — Make.com blueprint (steps + fill‑ins)

This scenario ingests a document, chunks it, compares hashes, and enqueues only changed chunks. Import to Make.com and replace the bracketed values.

  1. Webhook (Custom): receives { tenant_id, document: { url|html|markdown, title, meta } }.
  2. Text Normalize (Code step): strip HTML, unaccent, collapse whitespace.
  3. Chunker (Code step): sliding window using [CHUNK_TOKENS]/[OVERLAP_TOKENS].
  4. Hash each chunk: util_sha256(chunk_text) via Supabase RPC or in‑scenario.
  5. Upsert documents row by source_url + tenant_id; update content_hash.
  6. For each chunk:
    • Fetch existing content_hash for (document_id, ordinal).
    • If changed, POST to Supabase REST: insert into chunks (or update) and insert into embed_queue with enqueue_reason=&#39;changed-hash&#39;.
  7. Optional: call Edge Function drain-queue.

Blueprint variables to fill:

  • [SUPABASE_REST_URL]
  • [SUPABASE_SERVICE_ROLE_KEY]
  • [EDGE_FUNCTION_URL]
  • [TENANT_ID]

Example: HTTP Upsert (Make HTTP module)

POST [SUPABASE_REST_URL]/rest/v1/chunks
Headers: apikey: [SUPABASE_SERVICE_ROLE_KEY], Authorization: Bearer [SUPABASE_SERVICE_ROLE_KEY], Content-Type: application/json
Body: [{
  &quot;tenant_id&quot;: &quot;[TENANT_ID]&quot;,
  &quot;document_id&quot;: &quot;{{doc_id}}&quot;,
  &quot;ordinal&quot;: {{index}},
  &quot;text&quot;: &quot;{{chunk_text}}&quot;,
  &quot;content_hash&quot;: &quot;{{chunk_hash}}&quot;
}]
Prefer: resolution=merge-duplicates

Ingestion — n8n workflow (nodes + placeholders)

Same flow as Make, with n8n nodes. Import the workflow JSON and replace credentials.

Node sketch:

  • Webhook → Function (normalize) → Function (chunk) → HTTP Request (Supabase upsert documents) → IF (changed‑hash?) → HTTP Request (upsert chunks) → HTTP Request (insert embed_queue) → HTTP Request (Edge drain-queue, optional).

Credential/env placeholders:

  • [SUPABASE_REST_URL]
  • [SUPABASE_SERVICE_ROLE_KEY]
  • [EDGE_FUNCTION_URL]
  • [TENANT_ID]

Guard against retries: add an Idempotency-Key header using {{document_id}}-{{ordinal}}-{{content_hash}}.

Policy card — chunking, indexes, and guardrails (defaults)

Copy this section into Notion (or your runbook) and adjust per client.

  • Chunk sizes
    • [CHUNK_TOKENS]=600, [OVERLAP_TOKENS]=120 for FAQs/docs
    • Long narrative PDFs: 800/160
    • Dense specs: 400/80
  • Index choice by table size
    • <10k chunks: no ANNS (exact), GIN on tsv
    • 10k–500k: IVFFlat (lists≈√n), probes=10
    • >500k or high‑recall QA: HNSW (m=16, ef_construction=64, ef_search=40)
  • Distance guardrails (cosine distance on normalized embeddings)
    • Start MAX_COSINE_DISTANCE=[0.20] and log rejects; tighten to 0.18 if noisy, loosen to 0.22 for sparse domains.
    • Always require lexical overlap via tsv @@ websearch_to_tsquery with ts_rank_cd(tsv, query) &gt;= [MIN_TS_RANK].
  • Refresh cadence
    • Re‑embed on change (hash difference) immediately.
    • Nightly sweep (enqueue_nightly_refresh) for misses and aged items > [REFRESH_DAYS]=30.
  • Access control
    • Enforce RLS by tenant_id on documents, chunks, embed_queue.
  • Logging
    • Track: queries blocked by lexical gate, by distance cap, by empty results; LLM calls avoided; average distance of served chunks.

Cost & sizing quicksheet [copy/paste into client deck]

Use these quick formulas when sizing and estimating.

  • Embedding cost (OpenAI small):
    • $0.02 / 1M tokens1,000 chunks × 500 tokens ≈ 0.5M tokens ≈ $0.01 per full re‑embed.
    • Set [EMBED_MODEL]=text-embedding-3-small unless your evals demand larger vectors.
  • Vector storage (pgvector):
    • bytes ≈ 4 × [EMBED_DIM] + 8 → at 1,536 dims ≈ 6,152 bytes per row before index/row overhead.
  • Egress & storage (Supabase):
    • Cached egress overage: ~$0.03/GB; uncached: ~$0.09/GB.
    • Storage overage: ~$0.021/GB‑month; DB disk beyond plan: ~$0.125/GB.
  • Throughput tips:
    • Use the Batch API (if available) for large re‑embeds to raise TPM.
    • Raise ivfflat.probes or hnsw.ef_search only after checking the lexical gate impact.

Replace these with your client’s actual plan numbers before quoting. Cohere Embed pricing varies by account; verify in‑console if you switch providers.

Migration guardrails — dual‑write and versioning [optional]

Plan for growth without downtime.

  • Dual‑write adapter: in your Edge Function, add a feature flag [DUAL_WRITE=true|false] and a second client (e.g., Pinecone/Weaviate). On write, upsert to both.
  • Model versioning: add embed_model and embed_version columns to chunks. On re‑embed, write new vectors side‑by‑side; flip readers to embed_version=[TARGET] after backfill.
  • Exact search fallback: if filters get heavy and recall drops, run a two‑stage query: exact top‑N by keyword → re‑rank by vector locally.

Version log stub [keep with your project]

Document updates to your build so collaborators know what changed.

  • v[VERSION]: [DATE]
    • Changed: [WHAT_CHANGED]
    • Reason: [WHY]
    • Rollback: [HOW]
  • Next review on: [DATE+30]