Supabase RAG Starter Pack — Paste‑Ready SQL, Cron, and Retrieval Guardrails
Paste‑ready SQL, cron jobs, an Edge Function scaffold, and Make/n8n ingestion blueprints to launch a private RAG stack on Supabase with pgvector, automatic re‑embeds, and retrieval guardrails. Built for solo automation consultants shipping client‑private Q&A and helpdesk assistants.
Use this template to stand up a production‑grade private RAG baseline on Supabase in under an hour. Copy each section into your project, replace the [BRACKETS], and run the SQL blocks in your Supabase SQL editor. Then deploy the Edge Function and choose either the Make.com or n8n blueprint for ingestion. Guardrails (keyword probe + max‑distance) are included so you can block low‑confidence LLM calls on day one.
00 — Prerequisites and defaults [FILL THESE FIRST]
- Supabase project: [SUPABASE_PROJECT_NAME]
- Schema name (optional): [SCHEMA] (default: public)
- Tenancy key (if you serve multiple clients): [TENANT_ID]
- Embedding model: [EMBED_MODEL] (e.g.,
text-embedding-3-small) - Embedding dimensions: [EMBED_DIM] (e.g.,
1536for OpenAI small) - Guardrail thresholds:
- Max cosine distance: [MAX_COSINE_DISTANCE] (start at
0.20; tune per corpus) - Minimum lexical overlap: [MIN_TS_RANK] (start at
0.05‑0.10)
- Max cosine distance: [MAX_COSINE_DISTANCE] (start at
- Chunking defaults:
- Target tokens per chunk: [CHUNK_TOKENS] (start
600) - Overlap tokens: [OVERLAP_TOKENS] (start
120)
- Target tokens per chunk: [CHUNK_TOKENS] (start
- Edge Function name: [EDGE_FUNCTION_NAME] (e.g.,
embed-upsert) - Ingestion choice: [MAKE|N8N]
- Service environment:
- [SUPABASE_URL]
- [SUPABASE_SERVICE_ROLE_KEY]
- [OPENAI_API_KEY] (or [COHERE_API_KEY])
How to apply:
- Run SQL 01→05 in order. 2) Deploy the Edge Function from the scaffold. 3) Turn on pg_cron jobs (SQL 06). 4) Import the Make or n8n blueprint and paste your keys/URLs. 5) Test the Retrieval Query (SQL 07) with your own [QUERY_TEXT].
SQL 01 — Enable extensions and helpers
Run this once per project. It enables required extensions and a utility hash function used for change detection.
-- SQL 01 — Extensions & utilities
create extension if not exists vector; -- pgvector
create extension if not exists pgcrypto; -- for gen_random_uuid() + digest()
create extension if not exists pg_trgm; -- for trigram/tsvector helpers
create extension if not exists unaccent; -- optional text normalization
create extension if not exists pg_net; -- optional: HTTP from SQL (cron→Edge call)
create extension if not exists pg_cron; -- scheduler
-- Utility: stable SHA256 for change detection
create or replace function util_sha256(text) returns text language sql immutable as $$
select encode(digest($1, 'sha256'), 'hex');
$$;
SQL 02 — Schema: documents, chunks, embed_queue
Documents are your sources; chunks are the units you embed and retrieve. Defaults assume OpenAI small (1,536 dims). Adjust [EMBED_DIM] if you change models.
-- SQL 02 — Core tables
create table if not exists documents (
id uuid primary key default gen_random_uuid(),
tenant_id text not null default 'default',
source_url text,
source_type text, -- e.g., 'notion', 'pdf', 'md'
title text,
raw text, -- full unchunked source (optional)
meta jsonb not null default '{}'::jsonb,
content_hash text not null, -- util_sha256(raw or canonicalized text)
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
create table if not exists chunks (
id uuid primary key default gen_random_uuid(),
tenant_id text not null default 'default',
document_id uuid not null references documents(id) on delete cascade,
ordinal int not null, -- order within document
text text not null, -- chunk content
token_count int,
embedding vector([EMBED_DIM]), -- set to your model dims
last_embedded_at timestamptz,
content_hash text not null, -- util_sha256(text)
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
tsv tsvector generated always as (
setweight(to_tsvector('english', coalesce(title(text), '')), 'A') ||
setweight(to_tsvector('english', coalesce(text, '')), 'B')
) stored
);
-- Lightweight queue for re-embeds
create table if not exists embed_queue (
id bigserial primary key,
chunk_id uuid not null references chunks(id) on delete cascade,
enqueue_reason text not null, -- 'changed-hash' | 'scheduled-refresh'
attempts int not null default 0,
status text not null default 'queued', -- 'queued' | 'processing' | 'done' | 'error'
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
-- Helpful view: unembedded or stale chunks (refresh older than [REFRESH_DAYS])
create or replace view v_chunks_needing_embed as
select c.*
from chunks c
where c.embedding is null
or c.last_embedded_at is null
or c.updated_at > c.last_embedded_at
or c.last_embedded_at < now() - interval '[REFRESH_DAYS] days';
SQL 03 — Indexes: HNSW or IVFFlat + GIN for lexical gate
Pick one index path to start. HNSW is great for higher recall and larger corpora; IVFFlat builds faster and uses less memory. For very small sets (< 10k chunks), you can skip ANNS and use exact search first.
-- SQL 03A — HNSW (cosine) for medium→large sets
create index if not exists idx_chunks_embedding_hnsw
on chunks using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
-- Optional: tune search depth at session or database level
-- alter database [YOUR_DB] set hnsw.ef_search = 40; -- start 20–60
-- SQL 03B — IVFFlat (cosine) for faster builds / lower memory
-- choose lists ≈ sqrt(n_chunks); adjust as corpus grows
create index if not exists idx_chunks_embedding_ivfflat
on chunks using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
-- Optional: increase probes for better recall at some latency cost
-- alter database [YOUR_DB] set ivfflat.probes = 10; -- start 5–20
-- SQL 03C — Keyword probe index (cheap lexical gate)
create index if not exists idx_chunks_tsv on chunks using gin (tsv);
-- Supporting btree indexes
create index if not exists idx_chunks_doc on chunks(document_id);
create index if not exists idx_chunks_tenant on chunks(tenant_id);
Index choice policy (put this in your runbook):
- < 10k chunks: exact search (no ANNS) is fine; add GIN on
tsvonly. - 10k–500k chunks: start with IVFFlat (lists ≈ sqrt(n)); switch to HNSW if recall suffers.
- > 500k chunks or high‑recall QA: HNSW first; profile memory and build times before prod.
SQL 04 — RLS (Row‑Level Security) scaffolding
Enable RLS to ensure per‑tenant isolation. Use service role only in trusted server contexts (Edge Function). Replace [TENANT_ID] checks with your auth claim mapping.
-- SQL 04 — RLS per-tenant
alter table documents enable row level security;
alter table chunks enable row level security;
alter table embed_queue enable row level security;
-- Example policies: tenant can read its rows; only service role can write
create policy "tenant_select_docs" on documents
for select using (tenant_id = current_setting('request.jwt.claims.tenant_id', true));
create policy "tenant_select_chunks" on chunks
for select using (tenant_id = current_setting('request.jwt.claims.tenant_id', true));
-- Service role bypass for writes (Edge Function uses service key)
create policy "service_write_docs" on documents for all
using (auth.role() = 'service_role') with check (auth.role() = 'service_role');
create policy "service_write_chunks" on chunks for all
using (auth.role() = 'service_role') with check (auth.role() = 'service_role');
create policy "service_write_queue" on embed_queue for all
using (auth.role() = 'service_role') with check (auth.role() = 'service_role');
SQL 05 — Enqueue functions (changed‑hash and nightly refresh)
These helpers enqueue changed chunks and perform nightly refresh checks. Use pg_cron to run them. If you prefer event‑driven only, you can skip the nightly sweep.
-- SQL 05 — Queue helpers
create or replace function enqueue_if_changed(p_chunk_id uuid, p_new_hash text) returns void
language plpgsql as $$
begin
if exists (
select 1 from chunks c
where c.id = p_chunk_id and c.content_hash <> p_new_hash
) then
insert into embed_queue (chunk_id, enqueue_reason)
values (p_chunk_id, 'changed-hash')
on conflict do nothing;
end if;
end; $$;
create or replace function enqueue_nightly_refresh() returns int
language plpgsql as $$
declare v_count int; begin
insert into embed_queue (chunk_id, enqueue_reason)
select id, 'scheduled-refresh' from v_chunks_needing_embed
on conflict do nothing;
get diagnostics v_count = row_count; return v_count;
end; $$;
SQL 06 — pg_cron: schedule refresh and optional queue drain
This schedules the nightly refresh and, if you want fully in‑DB scheduling, posts to your Edge Function. Replace [EDGE_FUNCTION_URL] if using pg_net. Otherwise, your worker (Make/n8n) can call the Edge Function directly.
-- SQL 06 — pg_cron jobs
-- Nightly at 02:15 UTC: sweep for stale/unembedded chunks
select cron.schedule('rag-nightly-refresh', '15 2 * * *', $$
select enqueue_nightly_refresh();
$$);
-- Optional: kick the Edge Function every minute to drain the queue (pg_net)
-- Replace [EDGE_FUNCTION_URL] and add auth header for service key
select cron.schedule('rag-embed-drain', '* * * * *', $$
select net.http_post(
url := '[EDGE_FUNCTION_URL]',
headers := jsonb_build_object('Authorization','Bearer '||current_setting('supabase.service_key', true)),
body := jsonb_build_object('op','drain-queue')
);
$$);
Edge Function — Embed & upsert scaffold (TypeScript/Deno)
Copy this into supabase/functions/[EDGE_FUNCTION_NAME]/index.ts and deploy. It drains embed_queue, generates embeddings, and upserts vectors. Replace [BRACKETS] and confirm your env vars in the Supabase dashboard.
// Edge Function — [EDGE_FUNCTION_NAME]
// Runtime: Deno. Permissions: service role key in env.
// Env: SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY, OPENAI_API_KEY, EMBED_MODEL, EMBED_DIM
import 'jsr:@supabase/functions-js/edge-runtime';
import OpenAI from 'npm:openai@4';
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2';
const supabase = createClient(Deno.env.get('SUPABASE_URL')!, Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!);
const openai = new OpenAI({ apiKey: Deno.env.get('OPENAI_API_KEY')! });
async function nextJob() {
const { data, error } = await supabase
.from('embed_queue')
.select('id, chunk_id')
.eq('status', 'queued')
.order('id', { ascending: true })
.limit(1)
.maybeSingle();
if (error || !data) return null;
await supabase.from('embed_queue').update({ status: 'processing' }).eq('id', data.id);
return data;
}
async function embedText(text: string) {
const model = Deno.env.get('EMBED_MODEL') || 'text-embedding-3-small';
const dims = Number(Deno.env.get('EMBED_DIM') || 1536);
const res = await openai.embeddings.create({ model, input: text, dimensions: dims });
return res.data[0].embedding;
}
async function processJob(job: { id: number; chunk_id: string }) {
const { data: chunk } = await supabase.from('chunks').select('id, text').eq('id', job.chunk_id).single();
if (!chunk) throw new Error('Missing chunk');
const vec = await embedText(chunk.text);
const { error: upErr } = await supabase
.from('chunks')
.update({ embedding: vec, last_embedded_at: new Date().toISOString() })
.eq('id', job.chunk_id);
if (upErr) throw upErr;
await supabase.from('embed_queue').update({ status: 'done' }).eq('id', job.id);
}
Deno.serve(async (req) => {
const payload = await req.json().catch(() => ({}));
if (payload.op === 'drain-queue') {
let processed = 0;
for (let i = 0; i < 50; i++) { // soft cap per invocation
const job = await nextJob();
if (!job) break;
try { await processJob(job); processed++; } catch (e) {
await supabase.from('embed_queue').update({ status: 'error' }).eq('id', job.id);
}
}
return new Response(JSON.stringify({ ok: true, processed }), { headers: { 'Content-Type': 'application/json' } });
}
return new Response(JSON.stringify({ ok: true, note: 'POST {"op":"drain-queue"}' }), { headers: { 'Content-Type': 'application/json' } });
});
SQL 07 — Retrieval: lexical probe + cosine distance cap
This request pattern enforces a cheap lexical gate, a semantic distance cap, and returns top‑K with citations. Swap [PLACEHOLDERS] and test in the SQL editor.
-- SQL 07 — Retrieval with guardrails
with
params as (
select
'[TENANT_ID]'::text as tenant_id,
websearch_to_tsquery('english', '[QUERY_TEXT]') as q_ts,
'[RAW_QUERY_FOR_EMBED]'::text as q_text,
'[MAX_COSINE_DISTANCE]'::float as max_d,
[TOP_K]::int as k
),
query_vec as (
-- Compute query embedding in your app; pass as parameter $1::vector
select $1::vector as v
),
lexical as (
select c.id
from chunks c, params p
where c.tenant_id = p.tenant_id and c.tsv @@ p.q_ts
),
semantic as (
select c.id, c.document_id, c.text,
(c.embedding <=> q.v) as cos_distance
from chunks c
join query_vec q on true
join params p on true
where c.tenant_id = p.tenant_id
and c.id in (select id from lexical) -- lexical gate
and c.embedding is not null
and (c.embedding <=> q.v) <= p.max_d -- distance cap
order by c.embedding <=> q.v
limit (select k from params)
)
select s.*, d.title, d.source_url
from semantic s
join documents d on d.id = s.document_id;
Notes:
- Pass the query embedding as a bind parameter to avoid recomputing in‑DB.
- If
semanticreturns zero rows, skip the LLM call and surface “No confident match.” Log the query to revisit thresholds.
Ingestion — Make.com blueprint (steps + fill‑ins)
This scenario ingests a document, chunks it, compares hashes, and enqueues only changed chunks. Import to Make.com and replace the bracketed values.
- Webhook (Custom): receives
{ tenant_id, document: { url|html|markdown, title, meta } }. - Text Normalize (Code step): strip HTML, unaccent, collapse whitespace.
- Chunker (Code step): sliding window using [CHUNK_TOKENS]/[OVERLAP_TOKENS].
- Hash each chunk:
util_sha256(chunk_text)via Supabase RPC or in‑scenario. - Upsert
documentsrow bysource_url+tenant_id; updatecontent_hash. - For each chunk:
- Fetch existing
content_hashfor(document_id, ordinal). - If changed,
POSTto Supabase REST:insert into chunks(orupdate) and insert intoembed_queuewithenqueue_reason='changed-hash'.
- Fetch existing
- Optional: call Edge Function
drain-queue.
Blueprint variables to fill:
- [SUPABASE_REST_URL]
- [SUPABASE_SERVICE_ROLE_KEY]
- [EDGE_FUNCTION_URL]
- [TENANT_ID]
Example: HTTP Upsert (Make HTTP module)
POST [SUPABASE_REST_URL]/rest/v1/chunks
Headers: apikey: [SUPABASE_SERVICE_ROLE_KEY], Authorization: Bearer [SUPABASE_SERVICE_ROLE_KEY], Content-Type: application/json
Body: [{
"tenant_id": "[TENANT_ID]",
"document_id": "{{doc_id}}",
"ordinal": {{index}},
"text": "{{chunk_text}}",
"content_hash": "{{chunk_hash}}"
}]
Prefer: resolution=merge-duplicates
Ingestion — n8n workflow (nodes + placeholders)
Same flow as Make, with n8n nodes. Import the workflow JSON and replace credentials.
Node sketch:
- Webhook → Function (normalize) → Function (chunk) → HTTP Request (Supabase upsert
documents) → IF (changed‑hash?) → HTTP Request (upsertchunks) → HTTP Request (insertembed_queue) → HTTP Request (Edgedrain-queue, optional).
Credential/env placeholders:
- [SUPABASE_REST_URL]
- [SUPABASE_SERVICE_ROLE_KEY]
- [EDGE_FUNCTION_URL]
- [TENANT_ID]
Guard against retries: add an Idempotency-Key header using {{document_id}}-{{ordinal}}-{{content_hash}}.
Policy card — chunking, indexes, and guardrails (defaults)
Copy this section into Notion (or your runbook) and adjust per client.
- Chunk sizes
- [CHUNK_TOKENS]=600, [OVERLAP_TOKENS]=120 for FAQs/docs
- Long narrative PDFs: 800/160
- Dense specs: 400/80
- Index choice by table size
- <10k chunks: no ANNS (exact), GIN on
tsv - 10k–500k: IVFFlat (lists≈√n), probes=10
- >500k or high‑recall QA: HNSW (m=16, ef_construction=64, ef_search=40)
- <10k chunks: no ANNS (exact), GIN on
- Distance guardrails (cosine distance on normalized embeddings)
- Start
MAX_COSINE_DISTANCE=[0.20]and log rejects; tighten to0.18if noisy, loosen to0.22for sparse domains. - Always require lexical overlap via
tsv @@ websearch_to_tsquerywithts_rank_cd(tsv, query) >= [MIN_TS_RANK].
- Start
- Refresh cadence
- Re‑embed on change (hash difference) immediately.
- Nightly sweep (
enqueue_nightly_refresh) for misses and aged items > [REFRESH_DAYS]=30.
- Access control
- Enforce RLS by
tenant_idondocuments,chunks,embed_queue.
- Enforce RLS by
- Logging
- Track: queries blocked by lexical gate, by distance cap, by empty results; LLM calls avoided; average distance of served chunks.
Cost & sizing quicksheet [copy/paste into client deck]
Use these quick formulas when sizing and estimating.
- Embedding cost (OpenAI small):
$0.02 / 1M tokens→1,000 chunks × 500 tokens ≈ 0.5M tokens ≈ $0.01per full re‑embed.- Set [EMBED_MODEL]=
text-embedding-3-smallunless your evals demand larger vectors.
- Vector storage (pgvector):
bytes ≈ 4 × [EMBED_DIM] + 8→ at 1,536 dims ≈6,152 bytesper row before index/row overhead.
- Egress & storage (Supabase):
- Cached egress overage: ~$0.03/GB; uncached: ~$0.09/GB.
- Storage overage: ~$0.021/GB‑month; DB disk beyond plan: ~$0.125/GB.
- Throughput tips:
- Use the Batch API (if available) for large re‑embeds to raise TPM.
- Raise
ivfflat.probesorhnsw.ef_searchonly after checking the lexical gate impact.
Replace these with your client’s actual plan numbers before quoting. Cohere Embed pricing varies by account; verify in‑console if you switch providers.
Migration guardrails — dual‑write and versioning [optional]
Plan for growth without downtime.
- Dual‑write adapter: in your Edge Function, add a feature flag [DUAL_WRITE=true|false] and a second client (e.g., Pinecone/Weaviate). On write, upsert to both.
- Model versioning: add
embed_modelandembed_versioncolumns tochunks. On re‑embed, write new vectors side‑by‑side; flip readers toembed_version=[TARGET]after backfill. - Exact search fallback: if filters get heavy and recall drops, run a two‑stage query:
exact top‑Nby keyword →re‑rankby vector locally.
Version log stub [keep with your project]
Document updates to your build so collaborators know what changed.
- v[VERSION]: [DATE]
- Changed: [WHAT_CHANGED]
- Reason: [WHY]
- Rollback: [HOW]
- Next review on: [DATE+30]