AI Automation

Supabase Ai Enrichment Engine.

An AI enrichment engine that turns raw content into structured SEO records with metadata, internal links, and usable taxonomy.

Client: Internal Tooling / Content OperationsTimeline: 4 weeksTeam: 1 engineer

200 records/hrProcessing speed

96%Metadata accuracy

85%Internal link coverage

100%Schema consistency

System architecture.

How it's built.

Component

Purpose

Technology

Reasoning

Data Ingestion

Accept raw content from RSS, scraping, manual entry

Supabase edge functions + webhooks

Real-time triggers, serverless processing

AI Orchestration

Coordinate metadata generation, linking, categorization

Custom Python + multi-agent logic

Parallel processing, independent task isolation

Metadata Generation

Generate SEO titles, descriptions, keywords

GPT with structured JSON outputs

Consistent schema, high semantic accuracy

Internal Linking

Find and inject relevant internal links

Embedding similarity + Supabase pgvector

Semantic matching, not keyword matching

Topic Clustering

Categorize content into topic groups

GPT classification + Supabase taxonomy

Automatic category assignment, consistent taxonomy

Schema Validation

Enforce data consistency before database write

Zod validation + Supabase row-level policies

Catch schema errors before they reach production

Engineering process.

How it was built.

Schema DesignWeek 1

Defined output schema with all required fields
Mapped source data inconsistencies
Designed validation rules

Database schema, field definitions, validation spec

AI Pipeline BuildWeek 2

Built GPT agents with structured output prompts
Implemented Supabase edge functions for ingestion
Created metadata generation pipeline

Working metadata enrichment pipeline

Linking & ClusteringWeek 3

Integrated pgvector for semantic similarity
Built internal link injection logic
Implemented topic clustering and taxonomy

Internal linking + categorization system

Validation & DeployWeek 4

Built Zod schema validation layer
Implemented error logging and retry
Stress-tested with 500+ records
Deployed to production

Production-ready enrichment engine

Engineering challenges.

What broke. How we fixed it.

Schema Consistency at Scale

Problem

Raw content from multiple sources had inconsistent structure. AI outputs varied in format. Database schema rejections were frequent.

Constraint

Cannot sanitize inputs manually at 200 records/hour. AI is non-deterministic by nature.

Solution

Strict JSON schema in every prompt. Zod validation before every database write. Multi-pass generation: generate → validate → regenerate if invalid.

Outcome

Schema consistency: 100% (all records written). First-pass accuracy: 94%. Rejections: zero in production.

Internal Link Accuracy

Problem

Keyword-based linking surfaced irrelevant related articles. Poor link quality reduced SEO value.

Constraint

Keyword matching fails with synonyms and conceptual relationships. Manual curation not viable at scale.

Solution

Switched to embedding-based similarity using Supabase pgvector. Each article embedded on ingest. Links generated by cosine similarity, not keyword overlap.

Outcome

Internal link relevance: 60% (keyword) → 85% (embedding). Coverage: 85% of articles have 3+ relevant links.

Measured impact.

Results. Numbers only.

Processing performance

Processing speed: 200 records/hour

Schema consistency: 100%

Manual enrichment time: 10–15 min/article → <1 min

Error rate: <2%

Content quality

Metadata accuracy: 96% (human validation)

Internal link coverage: 85% of content

Topic clustering accuracy: 91%

SEO field completion: 100%

Related.

Related systems.

Systems

AI Automation Systems →Content Operating Systems →

All case studies

View all builds →

Get started

Need similar architecture?

We build systems for operators serious about scale. If you're ready to invest in infrastructure that compounds, let's design your system.

Start a diagnostic Explore all systems