OJCLabs

AI Automation

Supabase Ai Enrichment Engine.

AI-powered content enrichment system that transforms raw content into structured, SEO-ready database records with automated metadata, internal linking, and topic clustering.

Client: Internal Tooling / Content OperationsTimeline: 4 weeksTeam: 1 engineer
200 records/hrProcessing speed
96%Metadata accuracy
85%Internal link coverage
100%Schema consistency


System architecture.

How it's built.

Component
Purpose
Technology
Reasoning
Data Ingestion
Accept raw content from RSS, scraping, manual entry
Supabase edge functions + webhooks
Real-time triggers, serverless processing
AI Orchestration
Coordinate metadata generation, linking, categorization
Custom Python + multi-agent logic
Parallel processing, independent task isolation
Metadata Generation
Generate SEO titles, descriptions, keywords
GPT with structured JSON outputs
Consistent schema, high semantic accuracy
Internal Linking
Find and inject relevant internal links
Embedding similarity + Supabase pgvector
Semantic matching, not keyword matching
Topic Clustering
Categorize content into topic groups
GPT classification + Supabase taxonomy
Automatic category assignment, consistent taxonomy
Schema Validation
Enforce data consistency before database write
Zod validation + Supabase row-level policies
Catch schema errors before they reach production

Engineering process.

How it was built.

Schema DesignWeek 1
  • Defined output schema with all required fields
  • Mapped source data inconsistencies
  • Designed validation rules

Database schema, field definitions, validation spec

AI Pipeline BuildWeek 2
  • Built GPT agents with structured output prompts
  • Implemented Supabase edge functions for ingestion
  • Created metadata generation pipeline

Working metadata enrichment pipeline

Linking & ClusteringWeek 3
  • Integrated pgvector for semantic similarity
  • Built internal link injection logic
  • Implemented topic clustering and taxonomy

Internal linking + categorization system

Validation & DeployWeek 4
  • Built Zod schema validation layer
  • Implemented error logging and retry
  • Stress-tested with 500+ records
  • Deployed to production

Production-ready enrichment engine


Engineering challenges.

What broke. How we fixed it.

Schema Consistency at Scale

Problem

Raw content from multiple sources had inconsistent structure. AI outputs varied in format. Database schema rejections were frequent.

Constraint

Cannot sanitize inputs manually at 200 records/hour. AI is non-deterministic by nature.

Solution

Strict JSON schema in every prompt. Zod validation before every database write. Multi-pass generation: generate → validate → regenerate if invalid.

Outcome

Schema consistency: 100% (all records written). First-pass accuracy: 94%. Rejections: zero in production.

Internal Link Accuracy

Problem

Keyword-based linking surfaced irrelevant related articles. Poor link quality reduced SEO value.

Constraint

Keyword matching fails with synonyms and conceptual relationships. Manual curation not viable at scale.

Solution

Switched to embedding-based similarity using Supabase pgvector. Each article embedded on ingest. Links generated by cosine similarity, not keyword overlap.

Outcome

Internal link relevance: 60% (keyword) → 85% (embedding). Coverage: 85% of articles have 3+ relevant links.


Measured impact.

Results. Numbers only.

Processing performance

Processing speed: 200 records/hour

Schema consistency: 100%

Manual enrichment time: 10–15 min/article → <1 min

Error rate: <2%

Content quality

Metadata accuracy: 96% (human validation)

Internal link coverage: 85% of content

Topic clustering accuracy: 91%

SEO field completion: 100%


Related.

Related systems.


Get started

Need similar architecture?

We build systems for operators serious about scale. If you're ready to invest in infrastructure that compounds, let's design your system.

Start a diagnosticExplore all systems