15. Ernest AI — RAG Chatbot and Voice Tools

Status

Accepted — 2026-04-19

Context

jjk.engineer publishes satirical engineering content under the Ernest Sludge persona. The blog has a growing corpus of posts (2+ per week, markdown to HTML via build pipeline) and an established content pipeline with caching and static JSON output.

The site already uses Firebase Functions (v2) with Vertex AI (Gemini) authenticated via service account for the LinkedIn autoposter. Firebase Auth exists but is limited to admin access. Firestore is the primary runtime datastore.

A $1,000 GCP GenAI App Builder credit (expires 2027-04-18) creates a low-risk window to experiment with Vertex AI embedding and vector search. At projected usage (~10 queries/day, ~$0.30/month), the credit covers the full development and early production period.

The goal is to build an Ernest Sludge AI experience grounded in actual blog content that serves as both a product feature and a vehicle for learning enterprise-grade RAG, auth-gated AI, and abuse-resistant API design.

Decision

Implement a Retrieval-Augmented Generation chatbot (“Ask Ernest”) and a suite of Ernest voice tools (email translator, email drafter, gibberish generator) gated behind Firebase Auth member claims. Work is structured as three independent layers:

  • Layer 1 — RAG Chat. Build-time embedding pipeline + askErnest callable function. Foundation for everything else.
  • Layer 2 — Voice Tools. Translator, drafter, gibberish generator. Share Layer 1 auth and rate-limiting infrastructure.
  • Layer 3 — Member Onboarding. Registration flow. Deferred; Layer 1 soft-launches via manual admin claims (setMemberClaim).

Key decisions made:

  • Hard member gate at function entry. Non-members receive 403. No free tier, no query-time content filtering. Simplest correct implementation; a free tier can be added later without rewrite.
  • All Firestore writes go through Cloud Functions per ADR 0003. No Admin SDK carve-out for the build pipeline; a persistChunks callable receives chunks and writes them.
  • Hash-based chunk diffing in persistChunks. Each chunk is SHA-256 hashed; only changed chunks are re-embedded and written; orphaned chunks are deleted. Avoids unnecessary embedding cost as the corpus grows (2+ posts/week, projected 200+ posts within two years).
  • Ernest voice governance: accept the gap, bound the surface. Prompts go through ash/vera-redline review. LLM responses do not. Mitigated by a tight system prompt with explicit refusal patterns for off-topic/off-brand requests. Queries and responses are logged to an audit collection for quality monitoring. A post-generation filter can be added later if quality drifts.
  • New Firestore collections are functions-only. posts/{slug}, posts/{slug}/chunks/{chunkIndex}, ernest_usage/{uid} — all allow read, write: if false in security rules. No client access. All reads and writes go through callable functions via Admin SDK.
  • Rate limiting via ernest_usage/{uid} with daily quota ceiling from Remote Config (ernest_daily_limit, default 10). Unified across RAG chat and voice tools.
  • Prompt version control under functions/prompts/ernest/ — plain markdown, git-reviewed, bundled at deploy time.
  • setMemberClaim gets an audit trail via Firestore trigger, following the existing inbox/timeclock audit pattern.

Consequences

  • Meaningful product feature that fits the Ernest Sludge brand and drives member engagement.
  • Forces member auth infrastructure that has been deferred, starting with manual claims and graduating to a registration flow.
  • Build pipeline gains an optional cloud side-effect (--embed flag) isolated from the default build. CI does not run --embed.
  • Firestore becomes a secondary source of truth for post metadata alongside static JSON. Acceptable given the member flag already lives in frontmatter.
  • Vertex AI credit absorbs cost during development and early production. At projected usage the credit outlasts the experiment window.
  • Hash-based chunk diffing adds modest complexity to persistChunks but scales cleanly with corpus growth and avoids re-embedding unchanged content.
  • Ernest voice quality depends on prompt quality and system prompt boundaries. Runtime responses bypass the persona-review pipeline — this gap is accepted, documented, and monitored.
  • The hard member gate means no free tier exists at launch. Non-members see a teaser UI, not a degraded experience. Adding a free tier later is additive.

See also: ADR 0003 — Cloud Functions Canonical Write Path, ADR 0008 — Ernest Sludge Governance Constraint. Implementation detail in designs/ernest-ai.md. Tracked in GitHub issue #29.

Amendment — 2026-04-24

RAG Backend Migration: Firestore Vector Search → Vertex AI Search (Discovery Engine)

The original decision used a custom embedding pipeline (Firestore vector index with text-embedding-004, 768 dimensions, hash-based chunk diffing) to power the RAG chatbot. This is being replaced by Vertex AI Search (Discovery Engine), a managed GCP service that handles document ingestion, chunking, indexing, and grounded answer generation.

What changes:

  • The custom embedding pipeline (syncErnestEmbeddings, triggerErnestSync, triggerErnestSyncWebhook, persistChunks) is removed.
  • Firestore collections posts/{slug} and posts/{slug}/chunks/{chunkIndex} are deprecated. The Firestore vector index is removed.
  • A GCS bucket stores raw Markdown content. Cloud Build syncs content/posts/ to the bucket on each blog deploy. Discovery Engine indexes the bucket automatically.
  • The askErnest handler calls the Discovery Engine Search/Answer API instead of Firestore findNearest(). The system prompt (functions/prompts/ernest/chat.md) is passed as the preamble parameter at query time.
  • Shared constants for embedding dimensions, chunk size, and embedding model are removed.

What does not change:

  • The member gate, rate limiting (ernest_usage/{uid}), and auth model remain identical.
  • Voice tools (translate, draft, gibberish) are unaffected — they are pure Gemini calls with no RAG.
  • Prompt files remain in functions/prompts/ernest/ under git version control, reviewed by the persona pipeline.
  • The response shape ({ response, citations, remaining }) is preserved.

Why: Discovery Engine provides managed chunking, embedding, indexing, and retrieval — replacing ~400 lines of custom pipeline code with a single API call. The $1,000 GCP GenAI App Builder credit covers the cost. See ADR 0018 for the broader GCP platform services adoption.