15. Ernest AI — RAG Chatbot and Voice Tools
Status
Accepted — 2026-04-19
Context
jjk.engineer publishes satirical engineering content under the Ernest Sludge persona. The blog has a growing corpus of posts (2+ per week, markdown to HTML via build pipeline) and an established content pipeline with caching and static JSON output.
The site already uses Firebase Functions (v2) with Vertex AI (Gemini) authenticated via service account for the LinkedIn autoposter. Firebase Auth exists but is limited to admin access. Firestore is the primary runtime datastore.
A $1,000 GCP GenAI App Builder credit (expires 2027-04-18) creates a low-risk window to experiment with Vertex AI embedding and vector search. At projected usage (~10 queries/day, ~$0.30/month), the credit covers the full development and early production period.
The goal is to build an Ernest Sludge AI experience grounded in actual blog content that serves as both a product feature and a vehicle for learning enterprise-grade RAG, auth-gated AI, and abuse-resistant API design.
Decision
Implement a Retrieval-Augmented Generation chatbot (“Ask Ernest”) and a suite of Ernest voice tools (email translator, email drafter, gibberish generator) gated behind Firebase Auth member claims. Work is structured as three independent layers:
- Layer 1 — RAG Chat. Build-time embedding pipeline +
askErnestcallable function. Foundation for everything else. - Layer 2 — Voice Tools. Translator, drafter, gibberish generator. Share Layer 1 auth and rate-limiting infrastructure.
- Layer 3 — Member Onboarding. Registration flow. Deferred; Layer 1 soft-launches via manual admin claims (
setMemberClaim).
Key decisions made:
- Hard member gate at function entry. Non-members receive 403. No free tier, no query-time content filtering. Simplest correct implementation; a free tier can be added later without rewrite.
- All Firestore writes go through Cloud Functions per ADR 0003. No Admin SDK carve-out for the build pipeline; a
persistChunkscallable receives chunks and writes them. - Hash-based chunk diffing in
persistChunks. Each chunk is SHA-256 hashed; only changed chunks are re-embedded and written; orphaned chunks are deleted. Avoids unnecessary embedding cost as the corpus grows (2+ posts/week, projected 200+ posts within two years). - Ernest voice governance: accept the gap, bound the surface. Prompts go through
ash/vera-redlinereview. LLM responses do not. Mitigated by a tight system prompt with explicit refusal patterns for off-topic/off-brand requests. Queries and responses are logged to an audit collection for quality monitoring. A post-generation filter can be added later if quality drifts. - New Firestore collections are functions-only.
posts/{slug},posts/{slug}/chunks/{chunkIndex},ernest_usage/{uid}— allallow read, write: if falsein security rules. No client access. All reads and writes go through callable functions via Admin SDK. - Rate limiting via
ernest_usage/{uid}with daily quota ceiling from Remote Config (ernest_daily_limit, default 10). Unified across RAG chat and voice tools. - Prompt version control under
functions/prompts/ernest/— plain markdown, git-reviewed, bundled at deploy time. setMemberClaimgets an audit trail via Firestore trigger, following the existing inbox/timeclock audit pattern.
Consequences
- Meaningful product feature that fits the Ernest Sludge brand and drives member engagement.
- Forces member auth infrastructure that has been deferred, starting with manual claims and graduating to a registration flow.
- Build pipeline gains an optional cloud side-effect (
--embedflag) isolated from the default build. CI does not run--embed. - Firestore becomes a secondary source of truth for post metadata alongside static JSON. Acceptable given the
memberflag already lives in frontmatter. - Vertex AI credit absorbs cost during development and early production. At projected usage the credit outlasts the experiment window.
- Hash-based chunk diffing adds modest complexity to
persistChunksbut scales cleanly with corpus growth and avoids re-embedding unchanged content. - Ernest voice quality depends on prompt quality and system prompt boundaries. Runtime responses bypass the persona-review pipeline — this gap is accepted, documented, and monitored.
- The hard member gate means no free tier exists at launch. Non-members see a teaser UI, not a degraded experience. Adding a free tier later is additive.
See also: ADR 0003 — Cloud Functions Canonical Write Path, ADR 0008 — Ernest Sludge Governance Constraint. Implementation detail in designs/ernest-ai.md. Tracked in GitHub issue #29.
Amendment — 2026-04-24
RAG Backend Migration: Firestore Vector Search → Vertex AI Search (Discovery Engine)
The original decision used a custom embedding pipeline (Firestore vector index with text-embedding-004, 768 dimensions, hash-based chunk diffing) to power the RAG chatbot. This is being replaced by Vertex AI Search (Discovery Engine), a managed GCP service that handles document ingestion, chunking, indexing, and grounded answer generation.
What changes:
- The custom embedding pipeline (
syncErnestEmbeddings,triggerErnestSync,triggerErnestSyncWebhook,persistChunks) is removed. - Firestore collections
posts/{slug}andposts/{slug}/chunks/{chunkIndex}are deprecated. The Firestore vector index is removed. - A GCS bucket stores raw Markdown content. Cloud Build syncs
content/posts/to the bucket on each blog deploy. Discovery Engine indexes the bucket automatically. - The
askErnesthandler calls the Discovery Engine Search/Answer API instead of FirestorefindNearest(). The system prompt (functions/prompts/ernest/chat.md) is passed as thepreambleparameter at query time. - Shared constants for embedding dimensions, chunk size, and embedding model are removed.
What does not change:
- The member gate, rate limiting (
ernest_usage/{uid}), and auth model remain identical. - Voice tools (translate, draft, gibberish) are unaffected — they are pure Gemini calls with no RAG.
- Prompt files remain in
functions/prompts/ernest/under git version control, reviewed by the persona pipeline. - The response shape (
{ response, citations, remaining }) is preserved.
Why: Discovery Engine provides managed chunking, embedding, indexing, and retrieval — replacing ~400 lines of custom pipeline code with a single API call. The $1,000 GCP GenAI App Builder credit covers the cost. See ADR 0018 for the broader GCP platform services adoption.