Ernest AI — RAG Chatbot and Voice Tools
Status: Partially superseded — 2026-04-25
Note: The RAG backend described in this document (custom embedding pipeline with Firestore vector search) has been replaced by Vertex AI Search (Discovery Engine). See the ADR 0015 Amendment (2026-04-24) for the current architecture. Voice tools (Layer 2), member management (Layer 3), rate limiting, and prompt governance are unchanged.
This document is the original implementation reference for the Ernest AI feature. The architectural decision is captured in ADR 0015.
Source GitHub issue: justinkowarsch/jjk-workspace#29.
Context
jjk.engineer publishes satirical engineering content under the Ernest Sludge persona. The blog has a growing corpus of posts (markdown → HTML via build pipeline) and an established content pipeline with caching, git history tracking, and static JSON output.
The site currently uses Firebase Functions (v2) with Vertex AI (Gemini) already authenticated via service account for the LinkedIn autoposter feature. Firebase Auth exists but is limited to admin access. Firestore is the primary runtime datastore.
A $1,000 GCP GenAI App Builder credit (expires 2027-04-18) creates a low-risk window to experiment with Vertex AI embedding and vector search at no immediate cost.
The goal is to build an Ernest Sludge AI experience — grounded in actual blog content — that serves as both a genuine product feature and a vehicle for learning enterprise-grade RAG, auth-gated AI, and abuse-resistant API design.
Proposed Decision (Draft)
Implement a Retrieval-Augmented Generation chatbot (“Ask Ernest”) and a suite of Ernest voice tools (email translator, email drafter, gibberish generator) gated behind Firebase Auth member claims.
Work is structured as three independent layers:
- Layer 1 — RAG Chat. Build-time embedding pipeline +
askErnestcallable function. Foundation. - Layer 2 — Voice Tools. Translator, drafter, gibberish generator. Share Layer 1 auth and rate-limiting.
- Layer 3 — Member Onboarding. Registration flow. Deferred; Layer 1 soft-launches via manual admin claims.
All Firestore writes — including build-time embedding persistence — go through callable Cloud Functions per ADR 0003. No Admin SDK carve-out for tools/build.
Alternatives Considered
- Static precomputed related posts. Viable for a “related posts” widget but cannot support dynamic querying, chatbot, or voice tools. May still be worth adding as a secondary output of the embedding pipeline.
- Pinecone or other dedicated vector DB. Unnecessary for this corpus size. Firestore vector search handles hundreds of posts comfortably. Revisit if corpus grows significantly or query latency becomes a concern.
- OpenAI or other embedding providers. GCP credit makes Vertex AI the obvious choice. Same service account already authenticated. No additional credential surface.
- Unauthenticated public Ernest. Rejected. App Check verifies the app, not the user; per-user identity is required for abuse management and quotas.
Consequences
- Meaningful product feature that fits the Ernest Sludge brand.
- Forces member auth infrastructure that has been deferred.
- Build pipeline gains an optional cloud side-effect capability via opt-in flag; isolated from the default build.
- Firestore becomes a secondary source of truth for post metadata alongside static JSON. Acceptable given the
memberflag already lives in frontmatter. - Vertex AI credit absorbs cost during development and early production.
- Member-gated content retrieval filtering must be correct — a bug here leaks member content to non-members.
- Ernest voice quality depends entirely on prompt quality. Prompts ship as git-reviewed artifacts under
functions/prompts/ernest/; the persona-review pipeline (ernest,ash,vera-redline) applies to prompts, not to runtime responses — see ADR 0008 and Open Question #2.
Architecture Detail
Layer 1 — RAG Chat
Build-time: embedding pipeline
A new --embed flag on the content pipeline triggers a post-transform stage that:
- Strips HTML to plaintext via a dedicated utility (not inline regex).
- Splits each changed post into overlapping chunks (~500 tokens, ~100 token overlap).
- Embeds each chunk via
text-embedding-004(Vertex AI, 768 dimensions). - Calls a new
persistChunksCloud Function (callable) with the resulting chunks. The function writes to Firestore underposts/{slug}/chunks/{chunkIndex}using the Admin SDK, matching the ADR 0003 write-path pattern.
Runs in production only. Processes only freshlyTransformed posts (uses the existing cache diff). No effect on static output.
Firestore document shape
posts/{slug}
title: string
excerpt: string
tags: string[]
date: string
url: string
member: boolean ← gates retrieval for member-only content
chunks/{chunkIndex}
text: string
embedding: vector(768)
chunkIndex: number
postSlug: string ← denormalized for retrieval joins
postTitle: string ← denormalized for citation display
postUrl: string ← denormalized for citation display
Vertex AI auth reuses the existing ADC/service-account pattern from the LinkedIn pipeline. No new credentials.
Runtime: askErnest callable function
onCall, enforceAppCheck: true. Per query:
- Verify Firebase Auth token carries
member: truecustom claim. - Check per-UID daily quota in
ernest_usage/{uid}. - Embed the query via
text-embedding-004. - Query Firestore vector index for top-5 nearest chunks.
- Construct Ernest prompt from retrieved chunks + user question.
- Call Gemini via Vertex AI with the grounded prompt.
- Return response + source citations.
Rate limiting
ernest_usage/{uid}
date: string ← YYYY-MM-DD, reset when date changes
queryCount: number
lastQuery: timestamp
Daily quota ceiling configured via Remote Config (ernest_daily_limit, default 10). Matches the existing Remote Config pattern used by the admin idle timeout and LinkedIn autoposter.
Prompt version control
Prompts under functions/prompts/ernest/:
chat.md— RAG chat system prompttranslate.md— email translatordraft.md— email draftergibberish.md— gibberish generator
Bundled at deploy time. Changes go through normal git review. ash drafts, vera-redline reviews, before any prompt ships.
Member auth (soft launch)
Manual custom claims via an admin callable (setMemberClaim). No registration UI in Layer 1. Defers Layer 3 cleanly.
Layer 2 — Voice Tools
Share Layer 1’s auth, rate limiting, and function infrastructure. Each tool is a distinct prompt file + thin handler — no vector retrieval:
- Email Translator — paste corporate prose, receive Ernest.
- Email Drafter — describe intent, receive Ernest-voiced draft.
- Gibberish Generator — pure Ernest, no input required.
All tools share the ernest_usage quota bucket with RAG chat (unified daily limit).
Layer 3 — Member Onboarding
Full member registration — invite system or self-service with approval gate. Separate issue. Layer 1 soft-launches without it via manual claims.
Implementation Sequence
- Firestore vector index setup +
postscollection schema. - HTML → plaintext utility.
- Chunking utility.
persistChunkscallable Cloud Function (auth: admin only for build-tool invocation).embedContentpipeline stage behind--embedflag (callspersistChunks).ernest_usagerate-limit helper.setMemberClaimadmin callable (with audit subcollection per ADR 0003).askErnestcallable function (auth + rate limit + retrieval + generation).- Prompt files:
chat.mddrafted,vera-redlinereviewed. - Layer 2 voice tool handlers + prompt files.
- Angular UI — Ask Ernest component (member-gated).
Resolved Decisions (2026-04-19)
All open questions resolved. Full rationale in ADR 0015.
- Firestore rules: All new collections (
posts/{slug},chunks,ernest_usage) areallow read, write: if false. Functions-only via Admin SDK. - Voice governance: Accept the gap, bound the surface. Tight system prompt with refusal patterns. Log queries + responses for quality monitoring. No post-generation filter (doubles Vertex AI cost).
- Orphan chunks: Hash-based diffing in
persistChunks. SHA-256 per chunk, only re-embed changed chunks, delete orphans inline. Scales with corpus growth (2+ posts/week). setMemberClaimaudit: Firestore trigger writes to audit subcollection, following inbox/timeclock pattern.- Cost envelope: ~$0.30/month at 10 queries/day. $1,000 credit effectively covers the full experiment window through April 2027.
- Member gate: Hard gate at function entry. Non-members get 403. No free tier at launch. Additive to introduce later.
- ADR lifecycle: Deferred to ADR tooling session (see ADR 0014).
- Feature ADR format: Keep ADRs lean Nygard; implementation detail stays in design docs (this pattern).