Ernest AI — RAG Chatbot and Voice Tools

Status: Partially superseded — 2026-04-25

Note: The RAG backend described in this document (custom embedding pipeline with Firestore vector search) has been replaced by Vertex AI Search (Discovery Engine). See the ADR 0015 Amendment (2026-04-24) for the current architecture. Voice tools (Layer 2), member management (Layer 3), rate limiting, and prompt governance are unchanged.

This document is the original implementation reference for the Ernest AI feature. The architectural decision is captured in ADR 0015.

Source GitHub issue: justinkowarsch/jjk-workspace#29.


Context

jjk.engineer publishes satirical engineering content under the Ernest Sludge persona. The blog has a growing corpus of posts (markdown → HTML via build pipeline) and an established content pipeline with caching, git history tracking, and static JSON output.

The site currently uses Firebase Functions (v2) with Vertex AI (Gemini) already authenticated via service account for the LinkedIn autoposter feature. Firebase Auth exists but is limited to admin access. Firestore is the primary runtime datastore.

A $1,000 GCP GenAI App Builder credit (expires 2027-04-18) creates a low-risk window to experiment with Vertex AI embedding and vector search at no immediate cost.

The goal is to build an Ernest Sludge AI experience — grounded in actual blog content — that serves as both a genuine product feature and a vehicle for learning enterprise-grade RAG, auth-gated AI, and abuse-resistant API design.


Proposed Decision (Draft)

Implement a Retrieval-Augmented Generation chatbot (“Ask Ernest”) and a suite of Ernest voice tools (email translator, email drafter, gibberish generator) gated behind Firebase Auth member claims.

Work is structured as three independent layers:

  • Layer 1 — RAG Chat. Build-time embedding pipeline + askErnest callable function. Foundation.
  • Layer 2 — Voice Tools. Translator, drafter, gibberish generator. Share Layer 1 auth and rate-limiting.
  • Layer 3 — Member Onboarding. Registration flow. Deferred; Layer 1 soft-launches via manual admin claims.

All Firestore writes — including build-time embedding persistence — go through callable Cloud Functions per ADR 0003. No Admin SDK carve-out for tools/build.


Alternatives Considered

  • Static precomputed related posts. Viable for a “related posts” widget but cannot support dynamic querying, chatbot, or voice tools. May still be worth adding as a secondary output of the embedding pipeline.
  • Pinecone or other dedicated vector DB. Unnecessary for this corpus size. Firestore vector search handles hundreds of posts comfortably. Revisit if corpus grows significantly or query latency becomes a concern.
  • OpenAI or other embedding providers. GCP credit makes Vertex AI the obvious choice. Same service account already authenticated. No additional credential surface.
  • Unauthenticated public Ernest. Rejected. App Check verifies the app, not the user; per-user identity is required for abuse management and quotas.

Consequences

  • Meaningful product feature that fits the Ernest Sludge brand.
  • Forces member auth infrastructure that has been deferred.
  • Build pipeline gains an optional cloud side-effect capability via opt-in flag; isolated from the default build.
  • Firestore becomes a secondary source of truth for post metadata alongside static JSON. Acceptable given the member flag already lives in frontmatter.
  • Vertex AI credit absorbs cost during development and early production.
  • Member-gated content retrieval filtering must be correct — a bug here leaks member content to non-members.
  • Ernest voice quality depends entirely on prompt quality. Prompts ship as git-reviewed artifacts under functions/prompts/ernest/; the persona-review pipeline (ernest, ash, vera-redline) applies to prompts, not to runtime responses — see ADR 0008 and Open Question #2.

Architecture Detail

Layer 1 — RAG Chat

Build-time: embedding pipeline

A new --embed flag on the content pipeline triggers a post-transform stage that:

  1. Strips HTML to plaintext via a dedicated utility (not inline regex).
  2. Splits each changed post into overlapping chunks (~500 tokens, ~100 token overlap).
  3. Embeds each chunk via text-embedding-004 (Vertex AI, 768 dimensions).
  4. Calls a new persistChunks Cloud Function (callable) with the resulting chunks. The function writes to Firestore under posts/{slug}/chunks/{chunkIndex} using the Admin SDK, matching the ADR 0003 write-path pattern.

Runs in production only. Processes only freshlyTransformed posts (uses the existing cache diff). No effect on static output.

Firestore document shape

posts/{slug}
  title: string
  excerpt: string
  tags: string[]
  date: string
  url: string
  member: boolean                      ← gates retrieval for member-only content

  chunks/{chunkIndex}
    text: string
    embedding: vector(768)
    chunkIndex: number
    postSlug: string                   ← denormalized for retrieval joins
    postTitle: string                  ← denormalized for citation display
    postUrl: string                    ← denormalized for citation display

Vertex AI auth reuses the existing ADC/service-account pattern from the LinkedIn pipeline. No new credentials.

Runtime: askErnest callable function

onCall, enforceAppCheck: true. Per query:

  1. Verify Firebase Auth token carries member: true custom claim.
  2. Check per-UID daily quota in ernest_usage/{uid}.
  3. Embed the query via text-embedding-004.
  4. Query Firestore vector index for top-5 nearest chunks.
  5. Construct Ernest prompt from retrieved chunks + user question.
  6. Call Gemini via Vertex AI with the grounded prompt.
  7. Return response + source citations.

Rate limiting

ernest_usage/{uid}
  date: string                         ← YYYY-MM-DD, reset when date changes
  queryCount: number
  lastQuery: timestamp

Daily quota ceiling configured via Remote Config (ernest_daily_limit, default 10). Matches the existing Remote Config pattern used by the admin idle timeout and LinkedIn autoposter.

Prompt version control

Prompts under functions/prompts/ernest/:

  • chat.md — RAG chat system prompt
  • translate.md — email translator
  • draft.md — email drafter
  • gibberish.md — gibberish generator

Bundled at deploy time. Changes go through normal git review. ash drafts, vera-redline reviews, before any prompt ships.

Member auth (soft launch)

Manual custom claims via an admin callable (setMemberClaim). No registration UI in Layer 1. Defers Layer 3 cleanly.

Layer 2 — Voice Tools

Share Layer 1’s auth, rate limiting, and function infrastructure. Each tool is a distinct prompt file + thin handler — no vector retrieval:

  • Email Translator — paste corporate prose, receive Ernest.
  • Email Drafter — describe intent, receive Ernest-voiced draft.
  • Gibberish Generator — pure Ernest, no input required.

All tools share the ernest_usage quota bucket with RAG chat (unified daily limit).

Layer 3 — Member Onboarding

Full member registration — invite system or self-service with approval gate. Separate issue. Layer 1 soft-launches without it via manual claims.


Implementation Sequence

  1. Firestore vector index setup + posts collection schema.
  2. HTML → plaintext utility.
  3. Chunking utility.
  4. persistChunks callable Cloud Function (auth: admin only for build-tool invocation).
  5. embedContent pipeline stage behind --embed flag (calls persistChunks).
  6. ernest_usage rate-limit helper.
  7. setMemberClaim admin callable (with audit subcollection per ADR 0003).
  8. askErnest callable function (auth + rate limit + retrieval + generation).
  9. Prompt files: chat.md drafted, vera-redline reviewed.
  10. Layer 2 voice tool handlers + prompt files.
  11. Angular UI — Ask Ernest component (member-gated).

Resolved Decisions (2026-04-19)

All open questions resolved. Full rationale in ADR 0015.

  1. Firestore rules: All new collections (posts/{slug}, chunks, ernest_usage) are allow read, write: if false. Functions-only via Admin SDK.
  2. Voice governance: Accept the gap, bound the surface. Tight system prompt with refusal patterns. Log queries + responses for quality monitoring. No post-generation filter (doubles Vertex AI cost).
  3. Orphan chunks: Hash-based diffing in persistChunks. SHA-256 per chunk, only re-embed changed chunks, delete orphans inline. Scales with corpus growth (2+ posts/week).
  4. setMemberClaim audit: Firestore trigger writes to audit subcollection, following inbox/timeclock pattern.
  5. Cost envelope: ~$0.30/month at 10 queries/day. $1,000 credit effectively covers the full experiment window through April 2027.
  6. Member gate: Hard gate at function entry. Non-members get 403. No free tier at launch. Additive to introduce later.
  7. ADR lifecycle: Deferred to ADR tooling session (see ADR 0014).
  8. Feature ADR format: Keep ADRs lean Nygard; implementation detail stays in design docs (this pattern).