whiz.coach Engineering · White Paper

Visually Engaging Study Material Generation and Validation using Multi‑Agent Orchestration

How twelve specialised AI agents on Google Cloud, paired with a fully-local MLX validation tier, turn raw syllabus sources into responsive, audio-narrated, SVG-rich study material at scale.

Document version
v1.0 · 2026-04-12

Author
Ashish Awasthi

Audience
Engineering, Architecture, Educational Product

Scope
Cloud generation pipeline + post-hoc local validator

1. Summary

The whiz.coach platform turns an instructor-provided syllabus, anything from a national curriculum PDF to a one-page outline, into a complete, learner-ready study experience: structured topics, narrated audio with synchronised SVG storyboards, interactive visualisations, practice questions, flashcards, and a master index. This document describes the two production subsystems that make that transformation reliable at scale:

A cloud-resident multi-agent pipeline built on Firebase Cloud Functions and Cloud Tasks. Twelve specialised agents, each owning one slice of the content lifecycle, collaborate through Firestore state and asynchronous task queues. The pipeline uses Gemini (with thinking enabled) for every LLM call, Cloud TTS for narration, and Cloud STT for forced alignment.
A post-hoc local validation subsystem running entirely on an Apple Silicon MacBook Pro M5 Max via MLX. Two locally-served Qwen3 models (a coder for corrections and a 32B Thinking VL model for vision judgement) read every generated artefact and propose corrections, without uploading a single byte to a remote LLM provider.

Together they form a generate-cloud / validate-local architecture that combines the throughput of managed cloud services with the privacy, cost, and iteration speed of on-device inference.

Three load-bearing properties.

Strict isolation. The validator is a separate package never bundled into the production web app or cloud functions; a deployment is incapable of shipping it to learners.
Fail-fast locality. If the local MLX server is unreachable, validation halts; there is no silent fallback to a remote vision model.
Every issue carries its own prompt fix. Each validator finding emits a structured remediation note that names the bug, the inferred cause, and the exact sentence to add to the generation prompt. A weekly Gemini digest synthesises these into one prompt-edit proposal per (content kind, root cause) cluster.

2. The content problem

A single subject (say, Cambridge Primary Maths Stage 5) expands into roughly 60–120 topics. Each topic must yield:

Age-adaptive structured prose (bullet-led for children, editorial for adults) with verbatim source grounding;
One static image per ~150 words plus a small number of truly interactive SVGs (cap-enforced: 1–3 per topic, depending on age);
A narration MP3 plus an SVG storyboard with cues synchronised to within ~80 ms using forced alignment;
Flashcards (Leitner-ready), practice questions (with diagrams), and a quiz-game variant;
A rendered, mobile-first HTML page hosted on Cloud Storage and tied into the React app's auth and progress system.

The naive approach, a single monolithic prompt, fails on three axes: output budget (a 60-topic syllabus easily exceeds Gemini's 65 k output tokens), partial failure modes (TTS rate limits should not erase content work), and verifiability (a single artefact must be inspectable by domain-specific judges). The system therefore decomposes generation into specialised agents and validation into specialised tiers.

3. System architecture

The system is partitioned across three planes: a cloud generation plane, a local validation plane, and a cloud-side user-feedback correction plane that handles thumbs-down submissions from learners. All three share Firestore as the source of truth.

flowchart LR subgraph CG["Cloud Generation Plane (Firebase + GCP)"] direction TB CF[Cloud Functions] CT[Cloud Tasks Queues] FS[(Firestore)] CS[(Cloud Storage)] GM[Gemini 3.1 Pro
thinking enabled] TTS[Cloud TTS
+ ElevenLabs] STT[Cloud Speech-to-Text] YT[YouTube Data API] CF -- enqueue/poll --> CT CT -- HTTP trigger --> CF CF -- read/write --> FS CF -- artefacts --> CS CF -- prompts --> GM CF -- narration --> TTS CF -- forced align --> STT CF -- search --> YT end subgraph LV["Local Validation Plane (M5 Max, MLX)"] direction TB CLI[content-validator CLI] PW[Playwright Chromium] MLX1[MLX :8000
Qwen3-Coder-Next 4-bit] MLX2[MLX :8001
Qwen3-VL 32B Thinking] CLI --> PW CLI -- corrections --> MLX1 CLI -- vision --> MLX2 end subgraph UF["User-feedback correction plane"] direction TB APP[React App] FAPI[Feedback API] SCA[SvgCorrectionAgent
Gemini Path A or B] APP -- thumbs-down --> FAPI FAPI -- Cloud Task --> SCA SCA --> FS SCA --> CS end CS -. read .-> CLI FS -. read/write .-> CLI FS -. learner data .-> APP classDef cloud fill:#ffe9b3,stroke:#7c5800,color:#2a2417; classDef local fill:#d6f0d6,stroke:#265a26,color:#1a3a1a; classDef user fill:#f8d3c2,stroke:#8a3315,color:#3a1f15; class CG,CF,CT,FS,CS,GM,TTS,STT,YT cloud; class LV,CLI,PW,MLX1,MLX2 local; class UF,APP,FAPI,SCA user;

Figure 1. Three planes and their shared substrates (Firestore and Cloud Storage).

Cloud generation Local validation User-feedback correction

4. The agent cast

Each agent extends a common base class, receives its work via a Cloud Tasks message, transitions topic state in Firestore, and either chains to the next agent or fans out in parallel. Twelve agents are in production.

Agent	Queue	Thinking	Responsibility
SourceBriefAgent	source-brief	medium	Reads uploaded sources + Google Search enrichment; emits canonical Markdown briefs, syllabus overview, and up to 10 key terms. Also handles source-grounded topic additions.
SyllabusPlanningAgent	planning	high	Proposes modules, topics, prerequisites, topic kind (concept / A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order./ A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order./ reference), and the topic-hierarchy tier graph for mastery learning.
TopicContentAgent	content	high	Age-adaptive structured JSON (intro, real-world apps, core content, exam questions, etc.) plus flashcards via a dedicated split Gemini call.
ResourceValidationAgent	resource	low	YouTube Data API + AI search for external links; URL liveness check; weighted scoring (relevance 40 / quality 30 / freshness 20 / accessibility 10).
ContentValidationAgent	validation	medium	LLM-driven factual + pedagogical + completeness + difficulty audit. Decides between expert review, improvement loop, or fan-out to image + storyboard generation.
ContentImprovementAgent	improvement	medium	Iterative content rewrite (max 3 cycles); also honours admin "edit this topic" instructions via a request-document pattern.
ImageGenerationAgent	image	high	Static images via flash image model; interactive SVGs via Gemini with `thinkingLevel: high`. Enforces per-age SVG caps and anti-duplication rules.
TopicStoryboardAgent	storyboard	high	Fire-and-forget; Gemini narration + SVG + cues, Cloud TTS synthesis, Cloud Speech-to-Text long-running recognition for forced alignment, MP3 + storyboard JSON to pre-determined URLs.
StudyMaterialRenderAgent	render	n/a	Firestore JSON → mobile-first HTML; defensive script wrapping; audio player with pre-determined URL + `onloadedmetadata` graceful reveal; LO marker + reveal-on-click engagement bridges.
HTMLValidationAgent	html-validation	low	Puppeteer pass: element overlap, SVG internal overlap, broken form controls (`<select>` inside `<foreignObject>`), console-error capture. Triggers `fix_svg` auto-fix.
IndexGenerationAgent	index	medium	Fires only when `completedCount === totalTopics`; topological-sort learning sequence, aggregated resources, AI-generated study tips, key-terms section.
SvgCorrectionAgent	svg-correction	high	User-feedback driven. Path A: fix existing SVG from screenshot + description. Path B: regenerate empty SVG from spec (age + domain aware). 11-category root-cause taxonomy.

An optional thirteenth agent, PastPaperAnalysisAgent, runs on demand to surface exam-paper patterns into the syllabus document.

5. Multi-agent interactions

5.1 The happy-path pipeline

For a fresh syllabus, the pipeline cascades from a single Firestore onCreate trigger. Agents that share no dependency run in parallel, both across topics and within a single topic (most visibly the image + storyboard fan-out).

flowchart TD U([Admin creates syllabus]) --> SB[SourceBriefAgent
+ Google Search] SB --> SP[SyllabusPlanningAgent] SP -- per topic, parallel --> TC[TopicContentAgent] TC --> RV[ResourceValidationAgent] RV --> CV[ContentValidationAgent] CV -- valid --> FAN((fan-out)) CV -- invalid, less than 3 --> CI[ContentImprovementAgent] CI --> CV CV -- invalid, 3 plus --> EXPERT[Expert review] --> FAN FAN --> IG[ImageGenerationAgent] FAN --> TS[TopicStoryboardAgent
fire-and-forget] IG --> SMR[StudyMaterialRenderAgent] SMR --> HV[HTMLValidationAgent] HV -- ok --> DONE([Topic completed]) HV -- issues --> IG TS -. uploads to pre-determined URLs .-> CS[(Cloud Storage)] DONE -. when all topics complete .-> IDX[IndexGenerationAgent] IDX --> SMR2[StudyMaterialRenderAgent
index page] SMR2 --> DONE2([Syllabus completed]) classDef io fill:#fdf3ef,stroke:#8a3315; class CS io;

Figure 2. Cloud generation pipeline. Each arrow corresponds to a Cloud Tasks enqueue with idempotency guards on the receiver.

5.2 The parallel image + storyboard fan-out

The storyboard path is intentionally fire-and-forget: it never chains to another agent. The rendered HTML is emitted with pre-determined URLs for the MP3 and storyboard JSON, so the renderer does not need to wait for audio completion, it embeds an <audio> tag whose onloadedmetadata handler reveals the player only when the file becomes fetchable. This eliminates a synchronous dependency between two paths of very different latency (1–3 min vs 5–15 min).

sequenceDiagram participant CV as ContentValidationAgent participant IG as ImageGenerationAgent participant TS as TopicStoryboardAgent participant GEM as Gemini participant TTS as Cloud TTS participant STT as Cloud STT participant CS as Cloud Storage participant SMR as StudyMaterialRenderAgent participant HV as HTMLValidationAgent CV->>IG: enqueue generate images CV->>TS: enqueue generate storyboard par Storyboard path (1 to 3 min) TS->>GEM: request narration, SVG and cues GEM-->>TS: text and svg TS->>TTS: synthesize MP3 TTS-->>TS: audio bytes TS->>CS: upload audio at pre-determined URL TS->>STT: long-running recognition STT-->>TS: per-word timings TS->>CS: upload storyboard JSON and Image and SVG path (5 to 15 min) IG->>GEM: SVG codegen with high thinking GEM-->>IG: svg blocks IG->>CS: upload images IG->>SMR: enqueue render SMR->>CS: upload HTML with pre-determined audio URL SMR->>HV: enqueue HTML validation HV->>HV: Puppeteer pass HV->>IG: enqueue fix-SVG if issues, else mark topic completed end

Figure 3. Parallel fan-out from ContentValidationAgent. Pre-determined URLs decouple the two paths.

5.3 The SVG auto-fix loop

HTML validation is the only loop in the system that actively rewrites artefacts. It is bounded by an SVG fix-attempt counter, a regeneration-triggered flag, and a fallback needs-review state that the cleanup scheduler picks up after two hours.

stateDiagram-v2 [*] --> Rendered Rendered --> Validating: HTMLValidationAgent Validating --> Completed: no issues Validating --> Fixing: issues found, fix attempts < 3 Fixing --> Rerendering: ImageGenerationAgent fix-SVG Rerendering --> Validating: StudyMaterialRenderAgent Validating --> Regenerating: fix attempts >= 3, not yet regenerated Regenerating --> Rerendering: ImageGenerationAgent regenerate-SVGs Validating --> NeedsReview: regeneration also failed NeedsReview --> Fixing: cleanup scheduler, 2h Completed --> [*]

Figure 4. SVG auto-fix state machine. Bounded by counters; cleanup scheduler unblocks stalled needs-review topics.

5.4 The user-feedback correction path

When a learner taps thumbs-down on an interactive SVG, the web app posts a feedback record to a dedicated Cloud Function, which enqueues SvgCorrectionAgent on its own Cloud Tasks queue. This is the one cloud-side correction path that remained on Gemini after the 2026-05-12 local-validator pivot.

sequenceDiagram participant L as Learner (React app) participant API as Feedback API participant CT as SVG correction queue participant SCA as SvgCorrectionAgent participant GEM as Gemini 3.1 Pro participant CS as Cloud Storage participant FS as Firestore L->>L: captureSvgScreenshot(), Canvas API L->>API: submitSvgFeedback(comment, png?, isSvgEmpty) API->>CS: upload screenshot (if any) API->>FS: append SVGFeedback to topic doc API->>CT: enqueue analyze_and_correct CT->>SCA: HTTP trigger alt svgCode is non-empty SCA->>CS: download screenshot + HTML SCA->>GEM: vision analysis + fix GEM-->>SCA: corrected SVG section SCA->>SCA: categorize root cause (11 classes) SCA->>CS: upload corrected HTML (public) SCA->>FS: feedback.correctionStatus = 'corrected' else svgCode is empty SCA->>FS: read SVG spec (title, concept, prompt) SCA->>GEM: age + domain aware regenerate GEM-->>SCA: fresh svg + interaction script SCA->>CS: inject + upload HTML SCA->>FS: regeneratedSuccessfully = true end

Figure 5. User-feedback correction flow with Path A (fix) and Path B (regenerate).

6. Platform and external services

The pipeline composes a small number of Google Cloud primitives and a handful of external service providers, each chosen for a specific operational property.

Component	Role	How the pipeline uses it
Cloud Functions v2	Agent runtime	A single HTTP entry point dispatches each Cloud Tasks message to the named agent. Per-function timeouts up to 60 min support long Gemini calls and speech-to-text polling.
Cloud Tasks	Async messaging	One queue per agent lets each stage be rate-limited and retried independently. Typical retry budget is 2 retries (3 total attempts); per-queue dispatch deadlines bound the total attempt budget.
Firestore	State + idempotency	Topic documents carry status enums; sibling request-document subcollections (for admin-initiated edits and topic additions) make retries idempotent. Document-create triggers seed the pipeline.
Cloud Storage	Artefact host	Per-syllabus prefixes hold images, audio, storyboard JSON, and rendered HTML. Pre-determined URLs let the renderer emit links to assets that don't yet exist.
Cloud TTS (Neural2 / Chirp3-HD)	Narration synthesis	Voice selected per `(language, ageGroup)`, Journey-F for younger learners, Chirp3-HD-Aoede for older, Hindi voices for Devanagari content. Plain text, no SSML.
ElevenLabs	TTS + alignment	Primary English TTS; provides forced alignment to ±8 ms (still gated for non-English).
Cloud Speech-to-Text	Forced alignment	Long-running recognition on the synthesised MP3 yields per-word timings that map each storyboard cue phrase to a time offset in seconds.
Cloud Scheduler	Maintenance + digests	15-minute cleanup sweeper for stalled topics; weekly Gemini synthesis over the local validators remediation notes.
BigQuery	Analytics warehouse	Daily export of SVG feedback (partitioned by date, clustered by `root_cause`) feeds the weekly prompt-improvement review.
YouTube Data API	External resources	Language-steered search; non-English syllabi additionally prepend the languages native name into the query string for stronger search-result bias.

6.1 Queue topology

Each agent owns its queue so that a backlog in one stage cannot stall another. Queue-level rate limits map onto provider quotas (Gemini RPM, Cloud TTS RPM, STT operation budget).

flowchart LR T1[source-brief-queue] --> T2[planning-queue] T2 --> T3[content-queue] T3 --> T4[resource-queue] T4 --> T5[validation-queue] T5 --> T6[improvement-queue] T5 --> T7[image-queue] T5 --> T8[storyboard-queue] T7 --> T9[render-queue] T9 --> T10[html-validation-queue] T10 --> T7 T10 --> T11[index-queue] T11 --> T9 FB[Web app] --> T12[SVG correction queue] classDef q fill:#f8fafc,stroke:#475569,color:#1f2937; class T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12 q;

Figure 6. Cloud Tasks queue topology. The auto-fix loop (T10 ↔ T7) and the index trigger (T10 → T11) are the two re-entry edges.

7. Engineering techniques

7.1 Loop prevention in three layers

Permanent flags. Audio-URL set, permanent-failure markers, and regeneration-triggered flags all cause an idempotent early return on re-delivery.
Status checks. Every action begins with an early return if the topic is already marked completed, so stale queued messages cannot overwrite finished work.
Atomic transactions. Request-document completion is batched with the downstream write so a crash cannot leave both sides inconsistent.

7.2 Bounded iteration

Bound	Where	What it prevents
3 improvement attempts	ContentImprovementAgent	Endless validate ↔ improve cycles on irreducible content disputes.
3 SVG fix attempts, then regenerate	HTMLValidationAgent	Repeated cosmetic patches that never converge.
1 regeneration per topic	regeneration-triggered flag	Regenerate-fix-regenerate ping-pong.
2 Cloud Tasks retries (3 total)	Queue config	Transient provider failures masking as permanent.
15 attempts hard cap	Base-agent attempt counter	Any path that escaped the agent-specific bounds.
20 correction iterations	Local validator	Validator-side auto-correction divergence.

7.3 Fire-and-forget for asymmetric latency

The image path takes 5–15 minutes; the storyboard path takes 1–3 minutes. Coupling them would either delay the page or block on the slower one. Instead, the renderer commits to a pre-determined URL pattern, content/{syllabusId}/audio/{topicId}.mp3, and emits HTML against that path. The <audio> element's onloadedmetadata handler reveals the player only when the browser successfully parses the metadata header; if the file never lands, the section silently remains hidden. No re-render is required when audio completes.

7.4 Topic-kind dispatch

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order. A lab topic emits hands-on steps, pitfalls, and tools; a capstone emits a rubric; a reference emits a glossary. Adding a new kind is therefore a content-prompt change plus a renderer switch, not a new pipeline.

7.5 Age- and language-adaptive generation

Every agent that touches learner-facing text or visuals reads the syllabus average age and language fields and routes through shared configuration tables for age-tuned content, age-tuned imagery, and language registry. The same agent that produces a playful, emoji-rich cartoon storyboard for a 9-year-old produces a slate/sky-blue editorial schematic for an adult, by reading two scalar fields and a config map.

7.6 Thinking levels as a cost knob

Level	Agents
`high`	SVG generation, image generation, content generation, syllabus planning, SvgCorrection.
`medium`	Content validation, content improvement, index generation, SVG fix analysis, past-paper analysis.
`low`	Resource validation, HTML validation.

7.7 Defensive SVG generation

SVG output is the system's highest-failure-rate artefact. The pipeline applies a layered defence:

Prompt constraints: banned elements (<select>, <foreignObject> form controls, drag-and-drop), required patterns (1:1 viewBox, 44×44 px targets, slate slider tracks).
Structural lint. A native renderer round-trip plus a deterministic geometry sweep (overflow, text-cut-by-line, endpoint-into-box).
Defensive script wrapping. Every emitted interaction script is automatically wrapped with safe parsing and lookup helpers so NaN and null-reference errors never escape.
Post-hoc visual judgement: deferred to the local Qwen3-VL validator (see §8).

8. Local validation subsystem

The local validator is a self-contained package that is the exclusive owner of every post-hoc validation, audit, and fix-proposal in the system. The 2026-05-12 pivot moved all visual judgement and validator-side correction off cloud Gemini and onto the operator's MacBook Pro M5 Max via MLX.

Six hard rules that govern the local validator:

Local MLX is the only non-Anthropic model surface for testing.
Screenshots stay in unified memory, loopback HTTP only, no remote vision uploads.
Real-time validation inside the cloud generation pipeline stays on cloud Gemini.
Validator-side post-hoc correction is local-only.
Every issue carries a remediation note that names the bug, the cause, and the prompt sentence that would prevent it.
Every test run is deletable, including its Firestore docs and Cloud Storage prefix.

8.1 The MLX server pair

Port	Model	Role
`:8000`	Qwen3-Coder-Next 4-bit	Auto-correction tiers: cue, SVG, or JSON patches, with a full regeneration tier for narration + SVG. Also writes remediation notes per issue.
`:8001`	Qwen3-VL 32B Thinking	Vision judge for every captured viewport; per-cue storyboard precision tier; strict-consensus second pass when first-pass confidence < 0.7.

8.2 Activity inventory

The validator decomposes into 24 named activities. The headline is that none of them invoke a cloud LLM. A representative sample:

ID	Activity	Where it runs
A1	Render topic HTML page	local Playwright
A3	Vision issue detection	MLX :8001
A6	Storyboard precision (per-cue + narration judge)	MLX :8001
A9	Storyboard auto-correction (cue / svg / json patches)	MLX :8000 + GCS upload
A10	Full regenerate (narration + SVG, reuse existing MP3)	MLX :8000, no TTS call
A11	Topic-HTML interactive-SVG auto-correction	MLX :8000
A17	Per-issue remediation-note annotation	MLX + Firestore write
A20	Auto-upload of run report to admin dashboard	Firestore + Cloud Storage write
A23	Triage validator report	local, no LLM

8.3 The local correction loop

flowchart TD RUN[validate-syllabus or topic-html
with --auto-correct] --> CAP[Capture viewports via
Playwright + CDP screencast] CAP --> VJ[Vision judge
MLX :8001] VJ -- issues --> CODER[Coder picks tier
cue / svg / json / full_regenerate] CODER -- MLX :8000 --> PATCH[Apply patch to artefact] PATCH --> UP[Upload to Cloud Storage
+ Firestore write] UP --> RECHK[Re-validate] RECHK -- clean --> OK([Verification passes]) RECHK -- still failing, less than 20 iters --> VJ RECHK -- 20 iters or 3 no-progress --> FAIL([Verification fails
remaining issues sent to admin]) classDef local fill:#d6f0d6,stroke:#265a26; class CAP,VJ,CODER,PATCH local;

Figure 7. Local validator's bounded correction loop. Reuses the existing MP3 on full_regenerate via heuristic cue re-alignment against the original forced-alignment timings.

8.4 Storage and observability

Firestore. Dedicated collections hold each test run, its per-artefact results, individual issues, weekly feedback digests, prompt-improvement records, and an audit log of admin actions.
Cloud Storage. A run-scoped prefix holds screenshots, run reports, and pre-correction backups.
Admin console. An internal review page lists runs with signed-URL screenshot thumbnails and a one-click delete that removes both the Firestore documents and the matching Cloud Storage prefix.

9. Cloud / local boundary

The system enforces a strict separation between where content is generated and where it is judged. Crossing the boundary is a privacy concern (screenshots may include unreleased curriculum), a cost concern (vision tokens are expensive), and a latency concern (loopback is microseconds; remote APIs are seconds).

flowchart LR subgraph CLOUD["Cloud, Generation"] direction TB GC1[Syllabus planning] GC2[Topic content] GC3[Storyboard text + SVG] GC4[Question + diagram gen] GC5[Practice game gen] GC6[Flashcard gen] end subgraph CLOUD_RT["Cloud, Real-time validation"] direction TB RT1[practice-game self-validation] RT2[html-validation-agent] RT3[storyboard lexical alignment] RT4[svg-layout-check] RT5[question validation-agent] end subgraph LOCAL["Local, Post-hoc validation"] direction TB L1[Topic HTML driver] L2[Storyboard precision] L3[Interactive SVG sweep] L4[Flashcard audit] L5[Game-specific solvers] L6[Auto-correction loop] end subgraph FB["Cloud, User feedback correction"] direction TB UF1[SvgCorrectionAgent] UF2[StoryboardCorrectionAgent] UF3[Gemini 3.1 Pro + ElevenLabs] end CLOUD -- artefacts --> LOCAL LOCAL -- remediation notes --> DIGEST[Weekly Gemini digest] DIGEST -. prompt edits .-> CLOUD FB -. learner thumbs-down .-> CLOUD_RT classDef cloud fill:#ffe9b3,stroke:#7c5800; classDef local fill:#d6f0d6,stroke:#265a26; classDef user fill:#f8d3c2,stroke:#8a3315; class CLOUD,CLOUD_RT,GC1,GC2,GC3,GC4,GC5,GC6,RT1,RT2,RT3,RT4,RT5,DIGEST cloud; class LOCAL,L1,L2,L3,L4,L5,L6 local; class FB,UF1,UF2,UF3 user;

Figure 8. Generation is cloud; post-hoc validation + correction is local; user-feedback-driven correction is cloud. The validator never crosses from local to cloud LLM on its hot path.

9.1 What lives on which side

Concern	Cloud (Gemini / ElevenLabs)	Local (MLX Qwen3)
Content generation	✓ all C1–C7	n/a
Real-time gates inside generation	✓ 5 agents	n/a
Post-hoc validation	n/a	✓ A1–A24
Validator-side auto-correction	n/a	✓ MLX :8000
User thumbs-down correction	✓ B1–B4	n/a
Weekly prompt synthesis	✓ Gemini 1 call per cluster	n/a

10. Closing the loop

The validator does not merely flag bugs, it generates the data needed to prevent the same class of bug next time. Each issue document carries a remediation note written by the local coder model after the consensus vision pass. The string follows a strict shape:

What the bug is, in one sentence.
The proximate cause inferred from the screenshot.
The exact sentence the generation prompt should add to prevent the class.

A weekly Cloud Scheduler job reads the past seven days of notes, groups them by content kind and root cause, and sends each group to Gemini for synthesis into one prompt-edit suggestion. The output lands in a dedicated Firestore collection and surfaces in the admin review console for human acceptance and manual application to the generation prompts.

flowchart LR V[Validator run] --> ISSUE[Issue documents
with remediation notes] ISSUE -- every week --> DIG[Weekly digest job] DIG -- one synthesis call per cluster --> GEM[Gemini 3.1 Pro] GEM --> CFD[Feedback-digest collection] CFD --> ADMIN[Admin reviews in console] ADMIN -. manual edit .-> PROMPTS[Generation prompts] PROMPTS -. next syllabus is better .-> V

Figure 9. The prompt-improvement feedback loop. Validator findings drive prompt edits that prevent the next syllabus from producing the same bug.

11. Operational characteristics

11.1 Recovery

A scheduled cleanup Cloud Function runs every 15 minutes. It scans for topics in every intermediate processing status and reapplies the appropriate enqueue. Quota-induced errors are retried after a 4-hour delay, up to 5 times.

11.2 Deployment isolation

Three independent guards prevent any validator code from entering production:

Package isolation. The validator package is referenced by neither the web app nor the cloud-function backend; production deploys ship only the latter two.
Sandbox route exclusion. Sandbox-only React routes use a development-only file-extension convention; the build configuration includes that extension only when an explicit sandbox flag is set at build time.
CI verification. A release-time script fails the build if any sandbox path leaks into the static export bundle.

11.3 Telemetry

Signal	Where to look
Agent activity	Centralised log stream, filtered by the per-agent log-line prefix.
Admin-attention conditions	Admin-alert log sentinel; permanent-failure marker on the topic document.
SVG root-cause distribution	BigQuery analytics table, partitioned by feedback date.
Validator runs	Admin review console, Content Tests page.
Weekly prompt suggestions	Admin review console, Prompt Feedback page.

12. Conclusion

The whiz.coach syllabus pipeline demonstrates a workable production pattern for AI-generated educational content: decompose generation into specialised, bounded, idempotent agents on managed cloud infrastructure; defer subjective visual judgement to a fully-local, privacy-preserving validation tier; and close the loop by feeding validator findings back into generation prompts. The same architecture that produces a 60-topic syllabus in hours also catches its own regressions overnight and proposes the fixes, without ever uploading a learner's unreleased curriculum to a third-party vision model.

The two design choices that did the most work were fire-and-forget for asymmetric latency (the image / storyboard fan-out) and cloud-generate / local-validate (the 2026-05-12 pivot). Both replaced synchronous coupling with a small amount of well-placed convention, a pre-determined URL pattern in one case, a six-rule local-only contract in the other.

Topics covered in more depth in companion material.

Canonical implementation guide for the twelve agents.
Local validation: hard rules, quick-start, and a shipped-vs-target capability table.
Per-tier validator architecture, per-content-type drivers, data-schema reference, feedback loop, and the operator runbook for local corrections.
Cloud Tasks agent system: queue topology, loop-prevention principles, recovery patterns.
Syllabus pipeline: the two-phase planning plus content split.
Companion paper: Pedagogically Rigorous Question and Diagram Generation and Validation using Multi‑Agent Orchestration.