whiz.coach Engineering · White Paper

Visually Engaging Study Material Generation and Validation using Multi‑Agent Orchestration

How twelve specialised AI agents on Google Cloud, paired with a fully-local MLX validation tier, turn raw syllabus sources into responsive, audio-narrated, SVG-rich study material at scale.
Document version
v1.0 · 2026-04-12
Author
Ashish Awasthi
Audience
Engineering, Architecture, Educational Product
Scope
Cloud generation pipeline + post-hoc local validator

1. Summary

The whiz.coach platform turns an instructor-provided syllabus, anything from a national curriculum PDF to a one-page outline, into a complete, learner-ready study experience: structured topics, narrated audio with synchronised SVG storyboards, interactive visualisations, practice questions, flashcards, and a master index. This document describes the two production subsystems that make that transformation reliable at scale:

  1. A cloud-resident multi-agent pipeline built on Firebase Cloud Functions and Cloud Tasks. Twelve specialised agents, each owning one slice of the content lifecycle, collaborate through Firestore state and asynchronous task queues. The pipeline uses Gemini (with thinking enabled) for every LLM call, Cloud TTS for narration, and Cloud STT for forced alignment.
  2. A post-hoc local validation subsystem running entirely on an Apple Silicon MacBook Pro M5 Max via MLX. Two locally-served Qwen3 models (a coder for corrections and a 32B Thinking VL model for vision judgement) read every generated artefact and propose corrections, without uploading a single byte to a remote LLM provider.

Together they form a generate-cloud / validate-local architecture that combines the throughput of managed cloud services with the privacy, cost, and iteration speed of on-device inference.

Three load-bearing properties.

2. The content problem

A single subject (say, Cambridge Primary Maths Stage 5) expands into roughly 60–120 topics. Each topic must yield:

The naive approach, a single monolithic prompt, fails on three axes: output budget (a 60-topic syllabus easily exceeds Gemini's 65 k output tokens), partial failure modes (TTS rate limits should not erase content work), and verifiability (a single artefact must be inspectable by domain-specific judges). The system therefore decomposes generation into specialised agents and validation into specialised tiers.

3. System architecture

The system is partitioned across three planes: a cloud generation plane, a local validation plane, and a cloud-side user-feedback correction plane that handles thumbs-down submissions from learners. All three share Firestore as the source of truth.

flowchart LR subgraph CG["Cloud Generation Plane (Firebase + GCP)"] direction TB CF[Cloud Functions] CT[Cloud Tasks Queues] FS[(Firestore)] CS[(Cloud Storage)] GM[Gemini 3.1 Pro
thinking enabled] TTS[Cloud TTS
+ ElevenLabs] STT[Cloud Speech-to-Text] YT[YouTube Data API] CF -- enqueue/poll --> CT CT -- HTTP trigger --> CF CF -- read/write --> FS CF -- artefacts --> CS CF -- prompts --> GM CF -- narration --> TTS CF -- forced align --> STT CF -- search --> YT end subgraph LV["Local Validation Plane (M5 Max, MLX)"] direction TB CLI[content-validator CLI] PW[Playwright Chromium] MLX1[MLX :8000
Qwen3-Coder-Next 4-bit] MLX2[MLX :8001
Qwen3-VL 32B Thinking] CLI --> PW CLI -- corrections --> MLX1 CLI -- vision --> MLX2 end subgraph UF["User-feedback correction plane"] direction TB APP[React App] FAPI[Feedback API] SCA[SvgCorrectionAgent
Gemini Path A or B] APP -- thumbs-down --> FAPI FAPI -- Cloud Task --> SCA SCA --> FS SCA --> CS end CS -. read .-> CLI FS -. read/write .-> CLI FS -. learner data .-> APP classDef cloud fill:#ffe9b3,stroke:#7c5800,color:#2a2417; classDef local fill:#d6f0d6,stroke:#265a26,color:#1a3a1a; classDef user fill:#f8d3c2,stroke:#8a3315,color:#3a1f15; class CG,CF,CT,FS,CS,GM,TTS,STT,YT cloud; class LV,CLI,PW,MLX1,MLX2 local; class UF,APP,FAPI,SCA user;
Figure 1. Three planes and their shared substrates (Firestore and Cloud Storage).
Cloud generation Local validation User-feedback correction

4. The agent cast

Each agent extends a common base class, receives its work via a Cloud Tasks message, transitions topic state in Firestore, and either chains to the next agent or fans out in parallel. Twelve agents are in production.

Agent Queue Thinking Responsibility
SourceBriefAgentsource-briefmediumReads uploaded sources + Google Search enrichment; emits canonical Markdown briefs, syllabus overview, and up to 10 key terms. Also handles source-grounded topic additions.
SyllabusPlanningAgentplanninghighProposes modules, topics, prerequisites, topic kind (concept /

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order./

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order./ reference), and the topic-hierarchy tier graph for mastery learning.

TopicContentAgentcontenthighAge-adaptive structured JSON (intro, real-world apps, core content, exam questions, etc.) plus flashcards via a dedicated split Gemini call.
ResourceValidationAgentresourcelowYouTube Data API + AI search for external links; URL liveness check; weighted scoring (relevance 40 / quality 30 / freshness 20 / accessibility 10).
ContentValidationAgentvalidationmediumLLM-driven factual + pedagogical + completeness + difficulty audit. Decides between expert review, improvement loop, or fan-out to image + storyboard generation.
ContentImprovementAgentimprovementmediumIterative content rewrite (max 3 cycles); also honours admin "edit this topic" instructions via a request-document pattern.
ImageGenerationAgentimagehighStatic images via flash image model; interactive SVGs via Gemini with thinkingLevel: high. Enforces per-age SVG caps and anti-duplication rules.
TopicStoryboardAgentstoryboardhighFire-and-forget; Gemini narration + SVG + cues, Cloud TTS synthesis, Cloud Speech-to-Text long-running recognition for forced alignment, MP3 + storyboard JSON to pre-determined URLs.
StudyMaterialRenderAgentrendern/aFirestore JSON → mobile-first HTML; defensive script wrapping; audio player with pre-determined URL + onloadedmetadata graceful reveal; LO marker + reveal-on-click engagement bridges.
HTMLValidationAgenthtml-validationlowPuppeteer pass: element overlap, SVG internal overlap, broken form controls (<select> inside <foreignObject>), console-error capture. Triggers fix_svg auto-fix.
IndexGenerationAgentindexmediumFires only when completedCount === totalTopics; topological-sort learning sequence, aggregated resources, AI-generated study tips, key-terms section.
SvgCorrectionAgentsvg-correctionhighUser-feedback driven. Path A: fix existing SVG from screenshot + description. Path B: regenerate empty SVG from spec (age + domain aware). 11-category root-cause taxonomy.

An optional thirteenth agent, PastPaperAnalysisAgent, runs on demand to surface exam-paper patterns into the syllabus document.

5. Multi-agent interactions

5.1 The happy-path pipeline

For a fresh syllabus, the pipeline cascades from a single Firestore onCreate trigger. Agents that share no dependency run in parallel, both across topics and within a single topic (most visibly the image + storyboard fan-out).

flowchart TD U([Admin creates syllabus]) --> SB[SourceBriefAgent
+ Google Search] SB --> SP[SyllabusPlanningAgent] SP -- per topic, parallel --> TC[TopicContentAgent] TC --> RV[ResourceValidationAgent] RV --> CV[ContentValidationAgent] CV -- valid --> FAN((fan-out)) CV -- invalid, less than 3 --> CI[ContentImprovementAgent] CI --> CV CV -- invalid, 3 plus --> EXPERT[Expert review] --> FAN FAN --> IG[ImageGenerationAgent] FAN --> TS[TopicStoryboardAgent
fire-and-forget] IG --> SMR[StudyMaterialRenderAgent] SMR --> HV[HTMLValidationAgent] HV -- ok --> DONE([Topic completed]) HV -- issues --> IG TS -. uploads to pre-determined URLs .-> CS[(Cloud Storage)] DONE -. when all topics complete .-> IDX[IndexGenerationAgent] IDX --> SMR2[StudyMaterialRenderAgent
index page] SMR2 --> DONE2([Syllabus completed]) classDef io fill:#fdf3ef,stroke:#8a3315; class CS io;
Figure 2. Cloud generation pipeline. Each arrow corresponds to a Cloud Tasks enqueue with idempotency guards on the receiver.

5.2 The parallel image + storyboard fan-out

The storyboard path is intentionally fire-and-forget: it never chains to another agent. The rendered HTML is emitted with pre-determined URLs for the MP3 and storyboard JSON, so the renderer does not need to wait for audio completion, it embeds an <audio> tag whose onloadedmetadata handler reveals the player only when the file becomes fetchable. This eliminates a synchronous dependency between two paths of very different latency (1–3 min vs 5–15 min).

sequenceDiagram participant CV as ContentValidationAgent participant IG as ImageGenerationAgent participant TS as TopicStoryboardAgent participant GEM as Gemini participant TTS as Cloud TTS participant STT as Cloud STT participant CS as Cloud Storage participant SMR as StudyMaterialRenderAgent participant HV as HTMLValidationAgent CV->>IG: enqueue generate images CV->>TS: enqueue generate storyboard par Storyboard path (1 to 3 min) TS->>GEM: request narration, SVG and cues GEM-->>TS: text and svg TS->>TTS: synthesize MP3 TTS-->>TS: audio bytes TS->>CS: upload audio at pre-determined URL TS->>STT: long-running recognition STT-->>TS: per-word timings TS->>CS: upload storyboard JSON and Image and SVG path (5 to 15 min) IG->>GEM: SVG codegen with high thinking GEM-->>IG: svg blocks IG->>CS: upload images IG->>SMR: enqueue render SMR->>CS: upload HTML with pre-determined audio URL SMR->>HV: enqueue HTML validation HV->>HV: Puppeteer pass HV->>IG: enqueue fix-SVG if issues, else mark topic completed end
Figure 3. Parallel fan-out from ContentValidationAgent. Pre-determined URLs decouple the two paths.

5.3 The SVG auto-fix loop

HTML validation is the only loop in the system that actively rewrites artefacts. It is bounded by an SVG fix-attempt counter, a regeneration-triggered flag, and a fallback needs-review state that the cleanup scheduler picks up after two hours.

stateDiagram-v2 [*] --> Rendered Rendered --> Validating: HTMLValidationAgent Validating --> Completed: no issues Validating --> Fixing: issues found, fix attempts < 3 Fixing --> Rerendering: ImageGenerationAgent fix-SVG Rerendering --> Validating: StudyMaterialRenderAgent Validating --> Regenerating: fix attempts >= 3, not yet regenerated Regenerating --> Rerendering: ImageGenerationAgent regenerate-SVGs Validating --> NeedsReview: regeneration also failed NeedsReview --> Fixing: cleanup scheduler, 2h Completed --> [*]
Figure 4. SVG auto-fix state machine. Bounded by counters; cleanup scheduler unblocks stalled needs-review topics.

5.4 The user-feedback correction path

When a learner taps thumbs-down on an interactive SVG, the web app posts a feedback record to a dedicated Cloud Function, which enqueues SvgCorrectionAgent on its own Cloud Tasks queue. This is the one cloud-side correction path that remained on Gemini after the 2026-05-12 local-validator pivot.

sequenceDiagram participant L as Learner (React app) participant API as Feedback API participant CT as SVG correction queue participant SCA as SvgCorrectionAgent participant GEM as Gemini 3.1 Pro participant CS as Cloud Storage participant FS as Firestore L->>L: captureSvgScreenshot(), Canvas API L->>API: submitSvgFeedback(comment, png?, isSvgEmpty) API->>CS: upload screenshot (if any) API->>FS: append SVGFeedback to topic doc API->>CT: enqueue analyze_and_correct CT->>SCA: HTTP trigger alt svgCode is non-empty SCA->>CS: download screenshot + HTML SCA->>GEM: vision analysis + fix GEM-->>SCA: corrected SVG section SCA->>SCA: categorize root cause (11 classes) SCA->>CS: upload corrected HTML (public) SCA->>FS: feedback.correctionStatus = 'corrected' else svgCode is empty SCA->>FS: read SVG spec (title, concept, prompt) SCA->>GEM: age + domain aware regenerate GEM-->>SCA: fresh svg + interaction script SCA->>CS: inject + upload HTML SCA->>FS: regeneratedSuccessfully = true end
Figure 5. User-feedback correction flow with Path A (fix) and Path B (regenerate).

6. Platform and external services

The pipeline composes a small number of Google Cloud primitives and a handful of external service providers, each chosen for a specific operational property.

ComponentRoleHow the pipeline uses it
Cloud Functions v2Agent runtimeA single HTTP entry point dispatches each Cloud Tasks message to the named agent. Per-function timeouts up to 60 min support long Gemini calls and speech-to-text polling.
Cloud TasksAsync messagingOne queue per agent lets each stage be rate-limited and retried independently. Typical retry budget is 2 retries (3 total attempts); per-queue dispatch deadlines bound the total attempt budget.
FirestoreState + idempotencyTopic documents carry status enums; sibling request-document subcollections (for admin-initiated edits and topic additions) make retries idempotent. Document-create triggers seed the pipeline.
Cloud StorageArtefact hostPer-syllabus prefixes hold images, audio, storyboard JSON, and rendered HTML. Pre-determined URLs let the renderer emit links to assets that don't yet exist.
Cloud TTS (Neural2 / Chirp3-HD)Narration synthesisVoice selected per (language, ageGroup), Journey-F for younger learners, Chirp3-HD-Aoede for older, Hindi voices for Devanagari content. Plain text, no SSML.
ElevenLabsTTS + alignmentPrimary English TTS; provides forced alignment to ±8 ms (still gated for non-English).
Cloud Speech-to-TextForced alignmentLong-running recognition on the synthesised MP3 yields per-word timings that map each storyboard cue phrase to a time offset in seconds.
Cloud SchedulerMaintenance + digests15-minute cleanup sweeper for stalled topics; weekly Gemini synthesis over the local validators remediation notes.
BigQueryAnalytics warehouseDaily export of SVG feedback (partitioned by date, clustered by root_cause) feeds the weekly prompt-improvement review.
YouTube Data APIExternal resourcesLanguage-steered search; non-English syllabi additionally prepend the languages native name into the query string for stronger search-result bias.

6.1 Queue topology

Each agent owns its queue so that a backlog in one stage cannot stall another. Queue-level rate limits map onto provider quotas (Gemini RPM, Cloud TTS RPM, STT operation budget).

flowchart LR T1[source-brief-queue] --> T2[planning-queue] T2 --> T3[content-queue] T3 --> T4[resource-queue] T4 --> T5[validation-queue] T5 --> T6[improvement-queue] T5 --> T7[image-queue] T5 --> T8[storyboard-queue] T7 --> T9[render-queue] T9 --> T10[html-validation-queue] T10 --> T7 T10 --> T11[index-queue] T11 --> T9 FB[Web app] --> T12[SVG correction queue] classDef q fill:#f8fafc,stroke:#475569,color:#1f2937; class T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12 q;
Figure 6. Cloud Tasks queue topology. The auto-fix loop (T10 ↔ T7) and the index trigger (T10 → T11) are the two re-entry edges.

7. Engineering techniques

7.1 Loop prevention in three layers

7.2 Bounded iteration

BoundWhereWhat it prevents
3 improvement attemptsContentImprovementAgentEndless validate ↔ improve cycles on irreducible content disputes.
3 SVG fix attempts, then regenerateHTMLValidationAgentRepeated cosmetic patches that never converge.
1 regeneration per topicregeneration-triggered flagRegenerate-fix-regenerate ping-pong.
2 Cloud Tasks retries (3 total)Queue configTransient provider failures masking as permanent.
15 attempts hard capBase-agent attempt counterAny path that escaped the agent-specific bounds.
20 correction iterationsLocal validatorValidator-side auto-correction divergence.

7.3 Fire-and-forget for asymmetric latency

The image path takes 5–15 minutes; the storyboard path takes 1–3 minutes. Coupling them would either delay the page or block on the slower one. Instead, the renderer commits to a pre-determined URL pattern, content/{syllabusId}/audio/{topicId}.mp3, and emits HTML against that path. The <audio> element's onloadedmetadata handler reveals the player only when the browser successfully parses the metadata header; if the file never lands, the section silently remains hidden. No re-render is required when audio completes.

7.4 Topic-kind dispatch

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|

A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order. A lab topic emits hands-on steps, pitfalls, and tools; a capstone emits a rubric; a reference emits a glossary. Adding a new kind is therefore a content-prompt change plus a renderer switch, not a new pipeline.

7.5 Age- and language-adaptive generation

Every agent that touches learner-facing text or visuals reads the syllabus average age and language fields and routes through shared configuration tables for age-tuned content, age-tuned imagery, and language registry. The same agent that produces a playful, emoji-rich cartoon storyboard for a 9-year-old produces a slate/sky-blue editorial schematic for an adult, by reading two scalar fields and a config map.

7.6 Thinking levels as a cost knob

LevelAgents
highSVG generation, image generation, content generation, syllabus planning, SvgCorrection.
mediumContent validation, content improvement, index generation, SVG fix analysis, past-paper analysis.
lowResource validation, HTML validation.

7.7 Defensive SVG generation

SVG output is the system's highest-failure-rate artefact. The pipeline applies a layered defence:

8. Local validation subsystem

The local validator is a self-contained package that is the exclusive owner of every post-hoc validation, audit, and fix-proposal in the system. The 2026-05-12 pivot moved all visual judgement and validator-side correction off cloud Gemini and onto the operator's MacBook Pro M5 Max via MLX.

Six hard rules that govern the local validator:
  1. Local MLX is the only non-Anthropic model surface for testing.
  2. Screenshots stay in unified memory, loopback HTTP only, no remote vision uploads.
  3. Real-time validation inside the cloud generation pipeline stays on cloud Gemini.
  4. Validator-side post-hoc correction is local-only.
  5. Every issue carries a remediation note that names the bug, the cause, and the prompt sentence that would prevent it.
  6. Every test run is deletable, including its Firestore docs and Cloud Storage prefix.

8.1 The MLX server pair

PortModelRole
:8000Qwen3-Coder-Next 4-bitAuto-correction tiers: cue, SVG, or JSON patches, with a full regeneration tier for narration + SVG. Also writes remediation notes per issue.
:8001Qwen3-VL 32B ThinkingVision judge for every captured viewport; per-cue storyboard precision tier; strict-consensus second pass when first-pass confidence < 0.7.

8.2 Activity inventory

The validator decomposes into 24 named activities. The headline is that none of them invoke a cloud LLM. A representative sample:

IDActivityWhere it runs
A1Render topic HTML pagelocal Playwright
A3Vision issue detectionMLX :8001
A6Storyboard precision (per-cue + narration judge)MLX :8001
A9Storyboard auto-correction (cue / svg / json patches)MLX :8000 + GCS upload
A10Full regenerate (narration + SVG, reuse existing MP3)MLX :8000, no TTS call
A11Topic-HTML interactive-SVG auto-correctionMLX :8000
A17Per-issue remediation-note annotationMLX + Firestore write
A20Auto-upload of run report to admin dashboardFirestore + Cloud Storage write
A23Triage validator reportlocal, no LLM

8.3 The local correction loop

flowchart TD RUN[validate-syllabus or topic-html
with --auto-correct] --> CAP[Capture viewports via
Playwright + CDP screencast] CAP --> VJ[Vision judge
MLX :8001] VJ -- issues --> CODER[Coder picks tier
cue / svg / json / full_regenerate] CODER -- MLX :8000 --> PATCH[Apply patch to artefact] PATCH --> UP[Upload to Cloud Storage
+ Firestore write] UP --> RECHK[Re-validate] RECHK -- clean --> OK([Verification passes]) RECHK -- still failing, less than 20 iters --> VJ RECHK -- 20 iters or 3 no-progress --> FAIL([Verification fails
remaining issues sent to admin]) classDef local fill:#d6f0d6,stroke:#265a26; class CAP,VJ,CODER,PATCH local;
Figure 7. Local validator's bounded correction loop. Reuses the existing MP3 on full_regenerate via heuristic cue re-alignment against the original forced-alignment timings.

8.4 Storage and observability

9. Cloud / local boundary

The system enforces a strict separation between where content is generated and where it is judged. Crossing the boundary is a privacy concern (screenshots may include unreleased curriculum), a cost concern (vision tokens are expensive), and a latency concern (loopback is microseconds; remote APIs are seconds).

flowchart LR subgraph CLOUD["Cloud, Generation"] direction TB GC1[Syllabus planning] GC2[Topic content] GC3[Storyboard text + SVG] GC4[Question + diagram gen] GC5[Practice game gen] GC6[Flashcard gen] end subgraph CLOUD_RT["Cloud, Real-time validation"] direction TB RT1[practice-game self-validation] RT2[html-validation-agent] RT3[storyboard lexical alignment] RT4[svg-layout-check] RT5[question validation-agent] end subgraph LOCAL["Local, Post-hoc validation"] direction TB L1[Topic HTML driver] L2[Storyboard precision] L3[Interactive SVG sweep] L4[Flashcard audit] L5[Game-specific solvers] L6[Auto-correction loop] end subgraph FB["Cloud, User feedback correction"] direction TB UF1[SvgCorrectionAgent] UF2[StoryboardCorrectionAgent] UF3[Gemini 3.1 Pro + ElevenLabs] end CLOUD -- artefacts --> LOCAL LOCAL -- remediation notes --> DIGEST[Weekly Gemini digest] DIGEST -. prompt edits .-> CLOUD FB -. learner thumbs-down .-> CLOUD_RT classDef cloud fill:#ffe9b3,stroke:#7c5800; classDef local fill:#d6f0d6,stroke:#265a26; classDef user fill:#f8d3c2,stroke:#8a3315; class CLOUD,CLOUD_RT,GC1,GC2,GC3,GC4,GC5,GC6,RT1,RT2,RT3,RT4,RT5,DIGEST cloud; class LOCAL,L1,L2,L3,L4,L5,L6 local; class FB,UF1,UF2,UF3 user;
Figure 8. Generation is cloud; post-hoc validation + correction is local; user-feedback-driven correction is cloud. The validator never crosses from local to cloud LLM on its hot path.

9.1 What lives on which side

ConcernCloud (Gemini / ElevenLabs)Local (MLX Qwen3)
Content generation✓ all C1–C7n/a
Real-time gates inside generation✓ 5 agentsn/a
Post-hoc validationn/a✓ A1–A24
Validator-side auto-correctionn/a✓ MLX :8000
User thumbs-down correction✓ B1–B4n/a
Weekly prompt synthesis✓ Gemini 1 call per clustern/a

10. Closing the loop

The validator does not merely flag bugs, it generates the data needed to prevent the same class of bug next time. Each issue document carries a remediation note written by the local coder model after the consensus vision pass. The string follows a strict shape:

  1. What the bug is, in one sentence.
  2. The proximate cause inferred from the screenshot.
  3. The exact sentence the generation prompt should add to prevent the class.

A weekly Cloud Scheduler job reads the past seven days of notes, groups them by content kind and root cause, and sends each group to Gemini for synthesis into one prompt-edit suggestion. The output lands in a dedicated Firestore collection and surfaces in the admin review console for human acceptance and manual application to the generation prompts.

flowchart LR V[Validator run] --> ISSUE[Issue documents
with remediation notes] ISSUE -- every week --> DIG[Weekly digest job] DIG -- one synthesis call per cluster --> GEM[Gemini 3.1 Pro] GEM --> CFD[Feedback-digest collection] CFD --> ADMIN[Admin reviews in console] ADMIN -. manual edit .-> PROMPTS[Generation prompts] PROMPTS -. next syllabus is better .-> V
Figure 9. The prompt-improvement feedback loop. Validator findings drive prompt edits that prevent the next syllabus from producing the same bug.

11. Operational characteristics

11.1 Recovery

A scheduled cleanup Cloud Function runs every 15 minutes. It scans for topics in every intermediate processing status and reapplies the appropriate enqueue. Quota-induced errors are retried after a 4-hour delay, up to 5 times.

11.2 Deployment isolation

Three independent guards prevent any validator code from entering production:

  1. Package isolation. The validator package is referenced by neither the web app nor the cloud-function backend; production deploys ship only the latter two.
  2. Sandbox route exclusion. Sandbox-only React routes use a development-only file-extension convention; the build configuration includes that extension only when an explicit sandbox flag is set at build time.
  3. CI verification. A release-time script fails the build if any sandbox path leaks into the static export bundle.

11.3 Telemetry

SignalWhere to look
Agent activityCentralised log stream, filtered by the per-agent log-line prefix.
Admin-attention conditionsAdmin-alert log sentinel; permanent-failure marker on the topic document.
SVG root-cause distributionBigQuery analytics table, partitioned by feedback date.
Validator runsAdmin review console, Content Tests page.
Weekly prompt suggestionsAdmin review console, Prompt Feedback page.

12. Conclusion

The whiz.coach syllabus pipeline demonstrates a workable production pattern for AI-generated educational content: decompose generation into specialised, bounded, idempotent agents on managed cloud infrastructure; defer subjective visual judgement to a fully-local, privacy-preserving validation tier; and close the loop by feeding validator findings back into generation prompts. The same architecture that produces a 60-topic syllabus in hours also catches its own regressions overnight and proposes the fixes, without ever uploading a learner's unreleased curriculum to a third-party vision model.

The two design choices that did the most work were fire-and-forget for asymmetric latency (the image / storyboard fan-out) and cloud-generate / local-validate (the 2026-05-12 pivot). Both replaced synchronous coupling with a small amount of well-placed convention, a pre-determined URL pattern in one case, a six-rule local-only contract in the other.

Topics covered in more depth in companion material.