The whiz.coach platform turns an instructor-provided syllabus, anything from a national curriculum PDF to a one-page outline, into a complete, learner-ready study experience: structured topics, narrated audio with synchronised SVG storyboards, interactive visualisations, practice questions, flashcards, and a master index. This document describes the two production subsystems that make that transformation reliable at scale:
Together they form a generate-cloud / validate-local architecture that combines the throughput of managed cloud services with the privacy, cost, and iteration speed of on-device inference.
A single subject (say, Cambridge Primary Maths Stage 5) expands into roughly 60–120 topics. Each topic must yield:
The naive approach, a single monolithic prompt, fails on three axes: output budget (a 60-topic syllabus easily exceeds Gemini's 65 k output tokens), partial failure modes (TTS rate limits should not erase content work), and verifiability (a single artefact must be inspectable by domain-specific judges). The system therefore decomposes generation into specialised agents and validation into specialised tiers.
The system is partitioned across three planes: a cloud generation plane, a local validation plane, and a cloud-side user-feedback correction plane that handles thumbs-down submissions from learners. All three share Firestore as the source of truth.
Each agent extends a common base class, receives its work via a Cloud Tasks message, transitions topic state in Firestore, and either chains to the next agent or fans out in parallel. Twelve agents are in production.
| Agent | Queue | Thinking | Responsibility |
|---|---|---|---|
| SourceBriefAgent | source-brief | medium | Reads uploaded sources + Google Search enrichment; emits canonical Markdown briefs, syllabus overview, and up to 10 key terms. Also handles source-grounded topic additions. |
| SyllabusPlanningAgent | planning | high | Proposes modules, topics, prerequisites, topic kind (concept / A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order./ A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order./ reference), and the topic-hierarchy tier graph for mastery learning. |
| TopicContentAgent | content | high | Age-adaptive structured JSON (intro, real-world apps, core content, exam questions, etc.) plus flashcards via a dedicated split Gemini call. |
| ResourceValidationAgent | resource | low | YouTube Data API + AI search for external links; URL liveness check; weighted scoring (relevance 40 / quality 30 / freshness 20 / accessibility 10). |
| ContentValidationAgent | validation | medium | LLM-driven factual + pedagogical + completeness + difficulty audit. Decides between expert review, improvement loop, or fan-out to image + storyboard generation. |
| ContentImprovementAgent | improvement | medium | Iterative content rewrite (max 3 cycles); also honours admin "edit this topic" instructions via a request-document pattern. |
| ImageGenerationAgent | image | high | Static images via flash image model; interactive SVGs via Gemini with thinkingLevel: high. Enforces per-age SVG caps and anti-duplication rules. |
| TopicStoryboardAgent | storyboard | high | Fire-and-forget; Gemini narration + SVG + cues, Cloud TTS synthesis, Cloud Speech-to-Text long-running recognition for forced alignment, MP3 + storyboard JSON to pre-determined URLs. |
| StudyMaterialRenderAgent | render | n/a | Firestore JSON → mobile-first HTML; defensive script wrapping; audio player with pre-determined URL + onloadedmetadata graceful reveal; LO marker + reveal-on-click engagement bridges. |
| HTMLValidationAgent | html-validation | low | Puppeteer pass: element overlap, SVG internal overlap, broken form controls (<select> inside <foreignObject>), console-error capture. Triggers fix_svg auto-fix. |
| IndexGenerationAgent | index | medium | Fires only when completedCount === totalTopics; topological-sort learning sequence, aggregated resources, AI-generated study tips, key-terms section. |
| SvgCorrectionAgent | svg-correction | high | User-feedback driven. Path A: fix existing SVG from screenshot + description. Path B: regenerate empty SVG from spec (age + domain aware). 11-category root-cause taxonomy. |
An optional thirteenth agent, PastPaperAnalysisAgent, runs on demand to surface exam-paper patterns into the syllabus document.
For a fresh syllabus, the pipeline cascades from a single Firestore onCreate trigger. Agents that share no dependency run in parallel, both across topics and within a single topic (most visibly the image + storyboard fan-out).
The storyboard path is intentionally fire-and-forget: it never chains to another agent. The rendered HTML is emitted with pre-determined URLs for the MP3 and storyboard JSON, so the renderer does not need to wait for audio completion, it embeds an <audio> tag whose onloadedmetadata handler reveals the player only when the file becomes fetchable. This eliminates a synchronous dependency between two paths of very different latency (1–3 min vs 5–15 min).
HTML validation is the only loop in the system that actively rewrites artefacts. It is bounded by an SVG fix-attempt counter, a regeneration-triggered flag, and a fallback needs-review state that the cleanup scheduler picks up after two hours.
When a learner taps thumbs-down on an interactive SVG, the web app posts a feedback record to a dedicated Cloud Function, which enqueues SvgCorrectionAgent on its own Cloud Tasks queue. This is the one cloud-side correction path that remained on Gemini after the 2026-05-12 local-validator pivot.
The pipeline composes a small number of Google Cloud primitives and a handful of external service providers, each chosen for a specific operational property.
| Component | Role | How the pipeline uses it |
|---|---|---|
| Cloud Functions v2 | Agent runtime | A single HTTP entry point dispatches each Cloud Tasks message to the named agent. Per-function timeouts up to 60 min support long Gemini calls and speech-to-text polling. |
| Cloud Tasks | Async messaging | One queue per agent lets each stage be rate-limited and retried independently. Typical retry budget is 2 retries (3 total attempts); per-queue dispatch deadlines bound the total attempt budget. |
| Firestore | State + idempotency | Topic documents carry status enums; sibling request-document subcollections (for admin-initiated edits and topic additions) make retries idempotent. Document-create triggers seed the pipeline. |
| Cloud Storage | Artefact host | Per-syllabus prefixes hold images, audio, storyboard JSON, and rendered HTML. Pre-determined URLs let the renderer emit links to assets that don't yet exist. |
| Cloud TTS (Neural2 / Chirp3-HD) | Narration synthesis | Voice selected per (language, ageGroup), Journey-F for younger learners, Chirp3-HD-Aoede for older, Hindi voices for Devanagari content. Plain text, no SSML. |
| ElevenLabs | TTS + alignment | Primary English TTS; provides forced alignment to ±8 ms (still gated for non-English). |
| Cloud Speech-to-Text | Forced alignment | Long-running recognition on the synthesised MP3 yields per-word timings that map each storyboard cue phrase to a time offset in seconds. |
| Cloud Scheduler | Maintenance + digests | 15-minute cleanup sweeper for stalled topics; weekly Gemini synthesis over the local validators remediation notes. |
| BigQuery | Analytics warehouse | Daily export of SVG feedback (partitioned by date, clustered by root_cause) feeds the weekly prompt-improvement review. |
| YouTube Data API | External resources | Language-steered search; non-English syllabi additionally prepend the languages native name into the query string for stronger search-result bias. |
Each agent owns its queue so that a backlog in one stage cannot stall another. Queue-level rate limits map onto provider quotas (Gemini RPM, Cloud TTS RPM, STT operation budget).
| Bound | Where | What it prevents |
|---|---|---|
| 3 improvement attempts | ContentImprovementAgent | Endless validate ↔ improve cycles on irreducible content disputes. |
| 3 SVG fix attempts, then regenerate | HTMLValidationAgent | Repeated cosmetic patches that never converge. |
| 1 regeneration per topic | regeneration-triggered flag | Regenerate-fix-regenerate ping-pong. |
| 2 Cloud Tasks retries (3 total) | Queue config | Transient provider failures masking as permanent. |
| 15 attempts hard cap | Base-agent attempt counter | Any path that escaped the agent-specific bounds. |
| 20 correction iterations | Local validator | Validator-side auto-correction divergence. |
The image path takes 5–15 minutes; the storyboard path takes 1–3 minutes. Coupling them would either delay the page or block on the slower one. Instead, the renderer commits to a pre-determined URL pattern, content/{syllabusId}/audio/{topicId}.mp3, and emits HTML against that path. The <audio> element's onloadedmetadata handler reveals the player only when the browser successfully parses the metadata header; if the file never lands, the section silently remains hidden. No re-render is required when audio completes.
A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|
A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|
A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order.|
A single topic-kind field, chosen by the planner from concept, lab, capstone, or reference, branches both the content prompt and the renderers section order. A lab topic emits hands-on steps, pitfalls, and tools; a capstone emits a rubric; a reference emits a glossary. Adding a new kind is therefore a content-prompt change plus a renderer switch, not a new pipeline.
Every agent that touches learner-facing text or visuals reads the syllabus average age and language fields and routes through shared configuration tables for age-tuned content, age-tuned imagery, and language registry. The same agent that produces a playful, emoji-rich cartoon storyboard for a 9-year-old produces a slate/sky-blue editorial schematic for an adult, by reading two scalar fields and a config map.
| Level | Agents |
|---|---|
high | SVG generation, image generation, content generation, syllabus planning, SvgCorrection. |
medium | Content validation, content improvement, index generation, SVG fix analysis, past-paper analysis. |
low | Resource validation, HTML validation. |
SVG output is the system's highest-failure-rate artefact. The pipeline applies a layered defence:
<select>, <foreignObject> form controls, drag-and-drop), required patterns (1:1 viewBox, 44×44 px targets, slate slider tracks).The local validator is a self-contained package that is the exclusive owner of every post-hoc validation, audit, and fix-proposal in the system. The 2026-05-12 pivot moved all visual judgement and validator-side correction off cloud Gemini and onto the operator's MacBook Pro M5 Max via MLX.
| Port | Model | Role |
|---|---|---|
:8000 | Qwen3-Coder-Next 4-bit | Auto-correction tiers: cue, SVG, or JSON patches, with a full regeneration tier for narration + SVG. Also writes remediation notes per issue. |
:8001 | Qwen3-VL 32B Thinking | Vision judge for every captured viewport; per-cue storyboard precision tier; strict-consensus second pass when first-pass confidence < 0.7. |
The validator decomposes into 24 named activities. The headline is that none of them invoke a cloud LLM. A representative sample:
| ID | Activity | Where it runs |
|---|---|---|
| A1 | Render topic HTML page | local Playwright |
| A3 | Vision issue detection | MLX :8001 |
| A6 | Storyboard precision (per-cue + narration judge) | MLX :8001 |
| A9 | Storyboard auto-correction (cue / svg / json patches) | MLX :8000 + GCS upload |
| A10 | Full regenerate (narration + SVG, reuse existing MP3) | MLX :8000, no TTS call |
| A11 | Topic-HTML interactive-SVG auto-correction | MLX :8000 |
| A17 | Per-issue remediation-note annotation | MLX + Firestore write |
| A20 | Auto-upload of run report to admin dashboard | Firestore + Cloud Storage write |
| A23 | Triage validator report | local, no LLM |
full_regenerate via heuristic cue re-alignment against the original forced-alignment timings.The system enforces a strict separation between where content is generated and where it is judged. Crossing the boundary is a privacy concern (screenshots may include unreleased curriculum), a cost concern (vision tokens are expensive), and a latency concern (loopback is microseconds; remote APIs are seconds).
| Concern | Cloud (Gemini / ElevenLabs) | Local (MLX Qwen3) |
|---|---|---|
| Content generation | ✓ all C1–C7 | n/a |
| Real-time gates inside generation | ✓ 5 agents | n/a |
| Post-hoc validation | n/a | ✓ A1–A24 |
| Validator-side auto-correction | n/a | ✓ MLX :8000 |
| User thumbs-down correction | ✓ B1–B4 | n/a |
| Weekly prompt synthesis | ✓ Gemini 1 call per cluster | n/a |
The validator does not merely flag bugs, it generates the data needed to prevent the same class of bug next time. Each issue document carries a remediation note written by the local coder model after the consensus vision pass. The string follows a strict shape:
A weekly Cloud Scheduler job reads the past seven days of notes, groups them by content kind and root cause, and sends each group to Gemini for synthesis into one prompt-edit suggestion. The output lands in a dedicated Firestore collection and surfaces in the admin review console for human acceptance and manual application to the generation prompts.
A scheduled cleanup Cloud Function runs every 15 minutes. It scans for topics in every intermediate processing status and reapplies the appropriate enqueue. Quota-induced errors are retried after a 4-hour delay, up to 5 times.
Three independent guards prevent any validator code from entering production:
| Signal | Where to look |
|---|---|
| Agent activity | Centralised log stream, filtered by the per-agent log-line prefix. |
| Admin-attention conditions | Admin-alert log sentinel; permanent-failure marker on the topic document. |
| SVG root-cause distribution | BigQuery analytics table, partitioned by feedback date. |
| Validator runs | Admin review console, Content Tests page. |
| Weekly prompt suggestions | Admin review console, Prompt Feedback page. |
The whiz.coach syllabus pipeline demonstrates a workable production pattern for AI-generated educational content: decompose generation into specialised, bounded, idempotent agents on managed cloud infrastructure; defer subjective visual judgement to a fully-local, privacy-preserving validation tier; and close the loop by feeding validator findings back into generation prompts. The same architecture that produces a 60-topic syllabus in hours also catches its own regressions overnight and proposes the fixes, without ever uploading a learner's unreleased curriculum to a third-party vision model.
The two design choices that did the most work were fire-and-forget for asymmetric latency (the image / storyboard fan-out) and cloud-generate / local-validate (the 2026-05-12 pivot). Both replaced synchronous coupling with a small amount of well-placed convention, a pre-determined URL pattern in one case, a six-rule local-only contract in the other.