AI-Powered Question & Diagram Quality Assurance
How multi-agent validation, multimodal AI analysis, and iterative correction produce high-quality educational content at scale.
Abstract
Generating educational questions at scale is only valuable if every question meets strict accuracy, clarity, and pedagogical standards. This paper describes the multi-agent quality assurance pipeline used to validate AI-generated questions and their accompanying diagrams. The system employs multimodal AI analysis, iterative correction with attempt limits, and automatic refutation to ensure that only verified content reaches students.
1. The Quality Challenge
AI-generated educational content faces several quality risks:
- Factual errors in questions or answer choices
- Ambiguous wording that confuses rather than tests understanding
- Diagram inaccuracies such as incorrect geometry, overlapping labels, or answer giveaways
- Missing pedagogical metadata (difficulty, topic, Bloom's level) that limits adaptive use
A single incorrect question erodes student trust. The system must catch errors before content is published, while maintaining throughput for large-scale generation.
2. Architecture Overview
The pipeline uses five specialized agents coordinated through Google Cloud Tasks with fire-and-forget messaging:
| Agent | Role |
|---|---|
| QuestionGenerationAgent | Creates questions from syllabus content with structured metadata |
| DiagramAgent | Generates SVG diagrams and converts to PNG |
| ValidationAgent | Multimodal AI validation of questions and diagrams |
| ImprovementAgent | Applies targeted corrections based on validation feedback |
| Orchestrator | Routes questions through the pipeline and manages state |
Each agent runs independently, communicating through Cloud Tasks queues with automatic retry and exponential backoff.
3. Question Generation with Rich Metadata
Questions are generated with structured metadata from the start, enabling downstream validation and adaptive use:
| Metadata Field | Purpose |
|---|---|
| Bloom's Taxonomy level | Ensures cognitive diversity (Remember through Create) |
| Difficulty rating | 5-level scale from Super Easy to Damn Hard |
| Topic and subtopic | Links to syllabus hierarchy for mastery tracking |
| Misconceptions | 2-5 commonly overlooked concepts per question |
| requiresDiagram | Flags questions needing visual support |
| Learning outcomes | Maps to syllabus-defined objectives |
The generation prompt enforces a difficulty distribution (60%+ Difficult or Damn Hard) and uses structured syllabus data including learning outcomes, exam patterns, and flashcard content to produce contextually relevant questions.
4. Diagram Generation & Validation
4.1 Generation Pipeline
- SVG Generation: AI creates scalable vector graphics from question instructions
- Structure Validation: SVG must pass structural checks (valid tags, white background, drawing elements, text labels, inline styling only, minimum 200 characters)
- PNG Conversion: SVG rendered to 800x600 PNG using
@resvg/resvg-js - Cloud Storage Upload: PNG stored at
diagrams/{syllabus}/{subject}/{questionId}.png
4.2 Diagram Rules
Diagrams follow strict educational integrity rules:
- Never show calculated values students need to derive
- Never label standard angles (e.g., 60 degrees in equilateral triangles)
- Only display explicitly given information
- Text labels under 15 characters to prevent truncation
- Single letters for vertices (A, B, C)
4.3 Multimodal Visual Analysis
The ValidationAgent performs generic visual analysis rather than programmatic pattern matching. It sends the rendered PNG image alongside the SVG source to a multimodal AI model that examines:
- Text overlapping with lines, shapes, or other text
- Missing or incomplete geometric elements
- Incorrectly positioned labels
- Missing or incorrect angle indicators
- Whether the diagram reveals answer values to students
- Any other visual quality issues ("including but not limited to")
This approach adapts to new diagram types without code changes — the AI sees what students see.
4.4 Multiple Protection Layers
Missing diagrams are caught at multiple stages:
| Layer | When | Action |
|---|---|---|
| Primary | Before validation | Block if diagram required but missing |
| Safety Net | After AI analysis | Override pass verdict if diagram still missing |
| Monitoring | Throughout | Enhanced logging of diagram status |
5. Iterative Correction Flow
When validation detects issues, the system attempts correction before refuting a question.
5.1 Diagram Corrections
The critical innovation is counter increment after PNG success: the correction attempt counter only increments after a corrected SVG successfully converts to PNG. This prevents wasting attempts on unconvertible SVGs.
Correction sequence:
- ValidationAgent detects visual issue and provides corrected SVG
- SVG structure pre-validated (quality gate prevents applying invalid corrections)
- DiagramAgent converts corrected SVG to PNG
- Only on PNG success:
diagramCorrectionCountincrements - Re-validation triggered with new PNG
- Up to 5 correction cycles before refutation
5.2 Progressive Angle Arc Fallback
For common angle arc placement issues (SVG sweep-flag problems):
| Attempts 1-3 | Attempts 4-5 | After 5 |
|---|---|---|
| Try correcting arc placement | Remove arc entirely, keep angle labels | Refute question |
5.3 Question Content Corrections
The ImprovementAgent applies targeted fixes based on validation feedback:
- Corrects wrong answers
- Improves ambiguous wording
- Enhances explanations
- Fixes MCQ choice quality
Maximum 5 improvement iterations per question. After each improvement, the question is re-validated.
6. Refutation — The Quality Gate
Questions are moved to the refuted_questions collection when they cannot be fixed:
| Trigger | Threshold |
|---|---|
| Diagram correction exhausted | 5 successful PNG conversions, still failing validation |
| Improvement iterations exhausted | 5 content improvement cycles, still failing |
| Missing required diagram | Cannot be generated after retries |
| Invalid SVG structure | Cannot be corrected to valid form |
Refuted questions are preserved for review but never shown to students. This hard quality gate ensures the question bank maintains high standards.
7. Loop Prevention
The system employs multiple mechanisms to prevent infinite processing loops:
| Mechanism | How It Works |
|---|---|
| Permanent trigger flags | validationTriggered=true set atomically, never reset |
| Cloud Tasks retries | 2 retries per queue with exponential backoff (60s to 1800s) |
| Improvement counter | Max 5 iterations, then auto-refute |
| Diagram correction counter | Max 5 successful PNG conversions |
| Cleanup scheduler | Runs every 30 minutes, detects stuck questions |
| Idempotent task creation | Cloud Tasks deduplication via task names |
8. Quality Metrics
8.1 Validation Pass Rates
| Question Type | Initial Pass Rate | After Corrections |
|---|---|---|
| Text-only questions | ~85% | ~97% |
| Questions with basic diagrams | ~75% | ~95% |
| Complex geometry with shading | ~60% | ~90% |
8.2 Common Failure Modes
Issues detected in initial generation (before corrections):
| Issue | Frequency | Impact | Resolution |
|---|---|---|---|
| Missing shading | 15% of diagram questions | Medium | Enhanced prompt templates |
| Incorrect proportions | 12% of diagram questions | High | AI visual validation |
| Overlapping elements | 8% of diagram questions | Medium | Multimodal detection |
| Invalid SVG structure | 5% of diagram questions | High | Structure pre-validation |
| Answer revealed in diagram | ~3% of diagram questions | Critical | Educational integrity check |
Most issues resolve through 1-2 correction iterations, resulting in 95%+ overall success rate after corrections.
9. Monitoring & Observability
The pipeline provides comprehensive logging at each stage:
- Agent success/failure rates per Cloud Tasks queue
- Validation pass/fail ratios including diagram-specific failures
- Correction cycle statistics (average iterations to pass)
- Refutation rates by question type and subject
- Processing time per question through the full pipeline
The cleanup scheduler (every 30 minutes) detects questions stuck in intermediate states and either re-enqueues them or marks them as failed, preventing indefinite processing.
10. Conclusion
Automated question generation requires equally automated quality assurance. The multi-agent pipeline described here achieves high content quality through:
- Multimodal AI validation that sees diagrams as students see them
- Iterative correction with quality gates and attempt limits
- Automatic refutation that prevents low-quality content from reaching students
- Loop prevention at multiple levels ensuring system stability
- Rich metadata generation enabling downstream mastery-based learning systems
The result is a question bank where every published question has passed AI-powered validation, with diagrams verified through visual analysis rather than brittle pattern matching.