AI-Powered Question & Diagram Quality Assurance

How multi-agent validation, multimodal AI analysis, and iterative correction produce high-quality educational content at scale.

Abstract

Generating educational questions at scale is only valuable if every question meets strict accuracy, clarity, and pedagogical standards. This paper describes the multi-agent quality assurance pipeline used to validate AI-generated questions and their accompanying diagrams. The system employs multimodal AI analysis, iterative correction with attempt limits, and automatic refutation to ensure that only verified content reaches students.

1. The Quality Challenge

AI-generated educational content faces several quality risks:

Factual errors in questions or answer choices
Ambiguous wording that confuses rather than tests understanding
Diagram inaccuracies such as incorrect geometry, overlapping labels, or answer giveaways
Missing pedagogical metadata (difficulty, topic, Bloom's level) that limits adaptive use

A single incorrect question erodes student trust. The system must catch errors before content is published, while maintaining throughput for large-scale generation.

2. Architecture Overview

The pipeline uses five specialized agents coordinated through Google Cloud Tasks with fire-and-forget messaging:

Agent	Role
QuestionGenerationAgent	Creates questions from syllabus content with structured metadata
DiagramAgent	Generates SVG diagrams and converts to PNG
ValidationAgent	Multimodal AI validation of questions and diagrams
ImprovementAgent	Applies targeted corrections based on validation feedback
Orchestrator	Routes questions through the pipeline and manages state

Each agent runs independently, communicating through Cloud Tasks queues with automatic retry and exponential backoff.

3. Question Generation with Rich Metadata

Questions are generated with structured metadata from the start, enabling downstream validation and adaptive use:

Metadata Field	Purpose
Bloom's Taxonomy level	Ensures cognitive diversity (Remember through Create)
Difficulty rating	5-level scale from Super Easy to Damn Hard
Topic and subtopic	Links to syllabus hierarchy for mastery tracking
Misconceptions	2-5 commonly overlooked concepts per question
requiresDiagram	Flags questions needing visual support
Learning outcomes	Maps to syllabus-defined objectives

The generation prompt enforces a difficulty distribution (60%+ Difficult or Damn Hard) and uses structured syllabus data including learning outcomes, exam patterns, and flashcard content to produce contextually relevant questions.

4. Diagram Generation & Validation

4.1 Generation Pipeline

SVG Generation: AI creates scalable vector graphics from question instructions
Structure Validation: SVG must pass structural checks (valid tags, white background, drawing elements, text labels, inline styling only, minimum 200 characters)
PNG Conversion: SVG rendered to 800x600 PNG using @resvg/resvg-js
Cloud Storage Upload: PNG stored at diagrams/{syllabus}/{subject}/{questionId}.png

4.2 Diagram Rules

Diagrams follow strict educational integrity rules:

Never show calculated values students need to derive
Never label standard angles (e.g., 60 degrees in equilateral triangles)
Only display explicitly given information
Text labels under 15 characters to prevent truncation
Single letters for vertices (A, B, C)

4.3 Multimodal Visual Analysis

The ValidationAgent performs generic visual analysis rather than programmatic pattern matching. It sends the rendered PNG image alongside the SVG source to a multimodal AI model that examines:

Text overlapping with lines, shapes, or other text
Missing or incomplete geometric elements
Incorrectly positioned labels
Missing or incorrect angle indicators
Whether the diagram reveals answer values to students
Any other visual quality issues ("including but not limited to")

This approach adapts to new diagram types without code changes — the AI sees what students see.

4.4 Multiple Protection Layers

Missing diagrams are caught at multiple stages:

Layer	When	Action
Primary	Before validation	Block if diagram required but missing
Safety Net	After AI analysis	Override pass verdict if diagram still missing
Monitoring	Throughout	Enhanced logging of diagram status

5. Iterative Correction Flow

When validation detects issues, the system attempts correction before refuting a question.

5.1 Diagram Corrections

The critical innovation is counter increment after PNG success: the correction attempt counter only increments after a corrected SVG successfully converts to PNG. This prevents wasting attempts on unconvertible SVGs.

Correction sequence:

ValidationAgent detects visual issue and provides corrected SVG
SVG structure pre-validated (quality gate prevents applying invalid corrections)
DiagramAgent converts corrected SVG to PNG
Only on PNG success: diagramCorrectionCount increments
Re-validation triggered with new PNG
Up to 5 correction cycles before refutation

5.2 Progressive Angle Arc Fallback

For common angle arc placement issues (SVG sweep-flag problems):

Attempts 1-3	Attempts 4-5	After 5
Try correcting arc placement	Remove arc entirely, keep angle labels	Refute question

5.3 Question Content Corrections

The ImprovementAgent applies targeted fixes based on validation feedback:

Corrects wrong answers
Improves ambiguous wording
Enhances explanations
Fixes MCQ choice quality

Maximum 5 improvement iterations per question. After each improvement, the question is re-validated.

6. Refutation — The Quality Gate

Questions are moved to the refuted_questions collection when they cannot be fixed:

Trigger	Threshold
Diagram correction exhausted	5 successful PNG conversions, still failing validation
Improvement iterations exhausted	5 content improvement cycles, still failing
Missing required diagram	Cannot be generated after retries
Invalid SVG structure	Cannot be corrected to valid form

Refuted questions are preserved for review but never shown to students. This hard quality gate ensures the question bank maintains high standards.

7. Loop Prevention

The system employs multiple mechanisms to prevent infinite processing loops:

Mechanism	How It Works
Permanent trigger flags	`validationTriggered=true` set atomically, never reset
Cloud Tasks retries	2 retries per queue with exponential backoff (60s to 1800s)
Improvement counter	Max 5 iterations, then auto-refute
Diagram correction counter	Max 5 successful PNG conversions
Cleanup scheduler	Runs every 30 minutes, detects stuck questions
Idempotent task creation	Cloud Tasks deduplication via task names

8. Quality Metrics

8.1 Validation Pass Rates

Question Type	Initial Pass Rate	After Corrections
Text-only questions	~85%	~97%
Questions with basic diagrams	~75%	~95%
Complex geometry with shading	~60%	~90%

8.2 Common Failure Modes

Issues detected in initial generation (before corrections):

Issue	Frequency	Impact	Resolution
Missing shading	15% of diagram questions	Medium	Enhanced prompt templates
Incorrect proportions	12% of diagram questions	High	AI visual validation
Overlapping elements	8% of diagram questions	Medium	Multimodal detection
Invalid SVG structure	5% of diagram questions	High	Structure pre-validation
Answer revealed in diagram	~3% of diagram questions	Critical	Educational integrity check

Most issues resolve through 1-2 correction iterations, resulting in 95%+ overall success rate after corrections.

9. Monitoring & Observability

The pipeline provides comprehensive logging at each stage:

Agent success/failure rates per Cloud Tasks queue
Validation pass/fail ratios including diagram-specific failures
Correction cycle statistics (average iterations to pass)
Refutation rates by question type and subject
Processing time per question through the full pipeline

The cleanup scheduler (every 30 minutes) detects questions stuck in intermediate states and either re-enqueues them or marks them as failed, preventing indefinite processing.

10. Conclusion

Automated question generation requires equally automated quality assurance. The multi-agent pipeline described here achieves high content quality through:

Multimodal AI validation that sees diagrams as students see them
Iterative correction with quality gates and attempt limits
Automatic refutation that prevents low-quality content from reaching students
Loop prevention at multiple levels ensuring system stability
Rich metadata generation enabling downstream mastery-based learning systems

The result is a question bank where every published question has passed AI-powered validation, with diagrams verified through visual analysis rather than brittle pattern matching.