AI-Powered Question & Diagram Quality Assurance

How multi-agent validation, multimodal AI analysis, and iterative correction produce high-quality educational content at scale.

Abstract

Generating educational questions at scale is only valuable if every question meets strict accuracy, clarity, and pedagogical standards. This paper describes the multi-agent quality assurance pipeline used to validate AI-generated questions and their accompanying diagrams. The system employs multimodal AI analysis, iterative correction with attempt limits, and automatic refutation to ensure that only verified content reaches students.

1. The Quality Challenge

AI-generated educational content faces several quality risks:

A single incorrect question erodes student trust. The system must catch errors before content is published, while maintaining throughput for large-scale generation.

2. Architecture Overview

The pipeline uses five specialized agents coordinated through Google Cloud Tasks with fire-and-forget messaging:

Agent Role
QuestionGenerationAgent Creates questions from syllabus content with structured metadata
DiagramAgent Generates SVG diagrams and converts to PNG
ValidationAgent Multimodal AI validation of questions and diagrams
ImprovementAgent Applies targeted corrections based on validation feedback
Orchestrator Routes questions through the pipeline and manages state

Each agent runs independently, communicating through Cloud Tasks queues with automatic retry and exponential backoff.

3. Question Generation with Rich Metadata

Questions are generated with structured metadata from the start, enabling downstream validation and adaptive use:

Metadata Field Purpose
Bloom's Taxonomy level Ensures cognitive diversity (Remember through Create)
Difficulty rating 5-level scale from Super Easy to Damn Hard
Topic and subtopic Links to syllabus hierarchy for mastery tracking
Misconceptions 2-5 commonly overlooked concepts per question
requiresDiagram Flags questions needing visual support
Learning outcomes Maps to syllabus-defined objectives

The generation prompt enforces a difficulty distribution (60%+ Difficult or Damn Hard) and uses structured syllabus data including learning outcomes, exam patterns, and flashcard content to produce contextually relevant questions.

4. Diagram Generation & Validation

4.1 Generation Pipeline

  1. SVG Generation: AI creates scalable vector graphics from question instructions
  2. Structure Validation: SVG must pass structural checks (valid tags, white background, drawing elements, text labels, inline styling only, minimum 200 characters)
  3. PNG Conversion: SVG rendered to 800x600 PNG using @resvg/resvg-js
  4. Cloud Storage Upload: PNG stored at diagrams/{syllabus}/{subject}/{questionId}.png

4.2 Diagram Rules

Diagrams follow strict educational integrity rules:

4.3 Multimodal Visual Analysis

The ValidationAgent performs generic visual analysis rather than programmatic pattern matching. It sends the rendered PNG image alongside the SVG source to a multimodal AI model that examines:

This approach adapts to new diagram types without code changes — the AI sees what students see.

4.4 Multiple Protection Layers

Missing diagrams are caught at multiple stages:

Layer When Action
Primary Before validation Block if diagram required but missing
Safety Net After AI analysis Override pass verdict if diagram still missing
Monitoring Throughout Enhanced logging of diagram status

5. Iterative Correction Flow

When validation detects issues, the system attempts correction before refuting a question.

5.1 Diagram Corrections

The critical innovation is counter increment after PNG success: the correction attempt counter only increments after a corrected SVG successfully converts to PNG. This prevents wasting attempts on unconvertible SVGs.

Correction sequence:

  1. ValidationAgent detects visual issue and provides corrected SVG
  2. SVG structure pre-validated (quality gate prevents applying invalid corrections)
  3. DiagramAgent converts corrected SVG to PNG
  4. Only on PNG success: diagramCorrectionCount increments
  5. Re-validation triggered with new PNG
  6. Up to 5 correction cycles before refutation

5.2 Progressive Angle Arc Fallback

For common angle arc placement issues (SVG sweep-flag problems):

Attempts 1-3 Attempts 4-5 After 5
Try correcting arc placement Remove arc entirely, keep angle labels Refute question

5.3 Question Content Corrections

The ImprovementAgent applies targeted fixes based on validation feedback:

Maximum 5 improvement iterations per question. After each improvement, the question is re-validated.

6. Refutation — The Quality Gate

Questions are moved to the refuted_questions collection when they cannot be fixed:

Trigger Threshold
Diagram correction exhausted 5 successful PNG conversions, still failing validation
Improvement iterations exhausted 5 content improvement cycles, still failing
Missing required diagram Cannot be generated after retries
Invalid SVG structure Cannot be corrected to valid form

Refuted questions are preserved for review but never shown to students. This hard quality gate ensures the question bank maintains high standards.

7. Loop Prevention

The system employs multiple mechanisms to prevent infinite processing loops:

Mechanism How It Works
Permanent trigger flags validationTriggered=true set atomically, never reset
Cloud Tasks retries 2 retries per queue with exponential backoff (60s to 1800s)
Improvement counter Max 5 iterations, then auto-refute
Diagram correction counter Max 5 successful PNG conversions
Cleanup scheduler Runs every 30 minutes, detects stuck questions
Idempotent task creation Cloud Tasks deduplication via task names

8. Quality Metrics

8.1 Validation Pass Rates

Question Type Initial Pass Rate After Corrections
Text-only questions ~85% ~97%
Questions with basic diagrams ~75% ~95%
Complex geometry with shading ~60% ~90%

8.2 Common Failure Modes

Issues detected in initial generation (before corrections):

Issue Frequency Impact Resolution
Missing shading 15% of diagram questions Medium Enhanced prompt templates
Incorrect proportions 12% of diagram questions High AI visual validation
Overlapping elements 8% of diagram questions Medium Multimodal detection
Invalid SVG structure 5% of diagram questions High Structure pre-validation
Answer revealed in diagram ~3% of diagram questions Critical Educational integrity check

Most issues resolve through 1-2 correction iterations, resulting in 95%+ overall success rate after corrections.

9. Monitoring & Observability

The pipeline provides comprehensive logging at each stage:

The cleanup scheduler (every 30 minutes) detects questions stuck in intermediate states and either re-enqueues them or marks them as failed, preventing indefinite processing.

10. Conclusion

Automated question generation requires equally automated quality assurance. The multi-agent pipeline described here achieves high content quality through:

The result is a question bank where every published question has passed AI-powered validation, with diagrams verified through visual analysis rather than brittle pattern matching.