Mastery-Based Learning Through Rich Question Metadata

How structured question metadata powers adaptive difficulty, gap detection, and prerequisite-aware progression at scale.

Abstract

Traditional question banks treat questions as isolated items with minimal metadata — a difficulty label and a topic tag. This paper describes how enriching every question with structured metadata (Bloom's taxonomy level, difficulty gradations, topic hierarchy position, misconception tags, and learning outcome mappings) enables a mastery-based learning system that adapts to each student. The system continuously profiles practice performance across all game types, detects knowledge gaps, gates progression on prerequisite mastery, and adjusts difficulty automatically — all driven by metadata attached to questions at generation time.

1. From Questions to Learning Signals

Every time a student answers a question, the system captures not just correctness but context:

Signal Source
Topic accuracy Which topic the question belongs to
Difficulty performance Whether the student succeeded at Easy, Moderate, Difficult, or Damn Hard
Bloom's level reached Whether the student can Remember, Apply, or Analyze
Misconception exposure Which common errors the question was designed to surface
Trend direction Whether recent performance is improving, stable, or declining

This transforms each answered question from a binary score into a multidimensional learning signal. The richer the question metadata, the more the system learns about the student from every interaction.

2. Question Metadata Architecture

2.1 Required Metadata Fields

Every generated question carries the following metadata:

Field Values Purpose in Mastery System
topic Syllabus topic name Groups performance by knowledge area
difficulty Super Easy, Easy, Moderate, Difficult, Damn Hard 5-level scale for adaptive difficulty selection
bloomsLevel Remember, Understand, Apply, Analyze, Evaluate, Create Ensures cognitive diversity and tracks depth of understanding
misconceptions 2-5 common errors Identifies specific conceptual gaps, not just topic-level weakness
syllabusId Reference to syllabus Links to topic hierarchy and prerequisites
subject Subject name Enables cross-subject proficiency tracking
learningOutcomes Syllabus-defined objectives Maps questions to curriculum goals

2.2 Topic Hierarchy Integration

Questions inherit position in a three-tier topic hierarchy:

Tier Definition Example (Physics)
Tier 1 — Foundational No prerequisites Kinematics, Waves
Tier 2 — Intermediate All prerequisites are Tier 1 Dynamics (requires Kinematics)
Tier 3 — Advanced Has Tier 2+ prerequisites Quantum Physics (requires Waves + Energy)

The hierarchy is generated during syllabus extraction using topological sort (Kahn's algorithm) and stored alongside the syllabus. Each question's topic maps to a tier, enabling the mastery system to understand prerequisite relationships.

3. Continuous Practice Profiling

3.1 Universal Data Collection

The practice profiler analyzes every game completion — not just AI Coach quizzes, but also Quick Play, solo games, multiplayer games, and goal-linked quizzes. This comprehensive collection builds profiles faster and reflects the student's true ability across all practice contexts.

3.2 Per-Topic Statistics

For each topic within a subject, the profiler maintains:

Metric Calculation Use
Accuracy correct / attempted Primary mastery indicator
Attempts Running total Confidence in the accuracy signal
Per-difficulty breakdown Accuracy at each of 5 difficulty levels Drives difficulty recommendation
Trend Recent batch vs prior batch accuracy Detects improvement or regression
Gap flag Accuracy < 50% after 5+ attempts Triggers notifications and focus
Weak flag Accuracy 50-70% after 5+ attempts Triggers reminders
Recommended difficulty Highest difficulty with >= 70% accuracy, then one step up Adaptive challenge

3.3 Trend Detection

Trends compare recent performance against prior performance:

Condition Classification
Recent > prior + 10% Improving
Recent < prior - 10% Declining
Within 10% Stable

Trends drive notifications ("Your Algebra accuracy is declining — review fundamentals") and influence quiz generation weighting.

4. Adaptive Difficulty

4.1 Level-Based Difficulty Mapping

The system maps student proficiency levels to question difficulty selections:

Student Level Questions Served
Not Started / Beginner Super Easy, Easy
Intermediate Easy, Moderate, Difficult
Advanced Moderate, Difficult, Damn Hard
Mastered Difficult, Damn Hard

As students demonstrate mastery, they automatically receive harder questions — and if they struggle, the system steps back to reinforce foundations.

4.2 Auto-Level Adjustment

After 5+ quizzes, the system adjusts levels automatically using a rolling weighted average:

newAccuracy = 0.3 * quizAccuracy + 0.7 * currentAccuracy

This weighted approach prevents a single bad quiz from causing demotion while still responding to sustained performance changes.

4.3 User Override

Students can set a preferred difficulty (easy, moderate, hard) that bypasses the automatic mapping. This respects learner autonomy while defaulting to data-driven selection.

5. Gap Detection & Prerequisite Awareness

5.1 Automated Gap Detection

The system classifies topics after every practice session:

Classification Threshold Minimum Attempts Action
Gap (critical) Accuracy < 50% 5+ Immediate notification, 3x quiz weight
Weak Accuracy 50-70% 5+ Reminder if not practiced in 7 days, 2x quiz weight

5.2 Prerequisite-Aware Quiz Generation

When generating quizzes, the system reads the topic hierarchy and applies a 0.05x weight multiplier to topics whose prerequisites are not mastered (< 80% accuracy or < 10 attempts). This heavily de-prioritizes advanced topics until foundations are solid.

For example, if a student hasn't mastered Kinematics (Tier 1), questions on Dynamics (Tier 2, requires Kinematics) are virtually excluded from quizzes. The student focuses on foundations first.

5.3 Prerequisite-Enriched Notifications

Gap notifications include prerequisite context when topic hierarchy exists:

"You're struggling with Kinematics (42% accuracy). This is a prerequisite for 3 other topics — mastering it will unlock Dynamics, Circular Motion, and Momentum."

This helps students understand why a gap matters, not just that it exists.

6. Tiered Assessment System

6.1 Staged Assessments

Rather than one large assessment, the system uses tier-based staging aligned with the topic hierarchy:

Tier Eligibility Questions
Tier 1 (Foundational) 15+ questions answered in subject Up to 10 questions from Tier 1 topics
Tier 2 (Intermediate) All Tier 1 topics mastered Up to 10 questions from Tier 2 topics
Tier 3 (Advanced) All Tier 2 topics mastered Up to 10 questions from Tier 3 topics

6.2 Mastery Threshold

A topic is considered mastered when all three conditions are met:

  1. Accuracy >= 80% on that topic
  2. >= 10 questions attempted
  3. Hard-question gate: >= 2 attempts at Difficult or Damn Hard difficulty with >= 60% accuracy

The hard-question gate prevents students from achieving "Mastered" status using only Easy questions. If a topic's question bank lacks Difficult/Damn Hard questions, it cannot reach Mastered — this incentivizes content quality.

6.3 Weighted Proficiency Scoring

Assessment results use difficulty-weighted scoring:

Difficulty Weight
Super Easy 0.5x
Easy 1x
Moderate 2x
Difficult 3x
Damn Hard 4x

Getting a Difficult question right contributes 6x more to proficiency than getting a Super Easy question right. This reflects the true depth of understanding.

6.4 Contextual Prompting

Assessments find the student rather than requiring the student to find them. Prompts appear on the game results screen after any practice game when eligibility conditions are met. Students can dismiss prompts (7-day cooldown, max 3 dismissals before the system stops asking).

7. The Metadata-Mastery Feedback Loop

Rich question metadata creates a virtuous cycle:

  1. Questions carry metadata (topic, difficulty, Bloom's level, misconceptions)
  2. Practice generates signals (per-topic accuracy, per-difficulty performance, trends)
  3. Signals build profiles (gap detection, difficulty recommendation, mastery levels)
  4. Profiles inform selection (quiz generation weights topics and difficulties based on profile)
  5. Selection targets weaknesses (gaps get 3x weight, weak areas get 2x)
  6. Targeted practice generates better signals (more data on weak areas)

Without structured metadata, this loop cannot function. A question labeled only "Math — Medium" provides one learning signal. A question labeled "Quadratic Equations — Difficult — Apply — misconceptions: sign errors in factoring, confusing roots with coefficients" provides a dozen.

8. Notification System

The system uses practice profile data to send actionable notifications with intelligent throttling:

Notification Trigger Cooldown Daily Limit
Gap detected New topic drops below 50% 3 days per topic 3 total
Weak area reminder Weak topic not practiced 7+ days 7 days per topic 3 total
Level-up suggestion Accuracy >= 80%, 10+ questions, no gaps 7 days per subject 3 total
Assessment ready 15+ questions, not yet assessed 7 days per subject 3 total
Weekly digest Sunday summary Weekly Separate
Coach gap alert Learner gap detected 3 days per learner per topic Separate

Every notification links directly to an action — practice a topic, take an assessment, or review progress. The daily cap of 3 prevents notification fatigue.

9. Design Principles

9.1 Analyze All Practice Data

The profiler runs on every game completion regardless of source. A student who practices through multiplayer games builds the same profile as one using AI Coach quizzes. No practice goes untracked.

9.2 Assessment Should Find the Student

Rather than requiring students to navigate to an assessment screen, prompts appear contextually during active engagement — specifically on the game results screen after practice.

9.3 Respect Learner Autonomy

Students can override difficulty levels, dismiss assessment prompts, and set their own pace. The system adapts and suggests; it does not mandate.

9.4 Foundations First

The prerequisite-aware quiz generation and tiered assessments enforce a natural learning progression without rigidly locking content. Students are guided toward foundations, but not blocked from exploring advanced topics if they choose.

9.5 Transaction Safety

Profile updates run inside Firestore transactions. Notifications are sent only after successful commits. This prevents inconsistent state from partial updates.

10. Conclusion

Mastery-based learning at scale requires two ingredients: rich metadata on every question, and a system that continuously converts practice data into adaptive decisions.

The question metadata architecture described here — topic hierarchy position, 5-level difficulty, Bloom's taxonomy, and misconception tags — transforms every answered question into a multidimensional learning signal. The practice profiler, gap detection, adaptive difficulty, and tiered assessment systems consume these signals to create personalized learning paths that prioritize foundations, target weaknesses, and gate progression on demonstrated mastery.

The result is a system where the act of practicing any question, in any context, contributes to an increasingly accurate picture of what each student knows and what they need to learn next.