Mastery-Based Learning Through Rich Question Metadata

How structured question metadata powers adaptive difficulty, gap detection, and prerequisite-aware progression at scale.

Abstract

Traditional question banks treat questions as isolated items with minimal metadata — a difficulty label and a topic tag. This paper describes how enriching every question with structured metadata (Bloom's taxonomy level, difficulty gradations, topic hierarchy position, misconception tags, and learning outcome mappings) enables a mastery-based learning system that adapts to each student. The system continuously profiles practice performance across all game types, detects knowledge gaps, gates progression on prerequisite mastery, and adjusts difficulty automatically — all driven by metadata attached to questions at generation time.

1. From Questions to Learning Signals

Every time a student answers a question, the system captures not just correctness but context:

Signal	Source
Topic accuracy	Which topic the question belongs to
Difficulty performance	Whether the student succeeded at Easy, Moderate, Difficult, or Damn Hard
Bloom's level reached	Whether the student can Remember, Apply, or Analyze
Misconception exposure	Which common errors the question was designed to surface
Trend direction	Whether recent performance is improving, stable, or declining

This transforms each answered question from a binary score into a multidimensional learning signal. The richer the question metadata, the more the system learns about the student from every interaction.

2. Question Metadata Architecture

2.1 Required Metadata Fields

Every generated question carries the following metadata:

Field	Values	Purpose in Mastery System
topic	Syllabus topic name	Groups performance by knowledge area
difficulty	Super Easy, Easy, Moderate, Difficult, Damn Hard	5-level scale for adaptive difficulty selection
bloomsLevel	Remember, Understand, Apply, Analyze, Evaluate, Create	Ensures cognitive diversity and tracks depth of understanding
misconceptions	2-5 common errors	Identifies specific conceptual gaps, not just topic-level weakness
syllabusId	Reference to syllabus	Links to topic hierarchy and prerequisites
subject	Subject name	Enables cross-subject proficiency tracking
learningOutcomes	Syllabus-defined objectives	Maps questions to curriculum goals

2.2 Topic Hierarchy Integration

Questions inherit position in a three-tier topic hierarchy:

Tier	Definition	Example (Physics)
Tier 1 — Foundational	No prerequisites	Kinematics, Waves
Tier 2 — Intermediate	All prerequisites are Tier 1	Dynamics (requires Kinematics)
Tier 3 — Advanced	Has Tier 2+ prerequisites	Quantum Physics (requires Waves + Energy)

The hierarchy is generated during syllabus extraction using topological sort (Kahn's algorithm) and stored alongside the syllabus. Each question's topic maps to a tier, enabling the mastery system to understand prerequisite relationships.

3. Continuous Practice Profiling

3.1 Universal Data Collection

The practice profiler analyzes every game completion — not just AI Coach quizzes, but also Quick Play, solo games, multiplayer games, and goal-linked quizzes. This comprehensive collection builds profiles faster and reflects the student's true ability across all practice contexts.

3.2 Per-Topic Statistics

For each topic within a subject, the profiler maintains:

Metric	Calculation	Use
Accuracy	`correct / attempted`	Primary mastery indicator
Attempts	Running total	Confidence in the accuracy signal
Per-difficulty breakdown	Accuracy at each of 5 difficulty levels	Drives difficulty recommendation
Trend	Recent batch vs prior batch accuracy	Detects improvement or regression
Gap flag	Accuracy < 50% after 5+ attempts	Triggers notifications and focus
Weak flag	Accuracy 50-70% after 5+ attempts	Triggers reminders
Recommended difficulty	Highest difficulty with >= 70% accuracy, then one step up	Adaptive challenge

3.3 Trend Detection

Trends compare recent performance against prior performance:

Condition	Classification
Recent > prior + 10%	Improving
Recent < prior - 10%	Declining
Within 10%	Stable

Trends drive notifications ("Your Algebra accuracy is declining — review fundamentals") and influence quiz generation weighting.

4. Adaptive Difficulty

4.1 Level-Based Difficulty Mapping

The system maps student proficiency levels to question difficulty selections:

Student Level	Questions Served
Not Started / Beginner	Super Easy, Easy
Intermediate	Easy, Moderate, Difficult
Advanced	Moderate, Difficult, Damn Hard
Mastered	Difficult, Damn Hard

As students demonstrate mastery, they automatically receive harder questions — and if they struggle, the system steps back to reinforce foundations.

4.2 Auto-Level Adjustment

After 5+ quizzes, the system adjusts levels automatically using a rolling weighted average:

newAccuracy = 0.3 * quizAccuracy + 0.7 * currentAccuracy

Accuracy >= 80%: promote one level (up to Mastered)
Accuracy < 40%: demote one level (down to Beginner minimum)

This weighted approach prevents a single bad quiz from causing demotion while still responding to sustained performance changes.

4.3 User Override

Students can set a preferred difficulty (easy, moderate, hard) that bypasses the automatic mapping. This respects learner autonomy while defaulting to data-driven selection.

5. Gap Detection & Prerequisite Awareness

5.1 Automated Gap Detection

The system classifies topics after every practice session:

Classification	Threshold	Minimum Attempts	Action
Gap (critical)	Accuracy < 50%	5+	Immediate notification, 3x quiz weight
Weak	Accuracy 50-70%	5+	Reminder if not practiced in 7 days, 2x quiz weight

5.2 Prerequisite-Aware Quiz Generation

When generating quizzes, the system reads the topic hierarchy and applies a 0.05x weight multiplier to topics whose prerequisites are not mastered (< 80% accuracy or < 10 attempts). This heavily de-prioritizes advanced topics until foundations are solid.

For example, if a student hasn't mastered Kinematics (Tier 1), questions on Dynamics (Tier 2, requires Kinematics) are virtually excluded from quizzes. The student focuses on foundations first.

5.3 Prerequisite-Enriched Notifications

Gap notifications include prerequisite context when topic hierarchy exists:

"You're struggling with Kinematics (42% accuracy). This is a prerequisite for 3 other topics — mastering it will unlock Dynamics, Circular Motion, and Momentum."

This helps students understand why a gap matters, not just that it exists.

6. Tiered Assessment System

6.1 Staged Assessments

Rather than one large assessment, the system uses tier-based staging aligned with the topic hierarchy:

Tier	Eligibility	Questions
Tier 1 (Foundational)	15+ questions answered in subject	Up to 10 questions from Tier 1 topics
Tier 2 (Intermediate)	All Tier 1 topics mastered	Up to 10 questions from Tier 2 topics
Tier 3 (Advanced)	All Tier 2 topics mastered	Up to 10 questions from Tier 3 topics

6.2 Mastery Threshold

A topic is considered mastered when all three conditions are met:

Accuracy >= 80% on that topic
>= 10 questions attempted
Hard-question gate: >= 2 attempts at Difficult or Damn Hard difficulty with >= 60% accuracy

The hard-question gate prevents students from achieving "Mastered" status using only Easy questions. If a topic's question bank lacks Difficult/Damn Hard questions, it cannot reach Mastered — this incentivizes content quality.

6.3 Weighted Proficiency Scoring

Assessment results use difficulty-weighted scoring:

Difficulty	Weight
Super Easy	0.5x
Easy	1x
Moderate	2x
Difficult	3x
Damn Hard	4x

Getting a Difficult question right contributes 6x more to proficiency than getting a Super Easy question right. This reflects the true depth of understanding.

6.4 Contextual Prompting

Assessments find the student rather than requiring the student to find them. Prompts appear on the game results screen after any practice game when eligibility conditions are met. Students can dismiss prompts (7-day cooldown, max 3 dismissals before the system stops asking).

7. The Metadata-Mastery Feedback Loop

Rich question metadata creates a virtuous cycle:

Questions carry metadata (topic, difficulty, Bloom's level, misconceptions)
Practice generates signals (per-topic accuracy, per-difficulty performance, trends)
Signals build profiles (gap detection, difficulty recommendation, mastery levels)
Profiles inform selection (quiz generation weights topics and difficulties based on profile)
Selection targets weaknesses (gaps get 3x weight, weak areas get 2x)
Targeted practice generates better signals (more data on weak areas)

Without structured metadata, this loop cannot function. A question labeled only "Math — Medium" provides one learning signal. A question labeled "Quadratic Equations — Difficult — Apply — misconceptions: sign errors in factoring, confusing roots with coefficients" provides a dozen.

8. Notification System

The system uses practice profile data to send actionable notifications with intelligent throttling:

Notification	Trigger	Cooldown	Daily Limit
Gap detected	New topic drops below 50%	3 days per topic	3 total
Weak area reminder	Weak topic not practiced 7+ days	7 days per topic	3 total
Level-up suggestion	Accuracy >= 80%, 10+ questions, no gaps	7 days per subject	3 total
Assessment ready	15+ questions, not yet assessed	7 days per subject	3 total
Weekly digest	Sunday summary	Weekly	Separate
Coach gap alert	Learner gap detected	3 days per learner per topic	Separate

Every notification links directly to an action — practice a topic, take an assessment, or review progress. The daily cap of 3 prevents notification fatigue.

9. Design Principles

9.1 Analyze All Practice Data

The profiler runs on every game completion regardless of source. A student who practices through multiplayer games builds the same profile as one using AI Coach quizzes. No practice goes untracked.

9.2 Assessment Should Find the Student

Rather than requiring students to navigate to an assessment screen, prompts appear contextually during active engagement — specifically on the game results screen after practice.

9.3 Respect Learner Autonomy

Students can override difficulty levels, dismiss assessment prompts, and set their own pace. The system adapts and suggests; it does not mandate.

9.4 Foundations First

The prerequisite-aware quiz generation and tiered assessments enforce a natural learning progression without rigidly locking content. Students are guided toward foundations, but not blocked from exploring advanced topics if they choose.

9.5 Transaction Safety

Profile updates run inside Firestore transactions. Notifications are sent only after successful commits. This prevents inconsistent state from partial updates.

10. Conclusion

Mastery-based learning at scale requires two ingredients: rich metadata on every question, and a system that continuously converts practice data into adaptive decisions.

The question metadata architecture described here — topic hierarchy position, 5-level difficulty, Bloom's taxonomy, and misconception tags — transforms every answered question into a multidimensional learning signal. The practice profiler, gap detection, adaptive difficulty, and tiered assessment systems consume these signals to create personalized learning paths that prioritize foundations, target weaknesses, and gate progression on demonstrated mastery.

The result is a system where the act of practicing any question, in any context, contributes to an increasingly accurate picture of what each student knows and what they need to learn next.