Mastery-Based Learning Through Rich Question Metadata
How structured question metadata powers adaptive difficulty, gap detection, and prerequisite-aware progression at scale.
Abstract
Traditional question banks treat questions as isolated items with minimal metadata — a difficulty label and a topic tag. This paper describes how enriching every question with structured metadata (Bloom's taxonomy level, difficulty gradations, topic hierarchy position, misconception tags, and learning outcome mappings) enables a mastery-based learning system that adapts to each student. The system continuously profiles practice performance across all game types, detects knowledge gaps, gates progression on prerequisite mastery, and adjusts difficulty automatically — all driven by metadata attached to questions at generation time.
1. From Questions to Learning Signals
Every time a student answers a question, the system captures not just correctness but context:
| Signal | Source |
|---|---|
| Topic accuracy | Which topic the question belongs to |
| Difficulty performance | Whether the student succeeded at Easy, Moderate, Difficult, or Damn Hard |
| Bloom's level reached | Whether the student can Remember, Apply, or Analyze |
| Misconception exposure | Which common errors the question was designed to surface |
| Trend direction | Whether recent performance is improving, stable, or declining |
This transforms each answered question from a binary score into a multidimensional learning signal. The richer the question metadata, the more the system learns about the student from every interaction.
2. Question Metadata Architecture
2.1 Required Metadata Fields
Every generated question carries the following metadata:
| Field | Values | Purpose in Mastery System |
|---|---|---|
| topic | Syllabus topic name | Groups performance by knowledge area |
| difficulty | Super Easy, Easy, Moderate, Difficult, Damn Hard | 5-level scale for adaptive difficulty selection |
| bloomsLevel | Remember, Understand, Apply, Analyze, Evaluate, Create | Ensures cognitive diversity and tracks depth of understanding |
| misconceptions | 2-5 common errors | Identifies specific conceptual gaps, not just topic-level weakness |
| syllabusId | Reference to syllabus | Links to topic hierarchy and prerequisites |
| subject | Subject name | Enables cross-subject proficiency tracking |
| learningOutcomes | Syllabus-defined objectives | Maps questions to curriculum goals |
2.2 Topic Hierarchy Integration
Questions inherit position in a three-tier topic hierarchy:
| Tier | Definition | Example (Physics) |
|---|---|---|
| Tier 1 — Foundational | No prerequisites | Kinematics, Waves |
| Tier 2 — Intermediate | All prerequisites are Tier 1 | Dynamics (requires Kinematics) |
| Tier 3 — Advanced | Has Tier 2+ prerequisites | Quantum Physics (requires Waves + Energy) |
The hierarchy is generated during syllabus extraction using topological sort (Kahn's algorithm) and stored alongside the syllabus. Each question's topic maps to a tier, enabling the mastery system to understand prerequisite relationships.
3. Continuous Practice Profiling
3.1 Universal Data Collection
The practice profiler analyzes every game completion — not just AI Coach quizzes, but also Quick Play, solo games, multiplayer games, and goal-linked quizzes. This comprehensive collection builds profiles faster and reflects the student's true ability across all practice contexts.
3.2 Per-Topic Statistics
For each topic within a subject, the profiler maintains:
| Metric | Calculation | Use |
|---|---|---|
| Accuracy | correct / attempted |
Primary mastery indicator |
| Attempts | Running total | Confidence in the accuracy signal |
| Per-difficulty breakdown | Accuracy at each of 5 difficulty levels | Drives difficulty recommendation |
| Trend | Recent batch vs prior batch accuracy | Detects improvement or regression |
| Gap flag | Accuracy < 50% after 5+ attempts | Triggers notifications and focus |
| Weak flag | Accuracy 50-70% after 5+ attempts | Triggers reminders |
| Recommended difficulty | Highest difficulty with >= 70% accuracy, then one step up | Adaptive challenge |
3.3 Trend Detection
Trends compare recent performance against prior performance:
| Condition | Classification |
|---|---|
| Recent > prior + 10% | Improving |
| Recent < prior - 10% | Declining |
| Within 10% | Stable |
Trends drive notifications ("Your Algebra accuracy is declining — review fundamentals") and influence quiz generation weighting.
4. Adaptive Difficulty
4.1 Level-Based Difficulty Mapping
The system maps student proficiency levels to question difficulty selections:
| Student Level | Questions Served |
|---|---|
| Not Started / Beginner | Super Easy, Easy |
| Intermediate | Easy, Moderate, Difficult |
| Advanced | Moderate, Difficult, Damn Hard |
| Mastered | Difficult, Damn Hard |
As students demonstrate mastery, they automatically receive harder questions — and if they struggle, the system steps back to reinforce foundations.
4.2 Auto-Level Adjustment
After 5+ quizzes, the system adjusts levels automatically using a rolling weighted average:
newAccuracy = 0.3 * quizAccuracy + 0.7 * currentAccuracy
- Accuracy >= 80%: promote one level (up to Mastered)
- Accuracy < 40%: demote one level (down to Beginner minimum)
This weighted approach prevents a single bad quiz from causing demotion while still responding to sustained performance changes.
4.3 User Override
Students can set a preferred difficulty (easy, moderate, hard) that bypasses the automatic mapping. This respects learner autonomy while defaulting to data-driven selection.
5. Gap Detection & Prerequisite Awareness
5.1 Automated Gap Detection
The system classifies topics after every practice session:
| Classification | Threshold | Minimum Attempts | Action |
|---|---|---|---|
| Gap (critical) | Accuracy < 50% | 5+ | Immediate notification, 3x quiz weight |
| Weak | Accuracy 50-70% | 5+ | Reminder if not practiced in 7 days, 2x quiz weight |
5.2 Prerequisite-Aware Quiz Generation
When generating quizzes, the system reads the topic hierarchy and applies a 0.05x weight multiplier to topics whose prerequisites are not mastered (< 80% accuracy or < 10 attempts). This heavily de-prioritizes advanced topics until foundations are solid.
For example, if a student hasn't mastered Kinematics (Tier 1), questions on Dynamics (Tier 2, requires Kinematics) are virtually excluded from quizzes. The student focuses on foundations first.
5.3 Prerequisite-Enriched Notifications
Gap notifications include prerequisite context when topic hierarchy exists:
"You're struggling with Kinematics (42% accuracy). This is a prerequisite for 3 other topics — mastering it will unlock Dynamics, Circular Motion, and Momentum."
This helps students understand why a gap matters, not just that it exists.
6. Tiered Assessment System
6.1 Staged Assessments
Rather than one large assessment, the system uses tier-based staging aligned with the topic hierarchy:
| Tier | Eligibility | Questions |
|---|---|---|
| Tier 1 (Foundational) | 15+ questions answered in subject | Up to 10 questions from Tier 1 topics |
| Tier 2 (Intermediate) | All Tier 1 topics mastered | Up to 10 questions from Tier 2 topics |
| Tier 3 (Advanced) | All Tier 2 topics mastered | Up to 10 questions from Tier 3 topics |
6.2 Mastery Threshold
A topic is considered mastered when all three conditions are met:
- Accuracy >= 80% on that topic
- >= 10 questions attempted
- Hard-question gate: >= 2 attempts at Difficult or Damn Hard difficulty with >= 60% accuracy
The hard-question gate prevents students from achieving "Mastered" status using only Easy questions. If a topic's question bank lacks Difficult/Damn Hard questions, it cannot reach Mastered — this incentivizes content quality.
6.3 Weighted Proficiency Scoring
Assessment results use difficulty-weighted scoring:
| Difficulty | Weight |
|---|---|
| Super Easy | 0.5x |
| Easy | 1x |
| Moderate | 2x |
| Difficult | 3x |
| Damn Hard | 4x |
Getting a Difficult question right contributes 6x more to proficiency than getting a Super Easy question right. This reflects the true depth of understanding.
6.4 Contextual Prompting
Assessments find the student rather than requiring the student to find them. Prompts appear on the game results screen after any practice game when eligibility conditions are met. Students can dismiss prompts (7-day cooldown, max 3 dismissals before the system stops asking).
7. The Metadata-Mastery Feedback Loop
Rich question metadata creates a virtuous cycle:
- Questions carry metadata (topic, difficulty, Bloom's level, misconceptions)
- Practice generates signals (per-topic accuracy, per-difficulty performance, trends)
- Signals build profiles (gap detection, difficulty recommendation, mastery levels)
- Profiles inform selection (quiz generation weights topics and difficulties based on profile)
- Selection targets weaknesses (gaps get 3x weight, weak areas get 2x)
- Targeted practice generates better signals (more data on weak areas)
Without structured metadata, this loop cannot function. A question labeled only "Math — Medium" provides one learning signal. A question labeled "Quadratic Equations — Difficult — Apply — misconceptions: sign errors in factoring, confusing roots with coefficients" provides a dozen.
8. Notification System
The system uses practice profile data to send actionable notifications with intelligent throttling:
| Notification | Trigger | Cooldown | Daily Limit |
|---|---|---|---|
| Gap detected | New topic drops below 50% | 3 days per topic | 3 total |
| Weak area reminder | Weak topic not practiced 7+ days | 7 days per topic | 3 total |
| Level-up suggestion | Accuracy >= 80%, 10+ questions, no gaps | 7 days per subject | 3 total |
| Assessment ready | 15+ questions, not yet assessed | 7 days per subject | 3 total |
| Weekly digest | Sunday summary | Weekly | Separate |
| Coach gap alert | Learner gap detected | 3 days per learner per topic | Separate |
Every notification links directly to an action — practice a topic, take an assessment, or review progress. The daily cap of 3 prevents notification fatigue.
9. Design Principles
9.1 Analyze All Practice Data
The profiler runs on every game completion regardless of source. A student who practices through multiplayer games builds the same profile as one using AI Coach quizzes. No practice goes untracked.
9.2 Assessment Should Find the Student
Rather than requiring students to navigate to an assessment screen, prompts appear contextually during active engagement — specifically on the game results screen after practice.
9.3 Respect Learner Autonomy
Students can override difficulty levels, dismiss assessment prompts, and set their own pace. The system adapts and suggests; it does not mandate.
9.4 Foundations First
The prerequisite-aware quiz generation and tiered assessments enforce a natural learning progression without rigidly locking content. Students are guided toward foundations, but not blocked from exploring advanced topics if they choose.
9.5 Transaction Safety
Profile updates run inside Firestore transactions. Notifications are sent only after successful commits. This prevents inconsistent state from partial updates.
10. Conclusion
Mastery-based learning at scale requires two ingredients: rich metadata on every question, and a system that continuously converts practice data into adaptive decisions.
The question metadata architecture described here — topic hierarchy position, 5-level difficulty, Bloom's taxonomy, and misconception tags — transforms every answered question into a multidimensional learning signal. The practice profiler, gap detection, adaptive difficulty, and tiered assessment systems consume these signals to create personalized learning paths that prioritize foundations, target weaknesses, and gate progression on demonstrated mastery.
The result is a system where the act of practicing any question, in any context, contributes to an increasingly accurate picture of what each student knows and what they need to learn next.