Spoken English Sessions
Autonomous AI Classes in Minecraft
Students speak. The captain listens, scores, and orchestrates. AI peers model the target language. Real English, real Minecraft.
Spoken English Sessions are fully autonomous AI-directed classes that run inside Minecraft. An AI captain directs the entire session — briefing, task beats, finale, and debrief — while peer AI bots model the target language and create resource gaps that require student↔student English communication. Every student has an always-on microphone. The captain listens, scores, and adapts in real time.
The system is a dyad architecture: a Game Agent handles Minecraft actions while a Teaching Agent manages pedagogy, scoring, and session flow. Text-only LLM keeps reasoning fast and cheap. Audio is handled by sidecars — faster-whisper for streaming ASR, MOSS-TTS-Nano for multi-voice output with distinct voices per speaker.
Features
Autonomous Captain
AI Director
The AI captain runs the full class autonomously — briefing students, assigning task beats, orchestrating the finale, and leading the debrief. No human teacher required during the session.
Peer AI Bots
Language Models
AI students model the target language at an appropriate level. They create resource gaps — holding items other students need — that require student↔student English communication to resolve.
Real Voice
Always-On
Always-on microphone with voice activity detection. Streaming ASR via faster-whisper. TTS with distinct voices per speaker via MOSS-TTS-Nano. Students speak naturally — the system listens continuously.
Live Scoring
Per-Student Exponent
Per-student exponent scoring updated in real time. Peer-exchange gates require English communication between students. The runtime is the single source of truth — no post-hoc grading.
Architecture
Session Manager + Captain Dyad + Voice Bridge. Text-only LLM for reasoning. Audio handled by sidecars. MOSS-TTS-Nano for multi-voice output.
Session Manager
Runtime
Orchestrates the entire session lifecycle — player connections, voice bridges, scoring pipeline, and session state. The single source of truth for all session data: scores, exchanges, and runtime events.
Captain Dyad
Game + Teaching Agent
Two specialized LLM agents: the Game Agent controls Minecraft actions (movement, building, item distribution), the Teaching Agent manages pedagogy, scoring decisions, and session pacing. Text-only, fast, cheap.
Voice Bridge
ASR + TTS Sidecars
Streaming ASR via faster-whisper for student speech. MOSS-TTS-Nano for multi-voice output — distinct voice per speaker (captain, peer bots). Audio runs in sidecar processes, separate from the LLM reasoning.
Demos
Papers & Reports
Technical reports and papers. All work is open access with accompanying code.