Momo
A Voice-First Holographic AI Command Center
A command center, not a chatbot. Voice in, action out.
Momo is a voice-first holographic AI command center — a new interface for delegating work to AI agents. You speak. It routes. Specialist agents execute. Results appear across holographic displays, browser-frame surfaces, and Minecraft integrations. No typing, no clicking, no context switching.
Built on Hermes Agent for operational task routing and Claude for complex reasoning, Momo connects to real business tools — Gmail, Meta Ads, Google Workspace — for live monitoring and control. The cinema-scale holographic display creates a theatre-like presence: dark room, immersive audio, agents that feel like they're in the room with you.
Features
Voice-First
Always-On
Streaming speech-to-speech with voice activity detection. Always-on microphone — just start talking. Sub-300ms latency from speech to response. Turn-taking, interruption handling, and prosody-aware output.
Delegation Engine
Multi-Agent
Routes tasks to specialist agents based on domain. Hermes Agent handles operational ops. Claude handles complex reasoning. Village pattern for concurrent multi-agent spaces. One voice, many agents.
Live Surfaces
Hologram + Screen + Minecraft
Holographic display for theatre-like presence. Browser-frame pipeline for web-based surfaces. Minecraft integration for embodied agent interaction. Agents render visual output alongside voice.
Business Tools
Live Monitoring
Connects to Gmail, Meta Ads, and Google Workspace for real-time monitoring and control. Read emails, check campaign performance, manage calendars — all by voice, with live visual feedback.
Architecture
Voice input → Hermes Agent routing → Specialist agents → Spatial output. MOSS-TTS-Nano for voice cloning. Cinema-scale display with dark room immersion.
Voice Input
Streaming ASR + VAD
Always-on microphone with voice activity detection. Streaming automatic speech recognition. Sub-300ms pipeline from spoken word to tokenized input for the agent router.
Agent Routing
Hermes Agent + Claude
Hermes Agent routes operational tasks to specialist agents. Claude SDK handles complex reasoning and multi-step workflows. Village pattern enables concurrent multi-agent spaces.
Output Layer
Spatial TTS + Display
MOSS-TTS-Nano for zero-shot voice cloning. 48kHz stereo spatial audio. Browser-frame pipeline for holographic surfaces. Cinema-scale display with dark room immersion.
Demos
Papers & Reports
Technical reports and papers. All work is open access with accompanying code.