Voice Interface Hologram Command Center Delegation

Momo

A Voice-First Holographic AI Command Center

A command center, not a chatbot. Voice in, action out.

Live Demos ↗ GitHub ↗ ← Back to Omo

Voice Latency

<300ms

Backend

Hermes Agent

Surfaces

Hologram · Screen · Minecraft

Delegation

Multi-agent

Momo is a voice-first holographic AI command center — a new interface for delegating work to AI agents. You speak. It routes. Specialist agents execute. Results appear across holographic displays, browser-frame surfaces, and Minecraft integrations. No typing, no clicking, no context switching.

Built on Hermes Agent for operational task routing and Claude for complex reasoning, Momo connects to real business tools — Gmail, Meta Ads, Google Workspace — for live monitoring and control. The cinema-scale holographic display creates a theatre-like presence: dark room, immersive audio, agents that feel like they're in the room with you.

Features

Voice-First

Always-On

Streaming speech-to-speech with voice activity detection. Always-on microphone — just start talking. Sub-300ms latency from speech to response. Turn-taking, interruption handling, and prosody-aware output.

Delegation Engine

Multi-Agent

Routes tasks to specialist agents based on domain. Hermes Agent handles operational ops. Claude handles complex reasoning. Village pattern for concurrent multi-agent spaces. One voice, many agents.

Live Surfaces

Hologram + Screen + Minecraft

Holographic display for theatre-like presence. Browser-frame pipeline for web-based surfaces. Minecraft integration for embodied agent interaction. Agents render visual output alongside voice.

Business Tools

Live Monitoring

Connects to Gmail, Meta Ads, and Google Workspace for real-time monitoring and control. Read emails, check campaign performance, manage calendars — all by voice, with live visual feedback.

Architecture

Voice input → Hermes Agent routing → Specialist agents → Spatial output. MOSS-TTS-Nano for voice cloning. Cinema-scale display with dark room immersion.

Voice Input

Streaming ASR + VAD

Always-on microphone with voice activity detection. Streaming automatic speech recognition. Sub-300ms pipeline from spoken word to tokenized input for the agent router.

Streaming ASR VAD <300ms

Agent Routing

Hermes Agent + Claude

Hermes Agent routes operational tasks to specialist agents. Claude SDK handles complex reasoning and multi-step workflows. Village pattern enables concurrent multi-agent spaces.

Hermes Agent Claude SDK Tool Use

Output Layer

Spatial TTS + Display

MOSS-TTS-Nano for zero-shot voice cloning. 48kHz stereo spatial audio. Browser-frame pipeline for holographic surfaces. Cinema-scale display with dark room immersion.

MOSS-TTS-Nano Voice Cloning 48kHz Stereo

<300ms latency Hermes Agent Multi-surface Open Source

Demos

Papers & Reports

Technical reports and papers. All work is open access with accompanying code.

Momo: A Voice-First Holographic AI Command Center

Technical Report · 2025

PDF ↗

Towards Real-Time Spatial Agent Communication

Preprint · 2025

PDF ↗