Send a WhatsApp voice note. Claude hears it, thinks, and speaks back in under 300ms. No SaaS middleman. No $0.31/min bill. Your server, your audio, your AI.
| 01 | 🎙️ | MIC | Raw audio input via WebSocket stream | client WebSocket | — | LIVE |
| 02 | 👂 | ASR | Speech-to-text transcription | Deepgram · Whisper · AssemblyAI | ~80ms | LIVE |
| 03 | ⚡ | ATTS | Adaptive turn-taking & barge-in detection | built-in | ~5ms | LIVE |
| 04 | 🧠 | LLM | Language model inference & streaming | OpenAI · Claude · Ollama · OpenRouter | ~120ms | LIVE |
| 05 | 🪣 | TAB | Temporal alignment buffer — smooth token rate | built-in | ~10ms | LIVE |
| 06 | 📡 | AQAL | Adaptive audio quality & codec selection | built-in | ~5ms | LIVE |
| 07 | 🔊 | TTS | Text-to-speech synthesis | OpenAI · ElevenLabs · Azure | ~60ms | LIVE |
| 08 | 🎧 | OUT | PCM-16 audio stream back to caller | client WebSocket | — | LIVE |
Clone, set two env vars, start. Your first voice session is one WebSocket connection away.
# 1. clone & install git clone https://github.com/yourname/omnivoice cd omnivoice && pip install -r requirements.txt # 2. guided setup — picks your providers python setup_env.py # choose your LLM (pick one): LLM_PROVIDER=anthropic → Claude (ANTHROPIC_API_KEY) LLM_PROVIDER=openai → GPT-4o (OPENAI_API_KEY) LLM_PROVIDER=ollama → Llama3 (no key needed) # ASR & TTS — both have free options: ASR_PROVIDER=whisper → local, free, no key TTS_PROVIDER=edge → free Microsoft neural voices # 3. start everything python start.py ✓ Tunnel active: https://xxx.trycloudflare.com ✓ Twilio webhook updated automatically ✓ OmniVoice ready — send a WhatsApp voice note ▌
// OmniVoice is the voice layer. You pick the brain. One env var to swap.
// Great until they're not. Here's what you give up.
| Feature | OmniVoice | Vapi | Retell AI | Bland AI |
|---|---|---|---|---|
| Pricing | Free (self-hosted) | $0.13–0.31/min | $0.07/min | Per API call |
| Self-hostable | YES | NO | NO | NO |
| Audio on your server | YES | NO | NO | NO |
| Provider-agnostic | FULL | PARTIAL | PARTIAL | NO |
| WhatsApp + Phone | YES | Phone only | Phone only | Phone only |
| Open source | YES (MIT) | NO | NO | NO |
2 billion people already send voice notes. Point OmniVoice at Claude or GPT-4o, connect your Twilio number, and they're talking to your AI — no app download, no new interface, no $0.31/min bill.