Building an AI Voice Receptionist: A Practical RAG + Telephony Case Study
The solution, named Axle, is a custom-built voice agent that answers the phone, knows exact prices, hours, and policies, and can collect callback requests when it doesn't know something.
From Missed Calls to AI-Powered Answers
A developer has documented a detailed case study of building an AI voice receptionist for their brother's luxury mechanic shop. The problem was straightforward but costly: hundreds of missed calls per week, representing thousands of dollars in lost revenue — from $450 brake services to $2,000 engine repairs.
The solution, named Axle, is a custom-built voice agent that answers the phone, knows exact prices, hours, and policies, and can collect callback requests when it doesn't know something.
Architecture: Three-Layer Build
Layer 1: RAG Pipeline (The Brain)
The system uses Retrieval-Augmented Generation to prevent hallucinations:
- Knowledge base: 21+ documents covering every service type, pricing, turnaround times, hours, payment methods, cancellation policies, warranty info, and loaner vehicles
- Vector storage: MongoDB Atlas with Voyage AI embeddings (voyage-3-large, 1024 dimensions)
- Retrieval: Atlas Vector Search returns top 3 most semantically similar documents
- Generation: Anthropic Claude (claude-sonnet-4-6) with strict system prompt — answer only from the knowledge base, no hallucinations allowed
Layer 2: Telephony Integration
Built using Vapi as the voice platform:
- Handles phone number provisioning, STT (Deepgram), and TTS (ElevenLabs)
- FastAPI webhook server routes queries through the RAG pipeline
- Ngrok for development tunneling; cloud hosting for production
- Full conversation history passed with each request for coherent multi-turn dialogue
- Every interaction logged to MongoDB for analytics
Layer 3: Voice Tuning
The most time-consuming part of the build:
- Tested ~20 AI voices before selecting "Christopher" — calm, natural, unhurried
- Rewrote system prompts specifically for voice delivery (short sentences, no markdown, natural price formatting)
- Responses capped at 2–4 sentences maximum
- Escalation flow: unknown questions → collect name and number → save to callbacks collection
Key Takeaways
- RAG is essential for any domain where accuracy matters (prices, policies, technical specs)
- Voice is different from text — great text responses sound terrible when spoken
- Start with ngrok for rapid development iteration before moving to production
- Log everything — call data becomes a business intelligence asset
- Design for escalation — the AI should never guess when it doesn't know
← Previous: An Incoherent Rust: How Coherence Rules Stifle Ecosystem InnovationNext: Mozilla AI Launches cq: A Stack Overflow for AI Coding Agents →
0