Building an AI Voice Receptionist: A Practical RAG + Telephony Case Study

2026-03-24T11:32:34.955Z·2 min read

The solution, named Axle, is a custom-built voice agent that answers the phone, knows exact prices, hours, and policies, and can collect callback requests when it doesn't know something.

From Missed Calls to AI-Powered Answers

A developer has documented a detailed case study of building an AI voice receptionist for their brother's luxury mechanic shop. The problem was straightforward but costly: hundreds of missed calls per week, representing thousands of dollars in lost revenue — from $450 brake services to $2,000 engine repairs.

The solution, named Axle, is a custom-built voice agent that answers the phone, knows exact prices, hours, and policies, and can collect callback requests when it doesn't know something.

Architecture: Three-Layer Build

Layer 1: RAG Pipeline (The Brain)

The system uses Retrieval-Augmented Generation to prevent hallucinations:

Knowledge base: 21+ documents covering every service type, pricing, turnaround times, hours, payment methods, cancellation policies, warranty info, and loaner vehicles
Vector storage: MongoDB Atlas with Voyage AI embeddings (voyage-3-large, 1024 dimensions)
Retrieval: Atlas Vector Search returns top 3 most semantically similar documents
Generation: Anthropic Claude (claude-sonnet-4-6) with strict system prompt — answer only from the knowledge base, no hallucinations allowed

Layer 2: Telephony Integration

Built using Vapi as the voice platform:

Handles phone number provisioning, STT (Deepgram), and TTS (ElevenLabs)
FastAPI webhook server routes queries through the RAG pipeline
Ngrok for development tunneling; cloud hosting for production
Full conversation history passed with each request for coherent multi-turn dialogue
Every interaction logged to MongoDB for analytics

Layer 3: Voice Tuning

The most time-consuming part of the build:

Tested ~20 AI voices before selecting "Christopher" — calm, natural, unhurried
Rewrote system prompts specifically for voice delivery (short sentences, no markdown, natural price formatting)
Responses capped at 2–4 sentences maximum
Escalation flow: unknown questions → collect name and number → save to callbacks collection

Key Takeaways

RAG is essential for any domain where accuracy matters (prices, policies, technical specs)
Voice is different from text — great text responses sound terrible when spoken
Start with ngrok for rapid development iteration before moving to production
Log everything — call data becomes a business intelligence asset
Design for escalation — the AI should never guess when it doesn't know

ai rag voiceai telephony casestudy mongodb anthropic

Comments0