SupportVerdict

The true cost of an AI voice minute

When a vendor advertises a voice agent at $0.05 per minute, they're quoting the platform fee alone. The real cost—platform plus speech-to-text, language model, text-to-speech, and telephony—typically runs $0.15 to $0.30 per minute for US calls.

Last updated June 2026

The voice agent market thrives on fragmented pricing. One vendor lists $0.05/min, another $0.10/min, a third offers "flat-fee unlimited," and none of them quote the same thing. To make a real buying decision, you need to decode what gets billed at each layer and spot the hidden multipliers that blow your budget.

The five-layer stack

Every inbound or outbound AI voice call incurs charges across five distinct infrastructure components. Not every vendor breaks them down, but they all pay for them—and pass the cost to you:

  • Platform fee: The agent orchestration layer (Vapi $0.05/min, Retell $0.07/min, Synthflow $0.09/min, Cartesia $0.06/min). This is what gets advertised.
  • Speech-to-text (STT): Converting caller audio to text. Range: $0.005–$0.055/min depending on provider (Deepgram, Whisper API, native). Quiet calls or accents may retry and re-bill.
  • Language model (LLM): The reasoning engine, billed on tokens. Per minute of conversation this typically runs ~$0.02–$0.16, with a smaller/cheaper model at the low end and a frontier model at the high end. Longer calls, multi-turn reasoning, or tool use (database lookups) push this up — it's the most variable layer in the stack.
  • Text-to-speech (TTS): Turning the agent's response into audio. Premium voices (ElevenLabs) run ~$0.03–$0.05/min; native or bulk (Google, Azure) ~$0.015–$0.03/min.
  • Telephony: PSTN inbound/outbound termination. US: ~$0.014–$0.02/min. International calls 5–10× higher (Mexico ~$0.10/min, India ~$0.12/min).

A real worked example

Let's build a typical 5-minute inbound customer service call on a standard stack:

  • Platform: Retell ($0.07/min) × 5 = $0.35
  • STT: Deepgram ($0.03/min, standard) × 5 = $0.15
  • LLM: GPT-4o mini (~$0.04/min) × 5 = $0.20
  • TTS: ElevenLabs ($0.03/min) × 5 = $0.15
  • Telephony: US carrier ($0.018/min) × 5 = $0.09

Total: $0.94 for a 5-minute call, or $0.188/min. That's nearly 4× the advertised "$0.07" platform fee. In a 1,000-minute month (a small support team at 20–30 calls/day), you're looking at $150–$200 before taxes or overages, not the "$70" the platform fee alone suggests.

Hidden multipliers that blow the budget

Even that $0.188/min estimate omits common surprises:

  • Ring time and dead air: Vapi, Retell, Synthflow, and others start the meter when the session begins (often at dial), not when the agent speaks. A 30-second ring + 10-second silence before the agent responds burns 40 seconds of your bill. Air AI explicitly bills ring time. Some vendors don't disclose this until you read the fine print or call sales.
  • Idle websocket metering: Deepgram (and a few others) charge for idle websocket connections. If the caller goes quiet for 10 seconds, you may still be billed. Always ask the vendor: "Do you bill idle or silence?"
  • International multiplier: A Mexico call that costs $0.02/min domestically can jump to $0.10–$0.20/min internationally, depending on country and carrier. STT and TTS also shift. If you support cross-border customers, ask for a mixed estimate.
  • Add-on creep: Advanced features (custom voices, priority inference, analytics, call recording, transfer to human) often layer on 10–30% fees on top of the base stack. Recording a call may add $0.01–$0.03/min.
  • Monthly minimums and seat fees: Some vendors bundle: a platform tier might require a $99/mo minimum (Kustomer voice, 8-seat minimum) even if you use 50 minutes. Bland avoids this; most others don't.

Two pricing philosophies

The market splits into two camps: unbundled passthrough and all-inclusive.

Unbundled (Vapi, Retell, Synthflow, Cartesia): You choose your own STT, LLM, and TTS providers. Transparency in theory, complexity in practice. You must configure each, tune costs (e.g., Groq LLM vs. OpenAI), and watch three invoices. Billing opacity is common; vendors don't always say whether idle time or ring time is metered.

All-inclusive (Bland): One price, all costs folded in. Bland advertises $0.11–$0.14/min as the true all-in fee—no hidden layers. The trade-off: less choice of STT/LLM/TTS, but less operational overhead and no surprise invoices. For SMBs, this simplicity often wins.

Done-for-you receptionists and the hidden stack

Some vendors (Goodcall, Smith.ai, Thoughtly, Ruby) sell "unlimited minutes" for a flat monthly fee ($79–$249/mo). Internally, they're still paying for all five layers plus labor for escalations. The flat fee smooths unpredictability for SMBs but masks real cost.

Example: Goodcall's $249/mo "unlimited" plan. If you take 1,500 minutes/month, that's $0.166/min all-in. If you take 300 minutes, it's $0.83/min. The vendor absorbs variance; you get predictability. For a 10–50 person support team, this often beats metered and avoids the overhead of multi-layer tuning.

How to calculate your real cost

Start with our voice agent cost calculator to plug in your own volumes and vendor assumptions. Then ask every vendor directly:

  • What is your all-in per-minute cost (platform + STT + LLM + TTS + telephony)?
  • Do you bill for ring time, ringing, or idle silence?
  • What are the STT, LLM, and TTS providers in a standard setup?
  • Are there usage minimums or annual commitments?
  • If I scale to international calls, what's the cost by country?

Few vendors answer this cleanly—a sign you should be wary. Transparent vendors (Bland, some Retell setups) break it out upfront. Opaque ones (Ada, Sierra, PolyAI, Air AI) publish no pricing at all and require a sales call.

For a side-by-side comparison of vendors and their transparency, see our best voice agents for SMBs and the full vendor breakdown. If you're torn between metered and flat-fee, our AI receptionist cost guide compares the two models end-to-end.

Budget the stack, not the headline

A $0.05/min headline price is a trap. Real voice calls cost $0.15–$0.30/min all-in, depending on your stack choices and call profile. The gap between advertised and actual cost is where vendors bury margin and SMBs lose budget. Insist on all-in numbers before you sign, or choose a vendor (like Bland) that quotes the real total upfront. If you move fast without doing that math, you'll wake up to a bill three times what you expected.

Frequently asked questions

Why does a "$0.05/min" voice agent cost three times more all-in?

The headline price covers only the platform fee. Real costs stack: speech-to-text (~$0.01–$0.06), LLM inference (~$0.02–$0.16), text-to-speech (~$0.015–$0.05), and US telephony (~$0.014–$0.02). Sum those five layers and you hit $0.15–$0.30/min. Most vendors list the platform fee alone; Bland is rare in advertising all-in pricing.

Does ring time or dead air count toward my bill?

Depends on the vendor. Vapi, Retell, and others bill session start, so ringing counts. Air AI explicitly bills ring time. Some vendors meter idle websocket time (Deepgram). Always ask: "Does ringing cost money?" before signing up. Bland includes it; most unbundled platforms don't clarify without a sales call.

Are international calls much more expensive?

Yes. Telephony cost multiplies 5–10× for most intl destinations. A US call at $0.02/min becomes $0.10–$0.20/min for Mexico or India. STT and TTS also shift by dialect. If you serve global customers, factor that into your per-minute math or ask the vendor for a mixed-destination estimate.

Is done-for-you (human + AI blend) cheaper or more expensive?

Flatter. Goodcall and Smith.ai hide the infrastructure cost behind $79–$295/mo unlimited minutes. For <300 min/mo, that's cheaper than unbundled. For >500 min/mo, unbundled (Bland ~$0.13/min) wins. Human escalations add opaque labor cost in blended offerings.

Keep reading