New research: automatically routing AI calls to cheaper models could cut your inference costs without hurting results
If you're running AI agents in production — chatbots that call APIs, document processors, internal tools — you're probably defaulting every call to a big, expensive frontier model. Switchcraft is a new model router built specifically for agentic tool-calling. Instead of sending every request to the most capable (and priciest) model, it evaluates each call and routes it to the cheapest model that can handle it reliably.
Existing routers were designed for chat completions, not tool use. Switchcraft is the first router optimized for the structured, function-calling patterns that agents actually use. The result: lower per-call costs without meaningful accuracy loss on the tasks that matter.
For a mid-market company running AI workflows at scale — say, processing hundreds of supplier invoices or routing customer service tickets — this kind of routing is a direct line item on your AI budget. It's the difference between paying premium rates on every call and paying premium only when you need to.
Atlas's take
This is one of those infrastructure papers that won't make headlines but matters a lot if you're actually spending money on AI in production. If your monthly inference bill is growing, ask your team whether you're using a router — and if not, why not.