GoIppo
Positioning · Anthropic Stack

Claude API Consulting & Implementation — Custom Agents on Anthropic's Stack.

Most AI agencies are OpenAI-shaped — their templates, their defaults, their case studies. We're Claude-native. Our default stack is Anthropic's Claude API because the combination of long context, clean tool use, prompt caching, and honest refusals is the right fit for mid-market production systems. This page is the read on what that means, what we build with it, and when we tell clients to use something else.

Why Claude specifically

Five concrete reasons we default to Claude for production agent builds. Each one shows up in the finished system — shorter code, lower cost, more reliable behavior at the edges.

Long context window

Current Claude flagship models handle up to 1M tokens in a single call. In practice that means an agent can reason over a full month of customer transcripts, a full code repository, or a full case file without retrieval-augmentation gymnastics. For a significant share of mid-market workloads, long context replaces the entire RAG stack you would otherwise build and maintain.

Tool use that actually works

Claude returns structured tool calls reliably — consistently formatted, consistently chosen, with fewer hallucinated tool names than any other model we have shipped against. For agentic systems where the model has to decide which of ten tools to call next, that reliability is the difference between a system that ships and one that keeps almost working.

Prompt caching for cost discipline

Anthropic's prompt caching cuts cost by roughly 90% on the repeated parts of a call. For an agent with long instructions, a dozen tool definitions, and a big system prompt, caching is the difference between a $500 monthly API bill and a $50 one. It also brings latency down materially for chatty agents. We design builds around the cache window.

Safety posture and honest refusals

Claude tends to be less sycophantic than alternatives, more willing to say 'I don't know' or 'that isn't what that data says,' and less prone to jailbreak-chasing. When an agent is customer-facing or sitting in front of real money, those properties matter more than any benchmark number.

Claude Code for the build layer

We also use Claude Code — Anthropic's CLI coding agent — in our own development process. That is the same stack the rest of the ecosystem is converging on, and it means the agent architectures we ship to clients were themselves built with agentic tooling. We live in the stack we sell.

What we build on the Claude API

The same three-tier build shape lives on what we build. Implemented on Claude, the work looks like this.

Custom agent systems

End-to-end builds on the Claude API — the same Tier 1 / Tier 2 / Tier 3 shape in our build overview, engineered directly against the Anthropic SDK. We do not wrap everything in LangChain, CrewAI, or any other framework that puts a black box between the model and the business logic. Simpler code, lower latency, easier debugging.

Claude-native integrations

Tool definitions tied to your CRM, ERP, email, or custom internal systems — structured so Claude picks the right tool on the first try and returns clean structured output. We write the SDK integration; we do not ship you a generic 'Claude connector' that you have to wire up.

Prompt caching architectures

Build layouts that keep the static prompt surface cached, isolate the dynamic slice, and route the right model (Haiku / Sonnet / Opus) to the right step. This is the boring-but-critical work that turns a $2,000 monthly API bill into a $200 one without sacrificing capability.

Migrations from other providers

OpenAI → Claude, Gemini → Claude, or multi-provider consolidation. Prompts get tuned, tool-use shape gets rewritten, response parsing gets updated. We scope the migration as a fixed-fee engagement with clear deliverables — same structure as any other build.

When Claude is not the right tool

Honest read: Claude is not the answer for every AI problem. Part of our job as consultants is knowing which tool fits the work — and saying so on the discovery call, not after you have paid for a build that should never have been Claude-shaped.

Image generation

Claude is not an image generator. For image output we use DALL-E, Midjourney via API, or a diffusion-based model depending on the use case. If your build needs both reasoning and image generation, we compose the two — Claude for the prompt reasoning, a generation model for the output.

Real-time voice

For live voice agents we currently recommend OpenAI Realtime API or a Deepgram + Cartesia pipeline. Claude's voice path is not there yet for production deployments. If your build is a voice agent, we will say so and recommend the right stack.

Pure embedding + vector search

For workflows that are really just 'embed documents, search by similarity,' a dedicated embedding model (OpenAI's, open-source alternatives, or a hosted vector DB with built-in embeddings) is often faster and cheaper than routing through a reasoning model. We pick the right tool for the shape of the work.

Deterministic, high-volume work

Do not use any AI model — use code. Tax math, inventory reorder logic, exact-string matching on structured data, invoice posting — these are rules problems, not reasoning problems. Our job as consultants is partly to tell you when an AI is not the answer, because it often isn't.

For a broader read on when an AI agent is the right answer vs. a rule-based script, see our agents vs. automation guide.

Retainer implications — how API cost gets absorbed

Every engagement ships with a monthly retainer. The retainer covers hosting, uptime monitoring, bug fixes, minor tweaks — and the API cost of running the system. No pass-through billing, no surprise invoices at the end of a high-volume month.

Absorbing the API cost only works because we engineer the builds for cost discipline from day one. Prompt caching is architected into the system, not added later. Model choice is per-step — Haiku for the classification work, Sonnet for the reasoning, Opus only where it is genuinely needed. Context windows are managed so we are not paying Opus prices for Haiku work. This is the boring backend of a good Claude build, and it is the reason we can quote a fixed retainer number and keep it.

For more on how pricing and retainers work, see the AI consulting cost guide.

We use Claude on our own systems

This site's daily AI brief — The Ippo Brief — is written and published autonomously by Ippo, our in-house agent. Ippo runs on Claude. Our scoping agent, which powers the scope page, runs on Claude. Our internal CEO, consulting, and architecture agents all run on Claude. Our build process itself is assisted by Claude Code.

That matters for one reason: we know the stack because we live in it every day. When you hire us, you are hiring a shop whose own internal systems are built on the exact tools we are going to use on your engagement. We are not recommending Claude because a sales deck said so — we are recommending it because it is what we ship with.

Frequently asked questions

Why specifically Claude and not GPT or Gemini?

Three concrete reasons. First, Claude's long context window (up to 1M tokens on current flagship models) lets us ship systems that reason over a full codebase, a full case file, or a full month of customer transcripts in one call — no retrieval hacks needed for many mid-market workloads. Second, Claude's tool-use and function-calling shape is cleaner for agentic workflows — the model returns structured tool calls reliably, which makes multi-step builds less fragile. Third, Anthropic's safety posture (less sycophancy, fewer jailbreak-chasing behaviors, more honest refusals) matters when the agent is sitting in front of real customers or real money. We also use other providers where they fit. But our default stack is Claude for good reasons, not brand loyalty.

What do you actually build on the Claude API?

Custom AI agents and agent systems — the same three-tier shape on our build overview, implemented on Anthropic's stack. Typical builds include document-extraction agents, quoting agents, sales operators, support ticket triagers, reporting agents, and internal operator dashboards. We write the agent code directly against the Anthropic SDK — no LangChain overhead, no framework lock-in, no black box between the model and the business logic.

What does prompt caching do and why does it matter for cost?

Prompt caching lets you pay less (roughly 10% the normal cost) for the repeated parts of a Claude API call. In a typical agent build, the instructions, tool definitions, and a big chunk of the context stay the same across calls — only the user input changes. Caching those static parts drops the per-call cost dramatically, especially for chatty agents or long-context workflows. For most Tier 2+ builds, caching turns what would be a $500/month API bill into a $50/month one. The cache has a short TTL, so we architect builds to keep the hot path inside the cache window.

When is Claude NOT the right tool, and what do you recommend instead?

A few cases. For image generation we use DALL-E, Midjourney via API, or a diffusion model — Claude isn't a generator. For real-time voice we use OpenAI Realtime API or Deepgram + Cartesia, since Claude's audio path isn't there yet. For anywhere the work is pure embedding + vector search, we might use OpenAI's embedding models or open-source alternatives depending on the scale. And for deterministic, high-volume, structured-in-structured-out work, we don't use an AI model at all — we write a script. Picking the right tool is the job. We are not a shop that reaches for a Claude call when a regex would do.

How does API cost get absorbed into the retainer?

Every build comes with a monthly retainer, and the API cost is folded into that number — no pass-through billing, no surprise invoice at the end of a busy month. That is only possible because we engineer the builds for cost discipline: prompt caching everywhere it makes sense, right-sized model choice per call (Haiku for the cheap classification steps, Sonnet for reasoning, Opus only where it's genuinely required), and tight context management. Clients do not get hit with 'AI usage overages' — that is our problem to manage inside the retainer.

Can you migrate an existing OpenAI project to Claude?

Yes. It is a common ask once clients hit context-window limits or realize they are paying for capabilities they are not using. The migration is usually straightforward — the prompts have to be tuned for Claude's instruction-following style, the tool-use shape is different, and the response-parsing logic on your side needs updating — but the underlying architecture transfers cleanly. We scope migration work the same way we scope a new build: fixed fee, defined deliverables, no surprises.

Want to see what a Claude-native build looks like for your company?

Describe the workflow in plain language. Our scoping agent will show you what a Claude-API build would actually look like, which tier it lands in, and what the realistic timeline is. No commitment, no price tag until we scope it properly.