
We wrote about a company running six AI agents for $28 a month. Since that post went up, the most common question we have gotten is some version of: "How do I keep MY AI costs that low?"
Fair question. Because it is entirely possible to spend $28 a month on AI agents. It is also entirely possible to spend $2,800 a month doing the exact same work. The difference is not luck or scale. It is strategy.
This is the guide we wish existed when we started building agent systems. No vague advice. Real numbers, real strategies, and a framework you can apply to your own setup this week.
What Actually Costs Money
Before you can optimize costs, you need to understand where the money goes. Most people assume it is the infrastructure. It is not.
API calls are 80-90% of your bill. Every time an AI agent reads something, thinks about it, and writes a response, that is a billable event measured in tokens. Tokens are roughly chunks of text — a typical business email is about 200-400 tokens. A long document might be 10,000. Every token in and every token out costs money.
Infrastructure is often free or nearly free. Supabase gives you a 500MB database and 1GB file storage at no cost. Vercel hosts your app and runs serverless functions on their free tier. Cloudflare Workers handles 100,000 requests per day for free. GitHub Actions gives you 2,000 minutes of compute per month. The $28/month AI company used all of these.
Tools and subscriptions are cheap. Most of the software you need to orchestrate AI agents is either open source or has a generous free tier. The expensive part is the reasoning, not the plumbing.
So when we talk about cutting costs, we are almost entirely talking about reducing your API spend. That is where the leverage is. Everything else is a rounding error.
How Much Do API Calls Actually Cost?
Here is where it gets concrete. AI models are priced per million tokens, and the range is enormous — from pennies to tens of dollars for the same million tokens, depending on which model you pick.
Current pricing per million tokens (input/output):
| Model | Input | Output | Best For |
|---|---|---|---|
| DeepSeek V3 | $0.28 | $0.42 | Simple tasks, extraction, formatting |
| Gemini 2.0 Flash | $0.10 | $0.40 | Classification, routing, quick lookups |
| GPT-4o Mini | $0.15 | $0.60 | Drafts, summaries, data cleanup |
| Claude Haiku 4.5 | $1.00 | $5.00 | Structured analysis, moderate reasoning |
| GPT-4o | $2.50 | $10.00 | Business writing, complex extraction |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Code generation, detailed analysis |
| Claude Opus 4.5 | $5.00 | $25.00 | Strategic reasoning, nuanced writing |
Look at the gap. DeepSeek V3 is roughly 60 times cheaper than Claude Opus on output tokens. Gemini Flash is 62 times cheaper than Opus on input tokens. For simple tasks — formatting a customer record, classifying an email, extracting a date from a document — the cheap models do the job just fine.
For complex tasks — analyzing a contract, writing a strategy memo, debugging a multi-file codebase — you need the expensive models. But here is the critical insight: most tasks your agents handle are simple.
A developer who open-sourced his own routing tool reported that 80% of his daily AI requests were simple tasks like autocomplete, error explanations, and syntax fixes. Only 20% needed a powerful model. He was sending all of them to the most expensive model available — the equivalent of taking a Ferrari to buy groceries. His monthly API bill was over $4,600. After routing to appropriate models, it dropped dramatically.
Strategy 1: Route to the Right Model
This is the single biggest lever for cost savings, and most businesses are not using it.
The concept is simple. Instead of sending every request to one model, you put a routing layer in front of your agents that evaluates each request and sends it to the cheapest model capable of handling it.
What that looks like in practice:
- Customer asks "What are your hours?" --> Gemini Flash ($0.10/M tokens)
- Agent needs to classify 500 support tickets --> DeepSeek V3 ($0.28/M tokens)
- Agent is drafting a blog post from notes --> Claude Sonnet ($3.00/M tokens)
- Agent is analyzing a competitor's pricing strategy --> Claude Opus ($5.00/M tokens)
The routing decision itself takes less than a millisecond and costs nothing — it is pattern matching on the input, not an additional API call. Tools like ClawRouter use a 14-dimension scoring system to classify task complexity locally before routing. No extra API call. No added latency.
The math on this is compelling. If 80% of your requests are simple and 20% are complex, sending everything to Opus costs you roughly $25 per million output tokens on average. Routing drops that blended average to around $5-8 per million. That is a 70-80% reduction without changing anything about what your agents actually do.
You do not necessarily need a dedicated routing tool to get started. Even manually assigning different models to different agent roles gets you most of the benefit. Your customer response agent does not need the same model as your strategy analysis agent. Configure them accordingly.
Strategy 2: Reduce Token Usage
Every token costs money. Fewer tokens, lower bills. Here are four ways to cut token usage without reducing capability.
Write shorter, clearer prompts. A 500-word system prompt that rambles costs you tokens on every single request. A 150-word prompt that says exactly what the agent needs to know costs a third as much and usually produces better results. Precision beats verbosity.
Use prompt caching. If your agent sends the same system prompt or reference document with every request — your company policies, your product catalog, your style guide — you are paying full price for that context every time. Claude, OpenAI, and Gemini all offer prompt caching that stores frequently-used context and charges you a fraction of the cost on subsequent uses. Claude charges 10% of the base input price for cached reads. That is a 90% discount on your most repeated context.
Batch your requests. If you need to classify 100 customer emails, do not make 100 separate API calls. Send them in batches of 10-20 in a single call. You reduce overhead, and most providers offer a 50% discount on batch API calls specifically designed for this. Anthropic and OpenAI both offer batch processing at half price for non-urgent work.
Stop re-reading what has not changed. If your agent reads the same 50-page product manual every morning to answer customer questions, that is 15,000+ tokens burned daily on information that changes maybe once a month. Cache it. Summarize the relevant sections once and feed the summary to the agent instead. Or use retrieval-augmented generation (RAG) to only pull in the specific sections relevant to each question.
Strategy 3: Use Free Tiers Strategically
The infrastructure for AI agents is shockingly cheap if you know where to look. Here is what you can get without paying a cent:
| Service | Free Tier | What It Does for You |
|---|---|---|
| Supabase | 500MB database, 1GB file storage, 50K monthly active users | Store agent memory, task queues, customer data |
| Vercel | 100GB bandwidth, serverless functions, automatic deployments | Host your agent's control panel and API endpoints |
| Cloudflare Workers | 100K requests/day, 10ms CPU per request | Handle webhooks, routing, lightweight processing |
| GitHub Actions | 2,000 minutes/month, Linux runners | Scheduled jobs, CI/CD, automated agent triggers |
| Upstash Redis | 10K commands/day, 256MB storage | Caching layer, rate limiting, session management |
These are not toy tiers with artificial restrictions that force you to upgrade after a week. These are legitimate production-grade services that can support a real business workload. The $28/month AI company ran entirely on these free tiers plus a cheap server for the agent runtime.
The key is knowing these exist before you reach for your credit card. Most businesses we talk to are paying $20-50 a month for database hosting and $15-30 for serverless compute that they could get for free. That is $35-80 a month in savings before you even touch your API bill.
Strategy 4: Build vs. Buy (The Decision That Saves — or Wastes — the Most Money)
There is a tool or SaaS for everything now. Automated social posting. Email sequences. Customer support bots. Lead scoring. Competitor monitoring. Each one costs $20-100 a month. String five or six together and you are spending $200-500 a month on tools.
Here is the uncomfortable truth: a surprising number of those tools can be replaced by an AI agent you build in an afternoon. A social media posting agent that reads your blog posts and generates platform-specific content. A competitor monitoring agent that checks ten websites every morning. A customer response agent that reads your FAQ and answers questions at 2 AM.
The build-it-yourself cost? The API calls to run it — maybe $5-15 a month.
But the opposite is also true. Do not spend a week building what you could buy for $10 a month. If Zapier connects your CRM to your email platform in five minutes, use Zapier. The time you spend building a custom integration is time you are not spending on revenue-generating work.
The framework:
- Build custom automations that are specific to your business and would require multiple tools to replicate
- Buy infrastructure, integrations between existing tools, and anything with a complex maintenance burden
- Skip anything you are building "just in case" or because it seems cool rather than because it solves a real problem
The most expensive AI agent is not the one with the highest API bill. It is the one you spent two weeks building that nobody uses.
Strategy 5: Start Manual, Automate What Works
This is the strategy that saves the most money long-term, and it has nothing to do with technology.
Do not automate a workflow you have not validated manually first.
If you want an agent to handle customer responses, start by answering those responses yourself for two weeks. Document the patterns. Identify which questions come up repeatedly. Figure out which responses actually satisfy customers and which ones generate follow-up questions.
Then, and only then, build the agent around those proven patterns.
The businesses that waste the most money on AI are the ones that automate theoretical workflows. They imagine what an agent should do, build it, deploy it, and then discover that the workflow itself was wrong. The agent runs 24/7 doing the wrong thing efficiently. Every token it consumes is waste.
Manual first. Measure the results. Automate the winners.
A customer response agent built on two weeks of real conversation data costs the same to run as one built on guesswork. But the one built on real data actually works, which means it actually saves you money, which means the ROI is real instead of theoretical.
What Does a Realistic Monthly Budget Look Like?
Here is what we see across the businesses we work with, from lean startups to established small companies:
| Setup | What It Includes | Monthly Cost |
|---|---|---|
| Starter | 1 agent, simple tasks (email triage, data extraction), smart model routing | $10-30 |
| Growth | 3-5 agents, mixed complexity (content, customer support, research), caching enabled | $50-150 |
| Scale | Full automation suite (6+ agents, inter-agent coordination, daily workflows) | $200-500 |
| For comparison | Part-time virtual assistant, 20 hrs/week | $1,000-2,000 |
The Starter tier is where every business should begin. One agent. One task. Prove the value. The $28/month AI company started as a single agent experiment before growing to six.
The Growth tier is where most small businesses settle. Three to five agents handling different aspects of operations — content, support, monitoring — with enough volume to justify the investment but not so much that costs become unpredictable.
The Scale tier is for businesses that have validated their workflows, measured their ROI, and are ready to go all-in on automation. At $200-500 a month, you are still paying less than a single part-time employee while getting capability across multiple functions.
The Honest Take
Three things about AI agent costs that we think matter more than any optimization trick.
Costs are dropping fast. Claude Opus 4.5 delivered flagship performance at 67% lower cost than its predecessor. DeepSeek entered the market at prices that seemed like typos — $0.28 per million input tokens for a capable model. Google keeps pushing Gemini Flash prices lower. What costs $100 a month today will cost $20-30 a month within a year, possibly less. If cost is the only thing stopping you from experimenting, the barrier is falling every quarter.
The biggest waste is not expensive models — it is building things nobody uses. We have seen businesses spend $500 setting up an elaborate multi-agent system that automates a workflow they do twice a month. The ROI is negative for years. Meanwhile, a $15/month agent handling daily customer questions saves 20 hours of work in the first month. Pick the workflow that matters. Automate that one. Ignore the rest until the first one is proven.
Start with one workflow. Prove the ROI. Then expand. This is not timid advice. It is how every successful AI deployment we have seen actually happened. The businesses running sophisticated multi-agent operations today all started with a single agent doing a single job. They measured the savings. They refined the workflow. Then they added another agent, and another, each one justified by the results of the last.
The goal is not to spend as little as possible on AI. The goal is to spend intentionally — getting real value for every dollar, and scaling up only when the numbers prove you should.
Your Cost Optimization Checklist
Here is what to do this week if you are running AI agents or planning to:
- Audit your current model usage. Which model is each agent using? Is it the cheapest one that can handle the task?
- Check your free tier utilization. Are you paying for database hosting, serverless compute, or CI/CD that you could get free?
- Enable prompt caching. If your agents send repeated context, caching alone can cut 30-50% of your input token costs.
- Set spending limits. Every major AI provider lets you set daily and monthly caps. Use them. An agent with a bug can burn through your budget overnight.
- Measure cost per completed task. Not total spend — cost per task. An agent that costs $5 a day and completes 50 tasks costs $0.10 per task. That is the number that tells you whether it is worth it.
Let Us Build It Right the First Time
Blue Octopus Technology builds AI agent systems with cost optimization baked in from day one. Model routing, caching layers, free-tier infrastructure, spending guardrails — we set all of it up so your agents deliver results without budget surprises.
You do not need to become an expert in token pricing and API tiers. You need agents that do useful work at a cost that makes sense.
If you are spending more than you think you should on AI, or if you want to start with a setup that is lean from the beginning, let's talk.
Blue Octopus Technology helps businesses deploy AI agents that work hard and cost less. See what we build.
Related Posts
Stay Connected
Follow us for practical insights on using technology to grow your business.

