What Does a RAG System Actually Cost to Build in 2026?
A transparent breakdown of what you'll spend on a production-ready RAG system — infrastructure, development, and the hidden costs most vendors won't tell you about.
Everyone selling AI services will tell you "it depends." They're not wrong — but that answer is useless when you're trying to justify a budget to a CFO or decide between building in-house and hiring a consultancy.
This article gives you real numbers. We build RAG systems. Here's what they actually cost.
What is a RAG system, quickly?
RAG (Retrieval-Augmented Generation) connects an LLM like Claude or GPT-4 to your company's internal data. Instead of the model hallucinating answers, it searches your documents, contracts, support tickets, or product catalogue first — then generates a grounded response.
The result: a chatbot or internal tool that can answer questions about your business, accurately, without manual updating.
The three cost buckets
Every RAG project has three distinct cost buckets:
- Development cost — one-time, what you pay to build it
- Infrastructure cost — ongoing, what you pay to run it
- Hidden cost — the stuff most quotes leave out
Let's break each down.
1. Development cost
Scope is everything
A RAG system is not one thing. The range is enormous:
| Project scope | What it includes | Typical cost |
|---|---|---|
| Proof of concept | Single data source, basic UI, no auth | €15k–€30k |
| Internal tool (SMB) | 2–5 data sources, SSO, basic analytics | €40k–€80k |
| Production system (Enterprise) | Multi-source, access controls, audit logs, integrations | €100k–€200k |
| Platform-level RAG | Multi-tenant, custom retrieval, fine-tuned embeddings | €250k+ |
Development cost by project scope
The biggest variable is data complexity. Ingesting clean PDFs is fast. Ingesting a 10-year-old ERP with inconsistent schemas, scanned invoices, and three languages takes 3× longer.
What drives development cost up
- Multiple data sources — each connector (SharePoint, Confluence, Salesforce, email, PDFs) adds 3–5 days of work
- Access control — if different users should see different documents, the retrieval layer needs permissions logic. This is often underestimated.
- Custom chunking — the default "split every 512 tokens" approach breaks badly on tables, legal clauses, and structured data. Good chunking strategies add 1–2 weeks.
- Evaluation pipeline — how do you know the system is giving correct answers? Building automated evals adds cost but prevents production disasters.
- Enterprise requirements — SSO, audit logging, GDPR compliance, on-premise hosting add €20k–€50k to any project.
2. Infrastructure cost (monthly)
This is where most vendors give you a suspiciously low number, because they quote development but not running costs.
LLM API costs
The biggest variable. For a mid-sized internal knowledge base (say, 500 queries/day):
| Model | Cost per 1M tokens | Est. monthly (500 queries/day) |
|---|---|---|
| GPT-4o | ~$5 input / $15 output | ~$300–$800/mo |
| Claude Sonnet | ~$3 input / $15 output | ~$200–$600/mo |
| Claude Haiku | ~$0.25 input / $1.25 output | ~$30–$100/mo |
| Llama 3 (self-hosted) | Compute only | €200–€600/mo infra |
For most SMB internal tools: €200–€800/month in LLM costs — and that's before you factor in context windows that grow as you add more retrieved documents per query.
Scale that up 10× query volume and you're at €2,000–€8,000/month. That's the moment companies start looking at self-hosted models.
Vector database
You need somewhere to store your document embeddings:
| Option | Cost | Notes |
|---|---|---|
| Pinecone (managed) | $100–$400/mo | Easiest to start with |
| Weaviate Cloud | $25–$200/mo | Good balance of features |
| pgvector (self-hosted) | ~€80–€200/mo infra | Lowest cost, more ops burden |
| Qdrant Cloud | $50–$200/mo | Good performance/cost ratio |
For most projects: €100–€400/month for vector storage once you're past the free tier and running real workloads.
Embedding costs
Every time you ingest new documents, you generate embeddings. For ongoing ingestion (say, 1,000 new pages/week):
- OpenAI
text-embedding-3-small: ~$0.02 per 1M tokens — negligible at this scale - Cohere Embed: similar range
- Self-hosted (e.g.,
bge-m3): compute cost only
Embedding costs are rarely a budget line item until you hit millions of documents.
Hosting and compute
The application layer (API, frontend, background jobs):
- Vercel + Railway: €150–€400/mo
- AWS/GCP managed services: €300–€800/mo depending on traffic
- Self-hosted Kubernetes: €500–€1,500/mo but you own it
Realistic total monthly infrastructure for a production SMB RAG system: €500–€1,500/month.
3. Hidden costs
Most project quotes cover development only. Always ask for a 12-month total cost of ownership — development plus infrastructure plus maintenance. The gap between the headline number and the real first-year cost is often 2–3×.
These are the ones that blow budgets after sign-off.
Data cleanup
If your documents are messy — inconsistent naming, duplicates, outdated content — someone has to clean them before they go into the system. This is often 2–4 weeks of analyst time that nobody quotes because it's "not really AI work."
Budget: €10k–€40k if you outsource this, or internal team time.
Change requests mid-project
"Actually, can it also pull from our CRM?" is the most expensive sentence in AI consulting. Scope changes after the retrieval architecture is set can cascade. Build a buffer of 20–30% for this.
Maintenance and updates
Models improve. Your data changes. The system needs occasional reindexing, prompt tuning, and dependency updates.
Budget: €500–€2,000/month for light maintenance, or structure a managed support retainer.
User adoption
The least technical cost is often the biggest: training staff to use the system, handling the skeptics, adjusting the UX based on real usage. Budget 10–20% of development cost for this phase.
What a realistic project looks like
Here's a real project profile we see often: a professional services firm (law, accounting, consulting) wants to let their staff query internal documents — contracts, precedents, client files — without digging through shared drives.
Scope:
- 3 data sources (SharePoint, email attachments, PDF archive)
- 50–200 users
- Permission-based retrieval (each user sees only their client files)
- Simple chat UI + Slack integration
Development cost: €70k–€110k
Timeline: 14–20 weeks
Monthly running cost: €800–€2,000/mo
Break-even vs. analyst time saved: 8–14 months
Build vs. buy vs. hire
| Approach | Upfront | Monthly | Control | Time to value |
|---|---|---|---|---|
| SaaS tools (Notion AI, Guru, etc.) | €0 | €200–€2,000 | Low | Days |
| Build in-house | €0 | Dev salaries | Full | 6–18 months |
| Hire a consultancy | €40k–€200k | €500–€1,500 | High | 10–20 weeks |
SaaS tools are fine for generic Q&A. The moment you need custom data sources, access controls, or integration into your existing tools — you need custom development.
How to scope your project before calling anyone
Before you talk to a vendor, answer these questions — they'll save you from wildly varying quotes:
- How many data sources? List them. Note if any require special connectors.
- How many users? And do different users need different data access?
- What does "success" look like? Fewer support tickets? Faster onboarding? Measure it.
- Where does the output go? Chat interface, Slack, internal portal, API?
- What's your compliance posture? GDPR, ISO 27001, on-premise requirements?
Bring these answers to the first call and any serious vendor will give you a much tighter estimate.
Summary
| Cost item | Range |
|---|---|
| Development (SMB production system) | €40k–€120k |
| Monthly infrastructure | €500–€1,500/mo |
| Data cleanup (if needed) | €10k–€40k |
| Ongoing maintenance | €1,000–€3,000/mo |
First-year cost breakdown (SMB production system)
A realistic first-year total for a well-scoped internal knowledge system: €100k–€200k all-in, including development, infrastructure, and support.
That sounds like a lot until you compare it to the analyst hours saved, the support tickets avoided, or the onboarding time eliminated. For most companies that have passed the exploration phase, the ROI is measured in months — not years.
If you're ready to scope a project, book a discovery call — we'll walk through your data sources and give you a written estimate within a week.
Ready to automate your workflows?