What Does a RAG System Actually Cost to Build in 2026?

A transparent breakdown of what you'll spend on a production-ready RAG system — infrastructure, development, and the hidden costs most vendors won't tell you about.

Everyone selling AI services will tell you "it depends." They're not wrong — but that answer is useless when you're trying to justify a budget to a CFO or decide between building in-house and hiring a consultancy.

This article gives you real numbers. We build RAG systems. Here's what they actually cost.

What is a RAG system, quickly?

RAG (Retrieval-Augmented Generation) connects an LLM like Claude or GPT-4 to your company's internal data. Instead of the model hallucinating answers, it searches your documents, contracts, support tickets, or product catalogue first — then generates a grounded response.

The result: a chatbot or internal tool that can answer questions about your business, accurately, without manual updating.

The three cost buckets

Every RAG project has three distinct cost buckets:

Development cost — one-time, what you pay to build it
Infrastructure cost — ongoing, what you pay to run it
Hidden cost — the stuff most quotes leave out

Let's break each down.

1. Development cost

Scope is everything

A RAG system is not one thing. The range is enormous:

Project scope	What it includes	Typical cost
Proof of concept	Single data source, basic UI, no auth	€15k–€30k
Internal tool (SMB)	2–5 data sources, SSO, basic analytics	€40k–€80k
Production system (Enterprise)	Multi-source, access controls, audit logs, integrations	€100k–€200k
Platform-level RAG	Multi-tenant, custom retrieval, fine-tuned embeddings	€250k+

Development cost by project scope

The biggest variable is data complexity. Ingesting clean PDFs is fast. Ingesting a 10-year-old ERP with inconsistent schemas, scanned invoices, and three languages takes 3× longer.

What drives development cost up

Multiple data sources — each connector (SharePoint, Confluence, Salesforce, email, PDFs) adds 3–5 days of work
Access control — if different users should see different documents, the retrieval layer needs permissions logic. This is often underestimated.
Custom chunking — the default "split every 512 tokens" approach breaks badly on tables, legal clauses, and structured data. Good chunking strategies add 1–2 weeks.
Evaluation pipeline — how do you know the system is giving correct answers? Building automated evals adds cost but prevents production disasters.
Enterprise requirements — SSO, audit logging, GDPR compliance, on-premise hosting add €20k–€50k to any project.

2. Infrastructure cost (monthly)

This is where most vendors give you a suspiciously low number, because they quote development but not running costs.

LLM API costs

The biggest variable. For a mid-sized internal knowledge base (say, 500 queries/day):

Model	Cost per 1M tokens	Est. monthly (500 queries/day)
GPT-4o	~$5 input / $15 output	~$300–$800/mo
Claude Sonnet	~$3 input / $15 output	~$200–$600/mo
Claude Haiku	~$0.25 input / $1.25 output	~$30–$100/mo
Llama 3 (self-hosted)	Compute only	€200–€600/mo infra

For most SMB internal tools: €200–€800/month in LLM costs — and that's before you factor in context windows that grow as you add more retrieved documents per query.

Scale that up 10× query volume and you're at €2,000–€8,000/month. That's the moment companies start looking at self-hosted models.

Vector database

You need somewhere to store your document embeddings:

Option	Cost	Notes
Pinecone (managed)	$100–$400/mo	Easiest to start with
Weaviate Cloud	$25–$200/mo	Good balance of features
pgvector (self-hosted)	~€80–€200/mo infra	Lowest cost, more ops burden
Qdrant Cloud	$50–$200/mo	Good performance/cost ratio

For most projects: €100–€400/month for vector storage once you're past the free tier and running real workloads.

Embedding costs

Every time you ingest new documents, you generate embeddings. For ongoing ingestion (say, 1,000 new pages/week):

OpenAI text-embedding-3-small: ~$0.02 per 1M tokens — negligible at this scale
Cohere Embed: similar range
Self-hosted (e.g., bge-m3): compute cost only

Embedding costs are rarely a budget line item until you hit millions of documents.

Hosting and compute

The application layer (API, frontend, background jobs):

Vercel + Railway: €150–€400/mo
AWS/GCP managed services: €300–€800/mo depending on traffic
Self-hosted Kubernetes: €500–€1,500/mo but you own it

Realistic total monthly infrastructure for a production SMB RAG system: €500–€1,500/month.

3. Hidden costs

⚠️

Most project quotes cover development only. Always ask for a 12-month total cost of ownership — development plus infrastructure plus maintenance. The gap between the headline number and the real first-year cost is often 2–3×.

These are the ones that blow budgets after sign-off.

Data cleanup

If your documents are messy — inconsistent naming, duplicates, outdated content — someone has to clean them before they go into the system. This is often 2–4 weeks of analyst time that nobody quotes because it's "not really AI work."

Budget: €10k–€40k if you outsource this, or internal team time.

Change requests mid-project

"Actually, can it also pull from our CRM?" is the most expensive sentence in AI consulting. Scope changes after the retrieval architecture is set can cascade. Build a buffer of 20–30% for this.

Maintenance and updates

Models improve. Your data changes. The system needs occasional reindexing, prompt tuning, and dependency updates.

Budget: €500–€2,000/month for light maintenance, or structure a managed support retainer.

User adoption

The least technical cost is often the biggest: training staff to use the system, handling the skeptics, adjusting the UX based on real usage. Budget 10–20% of development cost for this phase.

What a realistic project looks like

Here's a real project profile we see often: a professional services firm (law, accounting, consulting) wants to let their staff query internal documents — contracts, precedents, client files — without digging through shared drives.

Scope:

3 data sources (SharePoint, email attachments, PDF archive)
50–200 users
Permission-based retrieval (each user sees only their client files)
Simple chat UI + Slack integration

Development cost: €70k–€110k
Timeline: 14–20 weeks
Monthly running cost: €800–€2,000/mo
Break-even vs. analyst time saved: 8–14 months

Build vs. buy vs. hire

Approach	Upfront	Monthly	Control	Time to value
SaaS tools (Notion AI, Guru, etc.)	€0	€200–€2,000	Low	Days
Build in-house	€0	Dev salaries	Full	6–18 months
Hire a consultancy	€40k–€200k	€500–€1,500	High	10–20 weeks

SaaS tools are fine for generic Q&A. The moment you need custom data sources, access controls, or integration into your existing tools — you need custom development.

How to scope your project before calling anyone

Before you talk to a vendor, answer these questions — they'll save you from wildly varying quotes:

How many data sources? List them. Note if any require special connectors.
How many users? And do different users need different data access?
What does "success" look like? Fewer support tickets? Faster onboarding? Measure it.
Where does the output go? Chat interface, Slack, internal portal, API?
What's your compliance posture? GDPR, ISO 27001, on-premise requirements?

Bring these answers to the first call and any serious vendor will give you a much tighter estimate.

Summary

Cost item	Range
Development (SMB production system)	€40k–€120k
Monthly infrastructure	€500–€1,500/mo
Data cleanup (if needed)	€10k–€40k
Ongoing maintenance	€1,000–€3,000/mo

First-year cost breakdown (SMB production system)

A realistic first-year total for a well-scoped internal knowledge system: €100k–€200k all-in, including development, infrastructure, and support.

That sounds like a lot until you compare it to the analyst hours saved, the support tickets avoided, or the onboarding time eliminated. For most companies that have passed the exploration phase, the ROI is measured in months — not years.

If you're ready to scope a project, book a discovery call — we'll walk through your data sources and give you a written estimate within a week.