If you've spent ten minutes reading about AI, you've hit three words that sound like they matter and never get explained properly: prompting, RAG, and fine-tuning. Vendors use them to sound clever. This guide tells you what each one actually is, when you'd reach for it, roughly what it costs, and — the part nobody selling you something wants to say — which one your business probably needs. Spoiler: it's usually not the expensive one.
The short version: Prompting is writing good instructions for an off-the-shelf model. RAG feeds the model your own documents so it can answer from your data instead of guessing. Fine-tuning retrains the model to behave a certain way. Start with prompting. Move to RAG when answers must come from your files. Fine-tune rarely, and only with a clear reason.
What is prompting, in plain terms?
Prompting is giving an AI model clear instructions and context so it produces what you want. No new software, no training, no build. You type (or your system sends) a well-structured request, and the model responds. That's it.
Good prompting is more than "write me an email". It means telling the model who it's acting as, what the task is, what good looks like, what to avoid, and giving it an example or two. The difference between a lazy prompt and a considered one is often the difference between output you bin and output you ship.
Here's the thing most people miss: a large share of business AI value is unlocked at this level and nowhere else. Drafting replies, summarising a long thread, turning messy notes into a clean brief, extracting the five facts you need from a document — a capable off-the-shelf model with a decent prompt handles all of it. Cost is pennies per task, or free on a consumer tier.
When prompting is enough:
- The knowledge the model needs is either general or short enough to paste into the request
- You can tolerate the occasional wobble and a human is checking the output
- The task is drafting, summarising, rewriting, classifying or reasoning over text you provide
Where prompting runs out of road: when the model needs to know things it was never trained on — your prices, your policies, last week's stock levels, the contents of your 400-page internal wiki. You can't paste all of that into every request, and if you try, you hit limits and costs climb. That's the moment RAG enters.
What is RAG (retrieval-augmented generation)?
RAG lets an AI answer using your own documents. Before the model replies, the system searches your content, finds the passages relevant to the question, and hands those passages to the model as context. The model then answers from what it just read. It's the difference between asking someone to recall a fact and asking them to read the file first, then answer.
The name is ugly but the idea is simple. "Retrieval" is the search step. "Augmented generation" means the model's answer is boosted by the retrieved material. Strip the jargon and RAG is: look it up, then answer.
Why this matters for a business: a plain model doesn't know your refund policy, your product catalogue or your internal procedures, and if you ask anyway, it may confidently make something up. RAG fixes that by grounding every answer in real, current source material you control. When your prices change, you update the documents, not the AI.
A worked example. A property firm has 1,200 pages of tenancy guidance, terms, and internal process notes. Staff waste hours hunting for the right clause. A RAG system indexes all of it. An agent types "what's our notice period for a periodic tenancy in Scotland?" — the system retrieves the two relevant paragraphs and the model answers in seconds, quoting the source. Update the guidance next quarter, re-index, done.
RAG is the right tool when:
- The answer must come from your documents, data or knowledge base
- That information changes over time and must stay current
- You need answers traceable to a source (useful for trust and compliance)
- The document set is too big to paste into a prompt
What RAG doesn't do: it doesn't make the model smarter or change its writing style. It changes what the model can see, not how it behaves. If your problem is tone or format, RAG won't touch it.
What is fine-tuning?
Fine-tuning retrains an existing model on your own examples so it learns a specific behaviour — a house style, a rigid output format, or a narrow specialist task. You feed it hundreds or thousands of paired examples ("given this input, produce this output"), and it adjusts its internal settings to match that pattern reliably.
The key distinction, and it trips up almost everyone: fine-tuning teaches behaviour, not facts. It won't reliably make a model memorise your product catalogue — that's RAG's job. What it does well is make a model consistently produce output in your exact format, or handle a repetitive, well-defined task where prompting keeps drifting off-target.
Real cases where fine-tuning earns its keep are narrower than the hype suggests: classifying support tickets into your specific 15 categories at high volume; generating output in a strict schema every single time; matching a distinctive brand voice across thousands of items where prompt instructions alone won't hold the line.
Fine-tuning is worth considering when all of these are true:
- You've already tried strong prompting (and RAG, if relevant) and hit a real ceiling
- The task is narrow, repetitive and high-volume enough to justify the effort
- You have — or can create — clean, consistent training examples, usually a few hundred at minimum
- Consistency of behaviour is the problem, not access to information
If you can't tick most of those, fine-tuning is premature. It's the approach most often bought for prestige and least often needed.
RAG vs fine-tuning vs prompting: the comparison
| Prompting | RAG | Fine-tuning | |
|---|---|---|---|
| What it does | Instructs an off-the-shelf model well | Feeds the model your documents at answer time | Retrains the model to behave a certain way |
| Changes | Nothing — you just write better requests | What the model can see | How the model behaves |
| Best for | Drafting, summarising, extracting, reasoning | Answering from your private, current data | Fixed style or format, narrow high-volume tasks |
| When to use | Almost always start here | Answers must come from your files | Rarely, and only after the other two fall short |
| Data needed | None beyond what you paste in | Your documents, kept current | Hundreds to thousands of labelled examples |
| Rough cost | Pennies per task, or free tier | Low-to-mid five figures to build, plus running cost | Highest — data prep, training, re-training over time |
| Keeps facts current? | Only what you provide each time | Yes — update the source documents | No — retrain to change anything |
| Speed to value | Immediate | Weeks | Longest |
Read that table top to bottom before you read it left to right. The honest pattern: cost and effort climb as you move right, and the number of businesses that genuinely need the rightmost column is small.
Which one does my business actually need?
Start with prompting. Move to RAG when the answer must come from your own documents or data. Reach for fine-tuning only when you've proven prompting and RAG can't deliver a consistent behaviour, and you have the training data to support it. Most businesses land on prompting, RAG, or a blend of the two — and never need fine-tuning at all.
Work through it in this order:
- Can a well-written prompt on an existing tool do this? Try properly before spending anything. Most drafting, summarising and extraction tasks stop here.
- Does the answer need to come from our specific, current information? If yes, that's RAG. This covers internal knowledge bases, customer-facing answers grounded in your policies, and search over your own document library.
- Is the remaining problem that the model won't stick to our exact style or format, at scale? Only now consider fine-tuning — and even then, a stronger prompt or a template often solves it cheaper.
The two most common real-world setups: good prompting alone for general productivity, and RAG for anything that has to answer from your data. A blend — RAG that retrieves your documents, wrapped in a carefully engineered prompt — covers the majority of serious business AI systems we build.
What does each one cost to run, not just to build?
Build cost is the headline; running cost is the bill that keeps arriving. Prompting is close to free — you pay per request, and requests are cheap. RAG adds ongoing costs for storing and searching your documents plus the model calls themselves, and someone has to keep the source content current. Fine-tuning carries the heaviest ongoing burden: every time your needs shift, you retrain.
A rough mental model:
- Prompting — negligible per task. Your real cost is the time spent writing good prompts and checking output.
- RAG — a monthly floor for hosting the search index and per-query model costs that scale with usage, plus the human effort of keeping documents accurate. Predictable, but not zero.
- Fine-tuning — upfront data preparation and training, then repeated retraining as your data or requirements change. The costs that bite are the ones people forget to budget for.
If you want the running-cost maths done properly for a specific system — token spend, hosting, the lot — that's exactly what our £8k AI System Audit produces: a 30-page report with real numbers, not estimates on a napkin. For ballpark build costs across automations, websites and custom systems, our pricing guide lays out the full ladder.
How do I tell which one I'm being sold?
Plenty of AI proposals lead with the most expensive option because it sounds the most impressive. A few questions cut through the pitch and tell you whether the approach fits the problem.
- "Why not just a better prompt first?" If the honest answer is "we didn't try", the proposal is skipping the cheapest step. Prompting should be ruled out before anything is built, not after.
- "Is the problem missing information, or missing behaviour?" Missing information — the model doesn't know your data — points to RAG. Missing behaviour — it won't hold a format or tone — points towards fine-tuning. If the proposal reaches for fine-tuning to solve an information problem, that's a mismatch.
- "How will you keep the answers current?" RAG updates when you update the documents. Fine-tuning needs retraining. If your data changes weekly and someone's proposing to fine-tune it in, the running cost will hurt.
- "What does this cost to run each month, not just to build?" A vague answer here is a warning. The running cost is where badly chosen approaches quietly drain budgets.
None of this requires you to be technical. It requires you to make the person explain their reasoning in plain terms — and a good engineer welcomes that, because the right approach is easy to justify.
The honest verdict
Most businesses need good prompting or RAG. Not fine-tuning.
Prompting is where you start and where a surprising amount of value lives — the cheapest, fastest lever most companies never pull properly. RAG is the workhorse for anything that must answer from your own information, and it's what genuinely differentiates a custom AI system from a chatbot that guesses. Fine-tuning is a specialist tool for a narrow set of problems, oversold because it sounds impressive and carries the biggest invoice.
If someone's first suggestion is to fine-tune a model for your business, ask them why prompting and RAG won't do the job. If they can't answer clearly, you're being sold complexity, not a solution.
The right sequence saves you a fortune: prove it with a prompt, ground it with RAG, and fine-tune only when the evidence forces your hand. Get that order wrong and you spend five figures solving a problem a good sentence would have fixed.
If you're weighing which approach fits a specific job — and whether it's worth building at all — the AI System Audit exists to answer exactly that, in writing, before you commit to a build. It's also worth understanding the difference between AI, machine learning, LLMs and agents so the sales conversations stop washing over you, and our AI glossary has plain-English definitions with an honest verdict on each term.