AI Engineering · Canarlo, Leeds
AI engineering for founders who need it in production, not in a deck.
Canarlo is a Leeds engineering studio building production AI for technical founders. Agents, RAG, integrations — shipped with evals in CI, cost dashboards, and a written runbook. Built in, not bolted on.
Who we work with
Technical founders and engineering leads, from pre-launch to growth.
Your prototype works in the notebook. It works in the demo. Then a real customer pastes in a real document, and the answer comes back confidently wrong. Or it comes back right — and the bill is four times the spreadsheet.
That gap is the work. A demo never finds the failure modes production does. Costs scale with usage until someone sets a budget. Regressions stay silent until you build the gate that catches them.
We work with technical buyers who already know this — a founder who reads the diff, a team that needs a second pair of hands on the AI surface. Not a big in-house team after more bodies. Not a no-code chatbot on a marketing site. Production AI for technical buyers — the whole remit.
The problem
Most AI engagements fail in three predictable places.
The demo cliff. A prototype nails the ten cases the team rehearsed. Then it goes live. Case eleven is a 200-page PDF. Case twelve is in Welsh. Production ignores the happy path it was tuned on — and the engagement ended at the demo, before anyone built a test set wide enough to catch it.
The silent regression. Someone tweaks a prompt to fix one ticket. Two weeks later, five other workflows are worse and nobody connects it to the edit. Without a regression gate in CI, the first signal is a customer noticing. Then the provider updates a model — and it happens again, with no change on your side.
The lock-in trap. One provider. Prompts tuned to one model’s quirks. The test set, if it exists, lives in one engineer’s notes. The provider retires the model or doubles the price — and the swap is a rebuild, not a setting. The part that’s yours was never separated from the part you rent.
All three are preventable. None of them are prevented by the way most agencies sell the work.
What we build
Agents, RAG, evals, integrations, LLMOps, strategy.
Agents
Software that does the job, not just chats
An AI worker that takes a task end to end — reads the input, chooses the next step, acts inside set permissions, and hands back to a person when unsure. With a kill switch and a record of every action. Not a chatbot demo.
RAG
Answers grounded in your documents
Ask a question, get an answer that cites its source — and flags when it isn’t sure. The confident-but-wrong answers get caught before a customer sees them, not after.
Evals
Proof it still works before you ship
A test suite for AI output that fails the build when quality drops. Changing the model becomes a number you can read, not a gamble. Start with the £8k audit.
Integrations
AI wired into the systems you already run
Real workflows across your tools — many systems, branching logic, retries that know an outage from a hiccup. Wired in safely, with a record of every change it makes.
LLMOps
Know what it costs and when it breaks
Live dashboards for spend per task. Alerts when quality slips or it slows down. A hard budget per workflow. The model bill is a line you control, not a surprise.
Strategy
Know what to build before you commit
Not sure where to start? The Readiness Assessment ranks the options by effort and payoff, names the data you’re missing, and sketches the build — before you spend on it.
Sectors
The same controls, shaped to your sector.
The capabilities above, applied to the data and rules your industry runs on. A few examples — not a fixed menu.
Fintech
- KYC document extraction
- Transaction categorisation
- Fraud-signal triage, human in the loop
Ecommerce & retail
- Support triage and routing
- Catalogue search that survives typos
- Returns and dispute handling
Edtech
- Rubric-scored auto-marking
- Lesson content with citations
- Student support that escalates when unsure
Healthtech
- Clinical-document summarisation
- Intake triage with audit trails
- Built for GDPR and replay-ready logs
Legal & professional
- Contract review with clause-level citations
- Matter and case intake
- Research grounded in your own precedents
Recruitment & HR
- Candidate-to-role matching
- Explainable shortlisting
- Structured screening, bias-checked and logged
Not listed? The pattern is the same — sensitive data, real workflows, controls that hold up. Tell us your sector →
How we work
Three transparent tiers. Fixed fees where possible.
Fixed fees, scope defined in writing before billing starts. No day-rate clock. If the work runs long, that’s on us.
AI System Audit
£8,000
Fixed fee
A diagnostic for teams with an AI prototype, workflow, or live system. We review architecture, eval coverage, prompt versioning, observability, cost exposure, data risks, and lock-in. You get a written report, risk register, prioritised fixes, and a build plan.
AI MVP
£25,000–60,000
8 weeks
Production-grade MVP with eval gates, observability, and runbook from day one.
Production AI Build
£60,000–100,000+
Scoped per project
End-to-end agent / RAG / integration build with eval CI, cost observability, and audit logging.
Full pricing rationale and cost breakdown: How much does AI engineering cost?
Case studies
Shipped, in production, owned by the client.
Case study · Private build
Recruitment matching platform
Candidate-to-role matching, admin workflows, structured search, and explainable recommendations. Built as a production system, with owned schemas, test sets, and audit trails.
Case study · Example system
Ecommerce support triage
Ticket classification, low-confidence escalation, intent routing, and an audit log on every AI-assisted action.
Frequently asked
Questions technical founders ask before they engage.
How is Canarlo different from a generic AI consulting firm?
We ship code, not slideware. Every engagement leaves you with a working production system, an eval harness in your CI, observability dashboards, and a written runbook. We specialise in technical buyers — founders and CTOs — and we own the production-grade end of the spectrum, between off-the-shelf automation and six-figure enterprise platforms.
What does an engagement cost?
Three transparent tiers: AI System Audit (£8,000 fixed-fee diagnostic), AI MVP (£25,000–60,000 over 8 weeks), Production AI Build (£60,000–100,000+ scoped per project). Full pricing rationale at /how-much-does-ai-consulting-cost.
How do you handle vendor lock-in and model deprecation?
We build on portable primitives. You rent the plumbing — inference, hosting — and own the brain: memory schemas, prompt registries, eval sets, audit logs, tool permissions. When a model is deprecated or pricing changes, you swap providers in days, not quarters.
Do you handle UK GDPR and the EU AI Act?
Yes. RLS-secured data, audit logging, policy registries, and replay-ready logs are built into every Canarlo system. We produce the artefacts your DPO and external auditors will ask for, not after the fact but as part of delivery.
How fast can we have something in production?
Eight weeks for an MVP that ships behind a feature flag with eval gates and observability. Two weeks to ship the eval harness alone. We don't ship demos — what goes live is production-ready or it doesn't go live.
What happens if our AI system breaks in production?
Every Canarlo system ships with a written incident runbook, alerting on the metrics that matter (eval regression, cost spike, latency cliff, tool-call failure rate), and circuit breakers for agent-driven workflows. Read our public post-mortem 'When Our Agent Closed The Wrong Support Tickets' for the failure mode we now design against.
Do you work with startups pre-Series A?
Yes, especially if you're a technical founder or have a CTO who wants a strategic partner rather than a body shop. Our £8,000 AI System Audit is designed as a low-friction first engagement.
Start here
Production AI. Evals in CI. You own the brain.
Twenty-minute call to scope the work. Or start with the £8k audit — a written report, risk register, and prioritised build plan.