Ochre & Co.

← All resources

Fundamentals
April 2026

What "context" really costs you

Context is the most expensive thing about AI, and almost nobody budgets for it.

By context we mean what the model sees, when it sees it, and what happens to that information afterward. Models are cheap to call. The work of giving them the right context at the right moment is not — and that is where the real bill comes from.

If you have not yet read AI vocabulary without the vendor speak, start there for the definitions behind the words we use here.


Context is not free, and it compounds

Every time you send a request to a model, the text that goes with it — instructions, question, reference material, conversation history — is called the context. You pay for every word of it. The model's attention is also paid out of that budget: the more you stuff in, the less of it the model actually uses well.

"Context" is three costs in a trench coat:

  1. The bill. What you pay your provider per request.
  2. The quality loss. What the model misses when it has too much to wade through.
  3. The governance cost. The overhead of knowing what went into the prompt, where it came from, who can see it, and how to unwind it.

Most teams track the first one, sometimes. Most ignore the other two until something breaks.


1. The bill

Token pricing is straightforward once you sit with it for five minutes. (A token is roughly three-quarters of a word.)

A page of a PDF is around 600 tokens. An email thread is 2,000 or so. A long contract is 20,000 to 40,000. A one-hour meeting transcript is around 10,000.

Per-token pricing moves every few months. The shape of the math does not: cost per request × number of requests = your monthly AI bill. That is trivial arithmetic, and people keep failing at it because the per-request cost is so small nobody checks it.

Where the bill gets away from people:

If you cannot put the token cost of one run of a workflow on an index card, you are not managing it. You are absorbing it.


2. The quality loss

The marketing message lately is "bigger context windows." Models can read a million tokens in one shot now. That is impressive. It is also misleading.

What the research consistently shows — and what every operator who has tried it has felt — is that as the context grows, the model's ability to use it gets worse. It pays close attention to what is at the start and end of a long prompt, and glazes over what is in the middle.

Practical consequences:

When an AI system is underperforming, the first question is rarely "are we on the best model?" It is: what is in the prompt, in what order, and why?


3. The governance cost

This is the cost that shows up in a risk register two years in.

Every word you put into a prompt is a word you have chosen to expose to the model. That text may be used for training, stored for monitoring, logged for your own records, or retrieved later by someone else's query — depending on how your provider is configured. Every answer the model produces is an answer your organization has effectively issued, grounded or not, reviewed or not.

Four practical consequences:


The real work has a name

Two years ago, the fashionable phrase was "prompt engineering." The more honest name — now used by the people actually building production systems — is context engineering.

It means designing what the model sees, when it sees it, and what happens to it afterward:

This is not exciting. It is where roughly 80% of the quality, safety, and cost of a production AI system is actually determined. The model is a commodity. The context is the craft.


Four principles that survive contact with production

None of these are original to us. They are the ones that keep earning their keep in serious systems work.

  1. Retrieve narrow before you generate wide. Three right paragraphs beat a 200-page handbook.
  2. Separate durable from volatile. Stable instructions go in a cache-friendly layer; today's query and today's data go in the hot path.
  3. Make the context inspectable. If an engineer cannot pull the exact prompt that produced a given output in under sixty seconds, your system is not debuggable. It is not accountable either.
  4. Measure before you optimize. Pick ten to fifty real examples, write down what "good" looks like, and run them. Your intuition about which model or prompt is better is almost always wrong in specific, repeatable ways.

How we help

When you pay a consultant, an agency, or an internal team to "do AI" for you, the work they are actually doing — whether the deck says it or not — is context engineering. The model choice is twenty minutes of work. The prompts are a week. The retrieval, the evaluations, the cost discipline, the access control, the provenance, the operational runbook — that is the real work, and that is what separates a demo from a durable system.

We do not do that work for you. We build the discipline into your team. If that is what you are trying to build, two sentences on where you are stuck is enough to start a real conversation.


Going deeper

For the business-side version of the same idea — context at the organization level rather than inside a prompt — read Organization context before models.

For the research and documentation underneath:

For ongoing practitioner writing worth following:

If something in here maps to a problem you are sitting on

Two sentences on what you are trying to do is enough to start. We reply personally—no sequences, no SDR handoff.

New writing is announced via the —same list site-wide. Away from the home page, this opens the signup form over the page so you do not lose your place.