Fundamentals

April 2026

AI vocabulary without the vendor speak

Most of the confusion on an AI sales call is lexical, not strategic. A term gets used in two senses inside the same sentence, and the room nods along. An hour later the buyer is holding a proposal for a thing they did not quite mean to buy.

This is a plain-English reference for the words you will hear most. Every definition ends with one line about what it actually feels like on the ground, because the dictionary version is rarely the part that matters when you are writing the check.

We are not trying to be exhaustive. We are trying to give you enough working vocabulary that the next time a vendor uses one of these words, you can ask a harder follow-up question instead of nodding.

A note before the glossary

When someone throws a term at you, you are allowed to stop the conversation and ask, "What do you mean by that — in this specific project?" Nine times out of ten the answer narrows a very expensive ambiguity. The few times you get pushback from the person using the word are themselves the signal.

We use the same rule internally. If we cannot write a plain sentence defining a term in the work we are doing for you, we do not let that term survive in the plan.

Models and what runs them

Model. The thing that takes text (or images, or audio) in and produces text out. Nothing more magical than that at this layer. A model is a file — a very large one — that has been trained on a very large amount of data. When you "use ChatGPT" you are sending requests to a model that runs on someone else's computers. On the ground: The model is not where most of your problems live. It is almost always downstream of how you are feeding it.

Large language model (LLM). A model specifically trained to work with language. Most of what people mean when they say "AI" today is an LLM behind a chat interface. On the ground: Treat "LLM" and "model" as interchangeable for almost every business conversation.

Foundation model. A general-purpose, large-scale model — the kind built by OpenAI, Anthropic, Google, Meta, and a handful of others — that is meant to be used as the base layer for many applications. Contrast with narrower, task-specific models. On the ground: If someone says "our proprietary foundation model," ask what it is fine-tuned from. Real foundation models are capital-intensive and rare.

Frontier model. The current best-available model at a given time (GPT-5.x, Claude Opus 4.x, Gemini 3.x class). The label moves every few months. On the ground: Whether you need the frontier model for your use case is a real question. Most internal knowledge tasks do not.

Open-weight vs. closed model. Open-weight means the file is downloadable and you can run it on your own hardware (Llama, Qwen, Mistral families, etc.). Closed means you only access it through the vendor's API. On the ground: Open-weight is not the same as free. Running it means paying for the GPUs. The real question is data boundary and control, not ideology.

Parameters. The internal numbers a model has learned during training — billions or trillions of them — that determine how it responds. Counted in B (billion) or T (trillion). On the ground: More parameters ≠ better for you. A well-trained 8B model will beat a lazy 70B model on many real tasks.

Input, output, and what you pay per

Token. The unit the model actually reads and writes. Roughly three-quarters of a word in English, on average. Every word you send in and every word you get back is counted in tokens, and that is what you pay for. On the ground: "This costs $0.005 per 1K input tokens" is the real price tag. Budgets blow up when people send whole documents on every request without thinking.

Prompt. The text you send to the model — system instructions, user question, any reference material you include. The prompt is not a magic spell. It is a brief. On the ground: "Better prompting" is usually shorthand for "clearer thinking about what you actually want." It is a writing problem, not a trick.

Completion / output. What the model gives you back. On the ground: You pay for this too, usually at a higher rate than input. Long, chatty outputs are more expensive than short, disciplined ones.

Temperature. A setting (0–2 in most APIs) that controls how predictable the output is. Low = more consistent and boring; high = more varied and creative. Zero is not deterministic, but it is close. On the ground: For internal ops work, temperature near zero is almost always right. For drafting, marketing, or ideation, you want it higher.

Context window. The maximum amount of text — measured in tokens — the model can hold in its working memory at one time. Modern models range from a few thousand tokens to more than a million. On the ground: "We have a million-token context window" is not a solved problem. Models get worse at using long contexts the closer they get to the edge. Stuffing the window is not a strategy.

Retrieval and memory

Embedding. A model's way of turning a piece of text (or an image) into a long list of numbers — a vector — that captures its meaning in a form a computer can compare. On the ground: You do not need to understand the math. You need to know that embeddings are how software finds similar things, not just matching things.

Vector database. Storage optimized for those embeddings, so you can ask "find me the ten most similar chunks to this question" quickly. Pinecone, pgvector, Weaviate, Qdrant, and others live here. On the ground: A vector database is plumbing, not a strategy. Whether you need one is downstream of whether retrieval is the right approach for your problem.

Retrieval-augmented generation (RAG). A pattern where, instead of training a model on your documents, you keep the documents in a database, and at query time you find the relevant pieces and paste them into the prompt for the model to read. On the ground: Most "chat with your documents" products are a RAG pipeline. The quality of a RAG system is usually the quality of its chunking and retrieval, not the model on top.

Fine-tuning. Continuing the training process on your own data to nudge a model's behavior or style. Different from prompting, and different from RAG. On the ground: Fine-tuning is usually the wrong first move. It is expensive, slow, and tends to ossify mistakes. Try prompting and retrieval first.

Memory. A loose term for "the system remembers what I told it earlier." Can mean anything from "we pass the last few messages back in" to "we store summaries in a database and look them up." On the ground: "It has memory" almost never means what you think it means. Ask where the memory lives, who can read it, and what happens when it is wrong.

Agents, tools, and the word we are careful about

Tool (in an LLM context). A function the model can call — a database lookup, a web search, a calendar write — where the model produces a request and software does the actual work. On the ground: Every tool a model can call is an interface you own. If the tool is wrong, the model will confidently use it wrong.

Tool use. The broader pattern of a model deciding which tools to call and in what order. On the ground: When it works, this feels magical. When it fails, it fails expensively. Logs and guardrails are not optional.

Agent. A system that takes a goal, makes a plan, uses tools, and iterates without a human pressing "go" between each step. There is no universal line between "complicated prompt chain" and "agent" — it is a spectrum. On the ground: A lot of "agents" in the wild are three function calls in a loop with aggressive marketing. That is not necessarily bad — just know what you are buying.

Multi-agent system. Several specialized agents that hand work back and forth to each other. Useful in specific shapes of problem, oversold in almost every other shape. On the ground: Jake Van Clief's published work on folder-based agent architecture makes a point we agree with: for sequential, human-reviewed workflows, a filesystem and one disciplined agent usually beats a pile of frameworks.

Orchestration. The coordination layer — who calls what, in what order, with what context. You will hear this word a lot. We do not love it as a public-facing term because it tends to hide where the actual decisions live, but it is real, and when someone sells you "AI orchestration" they usually mean a workflow engine. On the ground: If your vendor cannot draw the orchestration on a napkin, they do not understand it either.

Workflow. A defined sequence of steps, some of which may be done by a model and some by software or a human. Less fashionable than "agent," often more useful. On the ground: Most real production AI inside businesses today is workflow, not agent. That is a feature.

Quality, evaluation, and failure

Hallucination. When a model produces confident-sounding output that is wrong. Not a rare exception. A default mode of failure. On the ground: If your vendor says "we do not hallucinate," they are either selling you something very narrow or they are not being careful. Ask how they detect it, not whether they prevent it.

Grounding. Tying a model's output to a specific source — a document, a database row, a search result — so you can check it. On the ground: "Grounded in your data" should come with a way to click through to the underlying source. If it does not, it is decoration.

Evaluation. The process and the artifacts you use to measure whether a model or pipeline is doing what you need it to do. Usually a set of test inputs with expected behavior. On the ground: If there is no evaluation, there is no system. There is just vibes and hope. In the field, missing evaluations are one of the strongest predictors that AI work will not hold up.

Benchmark. A public evaluation — MMLU, HumanEval, ARC, and many others — used to compare models. Useful for generic capability claims, largely useless for your specific business. On the ground: "Beats GPT on HumanEval" and "works for your AP team" are unrelated statements.

Guardrail. A pre- or post-check that stops a model from producing certain outputs (or from acting on them). Classification, regex, policy rules, another model judging the first one. On the ground: Every guardrail is a tradeoff between false blocks and real leakage. There is no setting that eliminates both.

Cost and running it

Inference. Using a trained model — asking it a question and getting an answer. This is what you pay for per token. On the ground: Inference cost is the cost you feel monthly. It compounds with usage and with long prompts.

Training. Building the model in the first place (foundation models) or updating it with new data (fine-tuning). On the ground: You are almost certainly not going to train a foundation model. You might fine-tune. Know which one is on the table.

Latency. How long from request to response. Measured in seconds for generation, tens of milliseconds for retrieval. On the ground: For internal ops tools, a three-second response is fine. For live customer-facing use, it may not be. Ask about P95, not "average."

On-prem / self-hosted. Running the model on your own infrastructure rather than calling a vendor API. On the ground: Usually drives costs up, not down, unless you are operating at a scale most firms are not. The reason to do it is data boundary, not savings.

Phrases worth pushing back on

A short list of lines we hear often and the question we ask in response. None of these words are disqualifying — the inability to answer the follow-up is.

"AI-powered." → What, specifically, is the model doing — and what happens when it is wrong?
"Our proprietary model." → Fine-tuned from what? On what data? With what evaluation?
"It learns from your data." → Learns how? Stored where? Visible to whom?
"It eliminates X work." → Who reviews it? What is the error rate we are willing to accept?
"Enterprise-grade." → What does that mean in your SLA? Show me an SLA.
"End-to-end automation." → Where is the human in the loop today? Where will it be in six months?
"Plug and play." → What is the integration actually touching in our systems?

If those questions feel rude to ask, the room is not the right room.

What to do with this

Keep this tab open when you are on vendor calls. Add your own entries when a term shows up that we missed. The goal is not to become fluent in AI jargon — the goal is to not be rushed by it.

Clear vocabulary is the cheapest form of leverage a buyer has. Everything that follows a procurement conversation — scope, price, accountability — is negotiated in words. If the words are loose, the contract will be too.

When you are ready to stop translating vendors and start running the work, the next piece in this sequence is ours: What "context" really costs you — a closer look at the thing that actually determines whether any of this works in your business.

Where to read more

For readers who want the technical version, these are the references we actually use ourselves — not a curated syllabus, just the docs and papers worth having open when the vendor calls get sharp:

Official vendor documentation. Worth reading for anyone buying a model or building on one.

Research and frameworks that shaped how we think about this.

Jake Van Clief, Interpretable Context Methodology — the paper behind how we think about context as architecture rather than prompt engineering.
Model Context Protocol (MCP) — specification — the open standard for how tools and context hand off to a model in a composable way.

Educators and practitioners worth following.

Andrej Karpathy — Intro to Large Language Models (1-hour talk) — the single best plain-English education on how a modern LLM works, from one of the people who has built them. If this vocabulary page does not click for you, watch this first.
Simon Willison — blog at simonwillison.net — our default source for ongoing, hands-on analysis of new models, tools, and practical LLM engineering. Clear writing, no hype.
Sebastian Raschka — Build a Large Language Model (From Scratch) — for readers who want to actually understand the machine, not just use it. Pairs naturally with Karpathy's talk: video first, book second.
Hamel Husain — hamel.dev — independent ML consultant, ex-Airbnb and GitHub. Writes the clearest practitioner guide we know of on how to measure whether your AI system is actually working. Start with "A Field Guide to Rapidly Improving AI Products".

If a resource you think belongs here is missing, tell us. This list earns its keep by being short, current, and honestly used — not comprehensive.

If something in here maps to a problem you are sitting on

Two sentences on what you are trying to do is enough to start. We reply personally—no sequences, no SDR handoff.

New writing is announced via the —same list site-wide. Away from the home page, this opens the signup form over the page so you do not lose your place.