Eight AI Terms You’re Getting Wrong (And Why It Costs You)
A technical writer's guide to the modern AI stack, from standard LLMs to agentic workflows
Job postings want “experience with RAG pipelines.” Vendor pitches promise “agentic AI.” Your manager asks if the new tool uses an LLM or AGI. You nod. You shouldn’t have to fake it. Having built agentic pipelines that parse complex integration data into documentation, I’ve seen firsthand what happens when these terms are misunderstood.
These eight terms show up constantly in technical writing conversations now. They get misused, swapped, and conflated — by hiring managers, by vendors, by us. That’s a problem. Technical writers are supposed to be the people who make terminology precise. If we can’t sort these out for ourselves, we can’t sort them out for anyone.
Here’s what each term actually means, how they connect, and where the confusion usually starts.
1. LLM — Large Language Model
An LLM is a neural network trained on massive text datasets to predict the next word in a sequence. That’s the core mechanism. GPT-4, Claude, Gemini, Llama, and all LLMs take text in, generate text out, and have no persistent memory between conversations.
What an LLM does well is draft text, summarize, translate, and answer questions from patterns it absorbed during training. What it doesn’t do is reason through novel problems, access your company’s knowledge base, or remember what you told it yesterday.
Where technical writers encounter this: Every AI writing tool, every chatbot, every “AI-powered documentation” pitch runs on an LLM. When someone says “we’re using AI to write our docs,” they mean an LLM is generating text. The question that matters — the question a technical writer should ask — is what’s wrapped around it.
2. MoE — Mixture of Experts
Not every LLM processes your input the same way. A dense model sends every token through every parameter. A Mixture of Experts model routes each token to a subset of specialized sub-networks of “experts” and only activates the ones relevant to that input.
DeepSeek-V3 has 671 billion total parameters but activates only about 37 billion per token. GPT-4 is widely believed to use MoE with roughly 16 experts. Google’s Gemma 4 pushes it further: 128 experts with 8 active per token.
The practical result is that MoE models deliver the performance of a much larger dense model at a fraction of the compute cost. Think of it as a hospital with dozens of specialists. You don’t see all of them for an ear infection. The triage nurse routes you to the right one.
Why this matters to writers: When someone tells you a model has “671 billion parameters,” that number means almost nothing without knowing how many are active per query. MoE is the reason some models can be both enormous and cheap to run. It’s also why comparing models by raw parameter count (something vendor marketing loves to do) is misleading. Misunderstanding this distinction costs companies real money when they over-provision expensive compute for a dense model when an efficient MoE model would easily handle the load.
3. Large Reasoning Model
A standard LLM generates the next token fast. A reasoning model spends extra computing time thinking before it answers. OpenAI’s o1, o3, and o4-mini are reasoning models. So is DeepSeek-R1.
The difference is visible. Ask Claude or GPT-4o a complex math problem, and you get an answer in seconds, sometimes incorrect. Ask o3 or DeepSeek-R1, and you might wait a minute, but the model is performing real work, going through a chain of thought. It’s decomposing the problem, checking its work, and backtracking when something doesn’t hold. The tradeoff is that reasoning models are slower and more expensive per query.
For technical writers: Reasoning models matter when accuracy on complex, multi-step problems outweighs speed. If you’re evaluating an AI tool for code documentation or technical accuracy review, ask whether it uses a reasoning model or a standard LLM. The answer changes what you can trust it to do.
4. Vector Database
An LLM knows what it learned during training. It doesn’t know your product’s current API reference, your style guide, or the release notes you published last Tuesday. To give it that knowledge, you need a vector database.
A vector database stores text (or images, or other data) as numerical representations called embeddings, arrays of numbers that capture meaning. “OAuth error” and “login failure” end up close together in vector space even though they share no words. A traditional database search for “OAuth error” wouldn’t find a document titled “login failure.” A vector search would.
Concrete example: Say you have 2,000 help center articles. You convert each one into embeddings and store them in a vector database. When a user asks, “How do I share Shopify customer addresses with iPaaS and HubSpot?” the vector database finds the three or four articles closest in meaning to that question, even if none of them contain the exact phrase “share customer addresses.” Those articles then get passed to an LLM as context.
That handoff, the retrieving of relevant documents and feeding them to a model, is the next term on the list.
5. RAG — Retrieval-Augmented Generation
RAG is the pattern that connects a vector database to an LLM. The workflow: a user asks a question, the system searches a vector database for relevant content, retrieves the top results, and passes them to the LLM as context alongside the question. The LLM generates its answer based on that retrieved content rather than relying solely on its training data.
Without RAG, an LLM answering questions about your product is guessing from general training data. With RAG, it’s working from your actual documentation.
Why technical writers should care deeply: RAG systems are only as good as the content they retrieve. Garbage in, garbage out equals a failure mode. If your documentation is inconsistent, outdated, poorly structured, or full of ambiguous headings, the vector search retrieves the wrong chunks, and the LLM confidently generates a wrong answer sourced from your own content.
This is where the “context owner” argument bites hardest. The people who structure, maintain, and quality-control the content that feeds RAG pipelines are technical writers. The retrieval layer doesn’t fix bad documentation. It amplifies it.
6. MCP — Model Context Protocol
MCP is an open standard introduced by Anthropic in November 2024 that defines how AI applications connect to external tools and data sources. Think of it as a universal adapter. Before MCP, every integration between an AI model and an external system — your CRM, your Git repo, your documentation platform — required custom code. MCP standardizes that connection.
The protocol uses a client-server architecture. An AI application (the client) connects to MCP servers that expose capabilities: reading files, querying databases, and executing functions. By December 2025, the community had built thousands of MCP servers, and Anthropic donated the protocol to the Agentic AI Foundation under the Linux Foundation. OpenAI, Google, Microsoft, and others adopted it.
For technical writers, MCP matters in two ways. First, it’s what lets AI tools access your documentation systems, such as Git repos, your CCMS, and your knowledge bases, directly. When someone says an AI tool “connects to” your Confluence or Intercom instance, MCP (or something like it) is likely the plumbing handling it.
Second, MCP is the infrastructure for the next term.
7. Agentic AI
An LLM generates text. An agent does things. Agentic AI refers to systems where an LLM plans a sequence of actions, executes them using external tools, observes the results, and adjusts. It’s the difference between a model that writes a SQL query when you ask, and a system that decides it needs data, writes the query, runs it against a database, interprets the results, and then answers your question.
Claude’s computer use, where it can operate a browser, read files, and execute code, is agentic. So are coding assistants like Claude Code or Cursor that read your codebase, plan changes across multiple files, run tests, and iterate. MCP provides the connective tissue — it’s how agents reach the tools they need.
Here’s a real example relevant to technical writers: At iPaaS.com, I built Claude Agent Skills that took JSON integration exports and produced both Excel workbooks and Markdown documentation files. The agent didn’t just generate text. It parsed JSON, made structural decisions about how to organize the output, created multiple files, and handled edge cases in the data. That’s agentic behavior: planning, acting, observing, adjusting.
The confusion: People use “agentic AI” to describe everything from a chatbot with a search tool to a fully autonomous system that runs for hours. The spectrum is wide. The useful question isn’t “is it agentic?” but “what can it actually do without human intervention, and where must we enforce governance and permission checks?” Without those guardrails, a highly capable agent might autonomously delete your entire repository.
8. AGI — Artificial General Intelligence
AGI is a hypothetical system that can perform any intellectual task a human can, across any domain, without task-specific training. No one has built it. No current AI system qualifies.
The term generates more heat than light. Some researchers define AGI as matching average human performance across all cognitive tasks. Others set the bar at surpassing human experts in every domain. OpenAI’s charter references AGI as its long-term goal. Other organizations avoid the term entirely because it’s too vague to be useful.
It keeps coming up again and again because vendor marketing loves the proximity. “Our system brings you closer to AGI” sounds impressive and means nothing specific. When a job posting mentions AGI, it usually signals the company’s ambitions, not the technology they’re actually building.
The honest assessment: Current LLMs, even reasoning models, are narrow. They’re very good at specific tasks but fall apart outside their training distribution. A model that excels in competition math may struggle with a straightforward spatial reasoning problem. AGI would require something qualitatively different from what exists today. People disagree about how different and how far away.
For technical writers, the practical response to AGI claims is the same as it is for any undefined term: ask for the specification. What exactly does this system do? What are its boundaries? That’s documentation work. That’s what we do.
How These Terms Connect
These eight terms aren’t isolated vocabulary items. They form a stack.
LLMs are the foundation — the text-generation engine. MoE is an architecture choice that makes LLMs more efficient, letting them scale without proportional compute costs. Large Reasoning Models extend LLMs with deliberate, multi-step thinking for problems where accuracy matters more than speed.
Vector databases store your content as searchable embeddings. RAG is the pattern that retrieves from those databases and feeds relevant content to an LLM, grounding its answers in your actual documentation. MCP standardizes how AI systems connect to external tools and data sources — including the databases and platforms that hold your content.
Agentic AI combines all of these: an LLM (possibly with reasoning capabilities) that uses MCP connections to access tools, retrieves context via RAG, and executes multi-step plans. It’s not a single technology. It’s an architecture that layers the others.
AGI sits at the far end. It’s the theoretical destination that none of these technologies have reached, regardless of what the marketing copy says.
When a vendor tells you their product uses “agentic RAG,” you now know what that means: an AI system that autonomously retrieves from your content and takes actions based on what it finds. When a job posting asks for “experience with LLMs and vector databases,” it’s asking if you understand how AI documentation tools actually work under the surface.
Technical writers who understand this stack aren’t just keeping up with jargon. They’re the people who can evaluate whether an AI documentation tool will actually work, what content quality it requires, and where it will fail. That’s not a nice-to-have skill anymore. It’s the job now.

