Have you noticed how people reach for industry jargon when they want to sound more knowledgeable than they are? Corporate-speak works because its elastic. The terms can be debated, nodded along with, and defended through optics and peer pressure even when real understanding is thin.
I had one of these conversations the other day with a tech-guru type. I asked a straightforward question and got an answer that treated "tokens" like they’re an authoritative driver of AI decision-making. That wasn’t an explanation. It was a deflection. It wasn’t aimed at me either. It was aimed at the room, using buzzwords to keep the conversation moving, avoid admitting uncertainty, and preserve an aura of expertise without doing honest work and taking a few minutes to research an answer.
And this isn’t confined to randoms in a meeting. You can watch senior leaders and keynote scripts do the same sleight-of-hand in public. Microsoft’s CEO has pitched "tokens per watt per dollar" as a sweet spot where compute power and "intelligence" meet, and as a growth-driving formula (Nadella, 2025). In India, he’s been quoted pushing "tokens per rupee per watt" as a metric tied to GDP outcomes, with the framing sliding from efficiency to socioeconomic impact (The Economic Times, 2025). NVIDIA’s GTC keynote narration goes further, anthropomorphizing tokens as actors that "see disease" and "decode the laws of physics," as if tokens are the doing part rather than the metering and representation layer (Huang, 2025).
So when an apparent industry expert on a professional social platform expert uses token talk to posture as if they’ve explained capability, it sounds plausible because tokens are visible, measurable, and billed. Providers also operationalize them in exactly those ways: they cap throughput in tokens-per-minute, and they tell you bluntly that generating tokens is usually the dominant latency step (OpenAI, n.d.). Tokens are real. Token pricing is real. Token limits are real. What’s not real is treating tokens as the core driver of "AI decision-making," as if the model’s internal objective is "use the fewest tokens possible."
Tokens get conflated because people mash together three different things: (1) how text is represented for the model, (2) how providers meter and charge usage, and (3) what the model is actually optimizing during generation. If you treat "token minimization" as the decision model, you’re basically treating the electricity meter as the brain.
What is a Token?
A token is a chunk of text the model processes. Not a word. Not a character. A chunk. Depending on the tokenizer and language, a token can be a whole word, part of a word, punctuation, or even a leading space plus a word fragment. OpenAI describes tokens as “chunks” that represent commonly occurring sequences of characters, and notes that spaces and partial words count. (OpenAI, n.d.).
That detail matters because people keep talking about tokens like they’re a universal unit. They aren’t. They’re a representation chosen by the model creator, and as such they vary according to manufacturer. Think of it like a hamburger. You can get a burger from McDonalds, one from Wendy's, and one from Burger King. They're all still hamburgers. But they have different ingredients, and slightly different costs to produce as a result.
In other words, what defines a token is relatively arbitrary, constrained only by convention.
Are Tokens Universal?
Different models can tokenize the same string differently, because tokenization schemes and vocabularies vary. That means “token count” is not a stable, cross-model measure of “work done” or “intelligence.” It’s a model-specific encoding choice.

There’s also evidence that tokenization choices can affect model behavior and meaning, which is another reason you shouldn’t treat tokens as a neutral unit of capability. (Haslett, 2025).
So if someone is building grand theories of “AI decision-making” on token counts, they’re building on a unit that changes under their feet.
How are Tokens Used?
Tokens are used for at least four operational purposes:
- Context limits (how much text the model can consider at once).
Your prompt and the model’s output are counted in tokens, which is how providers enforce context windows and rate limits. (OpenAI, n.d.). - Billing.
Providers price usage largely in input and output tokens. OpenAI and Anthropic both publish token-based pricing tables. (OpenAI, n.d.; Anthropic, n.d.). - Latency and infrastructure optimization.
Some features exist specifically because tokens drive runtime cost. OpenAI’s “Predicted Outputs” is explicitly about speeding up responses when many output tokens are predictable. (OpenAI, n.d.). - Agent overhead.
In agentic systems, you can spend a huge amount of tokens on tool definitions, schemas, and accumulated context before you even get to “the task.” Anthropic’s engineering writeups and docs describe token overhead from tools, and tooling that “compacts” context when usage grows. (Anthropic, n.d.; Anthropic, 2025).
None of those are “decision-making.” They’re constraints and costs around inference and orchestration.
Generation is Token-by-Token, but the Objective is not “Minimize Tokens”
When a user creates a request models generate outputs by selecting the next token repeatedly until they stop. The selection mechanism is driven by a probability distribution over possible next tokens, shaped by decoding parameters (temperature, top_p, top_k), and it runs for “each subsequent token.” (Amazon Web Services, n.d.).

That tells you what’s being optimized at the point of choice: which token to pick next given the current context and the sampling rules. It does not imply a global objective like “minimize total tokens.” In practice, verbosity is controlled by constraints like max output tokens, stop sequences, system instructions, and user preference. Not by a built-in “token frugality” instinct.
If anything, providers are now shipping explicit controls that allow spending more tokens to reason longer. Anthropic’s “extended thinking” includes a “thinking budget,” which is literally an operator-controlled dial for how much compute and token budget to spend on harder problems. (Anthropic, 2025).
A system that can be instructed to spend more budget contradicts the idea that it is inherently minimizing tokens as a governing principle.
Tokens Reflect Cost, not Capability
Tokens are the meter providers use because they correlate with compute and throughput. Pricing pages make that explicit by quoting cost per million input and output tokens, and separating cached tokens and batching. (Microsoft, n.d.; OpenAI, n.d.).

Cost is not capability. Two models can charge differently per token, tokenize differently, and produce different quality. Treating “token minimization” as the core “decision model” is basically treating the electricity meter as the brain.
“Agentic AI” Often Burns Tokens by Design
Agents do not just “answer.” They loop: plan, call tools, read tool outputs, update state, and continue. Token usage rises because the agent has to carry context forward and often has to include tool schemas, tool outputs, and intermediate reasoning or summaries.
That’s why real agent frameworks include context management and compaction mechanisms to stay within token thresholds. (Anthropic, n.d.).
So the production trend isn’t “AI always tries to use fewer tokens.” The trend is “token use expands with capability and tooling, then teams put governance around it to keep it economically sane.”
So What's Going On?
The token claim isn’t “a technical misunderstanding.” It’s a pattern: AI’s opacity gives bad actors cover to perform expertise, and jargon becomes a social weapon. You can’t easily falsify them in real time, the room can’t tell the difference, and the speaker rides the ambiguity to authority.
That pattern is now visible at regulator level because the same behavior shows up in markets: people claiming AI capability they don’t have, or implying AI guarantees it can’t actually deliver. The SEC has already sanctioned “AI washing” in financial services. In March 2024 it settled with Delphia and Global Predictions over misleading statements about their use of AI, with civil penalties reaching $225,000. That is the institutional version of the same play: sell mystique, dodge technical accountability, and let the audience fill in the gaps. (U.S. Securities and Exchange Commission, 2024).
The FTC is making the same point from the consumer-protection angle. In September 2024 it announced a crackdown on deceptive AI claims and schemes, explicitly stating there is “no AI exemption” from existing laws against unfair or deceptive practices. Translation: “AI” doesn’t get you a free pass to overclaim, mislead, or dress up ordinary software as something magical. (Federal Trade Commission, 2024). Federal Trade Commission And in 2025, US state attorneys general have been leaning into the same enforcement posture, using existing consumer protection and anti-deception powers to go after misleading AI representations even without bespoke AI laws. (Reuters, 2025).

So where does the token story fit? It’s a clean example of metric laundering. There’s a reason people call parts of LinkedIn a flood of AI "word salad." LinkedIn’s own AI-driven content experiments triggered backlash for producing low-value, templated material at scale, and that gives grifters perfect cover: when the feed is already full of synthetic-sounding "expert" posts, the threshold for believability drops. (Goldman, 2024).
How it plays out in this discussion is it took someone who was experienced in how technology worked to spot an individual bluffing at poker. What sounded technical was just hard to challenge quickly and efficiently. It gives the speaker an escape hatch because any pushback could be reframed as “you don’t understand how models work", with the conversation quickly turning into a professional yardstick based not on evidence, but on hearsay and popularity. That’s theater.
How is it countered? Demand tangible information.
If you're offered a newsletter behind a paywall, walk.