AI OperatorOPERATOR NOTES
← Back to Notes

Input Tokens vs Output Tokens (Cost Is the Lens)

5 min read

There’s a small lie baked into the way most people start coding with AI.

It sounds like efficiency.

It’s usually described as “keeping prompts short.”

And if you’ve ever watched someone vibe code, you’ve seen the aesthetic:

“Build me X.”

“Fix this.”

“Make it better.”

The prompt is tiny. The model is powerful. The output is… enormous.

Then the conversation stretches. The thread thickens. The context window turns into a landfill. And the person doing it wonders why the experience feels chaotic and expensive.

There’s a calmer way to understand what’s going on.

Not philosophical.

Not moral.

Economic.

The Problem

Most “bad prompting” isn’t bad because it’s impolite.

It’s bad because it forces the model to defend itself against ambiguity.

When you say “make it better,” the model has to decide what “better” means:

  • faster?
  • cleaner?
  • more secure?
  • more features?
  • less code?
  • more comments?
  • different architecture?

A vague request creates a branching tree of interpretations.

The model doesn’t know which branch you meant, so it covers more branches.

That tends to look like:

  • longer explanations
  • more alternatives
  • more boilerplate
  • more “just in case” code

You didn’t ask for bloat.

You asked for help.

But ambiguity reliably produces verbosity.

And verbosity is not free.

Tokens Aren’t Symmetric

Here’s the part that flips intuition.

Input tokens and output tokens do not cost the same.

In many pricing schemes, output tokens are several times more expensive than input tokens.

Even if you don’t memorize the numbers, the pattern matters:

  • input is relatively cheap
  • output is relatively expensive

So the “short prompt” strategy is often upside down economically.

You saved pennies on input and paid dollars on output.

The Vibe Coding Cost Pattern

A typical vibe coding loop looks like this:

  1. tiny prompt
  2. huge output
  3. follow-up correction
  4. huge output
  5. follow-up correction
  6. huge output

Even when each step “works,” the cumulative effect is predictable:

  • costs rise
  • context window fills
  • the model becomes less precise
  • you spend more time steering

And because the outputs are large, you also pay a second tax: attention.

Reading long output is work.

Reviewing long diffs is work.

Keeping a long thread coherent is work.

So you’re not only paying financially.

You’re paying cognitively.

The Solution (The Calm Way)

The Operator approach goes in the opposite direction:

Front-load clarity.

Spend tokens on input.

Reduce output downstream.

This is counterintuitive if you think the goal is “talk less.”

But the goal is not to talk less.

The goal is to make the model certain.

Certainty collapses output size.

What “More Input” Actually Means

This isn’t a call to write a novel in the chat box.

It’s a call to move the project state into artifacts:

  • a short brief (goal, non-goals, constraints)
  • acceptance criteria (binary, testable)
  • edge cases (the unhappy paths you don’t want to rediscover)
  • a “current task” definition (what done means today)

You can afford to be generous here.

Because this input is reusable.

You can paste it into future sessions.

You can hand it to another model.

You can hand it to a human.

Artifacts turn input tokens into an asset.

The Counterintuitive Math

Consider two extremes:

Scenario A: Vibe coding

  • minimal input
  • maximal ambiguity
  • maximal output
  • many iterations

Scenario B: Operator mode

  • heavy input (clear artifacts)
  • minimal ambiguity
  • smaller output
  • fewer iterations

Even if Scenario B uses more input tokens overall, it can still be cheaper, because it avoids the expensive part: repeated, sprawling output.

This isn’t just about cost.

It’s about what cost reveals: where waste lives.

Waste lives in repeated retries.

Waste lives in long outputs you didn’t need.

Waste lives in “defensive” code that exists because you didn’t define scope.

Thinking First Is Greener Than Vibe Coding dives into the sustainability angle of the same habit: how clarity up front reduces compute burn and mental fatigue.

The Hidden Benefit: Less Output, Less Noise

Smaller outputs tend to be:

  • easier to review
  • easier to test
  • easier to integrate
  • less likely to contain accidental scope creep

And when the model isn’t trying to cover every possible interpretation, it hallucinates less.

Not because it became smarter.

Because you gave it less room to guess.

If you need a lightweight artifact to compress scope upfront, The Post-It Note Analogy shows how to anchor intent without writing a novel.

Review Is a Contract pairs with that approach by giving you a mechanical checklist to validate the smaller outputs you asked for.

Conclusion

Vague prompts feel fast.

They are often the most expensive way to use AI. Financially and mentally.

If you want a simple Operator rule:

Spend tokens where they are cheap and durable (input artifacts).

Avoid tokens where they are expensive and noisy (unbounded output).

Or said more plainly:

Maximum clarity up front.

Minimum output downstream.

LIKE THIS? READ THE BOOK.

The manual for AI Operators. Stop fighting chaos.

Check out the Book