What an LLM Is

Fundamentally

An LLM is a statistical pattern matcher trained on massive text corpora. It predicts the next token based on patterns it has seen during training.

The Core Metaphor: Autocomplete + Note-Taker

Really good autocomplete

Like your phone keyboard, but trained on the entire internet
Predicts what comes next based on what you’ve typed
All capabilities emerge from sophisticated next-word prediction

A note-taker completing your sentence

You start a thought, it finishes it in a way that “sounds right”
It has seen millions of similar sentences and patterns
Completes based on what typically follows in its training data

Practically

A text completion engine

You give it text, it continues it in a plausible way
All the fancy behaviors (reasoning, coding, conversation) emerge from this one mechanism

A lossy compression of training data

It’s learned patterns and relationships from billions of documents
It doesn’t have perfect recall, it has statistical tendencies
Like a very smart person with photographic memory who sometimes confabulates details

A pattern recognizer

Excellent at recognizing and applying patterns it has seen before
Can combine patterns in novel ways
Gets better with clear examples and structured prompts

Context-dependent

Everything depends on what you put in the context window
No persistent memory between conversations (unless explicitly provided)
Earlier context influences later responses

Eager to please (if anthropomorphized)

Will generate a plausible-sounding answer rather than admit uncertainty
Prioritizes “completing helpfully” over “being accurate”
Like a helpful assistant who really doesn’t want to disappoint you
This isn’t intention - it’s just trained to complete text in helpful-sounding ways

Why This Matters

Understanding it’s a pattern matcher explains:

Why clear examples work better than abstract instructions
Why it sometimes “hallucinates” plausible-sounding nonsense
Why repetition and structure improve results
Why it can’t truly “know” things, only predict likely continuations
Why context window management is critical

Understanding it’s eager to please explains:

Why it will confidently bullshit rather than say “I don’t know”
Why it agrees with faulty premises in your questions
Why leading questions get misleading answers
Why it needs explicit permission to disagree or provide counterarguments
Why it rather admits fault then tell you what you should change in your prompts

What This Enables

Emergent capabilities

Chain-of-thought reasoning (by predicting step-by-step patterns)
Code generation (by completing code patterns)
Translation (by completing multilingual text patterns)
Summarization (by completing “here’s a summary” patterns)

Composability

Outputs can feed into new inputs
Prompts can be built from reusable components
Workflows can chain multiple completions

Malleability

The same model behaves differently based on prompt structure
You’re not configuring the model, you’re shaping the completion context
Role-playing, few-shot examples, and format constraints all work by changing the statistical prediction landscape

How to Work With This

Shape the context to make “pleasing” align with “truthful”

Since LLMs want to complete text in helpful-sounding ways, control what “helpful” looks like:

Limit the context strategically

Give it only relevant information
Remove competing patterns that lead to wrong completions
The right answer should be the most plausible completion given what you provide

Make honesty the pleasing path

“List what you’re certain about, then what you’re unsure about”
“If you don’t know, say ‘I don’t have enough context’ instead of guessing”
“What assumptions are you making here?”

Explicitly request counterarguments

“What are the flaws in this approach?”
“Steel-man the opposing view”
“What could go wrong with this plan?”
Without this prompt, it will happily agree with bad ideas

Structure prompts to guide completion

Start with “Let’s think step by step…” → triggers step-by-step patterns
End with “Output format: JSON” → triggers structured output patterns
Include examples → triggers matching pattern completions

Use format constraints

Checklists force systematic thinking
Tables force comparison
JSON forces structured output
The format shapes what completion patterns are likely

You Are the Director

The LLM is an actor, you’re the director

It will perform whatever role/task you set up
But it needs clear direction, not vague instructions
It can’t read your mind or infer unstated requirements
Your job: be specific about what you want

Decompose tasks to their atomic parts

Don’t conflate multiple asks into one vague request.

Bad: “Make this test”

Good: Break it down explicitly:

What input files should be tested?
What output files are expected?
What style/format should the test follow?
What language/framework?
What level of documentation?
What specific design patterns to use?
How should it be executed?
What linting rules apply?
Should packages be updated?

Why this matters:

LLM will complete toward “a test” but might guess wrong on 7 of those dimensions
Explicit decomposition prevents wasted iterations
Clear tasks get clear completions

The pattern: Break down what you actually want before asking. Be the director who knows exactly what scene they’re shooting.

Create Boundaries Between Tasks

Use clean context for each distinct task

The LLM completes based on ALL context in the window. Polluted context = polluted completions.

The problem with long conversations:

Early context bleeds into later completions
Tangents and iterations pollute the completion space
“Remember when we discussed X?” → now every completion is influenced by X
Messy history makes focused completions harder

The solution: Boundaries

Use subagents for clean context:

Spawn a new agent/conversation for each distinct task
Fresh context = no pollution from previous work
Each agent gets only the context it needs
Results come back clean, ready to use

Example boundary pattern:

Main conversation (you as director):

“Analyze this codebase structure” → spawn subagent
“Generate tests for module X” → spawn different subagent
“Review the test output” → spawn third subagent
Each gets clean context, focused task

Without boundaries: One long conversation where test generation is influenced by architecture discussion, review is influenced by test generation tangents, everything is soup.

With boundaries: Each subagent has laser focus. Architecture agent only sees architecture. Test agent only sees test requirements. Review agent only sees output to review.

Specialized agents for metadata

Have one agent focus on metadata only:

Extract structure, identify files, map dependencies
Don’t mix metadata extraction with implementation
Clean separation: one agent knows ABOUT the code, another WRITES the code

Why this works:

Metadata agent completes toward analysis patterns
Implementation agent completes toward code patterns
No competing patterns in the same context

Example:

Bad: “Analyze this project and then create tests”

Completion context is confused: analyze or create?
Metadata extraction pollutes test generation
Test code influenced by analysis tangents

Good:

Subagent 1: “Map the project structure: inputs, outputs, dependencies” → pure metadata
Subagent 2: Given metadata from #1, “Create tests with these specifications…” → pure implementation

The pattern: Task boundaries = context boundaries. Clean context = clean completions.

The Fresh Conversation Test

The litmus test for good prompts/scripts/commands:

Can you get the same result if you ask the prompt in a fresh conversation?

If YES: Your prompt is well-designed

It’s self-contained
All necessary context is included
It’s reproducible
It will work as a reusable script/command

If NO: Your prompt needs work

It’s relying on conversation history
Context is implicit, not explicit
It won’t work when you try to reuse it later
Fix it: Cut it into pieces, or run a metadata script first

Examples:

Bad prompt (relies on history):

"Now create the tests"

Works in conversation because you discussed the project earlier. Fails in fresh context.

Good prompt (self-contained):

"Given this project structure: [explicit structure]
Using these input files: [explicit files]
Following this test pattern: [explicit pattern]
Create unit tests that..."

Works anywhere. No conversation history needed.

The pattern for complex workflows:

If your prompt needs history, break it into stages:

Script 1: Extract metadata (structure, files, dependencies) → save output
Script 2: Given metadata from Script 1, generate implementation

Each script is self-contained. Each can be run in fresh context. Each is reusable.

Why this matters:

Prompts that pass the fresh conversation test become reusable scripts
Scripts that fail the test are actually “conversation-dependent instructions”
You can’t automate conversation-dependent instructions
Self-contained prompts = composable workflows

The test: Before you save a prompt as a script/command, paste it into a fresh conversation. Does it work? If not, fix it.

The Core Insight

You’re not asking questions of a knowledgeable entity. You’re setting up a completion problem where the best completion is the answer you need.

Good prompting = engineering the context so the most plausible completion is also the most useful one.

You’re the director. The LLM is an eager actor. Give clear direction.