What an LLM Is
Fundamentally
An LLM is a statistical pattern matcher trained on massive text corpora. It predicts the next token based on patterns it has seen during training.
The Core Metaphor: Autocomplete + Note-Taker
Really good autocomplete
- Like your phone keyboard, but trained on the entire internet
- Predicts what comes next based on what you’ve typed
- All capabilities emerge from sophisticated next-word prediction
A note-taker completing your sentence
- You start a thought, it finishes it in a way that “sounds right”
- It has seen millions of similar sentences and patterns
- Completes based on what typically follows in its training data
Practically
A text completion engine
- You give it text, it continues it in a plausible way
- All the fancy behaviors (reasoning, coding, conversation) emerge from this one mechanism
A lossy compression of training data
- It’s learned patterns and relationships from billions of documents
- It doesn’t have perfect recall, it has statistical tendencies
- Like a very smart person with photographic memory who sometimes confabulates details
A pattern recognizer
- Excellent at recognizing and applying patterns it has seen before
- Can combine patterns in novel ways
- Gets better with clear examples and structured prompts
Context-dependent
- Everything depends on what you put in the context window
- No persistent memory between conversations (unless explicitly provided)
- Earlier context influences later responses
Eager to please (if anthropomorphized)
- Will generate a plausible-sounding answer rather than admit uncertainty
- Prioritizes “completing helpfully” over “being accurate”
- Like a helpful assistant who really doesn’t want to disappoint you
- This isn’t intention - it’s just trained to complete text in helpful-sounding ways
Why This Matters
Understanding it’s a pattern matcher explains:
- Why clear examples work better than abstract instructions
- Why it sometimes “hallucinates” plausible-sounding nonsense
- Why repetition and structure improve results
- Why it can’t truly “know” things, only predict likely continuations
- Why context window management is critical
Understanding it’s eager to please explains:
- Why it will confidently bullshit rather than say “I don’t know”
- Why it agrees with faulty premises in your questions
- Why leading questions get misleading answers
- Why it needs explicit permission to disagree or provide counterarguments
- Why it rather admits fault then tell you what you should change in your prompts
What This Enables
Emergent capabilities
- Chain-of-thought reasoning (by predicting step-by-step patterns)
- Code generation (by completing code patterns)
- Translation (by completing multilingual text patterns)
- Summarization (by completing “here’s a summary” patterns)
Composability
- Outputs can feed into new inputs
- Prompts can be built from reusable components
- Workflows can chain multiple completions
Malleability
- The same model behaves differently based on prompt structure
- You’re not configuring the model, you’re shaping the completion context
- Role-playing, few-shot examples, and format constraints all work by changing the statistical prediction landscape
How to Work With This
Shape the context to make “pleasing” align with “truthful”
Since LLMs want to complete text in helpful-sounding ways, control what “helpful” looks like:
Limit the context strategically
- Give it only relevant information
- Remove competing patterns that lead to wrong completions
- The right answer should be the most plausible completion given what you provide
Make honesty the pleasing path
- “List what you’re certain about, then what you’re unsure about”
- “If you don’t know, say ‘I don’t have enough context’ instead of guessing”
- “What assumptions are you making here?”
Explicitly request counterarguments
- “What are the flaws in this approach?”
- “Steel-man the opposing view”
- “What could go wrong with this plan?”
- Without this prompt, it will happily agree with bad ideas
Structure prompts to guide completion
- Start with “Let’s think step by step…” → triggers step-by-step patterns
- End with “Output format: JSON” → triggers structured output patterns
- Include examples → triggers matching pattern completions
Use format constraints
- Checklists force systematic thinking
- Tables force comparison
- JSON forces structured output
- The format shapes what completion patterns are likely
You Are the Director
The LLM is an actor, you’re the director
- It will perform whatever role/task you set up
- But it needs clear direction, not vague instructions
- It can’t read your mind or infer unstated requirements
- Your job: be specific about what you want
Decompose tasks to their atomic parts
Don’t conflate multiple asks into one vague request.
Bad: “Make this test”
Good: Break it down explicitly:
- What input files should be tested?
- What output files are expected?
- What style/format should the test follow?
- What language/framework?
- What level of documentation?
- What specific design patterns to use?
- How should it be executed?
- What linting rules apply?
- Should packages be updated?
Why this matters:
- LLM will complete toward “a test” but might guess wrong on 7 of those dimensions
- Explicit decomposition prevents wasted iterations
- Clear tasks get clear completions
The pattern: Break down what you actually want before asking. Be the director who knows exactly what scene they’re shooting.
Create Boundaries Between Tasks
Use clean context for each distinct task
The LLM completes based on ALL context in the window. Polluted context = polluted completions.
The problem with long conversations:
- Early context bleeds into later completions
- Tangents and iterations pollute the completion space
- “Remember when we discussed X?” → now every completion is influenced by X
- Messy history makes focused completions harder
The solution: Boundaries
Use subagents for clean context:
- Spawn a new agent/conversation for each distinct task
- Fresh context = no pollution from previous work
- Each agent gets only the context it needs
- Results come back clean, ready to use
Example boundary pattern:
Main conversation (you as director):
- “Analyze this codebase structure” → spawn subagent
- “Generate tests for module X” → spawn different subagent
- “Review the test output” → spawn third subagent
- Each gets clean context, focused task
Without boundaries: One long conversation where test generation is influenced by architecture discussion, review is influenced by test generation tangents, everything is soup.
With boundaries: Each subagent has laser focus. Architecture agent only sees architecture. Test agent only sees test requirements. Review agent only sees output to review.
Specialized agents for metadata
Have one agent focus on metadata only:
- Extract structure, identify files, map dependencies
- Don’t mix metadata extraction with implementation
- Clean separation: one agent knows ABOUT the code, another WRITES the code
Why this works:
- Metadata agent completes toward analysis patterns
- Implementation agent completes toward code patterns
- No competing patterns in the same context
Example:
Bad: “Analyze this project and then create tests”
- Completion context is confused: analyze or create?
- Metadata extraction pollutes test generation
- Test code influenced by analysis tangents
Good:
- Subagent 1: “Map the project structure: inputs, outputs, dependencies” → pure metadata
- Subagent 2: Given metadata from #1, “Create tests with these specifications…” → pure implementation
The pattern: Task boundaries = context boundaries. Clean context = clean completions.
The Fresh Conversation Test
The litmus test for good prompts/scripts/commands:
Can you get the same result if you ask the prompt in a fresh conversation?
If YES: Your prompt is well-designed
- It’s self-contained
- All necessary context is included
- It’s reproducible
- It will work as a reusable script/command
If NO: Your prompt needs work
- It’s relying on conversation history
- Context is implicit, not explicit
- It won’t work when you try to reuse it later
- Fix it: Cut it into pieces, or run a metadata script first
Examples:
Bad prompt (relies on history):
"Now create the tests"
Works in conversation because you discussed the project earlier. Fails in fresh context.
Good prompt (self-contained):
"Given this project structure: [explicit structure]
Using these input files: [explicit files]
Following this test pattern: [explicit pattern]
Create unit tests that..."
Works anywhere. No conversation history needed.
The pattern for complex workflows:
If your prompt needs history, break it into stages:
- Script 1: Extract metadata (structure, files, dependencies) → save output
- Script 2: Given metadata from Script 1, generate implementation
Each script is self-contained. Each can be run in fresh context. Each is reusable.
Why this matters:
- Prompts that pass the fresh conversation test become reusable scripts
- Scripts that fail the test are actually “conversation-dependent instructions”
- You can’t automate conversation-dependent instructions
- Self-contained prompts = composable workflows
The test: Before you save a prompt as a script/command, paste it into a fresh conversation. Does it work? If not, fix it.
The Core Insight
You’re not asking questions of a knowledgeable entity. You’re setting up a completion problem where the best completion is the answer you need.
Good prompting = engineering the context so the most plausible completion is also the most useful one.
You’re the director. The LLM is an eager actor. Give clear direction.