Lesson 1.2: How Large Language Models Work

Introduction: Why You Need to Understand the Mechanics

In the previous lesson, we established that LLMs are sophisticated pattern-matching systems, not reasoners or knowledge bases. But that abstract description doesn't fully explain why LLMs sometimes produce astonishing results and sometimes confidently state complete falsehoods.

To use AI effectively and responsibly in grant work, you need to understand the actual mechanics. Not the math (you don't need calculus), but the basic flow: how text gets processed, how predictions get made, and most importantly, why the system can fail in specific ways.

This deeper understanding is what separates grant professionals who use AI as a powerful tool from those who get burned by hallucinations, factual errors, and unreliable outputs.

From Training Data to Model: The Big Picture

Here's the simplified lifecycle of an LLM:

Collection: Developers collect enormous amounts of text—books, websites, articles, code, etc. (Usually hundreds of billions of tokens)
Processing: This text is converted into tokens and fed to the AI system
Training: The system learns patterns by trying to predict the next token over and over, billions of times, adjusting its internal weights each time
Deployment: Once trained, the model is locked (weights are fixed) and deployed for users to interact with
Inference: When you ask ChatGPT a question, the system uses what it learned to generate new text one token at a time

The key insight: The model learns during training, not during inference. When you use ChatGPT, it's not learning anything. It's running billions of mathematical operations using weights that were set during training months or years ago. This has profound implications.

Key Takeaway

LLMs learn from training data in the past. They cannot learn from your individual conversations. They cannot update their knowledge about recent events. They cannot verify information against current reality. This is why an LLM trained in early 2023 still thinks the current year is 2023 (unless explicitly told otherwise in your prompt).

What LLMs Actually "Know" (And What They Don't)

This is where language can mislead us. When we say an LLM "knows" something, what do we really mean?

Consider: What does ChatGPT "know" about the Ford Foundation?

The training data contained thousands of documents mentioning the Ford Foundation—their annual reports, news articles, grant lists, academic papers, etc. The model learned patterns about what words and concepts tend to appear near "Ford Foundation." It learned that text about Ford Foundation often includes words like "philanthropy," "global," "poverty," "grants," "democracy," etc.

When you ask "What is the Ford Foundation?" the model uses these learned patterns to generate text that sounds like what you'd expect from documents about Ford Foundation. But it doesn't have a stored representation of "Ford Foundation facts." It has statistical associations between tokens.

The Crucial Distinction

You might have a notebook where you write: "Ford Foundation was founded in 1936, focuses on reducing inequities globally, headquartered in New York, has given over $100 billion to causes." If you're asked about Ford Foundation, you look in your notebook. An LLM doesn't have a notebook. It has learned that when the token "founded" appears near "Ford Foundation," it's often followed by a year. But which year? It generates whichever year most frequently appeared in similar contexts in its training data. Sometimes that's accurate. Sometimes it's not.

How Text Gets Generated Step by Step

When you type a prompt into ChatGPT and hit enter, here's what happens in extreme simplification:

Step 1: Tokenization

Your prompt is broken into tokens. If you write "I need help with my grant application," that becomes something like: [I] [need] [help] [with] [my] [grant] [application]. Each token is converted to a number the system can process.

Step 2: Processing Through Layers

These tokens flow through multiple layers of mathematical operations. The model has learned which patterns are important and weights different parts of your prompt accordingly. This is where "attention" mechanisms matter—the system can focus on different parts of your prompt when deciding what to generate next.

Step 3: Predicting the Next Token

The system generates something called a "logit vector"—essentially, a score for every possible next token in its vocabulary (often 50,000+ tokens). These scores are converted into probabilities. Maybe the system thinks there's a 45% chance the next token should be "application," a 20% chance "proposal," a 15% chance "process," etc.

Step 4: Sampling

The system doesn't always pick the most likely token. It samples from the probability distribution, with some randomness. This is where temperature comes in. Low temperature means pick the most likely token almost always (deterministic). High temperature means give less-likely tokens a real chance to be picked (creative, but sometimes weird).

Step 5: Repeat

The newly generated token is added to the sequence, and the system repeats from Step 2. Generate next token. Repeat. Keep going until the system decides to stop (usually when it predicts an "end of sequence" token) or until you hit a length limit.

Your Prompt ↓ Tokenization ↓ Mathematical Processing (through layers) ↓ Probability Distribution (next token) ↓ Sampling (pick a token, considering temperature) ↓ [Generated Token Added to Sequence] ↓ Repeat until stopping condition... ↓ Complete Response (sequence of tokens, converted back to text)

Why This Matters: The Confidence Problem

Here's the critical problem: An LLM has no way to know if what it's generating is true.

Remember, the system is just predicting which token is likely to come next based on patterns in training data. If you ask "What are the grant deadlines for Ford Foundation's Strengthening US Democracy program?" the system will generate text that sounds like it's answering your question. It will generate plausible-sounding deadlines, program descriptions, application procedures.

But the system has no ability to:

Check if those deadlines are current (training data is from the past)
Verify against Ford Foundation's actual website
Know if it's hallucinating entirely
Flag uncertainty when it's unsure

The system generates with equal confidence whether it's right or wrong. A false statement feels just as fluent, authoritative, and reasonable as a true one, because both are just predictions based on statistical patterns.

Apply This: Always Verify

Make this a rule: Never use AI-generated grant deadlines, funder names, program descriptions, or eligibility criteria without verifying against the actual funder source. The AI sounds confident either way. Only human verification protects you from submitting grants based on hallucinated information.

Context Windows and Memory

LLMs have a "context window"—the amount of text they can see at once. Claude 3.5 Sonnet has a 200,000 token context window. That's roughly 150,000 words. GPT-4 has 128,000 tokens.

What's inside the context window? Your entire conversation from the beginning. This is why early messages in a long conversation might be "forgotten"—the older they are, the further they drift outside the model's attention focus.

Crucially: The model doesn't have persistent memory between conversations. If you have a conversation with Claude today and another conversation tomorrow, the system doesn't remember you. Each conversation starts fresh. This is by design (privacy) but has practical implications: you can't build a relationship with an AI where it learns your organization's specific context and preferences over time.

Real Example: Same Organization, Different Results

You might use Claude to help draft a grant proposal for your homeless services nonprofit. You have a great conversation, Claude learns your organization's voice and priorities. But in a conversation tomorrow, Claude has no memory of your organization. You'd need to re-explain your mission, programs, statistics, etc. each time. This is why many organizations are moving toward grant-specific platforms that maintain persistent knowledge about their specific organization.

Temperature and Creative vs. Reliable Outputs

Temperature is a setting that controls randomness in an LLM's output:

Low temperature (0.0-0.3): The system picks the most likely token almost always. Output is predictable and consistent. Good for factual tasks (translations, data extraction)
Medium temperature (0.5-0.7): Mix of likely and less-likely tokens. Good for writing that needs to be natural and varied but still coherent
High temperature (0.8-1.0): More randomness, more creative variation. Good for brainstorming. Less reliable for factual or technical content

For grant work, you typically want lower temperature. You don't want your proposals full of creative tangents and inventive but possibly incorrect details. You want reliable, consistent, accurate output. Most professional AI tools default to lower temperature settings for this reason.

Hallucinations: Where the System Goes Wrong

A "hallucination" is when an LLM generates false information confidently. Why does it happen?

No ground truth: The system has no access to current information or ability to verify facts
Pattern misapplication: It learned patterns that sometimes led to false predictions during training, and those errors get replicated during inference
Overconfidence: The system generates fluently about wrong things just as fluently as right things
Context confusion: Sometimes patterns in your prompt accidentally activate patterns the system learned from irrelevant training data

Key Takeaway

Hallucinations aren't bugs or defects. They're inherent to how these systems work. An LLM that never hallucinated would be a fundamentally different system. Accepting this limitation is part of responsible AI use in grant work. Your job is to catch hallucinations through human review and verification, because the AI can't.

Why Understanding This Matters for Grants

You now understand:

LLMs learn from training data, not from your conversations
They generate text token-by-token using statistical patterns
They have no way to verify if what they're saying is true
They're equally confident when wrong as when right
They can't access current information or check external sources
They don't have persistent memory between conversations
Hallucinations are built into the system, not avoidable bugs

This is why AI is useful as a brainstorming and drafting tool but dangerous if used as a research or verification tool. You must maintain the skepticism and verification discipline of a professional grant writer, using AI to amplify your productivity, not to replace your judgment.

Continue Your CAGP Journey

Now that you understand how LLMs work, we'll explore what they can actually do well in grant writing—and which specific tasks genuinely benefit from AI assistance.

Explore More Lessons

← Previous Lesson

What Is Artificial Intelligence?

Next Lesson →

What AI Can Do for Grant Writing

How Large Language Models Actually Work

Introduction: Why You Need to Understand the Mechanics

From Training Data to Model: The Big Picture

Key Takeaway

What LLMs Actually "Know" (And What They Don't)

The Crucial Distinction

How Text Gets Generated Step by Step

Step 1: Tokenization

Step 2: Processing Through Layers

Step 3: Predicting the Next Token

Step 4: Sampling

Step 5: Repeat

Why This Matters: The Confidence Problem

Apply This: Always Verify

Context Windows and Memory

Real Example: Same Organization, Different Results

Temperature and Creative vs. Reliable Outputs

Hallucinations: Where the System Goes Wrong

Key Takeaway

Why Understanding This Matters for Grants

Continue Your CAGP Journey

Chapter 1 Lessons

Chapter Progress