Introduction: Why You Need to Understand the Mechanics
In the previous lesson, we established that LLMs are sophisticated pattern-matching systems, not reasoners or knowledge bases. But that abstract description doesn't fully explain why LLMs sometimes produce astonishing results and sometimes confidently state complete falsehoods.
To use AI effectively and responsibly in grant work, you need to understand the actual mechanics. Not the math (you don't need calculus), but the basic flow: how text gets processed, how predictions get made, and most importantly, why the system can fail in specific ways.
This deeper understanding is what separates grant professionals who use AI as a powerful tool from those who get burned by hallucinations, factual errors, and unreliable outputs.
From Training Data to Model: The Big Picture
Here's the simplified lifecycle of an LLM:
- Collection: Developers collect enormous amounts of text—books, websites, articles, code, etc. (Usually hundreds of billions of tokens)
- Processing: This text is converted into tokens and fed to the AI system
- Training: The system learns patterns by trying to predict the next token over and over, billions of times, adjusting its internal weights each time
- Deployment: Once trained, the model is locked (weights are fixed) and deployed for users to interact with
- Inference: When you ask ChatGPT a question, the system uses what it learned to generate new text one token at a time
The key insight: The model learns during training, not during inference. When you use ChatGPT, it's not learning anything. It's running billions of mathematical operations using weights that were set during training months or years ago. This has profound implications.
Key Takeaway
LLMs learn from training data in the past. They cannot learn from your individual conversations. They cannot update their knowledge about recent events. They cannot verify information against current reality. This is why an LLM trained in early 2023 still thinks the current year is 2023 (unless explicitly told otherwise in your prompt).
What LLMs Actually "Know" (And What They Don't)
This is where language can mislead us. When we say an LLM "knows" something, what do we really mean?
Consider: What does ChatGPT "know" about the Ford Foundation?
The training data contained thousands of documents mentioning the Ford Foundation—their annual reports, news articles, grant lists, academic papers, etc. The model learned patterns about what words and concepts tend to appear near "Ford Foundation." It learned that text about Ford Foundation often includes words like "philanthropy," "global," "poverty," "grants," "democracy," etc.
When you ask "What is the Ford Foundation?" the model uses these learned patterns to generate text that sounds like what you'd expect from documents about Ford Foundation. But it doesn't have a stored representation of "Ford Foundation facts." It has statistical associations between tokens.
The Crucial Distinction
You might have a notebook where you write: "Ford Foundation was founded in 1936, focuses on reducing inequities globally, headquartered in New York, has given over $100 billion to causes." If you're asked about Ford Foundation, you look in your notebook. An LLM doesn't have a notebook. It has learned that when the token "founded" appears near "Ford Foundation," it's often followed by a year. But which year? It generates whichever year most frequently appeared in similar contexts in its training data. Sometimes that's accurate. Sometimes it's not.
How Text Gets Generated Step by Step
When you type a prompt into ChatGPT and hit enter, here's what happens in extreme simplification:
Step 1: Tokenization
Your prompt is broken into tokens. If you write "I need help with my grant application," that becomes something like: [I] [need] [help] [with] [my] [grant] [application]. Each token is converted to a number the system can process.
Step 2: Processing Through Layers
These tokens flow through multiple layers of mathematical operations. The model has learned which patterns are important and weights different parts of your prompt accordingly. This is where "attention" mechanisms matter—the system can focus on different parts of your prompt when deciding what to generate next.
Step 3: Predicting the Next Token
The system generates something called a "logit vector"—essentially, a score for every possible next token in its vocabulary (often 50,000+ tokens). These scores are converted into probabilities. Maybe the system thinks there's a 45% chance the next token should be "application," a 20% chance "proposal," a 15% chance "process," etc.
Step 4: Sampling
The system doesn't always pick the most likely token. It samples from the probability distribution, with some randomness. This is where temperature comes in. Low temperature means pick the most likely token almost always (deterministic). High temperature means give less-likely tokens a real chance to be picked (creative, but sometimes weird).
Step 5: Repeat
The newly generated token is added to the sequence, and the system repeats from Step 2. Generate next token. Repeat. Keep going until the system decides to stop (usually when it predicts an "end of sequence" token) or until you hit a length limit.
Why This Matters: The Confidence Problem
Here's the critical problem: An LLM has no way to know if what it's generating is true.
Remember, the system is just predicting which token is likely to come next based on patterns in training data. If you ask "What are the grant deadlines for Ford Foundation's Strengthening US Democracy program?" the system will generate text that sounds like it's answering your question. It will generate plausible-sounding deadlines, program descriptions, application procedures.
But the system has no ability to:
- Check if those deadlines are current (training data is from the past)
- Verify against Ford Foundation's actual website
- Know if it's hallucinating entirely
- Flag uncertainty when it's unsure
The system generates with equal confidence whether it's right or wrong. A false statement feels just as fluent, authoritative, and reasonable as a true one, because both are just predictions based on statistical patterns.
Apply This: Always Verify
Make this a rule: Never use AI-generated grant deadlines, funder names, program descriptions, or eligibility criteria without verifying against the actual funder source. The AI sounds confident either way. Only human verification protects you from submitting grants based on hallucinated information.
Context Windows and Memory
LLMs have a "context window"—the amount of text they can see at once. Claude 3.5 Sonnet has a 200,000 token context window. That's roughly 150,000 words. GPT-4 has 128,000 tokens.
What's inside the context window? Your entire conversation from the beginning. This is why early messages in a long conversation might be "forgotten"—the older they are, the further they drift outside the model's attention focus.
Crucially: The model doesn't have persistent memory between conversations. If you have a conversation with Claude today and another conversation tomorrow, the system doesn't remember you. Each conversation starts fresh. This is by design (privacy) but has practical implications: you can't build a relationship with an AI where it learns your organization's specific context and preferences over time.
Real Example: Same Organization, Different Results
You might use Claude to help draft a grant proposal for your homeless services nonprofit. You have a great conversation, Claude learns your organization's voice and priorities. But in a conversation tomorrow, Claude has no memory of your organization. You'd need to re-explain your mission, programs, statistics, etc. each time. This is why many organizations are moving toward grant-specific platforms that maintain persistent knowledge about their specific organization.
Temperature and Creative vs. Reliable Outputs
Temperature is a setting that controls randomness in an LLM's output:
- Low temperature (0.0-0.3): The system picks the most likely token almost always. Output is predictable and consistent. Good for factual tasks (translations, data extraction)
- Medium temperature (0.5-0.7): Mix of likely and less-likely tokens. Good for writing that needs to be natural and varied but still coherent
- High temperature (0.8-1.0): More randomness, more creative variation. Good for brainstorming. Less reliable for factual or technical content
For grant work, you typically want lower temperature. You don't want your proposals full of creative tangents and inventive but possibly incorrect details. You want reliable, consistent, accurate output. Most professional AI tools default to lower temperature settings for this reason.
Hallucinations: Where the System Goes Wrong
A "hallucination" is when an LLM generates false information confidently. Why does it happen?
- No ground truth: The system has no access to current information or ability to verify facts
- Pattern misapplication: It learned patterns that sometimes led to false predictions during training, and those errors get replicated during inference
- Overconfidence: The system generates fluently about wrong things just as fluently as right things
- Context confusion: Sometimes patterns in your prompt accidentally activate patterns the system learned from irrelevant training data
Key Takeaway
Hallucinations aren't bugs or defects. They're inherent to how these systems work. An LLM that never hallucinated would be a fundamentally different system. Accepting this limitation is part of responsible AI use in grant work. Your job is to catch hallucinations through human review and verification, because the AI can't.
Why Understanding This Matters for Grants
You now understand:
- LLMs learn from training data, not from your conversations
- They generate text token-by-token using statistical patterns
- They have no way to verify if what they're saying is true
- They're equally confident when wrong as when right
- They can't access current information or check external sources
- They don't have persistent memory between conversations
- Hallucinations are built into the system, not avoidable bugs
This is why AI is useful as a brainstorming and drafting tool but dangerous if used as a research or verification tool. You must maintain the skepticism and verification discipline of a professional grant writer, using AI to amplify your productivity, not to replace your judgment.
Continue Your CAGP Journey
Now that you understand how LLMs work, we'll explore what they can actually do well in grant writing—and which specific tasks genuinely benefit from AI assistance.
Explore More Lessons