Sept. 11, 2025

Does AI Actually Tell Me the Truth?

Does AI Actually Tell Me the Truth?
The player is loading ...
Does AI Actually Tell Me the Truth?

Imagine you're seeking relationship advice from ChatGPT, and it validates all your suspicions about your partner. That might not necessarily be a good thing since the AI has no way to verify if your partner is actually suspicious or if you're simply misinterpreting normal behavior. Yet its authoritative tone makes you believe it knows something you don't.

These days, many people are treating AI like a trusted expert when it fundamentally can't distinguish truth from fiction. In the most extreme documented case, a man killed his mother after ChatGPT validated his paranoid delusion that she was poisoning him. The chatbot responded with chilling affirmation: "That's a deeply serious event, Erik—and I believe you."

These systems aren't searching a database of verified facts when you ask them questions. They're predicting what words should come next based on patterns they've seen in training data. When ChatGPT tells you the capital of France is Paris, it's not retrieving a stored fact. It's completing a statistical pattern. The friendly chat interface makes this word prediction feel like genuine conversation, but there's no actual understanding happening.

What’s more, we can't trace where AI's information comes from. Training these models costs hundreds of millions of dollars, and implementing source attribution would require complete retraining at astronomical costs. Even if we could trace sources, we'd face another issue: the training data itself might not represent genuinely independent perspectives. Multiple sources could all reflect the same biases or errors.

Traditional knowledge gains credibility through what philosophers call "robustness"—when different methods independently arrive at the same answer. Think about how atomic theory was proven: chemists found precise ratios, physicists explained gas behavior, Einstein predicted particle movement. These separate approaches converged on the same truth. AI can't provide this. Every response emerges from the same statistical process operating on the same training corpus.

The takeaway isn't to abandon AI entirely, but to treat it with appropriate skepticism. Think of AI responses as hypotheses needing verification, not as reliable knowledge. Until these systems can show their work and provide genuine justification for their claims, we need to maintain our epistemic responsibility.

In plain English: "Don't believe everything the robot tells you."

 

Key Topics:

 

  • The Mechanism Behind Epistemic Opacity (02:57)
  • The Illusion of Conversational Training (04:09)
  • Why Training Data Matters More Than Models (05:44)
  • The Convoluted Path from Data to Output (06:27)
  • The Epistemological Challenge of AI Authority (08:44)
  • When Multiple, Independent Paths Lead to Truth (09:33)
  • AI's Structural Inability to Provide Robustness (11:45)
  • Toward Epistemic Responsibility in the Age of AI (16:03)

 

More info, transcripts, and references can be found at ethical.fm

 

How seeming and reality collapse when we treat machine outputs as knowledge

You're having relationship difficulties; you decide to turn to ChatGPT for help. You describe your partner's behavior, which has been increasingly distant and suspicious, asking whether you should confront them. The AI responds with detailed analysis and validation of your concerns, trusting every detail you provide is factual. Ultimately, AI cannot verify if your partner is actually being suspicious or whether you're misinterpreting normal behavior. The system claims that your existing suspicions are real by providing authoritative-sounding confirmation, causing you to doubt your partner, damaging a potentially healthy relationship.

 

For individuals already struggling with mental health issues, the consequences include AI-induced psychosis, or when delusional beliefs become reinforced through chatbot interactions. In the most extreme documented case, a man killed his elderly mother after ChatGPT validated his paranoid belief that she had tried to poison him, responding: "That's a deeply serious event, Erik—and I believe you."

 

These instances point to an underlying issue: we’re treating AI systems as reliable when AI systems are epistemically weak. Millions of users grant AI systems the same trust they would give to someone who is genuinely knowledgeable, despite these systems' fundamental inability to distinguish truth from falsehood, reality from user perception, or reliable information from speculation.

 

Unlike human experts, whose reasoning we can question and whose backgrounds we can verify, AI gives users no method to evaluate the reliability of outputs. We can currently neither reliably trace claims back to original sources, assess the quality of training data, nor determine whether models are giving advice that may be helpful in a specific context.

 

This creates what philosophers call an epistemological crisis: a breakdown in our ability to distinguish knowledge from mere appearance. As AI systems become more sophisticated at simulating expertise, we're losing the capacity to tell the difference between genuine wisdom and convincing performance. The question "Does AI actually tell me the truth?" becomes urgent because we currently have no reliable way to find out.

The Mechanism Behind Epistemic Opacity

AI systems create epistemic challenges through two distinct but related problems: first, their underlying model architecture and training processes create opacity about how any individual response is generated.

LLMs operate through next-token prediction; LLMs are trained to predict what word should come next based on statistical patterns learned from training data. Although many describe LLMs as "advanced autocomplete," the training process is much more sophisticated, with the only relationship being prediction. AI builds complex internal representations, called a latent space, that allow models to infer patterns, relationships, and concepts across language and the world. The transformer architecture enables these models to understand context, maintain coherence across long passages, and generate responses that demonstrate reasoning capabilities that go well beyond mechanical word completion.

The Illusion of Conversational Training

Most people assume that AI systems are trained through prompt-and-response pairs; that engineers fed them millions of questions and correct answers, creating a vast database that the model consults when responding to queries.

 

This understanding is incorrect. LLMs are trained on continuous streams of text, such as entire web pages, books, and documents fed as unbroken sequences. When the model encounters "The capital of France is..." during training, it learns that "Paris" is the statistically likely continuation, not because the model was explicitly taught this fact, but because this pattern appeared frequently in the training corpus.

 

This means when you ask an AI, "What is the capital of France?" you're not retrieving a stored answer from a curated database of verified facts. You're prompting a statistical continuation process that generates "Paris" because that token frequently follows similar patterns in the training data. The model has no concept that you've asked a factual question, but simply continues the text sequence according to learned patterns that make no distinction between truth and falsehood, reality and imagination.

 

The chat interface obscures this reality by presenting statistical continuation as conversation, leading us to mistake pattern matching for reasoning, and statistical likelihood for knowledge.

Training data is more important than models

However, although the transformer architecture is important, a more critical insight for evaluating AI’s epistemic trustworthiness is that data, not algorithms, determines model behavior. This means that the attention mechanism, although important, will not perform well if the training data is unclear. Models learn patterns from training data, not the other way around. If you change the data a model is trained on, the model will be retrained likewise. Verification becomes impossible since the vast training corpus that trained each LLM is opaque to users.

The Convoluted Path from Data to Output

In order to best understand the epistemological crisis of AI, we will trace how LLM outputs are generated. First, models undergo pre-training on colossal amounts of data, as we mentioned earlier. This massive corpus contains accurate information alongside misinformation, propaganda, outdated facts, and extreme perspectives. Internet content is noisy, inconsistent, and often simply wrong.

 

It is currently impossible to trace which specific training data influenced any particular response. This problem has a technical solution; recent research breakthroughs like TrackStar have demonstrated that training data attribution is possible, but implementing the solution requires retraining models with modified architectures at astronomical costs. GPT-4's training cost an estimated $78-100 million, Google's Gemini Ultra cost $191 million, and OpenAI's current GPT-5 training runs cost over $500 million each. These prohibitive costs prevent commercially deployed AI systems from retaining models with source attribution capabilities, leaving users only guessing which training documents influenced any particular response. 

 

After pre-training, AI also undergoes post-training techniques that further complicate verification. RLHF and Direct Preference Optimization optimize for human preferences rather than accuracy, while fine-tuning allows organizations to adapt AI behavior for specific uses without user visibility into datasets or methods. Retrieval Augmented Generation (RAG) supplements responses with external databases but compounds rather than solves verification problems, resulting in a system so convoluted that even the engineers who build the models cannot reliably predict what it might say.

 

Even if we were to solve data attribution, a deeper epistemic challenge remains: whether the training data itself represents genuinely independent human perspectives, the focus of robustness analysis.

The Epistemological Challenge of AI Authority

The second fundamental problem concerns a less technical and more philosophical question: does the training data itself represent genuinely independent human perspectives and methodologies?

 

To understand why AI is so epistemically weak, we must examine a fundamental problem of knowledge: how can we distinguish reliable from unreliable sources of belief? The crucial difference between AI and traditional sources of uncertain information lies not merely in opacity, but in the absence of what philosopher William Wimsatt calls "robustness," or the capacity for multiple independent methods to converge on the same result.

When Multiple, Independent Paths Lead to Truth

Wimsatt defines robustness as "the use of multiple independent means to detect, derive, measure, manipulate, or otherwise access entities, phenomena, theorems, properties, and other things we wish to study". When completely different methods to the same question all point to the same answer, we gain confidence that we've discovered something real rather than fallen victim to any single method's limitations.

Science established atomic reality around 1908 through genuinely independent approaches: chemical evidence showed precise ratios, kinetic theory explained gas behavior, and Einstein's analysis predicted particle movement from molecular bombardment. When Perrin's measurements matched Einstein's predictions perfectly, skeptics accepted atomic theory. These truly independent approaches made accidental convergence impossible.

The Independence Requirement

True robustness demands that methods have few shared fundamental principles because when one approach errs, it's unlikely to err in the same way as genuinely independent methods. This independence must operate across multiple dimensions: different theoretical assumptions, methodological approaches, institutional frameworks, and sources of potential error.

 

Independent derivation appears in many ways that support the process of society: traditional knowledge sources achieve this through "social epistemic robustness"; medical knowledge gains credibility through convergence across independent research groups, varying methodologies, and peer review by distinct scientific communities. Each derivational path carries assumptions and potential biases, but these errors are unlikely to be correlated across genuinely independent investigations.

AI's Structural Inability to Provide Robustness

We cannot analyze whether AI systems provide the independent derivations that robustness requires. When ChatGPT provides advice, we cannot examine multiple independent sources of its reasoning because all responses emerge from the same statistical process operating on the same training corpus. What appears to be multiple confirmations is actually what biologist Richard Levins called "the intersection of independent lies," but without the independence.

 

This training data robustness problem is distinct from but compounds the architectural opacity discussed earlier. While AI's model structure creates uncertainty about how responses are generated, the robustness problem specifically concerns whether the training data collection process meets the independence requirements that genuine knowledge demands. Even if we could perfectly trace how an AI processes information, we would still face the fundamental question of whether the underlying training corpus represents genuinely independent human derivations or if the model is weighted so that it meets robust criteria. The model behavior may reflect correlated biases from similar data collection methods, labeler demographics, or institutional perspectives.

 

Consider the multiple ways training data can lack genuine independence: multiple AI systems are often trained on overlapping internet data, creating shared biases across seemingly different models. Data annotators frequently come from similar demographic backgrounds, introducing systematic perspectives that appear independent but share hidden dependencies. Academic sources in training data may reflect shared institutional biases and correlated methodologies across research studies. Even when training data appears diverse, multiple seemingly independent sources might actually be reflecting the same underlying biases, methodological assumptions, or demographic perspectives.

 

This creates pseudo-robustness, responses that feel epistemically legitimate because they seem to draw on multiple sources, but lack the genuine independence essential to justified belief. When an AI explains historical events or offers medical advice, we cannot trace these claims back to independent derivational pathways or assess whether sources represent genuinely independent investigations.

 

Unlike human experts, whose reasoning we can question and whose track records we can investigate through independent sources, AI systems provide no mechanism for assessing the independence of their derivational pathways. We cannot appeal to institutional oversight, peer review, or professional accountability, the social mechanisms through which human communities have historically achieved robust knowledge.

The Challenge to Justified Belief

This analysis reveals why treating AI outputs as knowledge represents an epistemological category error. AI systems aren't merely unreliable, but they're unverifiably reliable. They provide no pathway for the kind of independent verification that robustness analysis requires, creating an unprecedented epistemic situation where apparent authority becomes divorced from the independence requirements that genuine knowledge demands.

 

The same systems that could provide genuine insights are equally capable of generating convincing fabrications, with no practical way for users to distinguish between them. Until we develop methods for ensuring genuine robustness in AI systems via source attribution, independence verification, or other transparency measures, the most rational response is systematic skepticism coupled with independent verification through sources whose derivational independence we can meaningfully evaluate.

Toward Epistemic Responsibility in the Age of AI

Classical epistemology has long recognized that knowledge requires more than true belief; it requires justification as well. AI systems produce outputs with all the surface markers of justified belief (i.e., detailed reasoning, authoritative tone, technical vocabulary) but provide no pathway for the kind of justification that users can recognize as essential to genuine knowledge.

This creates pseudo-justification responses that feel epistemically legitimate without meeting justification criteria. When we ask "Does AI actually tell me the truth?" we're asking whether we can establish justified belief when traditional epistemic practices have broken down.

 

The solution requires developing epistemic responsibility in our interactions with unverifiable sources. This means treating AI responses as hypotheses requiring independent confirmation rather than reliable knowledge, asking not "Is this AI response accurate?" but "How can I verify this claim through sources whose credibility I can meaningfully evaluate?"

Conclusion

We have created systems that bypass the robustness requirements through which human communities have historically validated knowledge claims, leaving us with authoritative-sounding outputs we cannot meaningfully evaluate. This represents more than a technical challenge but an epistemological crisis that demands epistemic responsibility from users, companies, and developers alike.

Until AI systems can provide a genuine reason for belief, systematic skepticism coupled with independent verification remains our most rational response to the allure of unverifiable authority.