June 17, 2025

The Next Frontier is Beyond Human Data

The Next Frontier is Beyond Human Data
The player is loading ...
The Next Frontier is Beyond Human Data

AI has come a long way by learning from us. Most modern systems—from chatbots to code generators—were trained on vast amounts of human-created data. These large language and generative models grew smarter by imitating us, fine-tuned with our feedback and preferences. But now, that strategy is hitting a wall. Our host, Carter Considine, elaborates.

 

Human data is finite. High-quality labeled datasets are expensive and time-consuming to produce. And in complex domains like science or math, even the best human data only goes so far. As AI pushes into harder problems, just feeding it more of what we already know won’t be enough. We need systems that can go beyond imitation.

 

That’s where the “Era of Experience” comes in. Instead of learning from static examples, AI agents can now learn by doing. They interact with environments, test ideas, make mistakes, and adapt—just like humans. This kind of experience-driven learning unlocks new possibilities: discovering scientific laws, exploring novel strategies, and solving problems that humans haven’t encountered.

 

But shifting to experience isn’t just a technical upgrade—it’s a paradigm shift. These agents will operate continuously, reason differently, and pursue goals based on real-world outcomes instead of human-written rubrics. They’ll need new kinds of rewards, tools, and safety mechanisms to stay aligned.

 

AI trained only on human data can’t lead—it can only follow. Experience flips that script. It empowers systems to generate new knowledge, test their own ideas, and improve autonomously. The sooner we embrace this shift, the faster we’ll move from imitation to true innovation.

 

 

Key Topics:

  • The Dramatic Progress of AI (00:25)
  • The Limits of Supervised Learning (01:59)
  • A New Era: Learning From Experience (04:32)
  • Sutton’s Legacy and the Reinforcement Learning Mindset (09:43)
  • Why Human Data Still Matters (11:55)
  • Conclusion (13:25)

 

 

More info, transcripts, and references can be found at ethical.fm

AI has advanced dramatically by learning directly from humans. For the past decade, supervised learning, or using human-curated data and guidance, has driven most progress in the field. Generative models, such as LLMs and diffusion models, are prime examples: each large base model was trained, unsupervised, on colossal collections of human-generated text and then later became much more impactful and useful once fine-tuned with human feedback. However, human-curated data is limited and time-consuming to create, especially high-quality data, which is essential for AI to perform well. But what about data that hasn’t been touched by humans, or data purely created by machine learning models? 

 

As Richard Sutton and his co-author, David Silver, argue in their paper The Era of Experience, in crucial fields like mathematics, coding, and science, the knowledge extracted from human data is rapidly approaching a ceiling. The best quality human-generated data has largely been used up, and each increment of improvement from more data is becoming smaller. The pace of AI progress based solely on human examples is clearly slowing, which suggests that continuing to scale up curated human data is yielding fewer returns, which means we must turn to alternative approaches to reach the amazing level of understanding of AGI.

The Limits of Supervised Learning

Supervised learning refers to any AI training that relies on human-provided data or guidance, including traditional labeled datasets but also human demonstrations and preferences used to shape models, such as in reinforcement learning with human feedback (RLHF). The strength of supervised learning is that it provides clear training signals, allowing models to learn tasks by imitating high-quality examples. However, this very reliance on human curation is also its greatest limitation.

 

Creating high-quality training data is a labor-intensive and time-consuming process. Human annotators must label examples, provide demonstrations, or score responses, often with detailed guidelines to ensure consistency. High-quality, curated data becomes challenging in complex domains like law, medicine, or mathematics, where domain experts are needed to produce or validate data. As a result, high-quality datasets are expensive to produce and often limited in scope and scale. Worse, the best human-generated datasets have already been collected and used. There are diminishing returns from trying to squeeze more performance out of existing datasets.

 

Moreover, supervised learning cannot easily generalize to situations that are novel, ambiguous, or lacking prior human examples. If a model encounters a task or environment that falls outside the distribution of its training data, it is likely to perform poorly or fail altogether. By design, supervised learning captures what humans already know. It does not naturally explore or extrapolate beyond those boundaries.

 

These reasons are likely why Sutton and Silver argue that supervised learning cannot, on its own, produce superhuman intelligence. An AI that imitates human inputs will, inherently, at best, match human performance. Current AI models cannot make discoveries that exceed human knowledge if those discoveries are not already present in the data. Breakthroughs like new theorems or scientific hypotheses, by definition, do not exist in prior human datasets and therefore cannot be learned through imitation alone. Future progress will require new ways of learning.

A New Era: Learning from Experience

Sutton describes a shift they call the Era of Experience, identifying three earlier stages of AI development, each defined by the primary source of learning. The first was the Era of Simulation, which characterized the 2010s. During that time, algorithms learned through interactions with simulations or games. DeepMind’s AlphaGo is a good example, which was trained through millions of simulated Go games. The second phase was the Era of Human Data, which lasted roughly from 2020 to 2023. Systems in this phase learned from vast collections of human-generated content. GPT-3, which helped launch the age of massive text-based training, is a central example. The third phase is now emerging: the Era of Experience. In this new phase, AI agents will improve their performance primarily by interacting with the world directly, rather than learning from static human-created datasets.

What is the Era of Experience? In this new era, AI agents actively generate their own learning data. Rather than relying on pre-prepared datasets, these agents learn by interacting with their environments. This process resembles how animals or humans acquire knowledge through life experiences. Sutton believes that experience-driven learning will allow AI systems to keep improving without needing new external data. As these agents become more capable, they can explore increasingly complex environments. This exploration generates new data, creating a continuous loop of improvement. Sutton and Silver emphasize that static methods for generating data will be outpaced quickly. In contrast, experience generates data that grows in value as the agent becomes more intelligent. Over time, the amount and quality of data generated through experience may exceed what is available from human sources.

One of the key characteristics of this shift is that AI agents will begin to exist within a continuous stream of experience, rather than operating in isolated tasks or sessions. This mirrors how humans accumulate knowledge and adapt behavior over long periods. For instance, an AI health coach could monitor a person’s sleep, nutrition, and activity across many months, adjusting its recommendations as patterns emerge. This long-term learning and adjustment is fundamentally different from today’s stateless chatbots that reset after every session.

In addition to experiencing time in continuous streams, agents in this new paradigm will begin to take autonomous actions in digital and physical environments. Instead of merely processing inputs and generating outputs via text, AI systems will interact with the world through tools, sensors, and APIs. They might execute code, browse the web, or manipulate physical objects via robotics. This kind of interaction allows the AI to observe the consequences of its actions, which becomes a new source of learning.

Crucially, the rewards for these actions will be based not on human ratings or curated preferences, but on real-world outcomes. Rather than relying on a person to decide whether a response is good, the system might measure success in terms of improved health metrics, successful experiments, or higher energy efficiency. For example, a scientific agent attempting to discover a new compound might reward itself based on empirical signals, such as molecular stability or simulation accuracy, rather than needing a human to approve each result. Sutton and Silver describe this as a shift from human-prejudged rewards to grounded signals from the environment.

Planning and reasoning also change under this new framework. In the Era of Human Data, AI reasoning closely mimicked how people think, often emulating human logic or chains of thought. But Sutton proposes that agents should develop entirely new methods of reasoning, based not on language or imitation, but on effectiveness. For instance, an agent might invent a novel way of organizing data or forming hypotheses that no human has considered, yet which proves useful in achieving its goals. AlphaProof, the DeepMind system that earned a silver medal at the International Mathematical Olympiad, already hints at this future. It learned to construct mathematical proofs through its own exploration, rather than by mimicking how human mathematicians work. The success of AlphaProof shows that machines can begin to reason differently from humans, and perhaps even more effectively when guided by experience.

Sutton’s Legacy and the Reinforcement Learning Mindset

Richard Sutton has long been a central figure in reinforcement learning (RL), the subfield of AI that focuses on learning from experience through trial and error. His earlier work, including the influential textbook “Reinforcement Learning: An Introduction,” laid the foundation for how agents can learn by interacting with environments and adjusting behavior based on rewards. In Sutton’s view, AI systems that rely on static data miss the opportunity to discover truly novel behaviors and strategies. Instead, he argues for agents that can learn continuously, adaptively, and autonomously.

Despite this vision, reinforcement learning has historically been difficult to implement at scale outside of games and simulations. Supervised learning, which relies on human-labeled data and human-dictated, known targets, has been more practical and predictable in many commercial applications. It is easier to build a dataset, run experiments, and iterate quickly in supervised settings. Reinforcement learning, by contrast, often requires vast computational resources, careful reward design, and long training times. Moreover, RL agents are more prone to unstable behavior and unintended outcomes when their reward functions are poorly specified.

Nonetheless, Sutton believes these challenges can be overcome. He argues that recent advances in algorithm design, compute power, and agent tooling are beginning to make reinforcement learning more viable in real-world scenarios. Projects like AlphaZero and AlphaFold have demonstrated that self-learning agents can exceed human performance when provided with the right incentives and enough time to learn from their own actions. Sutton envisions a future in which all aspects of intelligent behavior, such as reasoning, memory, creativity, and planning, are learned through extended interaction with the world.

Why Human Data Still Matters

Although Sutton advocates for a future driven by experience, it is important to recognize the essential role of human data. AI models fundamentally benchmark their performance against human standards. AI systems are evaluated by how well they align with human expectations, perform human tasks, and reason in ways that are meaningful to us. This requires human-defined benchmarks and reference points.

 

Even the concept of “data” is a human one. We define what is relevant, interpret what is meaningful, and build the environments in data, which AI systems learn. Even as agents begin to generate their own experience, that experience is shaped by human-designed goals, constraints, and feedback loops. No matter how autonomous a system becomes, AI cannot escape the human frame of reference.

 

In this sense, experience does not replace human knowledge; it extends it. An AI that discovers a new compound or proves a novel theorem is doing so within the conceptual scaffolding of human science. We still evaluate AI contributions by our own standards. Thus, the Era of Experience should not be seen as a break from human-centered AI, but as an evolution toward deeper, open-ended, and, perhaps, novel learning.

Conclusion

The Era of Experience marks a pivotal shift in how we develop artificial intelligence. Sutton and Silver propose that future breakthroughs will depend not on larger datasets or more human examples, but on agents that learn from their own interactions with the world. These agents will act over long time horizons, pursue grounded rewards, reason in unfamiliar ways, and generate new knowledge through trial and feedback. Experience will become the dominant source of intelligence, surpassing static data in both scale and depth.

This shift is ambitious and will face technical, ethical, and philosophical challenges. Ensuring that autonomous agents remain aligned with human values, avoid harmful behavior, and operate safely in complex environments will require innovation in reward design, interpretability, and oversight. However, the potential upside is profound. If we can successfully build machines that learn as we do, through curiosity, experimentation, and adaptation, then we may finally break through the current limits of artificial intelligence. The future of AI may lie not in mimicking humanity, but in learning as living beings learn: through experience.