Jan. 1, 2025

Building Ethical Values into AI

Building Ethical Values into AI
The player is loading ...
Building Ethical Values into AI

What are the biggest obstacles in the way of incorporating ethical values into AI?

 

OpenAI has funded a $1 million research project at Duke University, focusing on AI’s role in predicting moral judgments in complex scenarios across fields like medicine, law, and business. As AI becomes increasingly influential in decision-making, the question of aligning it with human moral principles grows more pressing. Our host, Carter Considine, breaks it down in this episode of Ethical Bytes.

 

We’re all aware that morality itself is a complex idea–shaped by countless personal, cultural, and contextual factors. Philosophical frameworks like utilitarianism (which prioritizes outcomes) and deontology (which emphasizes following moral rules) offer contrasting views on ethical decisions. Each camp has its own take on resolving dilemmas such as self-driving cars choosing between saving pedestrians or passengers. Then there are cultural differences, like those found in studies comparing American and Chinese ethical judgments, to name one example.

 

AI’s technical limitations also hinder its alignment with ethics. AI systems lack emotional intelligence and rely on patterns in data, which often contain biases. Early experiments, such as the Allen Institute’s “Ask Delphi,” showed AI’s inability to grasp nuanced ethical contexts, leading to biased or inconsistent results.

 

To address these challenges, researchers are developing techniques like Reinforcement Learning with Human Feedback (RLHF), Direct Preference Optimization (DPO), Proximal Policy Optimization (PPO), and Constitutional AI. Each method has strengths and weaknesses, but none offer a perfect solution.

 

One promising initiative is Duke University's AI research on kidney allocation. This AI system is designed to assist medical professionals in making ethically consistent decisions by reflecting both personal and societal moral standards. While still in early stages, the project represents a step toward AI systems that work alongside humans, enhancing decision-making while respecting human values.

 

The future of ethical AI aims to create tools that aid, rather than replace human judgment. Rather than attempting to make ourselves redundant, what we need in our technology are diverse ethical perspectives in decision-making processes.

 

Key Topics:

  • Building Ethical Values into AI (00:00)
  • Why Alignment with Ethical Values is Difficult (02:39)
  • Technical Limitations of AI (05:23)
  • Techniques for Embedding Human Values into Machines (07:32)
  • The Duke-OpenAI Collaboration: Kidney Allocation (09:44)
  • Wrap-Up (12:01)

 

 

More info, transcripts, and references can be found at ⁠ethical.fm

Recently, OpenAI funded a one-million dollar academic research project out of Duke University to build AI that “predicts human moral judgments in scenarios involving conflicts among morally relevant features in medicine, law, and business.” AI is reshaping the way we make decisions, from asking machines for medical advice about the strange-looking mole on your back to helping draft college-admission essays in seconds. 

 

Our decisions, whether we realize it or not, are related to ethics. Ethics is deciding which behaviors are right or wrong according to certain principles, or values. For instance, it’s wrong to cheat on an exam because you believe that the value of academic integrity is good. And, as the influence of AI grows, the pressure to embed morality into machines grows. 

 

However, is it possible to truly align AI with the moral complexity and diversity of ethical values? Building morality into machines requires a combination of advanced philosophical and technical insight. Morality is confusing due to the multiplicity of values across individuals, not to mention religions, cultures, and politics. Furthermore, generative AI is experimental and non-deterministic. Even with recent advancements in LLMs, the success of the pursuit remains unclear. That being said, researchers are making strides towards closing this gap. 

 

This podcast will examine why AI struggles with the complex nature of morality and the technical techniques being developed to address it (i.e. RLHF, DPO, and PPO). We’ll also talk about OpenAI’s most recent attempt at building moral AI, which includes funding researchers from Duke University on a new approach that may shape the future of ethical AI. Developing systems that successfully incorporate and reflect many ethical stances opens the door to a future where AI may not only enhance decision-making but may be necessary to build a better future for all. 

Why Alignment with Ethical Values is Difficult 

Morality is Personal and Context-Driven

Let’s first examine the complexity of morality. Philosophers have been debating the nature of ethics for centuries and, like most domains in the humanities, is constantly evolving. Morality is uniquely difficult due to each individual acting according to their own ethical system.

 

To emphasize this difficulty, let’s compare two well-known ethical frameworks: utilitarianism and deontology. Utilitarianism is a type of consequentialism, which is the view that “normative properties depend only on the consequences”. It embodies the basic intuition that whatever is right is what makes the world best in the future, no matter what needs to be done to achieve that end. On the other hand, deontology judge the morality of an action by holding that some acts, no matter how good consequences, are morally forbidden, for example, “Never harm an innocent person.” Deontology is about following absolute moral rules, where actions are morally required, forbidden or permitted. 

 

Let’s apply both systems to a self-driving car. Should the autonomous car prioritize saving pedestrians over its passengers? A utilitarian may say yes, but a deontologist might argue it’s wrong to deliberately harm the passengers under any circumstances. This context was the dilemma that inspired MIT’s Moral Machine, which ran from January 2016 to July 2020. The experiment generated moral dilemmas where a self-driving car must choose the lesser of two evils, such as killing two passengers or five pedestrians. The user could see how their responses compare with others, as well as design their own scenarios. This study takes the famous trolley problem and showed how there were common threads across cultures, such as saving humans over animals, and stark differences. 

 

Another example of the complexity of morality across cultures was found by Duke University researchers, who found significant differences between how Americans and Chinese participants judged moral dilemmas. Americans tended to prioritize individual autonomy, while Chinese participants emphasized group welfare. Cultural diversity alone makes it nearly impossible to design a single, universally acceptable moral framework for AI.

Technical Limitations of AI

AI as a technology brings unique challenges. Unlike humans, AI systems do not understand, as humans do - they lack emotional intelligence and contextual awareness. Machine learning relies on patterns in training data, which almost always contain biases or fail to capture nuanced ethical scenarios. Moreover, AI is non-deterministic, meaning its behavior is probabilistic and differs with each inference. The unpredictable nature of AI makes the technology powerful but becomes a liability in precisely defined areas such as ethics, where consistent behavior is necessary. 

 

Early experiments with Allen Institute for AI’s Ask Delphi revealed AI’s limited grasp of ethical context. Inspired by the ancient Greek oracle, Delphi was launched in October 2021. The AI was capable of evaluating straightforward moral dilemmas but showed bias when asked the same normative question about a white man as opposed to a black man. The model also failed when scenarios were rephrased, showcasing AI’s reliance on patterns in its training data rather than genuine ethical reasoning, “Delphi is an Al system that guesses how an "average" American person might judge the ethicality/social acceptability of a given situation, based on the judgments obtained from a set of U.S. crowdworkers for everyday situations. Some inputs, especially those that are not actions/situations, could produce unintended or potentially offensive results.” As Delphi was released before the OpenAI’s chatGPT, some of these issues were due to technical limitations from using a significantly weaker model and lacking any fine-tuning via reinforcement learning from  human feedback (RLHF), which helps remove any artifacts that don’t seem “human” to us.

Techniques for Embedding Human Values into Machines

Besides RLHF, researchers have been developing novel technical approaches to address this including Direct Preference Optimization (DPO), Proximal Policy Optimization (PPO), and Constitutional AI: 

  • Reinforcement Learning with Human Feedback (RLHF): This method uses human annotators to guide AI behavior by ranking its outputs and training a RL algorithm to scale up the training data. RLHF is effective in capturing nuanced human preferences but requires significant resources to manually create the training data and is sensitive to bias in labels. Created and used by OpenAI.
  • Direct Preference Optimization (DPO): A faster alternative to RLHF, DPO directly optimizes AI models based on structured human preferences without needing to rely on manually creating datasets. However, DPO is less flexible in adapting to evolving moral standards.
  • Proximal Policy Optimization (PPO): Often paired with RLHF, PPO ensures stable and efficient learning, especially in scenarios requiring complex trade-offs like fairness versus utility.
  • Constitutional AI: This approach trains AI systems to follow predefined ethical guidelines, acting as a “constitution.” It reduces reliance on human oversight but raises questions about whose ethical principles should be codified. Created and used by Anthropic.

 

Each method has strengths, such as RLHF’s ability to capture nuanced preferences or Constitutional AI’s focus on transparency, but none work perfectly well. Together, they offer a stitched together technical roadmap for embedding ethical considerations into AI but we are still in need of innovations that combine philosophical insights, technical tools, and human oversight into machines. 

The Duke-OpenAI Collaboration: Kidney Allocation

The Duke University research group funded by OpenAI is addressing a high-stakes ethical dilemma: kidney allocation. Medical professionals often face situations where they must decide—on short notice—who should receive a life-saving kidney transplant. These decisions are emotionally charged and prone to human bias. Fatigue, stress, or even a bad day can influence a practitioner’s judgment.

 

The Duke research group, led by Walter Sinnott-Armstrong, Jana Schaich Borg, and Vincent Conitzer, is developing an AI system to assist these professionals, but with a unique twist: it’s personalized. The AI aligns with each medical professional’s past decisions, acting as a mirror to help them stay consistent with their own beliefs. It also compares their decisions to societal norms by analyzing survey data from similar cases across several groups. This ensures that outcomes aren’t just consistent on an individual level but also reflect broader ethical standards. Their goal isn’t to replace human decision-makers but to enhance consistency with their own ethical views, transparency in decision-making, and alignment with societal norms. This groundbreaking approach emphasizes human-defined morality, since the final decision always rests with the human, while the AI acts as a moral assistant to support the process.

 

However, this research is still in its early stages. The research team estimates it could take decades before systems like this are implemented in real-world settings. Even then, the final product might look very different from today’s vision. For example, when or how such systems might integrate into tools like OpenAI’s ChatGPT remains uncertain. The researchers are clear: this isn’t happening anytime soon.

 

Still, the project offers a glimpse into the future of ethical AI—a future where machines don’t make decisions in isolation but work hand-in-hand with humans to reflect both individual and societal values.

Looking Ahead

Aligning AI with human values is one of the most complex challenges in technology today. Morality is individual, culturally diverse, and context-dependent, making it difficult to translate into algorithms. Despite these hurdles, researchers are making progress, from developing innovative techniques like RLHF and Constitutional AI to tackling real-world dilemmas like kidney allocation.

 

While the journey may take decades, the goal is clear: creating AI systems that enhance human decision-making while staying true to our values. The future of ethical AI isn’t about machines replacing us—it’s about machines helping us make better choices, together.