Building AI at the Expense of Users
Is the evolution of AI about to put an end to the rights and autonomy of human content creators?
Today, our host Carter Considine navigates the complex ethical landscape of AI development, and shines a spotlight on often-overlooked power dynamics between AI developers and content creatives. As generative AI models become increasingly reliant on vast, and often unconsented, data sources, we're witnessing a significant shift in how companies like Meta and Substack approach user consent. Despite their attempts to provide opt-in and opt-out options, these measures fall short of solving deeper ethical issues.
Carter also explores the creative resistance against AI's encroachment in different industries, such as the film and music spaces, and how some AI companies, including OpenAI, are striving to respect and compensate artists across diverse creative fields.
Despite making some headway in championing human creators, achieving a comprehensive moral framework that fairly balances the interests of all stakeholders in the AI space remains a formidable challenge. But through this discussion, we’re hoping to challenge conventional thinking. Let’s explore how we can harness the advantages AI offers while safeguarding and upholding those things that only humans can bring to the table.
Key Topics:
AI companies have managed to create extremely large and powerful generative models but at a price: the users, especially creatives, must supply training data, which has been done without consent.
This power imbalance has swayed the public opinion of AI, applying pressure on many companies implementing generative AI to find solutions that address the concerns of AI users. Although these solutions are a step in the right direction, each fails to encompass an ethical solution that satisfies all stakeholders, varying and often conflicting in AI development.
Data fuels AI development
Since the emergence of generative AI, especially chatGPT, the world has become increasingly aware of how AI works under the hood and how it might influence our daily lives.
One notable feature is that the current paradigm of generative AI, such as large language models (LLMs) or diffusion models, requires massive amounts of data to train the model for the model to function out-of-the-box.
These datasets are massive, such as all the comments on Reddit, Twitter, or Meta. Due to a lack of access to data, developers have used open-source data, which is created by scraping the web. However, this technique has left the creators of that data uncomfortable and indignant about their privacy and the inability to control who uses their data for AI training.
One group in particular, creatives, has felt a heavy impact from this new development in technology, resulting in the Hollywood strikes or the development of technologies that protect artists from having their art used in AI model training, such as Nightshade.
Allowing users to opt-in
Companies have started to introduce an opt-in option within their platforms, such as Meta. However, Meta has only made the option available to those within the GDPR region, and it’s unclear if the platform intends to extend this option to North America.
Another example is the popular publishing platform Substack, which includes the option to “block AI training” within the user settings. Turning this setting on programmatically indicates to AI tools, such as ChatGPT and Google Bard, to exclude your published content in their training data. But, if you decide to opt-out, there are trade-offs:
- The “block AI training” setting only works if a base model provider offers this feature
Since base models may be trained by anyone, it’s unclear who may be scraping the web for data to use for training a model. Large generative models require a significant amount of data for pre-training, which nearly always includes open-source datasets. Open-source datasets are created by crawling the web, a process which may or may not be informed of the user’s intent.
- Choosing to block AI training may negatively impact how your articles show up in AI-generated search engines.
If you’re an early career blogger, you cannot take advantage of the virality and wide reach of the internet to gain popularity and new viewers. Some even see their content being included in AI training data as an advantage — for example, Reddit signed a few deals to sell user-generated content for AI training, prompting many content marketers to start posting on Reddit as a strategy to get more brand reach. Nathan Lambert, a machine learning scientist at the Allen Institute for AI (A12) focused on AI for the common good, in light of the Scarlett Johansson case against OpenAI, claims that generative AI increases the power of celebrities rather than detracts from it.
Improving the company-user dynamic
Besides allowing users to opt out of training data, there are still many groups that have already had their data used for training and have not been compensated for their data being used to train base models. Base model companies have made movements on this front, such as OpenAI, which recently announced that they are giving artists opt-in control over their data:
“We believe AI systems should benefit and respect the choices of creators and content owners. We’re continually improving our industry-leading systems to reflect content owner preferences and are dedicated to building products and business models to fuel vibrant ecosystems for creators and publishers.”
Startups, most of which rely on base models to power their products, have also started to make moves. Glow Art, a startup that creates personalized webtoons using AI, makes the generative AI features transparent within their platform to let users decide for themselves if they wish to use AI or not. The moral decision is pushed into the hands of users rather than relying on base model companies to collect consent from all artists, which will take significant time and resources. MythWeaver, a generative AI tool for TTPRG enthusiasts, pays commissions and royalties to artists who help improve their fantasy art generator to balance the power dynamic between those training AI models and those producing new, unique, and valuable data.
From the side of creatives, AI-generated music has already been harnessed as a tool to empower musicians, rather than diminish them. Drake released a popular song that featured the deceased singer Tupac, as well as AI-generated vocals from Snoop Dogg. The song has been streamed more than a million times on various platforms. Grimes is one of the most extreme voices on this front, announcing that anyone could use the AI version of her voice as long as she got a portion of the royalties.
Future development: addressing underlying issues
Although all of these approaches step in the right direction, they do not comprehensively resolve ethical questions about building AI across the board. Building ethical AI still cannot address the varying and potentially conflicting values across stakeholders in AI development.
A recent publication by DeepMind, The Ethics of Advanced AI Assistants, emphasizes the importance of recognizing the tetradic relationship between AI agents, users, developers, and society at large:
“A values-aligned enlightened personal assistant represents part of a truly positive vision for an AI-enabled future. Yet this aspiration also risks creating a situation in which human users are increasingly ‘out of the loop’. After all, if we are in thrall to beneficent assistants, and potentially dependent on them, how can we really be sure that our life is under our own control? In other words, users may receive benefits from the technology at the expense of their own autonomy.
In the past, AI solutions were in favor of companies but are shifting towards benefiting the creative users who create training data. However, users interacting with models also generate data. The developers also have their own opinions on what is right and wrong.
\We still lack the mechanisms to properly handle conflicts of interest and, ultimately, this power seems to still lie within the hands of companies creating AI. Until we can resolve the natural differences between the ethical stances of individuals, we will continue to be in conflict about who decides what is right and wrong and how ethical AI should be built.