AI vs. AI: The upcoming arms race against disinformation online

- Generative AI offers new potential for more human-like, scalable content moderation, but also introduces new risks.
- For instance, the same tools that enable synthetic propaganda and deepfakes could also improve enforcement and reduce our reliance on human moderators to remove potentially traumatic content.
- The future of online safety hinges on whether generative AI becomes a more potent weapon for offense or defense in the information ecosystem.
The new era of generative AI constitutes an extraordinary moment given the growth of new technologies that, for the first time, are beginning to provide the kind of scale and supple understanding of human language and communication that could be adequate to the size of the problems on global platforms. These technologies remain flawed at present, but they are clearly rapidly improving in some cases. Some industry experts who have spent years tackling seemingly insurmountable problems are beginning to see real promise in using generative AI for content moderation.
Machine-learning technologies have been used for years now in content moderation, but they mostly have been doing complex pattern matching. In effect, generative AI allows for an approach to interpreting user-generated messages that is more sophisticated than prior waves of more traditional AI.
“We have machines now that can do something that functionally is equivalent to a human reading a document and responding to what it said,” says Dave Willner, a longtime industry expert who led teams at Meta and who was the head of trust and safety at OpenAI. He explains that this “means we have a machine that can directly address the core activity that a human moderator is doing instead of merely producing the result.”
Better AI tools also may alleviate long-running concerns about the psychologically damaging labor that, to date, has been performed by (often underpaid) human moderators, who regularly must stare at hundreds of images of violence, sexual abuse, and general trauma-inducing content.
They also may allow companies to enforce rules in emerging spaces such as social virtual reality platforms, where existing content moderation systems may not work or even be appropriate.
Generative AI creates almost unfathomable possibilities for confounding and disrupting human speech and expression online, for complex disinformation campaigns, and for general strategic manipulation of societies. It also creates new tools for regulating and managing the communications space. Generative AI may create chaos online, allowing millions of fake and manipulated images, videos, stories, and narratives to proliferate. Yet the same technologies may create new systems for better content moderation, allowing more context-sensitive work to be done at scale, down to local geographies, and across custom instances of decentralized social platforms.
While we might fear the potential chaos unleashed by AI in the short term, over the long run AI tools may finally provide the scale and subtlety that human moderators have long needed to be in a better position to get a handle on problems.
AI’s offensive capabilities
There is a strong chance that generative AI will create headaches across many domains, making high-quality content moderation even harder in some respects. For example, the ability to detect CSAM [child sexual abuse material] is becoming more difficult as bad actors create synthetic images and videos, which can avoid automated detection technologies.
Synthetic images often cannot be matched to existing known images that experts and company trust and safety teams share in databases, in order to catch criminals. It will be an arms race, across all manner of fields and areas. Generative AI will lower the cost for disinformation and deception efforts, and it is becoming easy to create automated accounts, well-crafted, auto-generated messages, synthetic images, and deep fake videos.
As noted by Stanford’s Alex Stamos, a longtime industry security expert, a central worry is that the cost of creating troll farms and other forms of information warfare and organized scams will now plummet, making it possible for authoritarian governments and criminals alike to be able to carry out disinformation campaigns rather cheaply.
Put concretely, this means that “what once took a team of 20 to 40 people working out of Russia or China to create 100,000 pieces of English-language propaganda is now possible with a single person using freely accessible generative AI tools.” Security researchers have expressed grave concerns about what nation states could do with these tools, which include getting AI models to carry out complex, multistep advertising campaigns and algorithmic “gaming,” where machines might be able to grasp viral levers better than humans can by interpreting vast digital signals.
Several early crisis cases have illustrated the potential for such mayhem. For example, in Slovakia, an AI-generated deep-fake audio tape of a presidential candidate plotting electoral tampering and fraud may have helped to swing the election to a more pro-Russian candidate. The details of that 2023 situation remain murky, but the potential for mischief became vivid for many around the world. Similarly, in January 2024, a robocall of a fake voice recording purporting to be that of President Joe Biden was broadcast to potential New Hampshire voters urging them not to vote in the primary election. That debunked audio, which was also AI-generated, turned out to have been created by a political operative, who faced subsequent criminal charges.
In all such cases, experts have worried that, although such disinformation might ultimately be corrected, there might not be sufficient time to get the word out, especially if AI fakes are disseminated very close to a key electoral moment — a kind of last-minute surprise.
Some social platforms have rules prohibiting deceptive “synthetic media,” but new types of AI content have also proven difficult for detection algorithms and for content moderation teams trying to enforce policy. For example, as The Wall Street Journal memorably put it in a scan of social media in 2024, one can find things such as “Mickey Mouse drinking a beer,” “SpongeBob in Nazi garb,” and “Donald Trump and Kamala Harris kissing.” Given the satirical nature (and sometimes political cartoon-like qualities) of such content, it is unclear that companies, honoring legitimate political expression, should necessarily remove all such items. AI may provide new dimensions to the range of political speech, even democratizing the tools for satire and citizen voice and campaigning.
Because social platforms have not trained their detection models on sufficient non-English-language and non-Western data, much of the AI-fakes detection technology is still very shaky, if not useless, in the Global South and across the developing world. This means that the harms of AI may accrue more for poorer countries in terms of social media damaging elections, fueling mob violence, or leading to bad public health outcomes.
AI’s defensive potential
As Jason Metheny, the president and CEO of RAND Corporation, has noted, there is a major debate in the research community about whether AI will be, on balance, good for offense or defense in terms of keeping the digital sphere safe and trustworthy. Sure, there are risks with AI, but can it also be a defender of the information environment?
Improving what is done “behind the scenes” — flagging, classifying, downranking, organizing platforms’ content — with generative AI is already taking place. There appears to be strong industry support among trust and safety professionals to embrace these new tools, albeit with a careful eye on potential new risks. In terms of reviewing and rating content, OpenAI has reported that its models perform as well as humans with light training, although expert humans still outperform models. Meta’s security team has noted that sophisticated bad actors who engage in what they call “coordinated inauthentic behavior,” or CIB, still struggle in “building and engaging authentic audiences they seek to influence.”
The challenge for disinformers, misinformers, and scammers is not primarily content creation but audience creation. “While generative AI does pose challenges for defenders,” Meta’s security team wrote in late 2023, “at this time we have not seen evidence that it will upend our industry’s efforts to counter covert influence operations — and it’s simultaneously helping to detect and stop the spread of potentially harmful content.” How far such confidence extends through the industry, or whether the alleged advantage to defense will continue even for the most well-resourced companies, is unknowable.
There are domains such as video livestreaming where companies now must rely on AI technologies to achieve any kind of timely action, before events can spin out of control. The shooter in the Christ Church, New Zealand, massacre in 2019 infamously livestreamed on Facebook his killing of dozens of Muslims in two mosques. That gruesome video was then quickly copied across the internet. As part of the immediate aftermath, there was worldwide attention to the lack of safeguards around livestreaming video technologies. Facing pressure, social platform and livestreaming video companies poured resources into early detection algorithms; governments and civil society have also formed working groups to coordinate quick responses across platforms.
By 2022, when a somewhat parallel atrocity took place in Buffalo, New York, involving a racially motivated mass shooting, the perpetrator’s livestream was quickly detected and shut down. Companies can now detect many types of incidents involving violence in a few seconds, although some complex situations, such as self-harm and suicide, remain difficult to address through automation.
What would be new, albeit highly ethically complicated, is a fleet of bots that are operating centrally on behalf of the large platforms, in police, referee, or service roles.
Further, there are business decisions, which come with trade-offs, that companies could make to create barriers to using the technology. These include requiring accounts to have established histories and sufficient numbers of subscribers — making it less likely that a deranged person could just join and press the streaming button.
The AI industry is generally predicting the broad use of AI agents across economic and social life in the years ahead. So much user-generated content is, probabilistically speaking, in the “gray zone” — it’s not clearly violating and in need of removal or harsher action.
There are increasing cries from both the political right and left for opposite treatments. On the right: Don’t touch speech; it should be free. On the left: Pay more attention to harmful speech. These countervailing trends and pressures push companies to the middle, in the direction of speech-preserving but responsive and activist measures. Therefore, one obvious tactic that could scale would be for platforms to engage with users leveraging LLM-powered chatbots, deploying a kind of interactive agent “referee” who might start conversations with users, warn them, attempt to de-escalate and turn down the temperature on heated discussions, and ask about motives and intentions.
There have been experiments with AI moderators within various smaller and decentralized platforms, such as Discord and Reddit, where community moderators can use this type of software. OpenAI is developing LLMs for content moderation. Meta/Facebook has rolled out engaging bots, parodying famous personas such as Jane Austen; Snapchat has offered a chatbot to individual users. Researchers have used LLMs to engage with people who believe in conspiracy theories, and there is promising early evidence that generative AI may be able to make a difference in steering people away from such false beliefs. Bots have been a part of social media for years, often deployed by users to do everything from entertain to deceive.
What would be new, albeit highly ethically complicated, is a fleet of bots that are operating centrally on behalf of the large platforms, in police, referee, or service roles. As scholars have suggested, positive, prosocial uses of generative AI on social media could include mediation, information assistance, counterspeech, and moderation of discourse among users. There are obvious risks, as AI technologies might say or do things that cannot be predicted in advance. They are still prone to error for complex tasks such as fact-checking. They also often carry biases that are a product of their training data — racial or gender biases, influenced by all kinds of stereotypes because of training their models on internet data of every kind.
Still, research on automated moderation tools shows they may have the potential to help users follow rules and guidelines, even as platforms must pay careful attention to humanizing the systems and user perceptions of justice and fairness. What is not yet clear is how the public would respond to the widespread use of such consumer-facing, company-supported AI bot technologies that attempt to enforce rules or encourage rule-following.