Advances in Artificial Intelligence and Machine Learning
Any uptick in new misinformation that casts doubt on COVID-19 mitigation strategies, such as vaccine boosters and masks, could reverse society’s recovery from the pandemic both nationally and globally. This study demonstrates how machine learning language models can automatically generate new COVID-19 and vaccine misinformation that appears fresh and realistic (i.e. human-generated) even to subject matter experts. The study uses the latest version of the GPT model that is public and freely available, GPT-2, and inputs publicly available text collected from social media communities that are known for their high levels of health misinformation. The same team of subject matter experts that classified the original social media data used as input, are then asked to categorize the GPT-2 output without knowing about its automated origin. None of them successfully identified all the synthetic text strings as being a product of the machine model. This presents a clear warning for social media platforms: an unlimited volume of fresh and seemingly human-produced misinformation can be created perpetually on social media using current, off-the-shelf machine learning algorithms that run continually. We then offer a solution: a statistical approach that detects differences in the dynamics of this output as compared to typical human behavior.
Richard Sear, Rhys Leahy, Nicholas Johnson Restrepo, Yonatan Lupu, Neil F. Johnson