
Wikipedia already shows signs of huge AI input
Serene Lee/SOPA Images/LightRocket via Getty Images
The impact of AI chatbots on online content creation has created a significant shift in how we perceive and trust information. The question arises: how will future generations view this transition? There is a growing urgency to preserve “uncontaminated” data from the pre-AI era, while others argue that documenting AI-generated content is essential for studying the evolution of chatbots.
Entrepreneur and former CTO of The New York Times and The Wall Street Journal, Rajiv Pant, expresses concerns about the risk AI poses to historical information, including news stories. Pant emphasizes the need to differentiate between human-authored content and AI-generated material on a large scale to address challenges in journalism, legal processes, and scientific research.
John Graham-Cumming, from cybersecurity firm Cloudflare, likens pre-2022 information to low-background steel – prized for its purity. He has launched lowbackgroundsteel.ai to archive non-AI contaminated data sources, such as a snapshot of Wikipedia from August 2022, revealing the significant AI influence on the platform.
On the other hand, Mark Graham, overseeing the Wayback Machine at the Internet Archive, proposes archiving AI outputs instead of focusing solely on pre-AI internet content. Graham plans to capture and store responses from chatbots by asking 1000 daily questions, using AI to document the evolving AI-generated content for future analysis.
Graham-Cumming emphasizes that preserving human-produced data can enhance AI models by avoiding “model collapse” caused by low-quality AI inputs. He believes that AI advancements can lead to groundbreaking discoveries, shaping the future of information dissemination.
As the digital landscape continues to evolve, the debate over archiving AI-generated content remains essential for understanding the intersection of technology and historical preservation.
Topics: