A recent investigation by the Atlantic is raising questions about how some of the world's largest AI systems are trained and who is benefiting from the journalism that fuels them.

The Common Crawl Foundation, which is funded in part by major tech companies, has been collecting and distributing billions of web pages — including paywalled news articles — to AI developers such as OpenAI, Anthropic, and Meta.

What's happening?

According to the Atlantic, the organization's archives contain millions of articles from news organizations. They include the New York Times, the Economist, the New Yorker, and the Atlantic itself.

Common Crawl claims it only collects freely available content. But its scraper bypasses paywall mechanisms by never executing the browser code that checks subscription status.

The foundation's executive director, Rich Skrenta, defended the practice.

"The robots are people too," said Skrenta to the Atlantic, arguing that they should be allowed to "read the books" for free.

FROM OUR PARTNER

Perk up the winter blues with natural, hemp-derived gummies

Camino's hemp-derived gummies naturally support balance and recovery without disrupting your routine, so you can enjoy reliable, consistent dosing without guesswork or habit-forming ingredients.

Flavors like sparkling pear for social events and tropical-burst for recovery deliver a sophisticated, elevated taste experience — and orchard peach for balance offers everyday support for managing stress while staying clear-headed and elevated.

Learn more →

Multiple publishers have requested content removal, but the Atlantic found that archive files haven't been modified since 2016.

Why is AI data scraping concerning?

Quality investigative reporting requires significant resources, and paywalls help sustain them. When AI companies train their models on this content without compensation, it undermines the business models that fund original reporting.

The issue also connects to broader environmental concerns surrounding AI. Training and deploying large language models requires enormous computational power, leading to increased energy consumption and harmful carbon pollution.

The U.N. Environment Programme reported that the data centers housing AI servers have large, polluting footprints. They generate electronic waste, consume large volumes of water, and rely on minerals and rare earth elements.

Do you worry about companies having too much of your personal data?

Absolutely 👍

Sometimes 🤔

Not really 👎

I'm not sure 🤷

Click your choice to see results and speak your mind.

A U.N. Conference on Trade and Development report found that making a 2-kilogram computer can require extracting about 800 kilograms of materials. According to the MIT Technology Review, data centers consume 4.4% of the energy in the U.S.

However, AI technology offers potential environmental benefits when applied thoughtfully. It can optimize more affordable energy systems and improve logistics. AI could support and even accelerate up to 80% of Sustainable Development Goals, per the U.N.

What's being done about AI data scraping?

Organizations are taking action to regulate the use of AI systems.

Beyond Fossil Fuels published a joint statement featuring over 100 organizations detailing their demands. They included phasing out dirty energy sources when powering data centers and keeping AI systems compatible with "planetary boundaries."

Individuals can also do their part by pushing for transparency and ethical practices in their respective organizations.

Online, Redditors expressed their views.

"Swartz was facing 35 years, I wonder what these guys will get," wrote one user, referencing the case of internet activist Aaron Swartz.

"I mean, all the human readers just bypass the paywall too," commented another.

💰Join TCD's exclusive Rewards Club to earn up to $5,000 toward clean upgrades that will help you slash your bills and future-proof your home.

Cool Picks

"You’ll feel so much more confident in your clean-energy decisions."

How the Expedia of solar panels helps homeowners save money and avoid a common trap: 'Giving you confidence in the systems'

Major brand unveils ultra-efficient appliances to lower your bills every month: 'A new benchmark in efficiency and performance'

One innovative company has developed a unique Comfort Plan that helps consumers save on heating and cooling costs.

Revolutionary service helps households make money-saving HVAC upgrades: 'Can save you around 30-50%'

AI giants push back after investigation raises troubling questions: 'The robots are people too'

What's happening?

FROM OUR PARTNER

Perk up the winter blues with natural, hemp-derived gummies

Learn more →

Why is AI data scraping concerning?

TCD Picks » Upway Spotlight

What's being done about AI data scraping?

Cool Picks

How the Expedia of solar panels helps homeowners save money and avoid a common trap: 'Giving you confidence in the systems'

Major brand unveils ultra-efficient appliances to lower your bills every month: 'A new benchmark in efficiency and performance'

Revolutionary service helps households make money-saving HVAC upgrades: 'Can save you around 30-50%'

This innovative company will install solar panels on your roof with no upfront costs — here's how its business model works

What's happening?

FROM OUR PARTNER

Perk up the winter blues with natural, hemp-derived gummies

Learn more →

Why is AI data scraping concerning?

TCD Picks » Upway Spotlight

These city bikes are perfect for daily rides

Tackle any climb with these rugged mountain bikes

Go farther and faster with this road collection

These adorable bikes fold up for convenient storage

Snag limited-time price drops on these top models

What's being done about AI data scraping?

Cool Picks

How the Expedia of solar panels helps homeowners save money and avoid a common trap: 'Giving you confidence in the systems'

Major brand unveils ultra-efficient appliances to lower your bills every month: 'A new benchmark in efficiency and performance'

Revolutionary service helps households make money-saving HVAC upgrades: 'Can save you around 30-50%'

This innovative company will install solar panels on your roof with no upfront costs — here's how its business model works

Follow Us