From www.cultofmac.com
Many prominent news outlets and social media platforms have opted out of Apple’s AI training data collection via website scraping, according to a new report Thursday.
Apple does it through a new tool called Applebot-Extended, which the iPhone giant introduced less than three months ago. If major content websites opt out of Apple AI scraping, that could have implications for the continuing development of Apple Intelligence.
Some of the biggest websites opt out of Apple AI scraping
Among those blocking Apple’s AI data collection are Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, USA Today network, and Condé Nast, according to a report in Wired. The “cold reception” to the robot crawler — now that such tools help train AI — suggests that bot crawlers have entered a “conflict zone over intellectual property and the future of the web.”
Apple extends an opt-out option
Unlike some content scrapers, Applebot-Extended allows website owners to prevent their data from being used in Apple’s AI training. But even so, the original Applebot can still crawl their sites to improve search functionality. A recent dispute arose on related matters, when Apple denied accusations it uses YouTube videos to train AI without consent.
So it appears some major sites are taking advantage to the opt-out on the AI scraper, which could disadvantage Apple Intelligence. Website owners can block Applebot-Extended by updating their robots.txt file, a long-standing protocol for managing web crawlers.
Holding out for partnerships?
Even so, analysis shows that currently, about 6% to 7% of high-traffic websites are blocking Applebot-Extended, with news and media outlets making up the majority. Applebot-Extended is new enough that some sites simply haven’t addressed its use yet. But it seems that some publishers are taking a strategic approach, potentially withholding data until partnership agreements are in place.
To that end, some media companies, like Condé Nast, have unblocked certain AI bots after forming partnerships with their creators.
AI scraping has its critics
The New York Times criticizes the opt-out nature of these AI data collection tools, arguing that copyright law should protect their content regardless of technical blocking measures.
As Wired’s article discusses, traditionally obscure robots.txt files has become a battleground for AI training data, reflecting broader tensions over intellectual property rights in the age of AI.
And one wonders: If Apple Intelligence soars upon wide release, won’t many major sites clamor to make sure they’re in on the action? More Apple deals with publishers could be in the offing.
[ For more curated Apple news, check out the main news page here]
The post Major websites opt out of Apple’s content scraping to train AI first appeared on www.cultofmac.com