Reddit is limiting its availability to the Web Archive's Wayback Machine

The Web Archive’s Wayback Machine is the most recent sufferer of Reddit’s crackdown on knowledge entry. The corporate has begun to put new restrictions on what the archive website will be capable to entry in a transfer that can considerably restrict the Wayback Machine’s capacity to protect data from Reddit.

With the change, the Wayback Machine, a undertaking run by the nonprofit Web Archive, will solely be capable to crawl Reddit’s homepage. It’s going to not be capable to entry feedback, subreddit pages, publish particulars, profiles and different knowledge.

The transfer is the most recent step Reddit has taken on its quest to restrict AI corporations’ capacity to make use of its knowledge to coach giant language fashions with out paying licensing fees. It is also a notably totally different stance than the corporate took final yr, when it explicitly stated that it might not restrict “good religion actors,” including the Web Archive. It isn’t clear what precisely has modified since then. Reddit appears to imagine that AI corporations are circumventing its guidelines by scraping knowledge by way of the Wayback Machine. We have reached out to the Web Archive for remark.

Information licensing has turn out to be a big enterprise for Reddit. The corporate has struck multimillion-dollar offers with OpenAI and Google that enable them to make use of Reddit posts to assist prepare their AI fashions. On the identical time, Reddit has taken an more and more hardline stance towards corporations that try to make use of its knowledge with out such preparations. Earlier this yr, the corporate sued Anthropic, alleging it scraped Reddit for years with out permission.

Trending Merchandise