New internet standard aims to reduce AI overreach

From MySpace profiles to YouTube vlogs, the human touch has been the cornerstone of the internet since its conception. AI’s evolution risks the loss of the web's intricate ecosystem as we know it.

A new, human-centered website policy could make bot generation harder. The Really Simple Licensing (RSL) Standard was launched on September 10th and released official specifications on December 10, aiming to combat the growing threat of AI web crawlers by creating consistent and accessible guidelines for how AI companies can use web publishers' content.

With support from companies such as Reddit, Medium, Quora and Cloudflare, the RSL standard allows website owners to create specific licensing and royalty terms for unchecked AI web crawlers accessing their site. However, the standard’s effectiveness ultimately depends on backing regulation: online publishers’ ability to enforce penalties on AI companies flouting the standard.

Web crawlers are bots that collect data from websites to train and update AI models that are “good enough to feel deceptive — and that’s what triggers something in us,” said Alan Rubel, an associate professor at the University of Wisconsin-Madison’s Information School.

AI has disrupted the existing content ecosystem, according to Doug Leeds, the co-founder of RSL Standard. Previously, search engines directed users to websites, generating the traffic that content creators needed to earn revenue. However, as AI now answers user questions directly, it bypasses original websites, depriving creators of visitors and income.

"It breaks the ecosystem because… [creators] do not get paid," Leeds said. "And if [creators] do not get paid, they cannot keep making [content]. Without content, then sooner or later, AI is not going to have anything to publish or any answers to get."

Often, web crawlers take original content from websites and use it to feed AI models without compensating or obtaining consent from web publishers, leading to prominent AI companies, such as Anthropic, which were sued for copyright infringement in a data scraping lawsuit and settled for $1.5 billion.

But many individual web publishers lack the resources to litigate against large companies and teams of lawyers, hence the proposal of the RSL standard.

Founders say the standard provides an efficient way of defining what AI crawlers can access, and a concrete licensing guaranteeing publishers compensation for the use of their original content.

What does RSL do differently?

Building upon the work of the robots.txt protocol, a well-known standard used to set boundaries for regular web crawlers, the RSL standard allows web publishers to embed machine-readable licenses directly into their sites, defining the conditions under which AI systems can access, use, and train on their content.

Rubel said AI is capable of mimicking human-created content, which reinforces the need for systems like RSL that respect creators’ rights.

Enjoy what you're reading? Get content from The Daily Cardinal delivered to your inbox

“I think there’ll be some areas where AI becomes totally ingrained and people don’t think about it much, but I think we’re going to see lots of areas of substantial conflict,” Rubel said.

Economically, RSL functions much like music royalty systems. The non-profit organization American Society of Composers, Authors, and Publishers licenses music on behalf of songwriters and publishers, collects payments from users and distributes royalties, simplifying otherwise time-consuming individual negotiations.

RSL models this approach for online content, ensuring creators get paid when their work is used by AI. AI companies could pay the RSL Collective for access to publishers’ material, and RSL would distribute that money to participating rights holders.

RSL’s success depends on copyright enforcement

Rubel said even with the RSL standard, stronger intellectual property laws still need to be passed.

“There are a couple of different concerns that people raise [about] whether AI companies using intellectual property to create new products is fair, whether it should be compensated,” Rubel said. “For the RSL standard to work in that realm, you need some copyright enforcement in the first place. Otherwise, there's not a lot of incentive to use it.”

Together, these developments represent a growing push to ensure the internet’s creative foundation remains sustainable in the age of AI.

Congress may soon reinforce these protections. The TRAIN Act, a bill introduced to the Senate and referred to the Committee on the Judiciary, would require AI companies to disclose whether an individual’s or publisher’s data was used in training their models. If enacted, it would bring new transparency to how AI companies gather their information, complementing RSL’s framework.

Together, these developments represent a growing push to ensure the internet’s creative foundation remains sustainable in the age of AI.

Support your local paper

Donate Today

The Daily Cardinal has been covering the University and Madison community since 1892. Please consider giving today.

New internet standard aims to reduce AI overreach

The RSL standard allows for more in-depth restrictions against AI in the digital space, from website scraping bans to attribution requirements for content use.

In Antarctica, UW-Madison researchers answer questions about the hidden giants of our universe

‘Mathematicians are just going crazy’: YouTuber 3Blue1Brown packs UW lecture hall, talks high-dimensional spheres

UW sociology study offers new lens into widening political polarization