A quick thing you can do if you want to restrict or limit #LLM training on your content - or the opposite, allow it under specific conditions (e.g. attribution).
/license.xmlLicense: https://krvtz.net/license.xml line to your /robots.txtSample license.xml banning any LLM learning:
<rsl xmlns="https://rslstandard.org/rsl">
<content url="/">
<license>
<prohibits type="usage">ai-train ai-input</prohibits>
</license>
</content>
</rsl>
Sample license.xml allowing LLM learning on CC-BY attribution basis:
<rsl xmlns="https://rslstandard.org/rsl">
<content url="/">
<license>
<permits type="usage">all</permits>
<payment type="attribution">
<standard>https://creativecommons.org/licenses/by/4.0/</standard>
</payment>
</license>
</content>
</rsl>
Live example: https://krvtz.net/robots.txt
Full standard: https://rslstandard.org/guide/getting-started
1,500개 미디어가 지지하는 RSL 1.0, 구글 AI 검색에 ‘선택적 NO’
RSL 1.0 정식 출시로 퍼블리셔가 일반 검색은 유지하면서 AI 검색만 선택적으로 거부할 수 있게 됐습니다. 1,500개 미디어 조직이 지지하는 새로운 표준을 소개합니다.Ars Technica has a good article discussing RSL
This need to get adopted industry wide.
RSL: a license for your web site, get AI scrapers to pay you or stop scraping.
RSL is the missing layer for the AI era: set terms, get attribution, and get paid (per crawl or per inference). Open standard, collective leverage. If AI uses your work, it should respect your license. Time to take control.
https://hostvix.com/rsl-a-new-standard-to-make-ai-pay-for-the-content-it-consumes/
#RSL #ReallySimpleLicensing #AI #AIethics #AIsafety #AIdata #ContentRights #Licensing #OpenWeb #RobotsTxt #Publishers #Creators #Attribution #PayPerCrawl #PayPerInference #RSS #WebStandards #DigitalRights #CollectiveLicensing #Fastly
The web has always run on a simple deal: creators publish content, audiences consume it, and somewhere in between, business models make the whole thing sustainable. For two decades, that model was mostly powered by ads, subscriptions, or syndication. But along came AI — and suddenly, content created for people is being vacuumed up at...
生成AIが巡回するたび報酬が発生? 無断クローラー対策の決定版となる標準規格が誕生【やじうまWatch】 - INTERNET Watch
https://internet.watch.impress.co.jp/docs/yajiuma/2046521.html
『「Really Simple Licensing(RSL)」は、 生成AIのトレーニングデータを収集するクローラーに対する利用条件をウェブパブリッシャーが設定できるというもので、ウェブサイトの「robots.txt」ファイルに追記するだけで利用できる。これを用いれば、生成AIのクローラーがコンテンツを巡回するごとに支払うべき料金といったライセンスも設定できることから、AIスクレイピングによってクリエイターは報酬を得やすくなる』
『すでにReddit、Yahoo、Quora、Mediumなどの大手が賛同を表明』