• News Categories
    ▼
    • Surveillance & Technology
    • U.S. News & Reports
    • International News
    • Finance
    • Defense & Security
    • Politics
    • Videos
  • Blog
  • Directory
  • Support Us
  • About
  • Contact

T-Room

The Best in Alternative News

  • News Categories
    • Surveillance & Technology
    • U.S. News & Reports
    • International News
    • Finance
    • Defense & Security
    • Politics
    • Videos
  • Blog
  • Directory
  • Support Us
  • About
  • Contact

December 12, 2023 at 7:02 pm

No Robots(.txt): How to Ask ChatGPT and Google Bard to Not Use Your Website for Training…

Artificial_Intelligence_Tech_Crunch
ParlerGabTruth Social

by Thorin Klosowski at Electronic Frontier Foundation

Both OpenAI and Google have released guidance for website owners who do not want the two companies using the content of their sites to train the company’s large language models (LLMs). We’ve long been supporters of the right to scrape websites—the process of using a computer to load and read pages of a website for later analysis—as a tool for research, journalism, and archivers. We believe this practice is still lawful when collecting training data for generative AI, but the question of whether something should be illegal is different from whether it may be considered rude, gauche, or unpleasant. As norms continue to develop around what kinds of scraping and what uses of scraped data are considered acceptable, it is useful to have a tool for website operators to automatically signal their preference to crawlers. Asking OpenAI and Google (and anyone else who chooses to honor the preference) to not include scrapes of your site in its models is an easy process as long as you can access your site’s file structure.

We’ve talked before about how these models use art for training, and the general idea and process is the same for text. Researchers have long used collections of data scraped from the internet for studies of censorship, malware, sociology, language, and other applications, including generative AI. Today, both academic and for-profit researchers collect training data for AI using bots that go out searching all over the web and “scrape up” or store the content of each site they come across. This might be used to create purely text-based tools, or a system might collect images that may be associated with certain text and try to glean connections between the words and the images during training. The end result, at least currently, is the chatbots we’ve seen in the form of Google Bard and ChatGPT.

If you do not want your website’s content used for this training,…

ParlerGabTruth Social
Continue Reading
This website lives off the kindness of your donations. If you would like to support The T-Room please visit our PayPal.

Editor’s Picks

Taking a Break…

Joby Wants to Fly a Future-Taxi Off the White House Lawn…So Cool!!!

‘Prince Andrew Was F*ing Underage Girls’ — Tape of Royal Family Advisor Exposes Prince Andrew’s Sexual Relations with Minors and Deep Ties to Jeffrey Epstein…

Cardinal Prevost Elected As Pope Leo XIV…

India on High Alert on Land, Air and Sea…

Any publication posted at The T-Room and/or opinions expressed therein do not necessarily reflect the views of The T-Room. Such publications and all information within the publications (e.g. titles, dates, statistics, conclusions, sources, opinions, etc) are solely the responsibility of the author of the article, not The T-Room.

Twitter Icon

View Old Archives

Copyright © 2025 T-Room

Site by Creative Visual Design