As a not super techy person, can someone more informed please point me to something that accurately explains the difference between ethical and non-ethical "AI"? Like, I know that technically an email spam filter is "ai" but not problematic (afaik), but llms are psychologically and environmentally disastrous. Can someone share an article or something that explains the differences in the factors and mechanisms here clearly for non techy people?
#ai #askfedi #tech
@ArcMother
What ethical #ai is, starts with the right definitions. An e-mail spamfilter usually works by combining 'rules' based and some 'machine learning' based weighing and filtering. Pretty straightforward and usually very reliable.
1/
@ArcMother
What people call 'artificial intelligence' today is basically just 'machine learning' like we have been using the last 25 years or so, with the key difference being we gave it considerably more resources to do its thing. It became more convincing by accurately mimicking human output, however it can also seriously go off the rails and make things up in a very convincing way. Therefore the lack of reliability of #ai is a serious shortcoming.
2/

@ArcMother
You just can't be sure how 'helpful' #ai will be for you today. That in itself is an ethical problem if you ask me.

This is however not the only ethical dilemma. Another extremely important issue is the training data and where it comes from. Every ai-companie has basically scraped the entire internet. Everything they could find, books news articles, social media posts, you name it. It was downloaded and used to train ai-models.

This raises the question of copyright and licensing.
3/

@ArcMother
Until not too long ago training #ai was considered academic research. Gaining knowledge to benefit humanity. The question is where academic research ends and where commercial applications start. When one flows into the other one is fencing creative works.

Using a creative work to train ai without prior permission is what I consider unethical. There can be a case of fair use, for academic research, to quote a work or critique it, these are all fair usage of protected creative work.
4/

@ArcMother
The key difference between 'machine learning' and 'artificial intelligence' is that it got allocated considerably more resources to do its thing. Resources being, computer hardware, energy in the form of electricity and water of drinking quality to supply the needed cooling.

An #ai data center needs a ton of electrical energy and water to function. These have to come from somewhere and the infrastructure needs to be able to deliver these resources without impacting other users.
5/

@ArcMother
It stands out that other users do get impacted, in some places more serious than in others. Creating societal problems, utility bills going up and shortage of resources. Development of #ai data centers have serious real life consequences for people. Raising ethical questions.
6/
@ArcMother
Scraping the internet is as old as the first internet search engines. It's common for Bing, DuckDuckGo, Google and others to pass by a webserver and request pages. Index the content and in return they forward visitors. Actual people who see your website and any advertising you might have on it. Ads pay the bills for writing content and costs for hosting the server, data, maintenance, software licenses and more.
7/

@ArcMother
Things become problematic when companies value your content but don't forward any actual visitors back to your website. Instead they show your creative work and you don't receive a penny in return. No more visitors are being forwarded to your site. Advertisers ask themselves why they are advertising on your site when fewer and fewer people see their ads.

It basically comes down to #ai-companies stealing your content and putting you out of business. There is an ethical problem.
8/

@ArcMother
Server resources and data don't come for free. Hosting a website can be a costly undertaking. Internet search engines have this gentleman's agreement of preserving your website's resources. They don't request too many pages at once, spread requests over time and only revisit an existing page after a reasonable time has passed. This has worked perfectly fine for decades.
9/

@ArcMother
Things have changed drastically since the #ai #goldrush. A lot of ai startups don't adhere to the gentleman's agreement between site owners and search engines. They just scrape as fast as they can without taking the resources of websites into account.

This shouldn't be a big problem if there are a handful of #startups. The problems really start when several hundreds of those come visiting your site at the same time and wasting your server's resources. A serious ethical dilemma.