Mastodawn

Nick Byrd, Ph.D.Jul 18, 2024

How can #largeLanguageModels scale up development of and insight from #ethics dilemmas?

Researchers had #LLMs make "50 scenarios and 400 unique test items" and collected responses from humans, #GPT4, and #Claude2.
- harm that was necessary to reduce overall harm deemed worse (and more intentional) than harm that was merely a side effect
- same for evitable harm (vs. inevitable) harm
- no effects of harm that required action (vs. omission)

https://escholarship.org/uc/item/77r459kj

#xPhi #compSci #psychology

Procedural Dilemma Generation for Moral Reasoning in Humans and Language Models

Author(s): Fränken, Jan-Philipp; Gandhi, Kanishk; Qiu, Tori; Khawaja, Ayesha; Goodman, Noah; Gerstenberg, Tobias | Abstract: As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a language model to translate causal graphs that capture key aspects of moral dilemmas into prompt templates. With this framework, we procedurally generated a large and diverse set of moral dilemmas---the OffTheRails benchmark---consisting of 50 scenarios and 400 unique test items. We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two language models (GPT-4 and Claude-2) across eight conditions. We find that moral dilemmas in which the harm is a necessary means (as compared to a side effect) resulted in lower permissibility and higher intention ratings for both participants and language models. The same pattern was observed for evitable versus inevitable harmful outcomes. However, there was no clear effect of whether the harm resulted from an agent's action versus from having omitted to act. We discuss limitations of our prompt generation pipeline and opportunities for improving scenarios to increase the strength of experimental effects.

Dagger ☀️Mar 3, 2024

Politische Chats mit Mensch oder Maschine?
Schon jetzt können Menschen in mehr als der Hälfte aller Fälle nicht mehr unterscheiden, mit wem sie chatten https://www.telepolis.de/features/KI-Chatbots-in-der-politischen-Kommunikation-taeuschend-echt-9644300.html
#ai #ki #llm #chatbot #fakenews #GPT4 #Llama2 #Claude2

KI-Chatbots in der politischen Kommunikation täuschend echt

KI-Chatbots mischen unbemerkt in Debatten mit. Ihre Meinungen wirken so echt, dass sie kaum noch von Menschen zu unterscheiden sind. Was Forscher dazu sagen.

heise online

Qiita - 人気の記事 Feb 27, 2024

GPT-4の新たなライバル？ Claudeより賢いとウワサのMistral AIとは
https://qiita.com/minorun365/items/ea71f3fe692f87a6e59c?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items

#qiita #bedrock #LLM #GPT_4 #Mistral #Claude2

GPT-4の新たなライバル？ Claudeより賢いとウワサのMistral AIとは - Qiita

Bedrockに新たな仲間が登場！AWSは先日、生成AIサービスAmazon BedrockにMistal AIのLLMを追加する予定があることを発表しました。https://aws.amazo…

Qiita

Alessio Pomaro Feb 8, 2024

🧠 Un confronto tra output di diversi #LLM: #GPT4, GPT-3.5, #Gemini Plus, #Claude2, #Llama2 70b, #Mixtral 8x7b (input identico).

⚙️ Il task è molto semplice: l'analisi di una recensione. Le risposte sono molto simili: le sfumature nell'estrazione dei topic derivano da diverse letture del contesto, ma hanno tutti senso.

🦾 Per operazioni semplici e ricorrenti, i modelli open source eseguiti localmente o su istanze private possono essere una risposta di valore.

#AI #GenAI #GenerativeAI

KINEWS24 Dec 31, 2023

10 things the AGI wishes us for 2024: ChatGPT, Gemini and Claude 2

#artificialintelligence #ai #chatgpt #claude2 #googlegeminipro

https://kinews24.de/10-dinge-die-uns-die-agi-fuer-2024-wuenscht/

10 Dinge die uns die AGI für 2024 wünscht: ChatGPT, Gemini und Claude 2 - KINEWS24.de

10 Dinge, die uns die AGI für 2024 wünscht: ChatGPT, Gemini und Claude 2

KI NEWS24

Ken Kousen Dec 10, 2023

Tales from the jar side: Claude AI in the Spring Framework, a viral video on YouTube plagiarism, some upcoming milestones, and the usual assortment of silly tweets, toots, and skeets

#java #springboot #plagiarism #claude2 #claudeai
https://open.substack.com/pub/kenkousen/p/tales-from-the-jar-side-claude-in?r=2dwq5&utm_campaign=post&utm_medium=web

Tales from the jar side: Claude in the Spring, Plagiarism on YouTube, Upcoming milestone, and the usual silly tweets, toots, and skeets

What do you call a one-legged hippo? A hoppo! (rimshot) And if you don't like that joke, you're being hippo-critical (rimshot again) (both from @Dadsaysjokes on the evil birdsite)

Tales from the jar side

Ken Kousen Dec 3, 2023

This week's Tales from the jar side, about how Google's Bard is stupid, Anthropic's Claude is better but can't do math, this week in Elon, and the usual silly tweets, toots, and skeets
https://open.substack.com/pub/kenkousen/p/tales-from-the-jar-side-google-bard?r=2dwq5&utm_campaign=post&utm_medium=web
#java #claude2 #openai #bard #elon

Tales from the jar side: Google Bard is stupid, Claude AI is not (but can't do math), and the usual silly tweets, toots, and skeets

I ordered a chicken and an egg from Amazon. I'll let you know. (rimshot)

Tales from the jar side

Mahmoud Az Nov 21, 2023

No better time.

#ai #claude2

D4vRAM Nov 19, 2023

No me imagino mi día a día hoy en día en #Internet sin #chatgptplus: es ALUCINANTE.

Merece la pena hasta el último euro invertido de los 20$ que son al mes (18€ poco más). Y llevo menos de una semana con él. Estaba antes con #PerplexityPro, que por #Claude2 es una maldita pasada también.

#ChatGPT #GPT4 #GPTPlus #IA #InteligenciaArtificial #AI #OpenAI

Esperemos que el despido/jugada rastrera que le han hecho a #SamAltman, echándole hoy de la compañía la junta directiva sin el esperarlo siquiera, no joda todo...

Sam, ¡¡CONTRAATACA, LUCHA POR LO QUE ES TUYO!!

Yo lo haría... Y dudo que él no, que era hasta hace 24 horas #CEO de la empresa más importante del siglo.

gittaca Nov 14, 2023

Me, whenever someone puts some #StochasticParrot-related or -generated report or proposal on my desk.