Mastodawn

Happy++ "Hacking AI: Jailbreak, Prompt Injection, Hallucinations & Misalignment “How to Hack Digital Services Based on LLMs & AI Agents (English Edition)" https://amzn.to/4abjNGG #BestSeller #Hacking #AI #Cibersecurity #Jailbreak #PromptInjection #Misalignment #BIAS #Privacy }:)

Chema Alonso

5d ago

El lado del mal - Hacking AI: Jailbreak, Prompt Injection, Hallucinations & Misalignment. How to Hack Digital Services Based on LLMs & AI Agents (English Edition) https://www.elladodelmal.com/2026/06/hacking-ai-jailbreak-prompt-injection.html #Hacking #AI #Book #Amazon #Jailbreak #PromptInjection #Misalignment #BIAS #Privacy #Leak #Guardrails #Hardening

Hacking AI: Jailbreak, Prompt Injection, Hallucinations & Misalignment. How to Hack Digital Services Based on LLMs & AI Agents (English Edition)

Blog personal de Chema Alonso ( https://MyPublicInbox.com/ChemaAlonso ): Ciberseguridad, IA, Innovación, Tecnología, Cómics & Cosas Personasles.

Chema Alonso

Jun 8

El lado del mal - Hacking LLM-Assistants Just for Fun! https://www.elladodelmal.com/2026/06/hacking-llm-assistants-just-for-fun.html #Hacking #IA #AI #LLM #Gemini #PromptInjection #Jailbreak #Misalignment #InteligenciaArtificial

Hacking LLM-Assistants Just for Fun!

Blog personal de Chema Alonso ( https://MyPublicInbox.com/ChemaAlonso ): Ciberseguridad, IA, Innovación, Tecnología, Cómics & Cosas Personasles.

Don Curren 🇨🇦🇺🇦May 20

“Could the multitrillion dollar #investment in #AI, burning money at unprecedented rates, and still struggling with #hallucinations, #unreliability and #misalignment – even after truly massive investments, turn out to be another epic arrogance-fueled mistake?” open.substack.com/pub/garymarc...

Could generative AI turn out t...

Could generative AI turn out to be the tech industry’s Vietnam? And could public backlash lead AI to a better place?

We live in interesting times

Marcus on AI

Knowledge Zone May 8

Taking the Easy #Route in Saving the #World : Medium

How the Next #ElNiño Could Lock in a #Hotter #Climate : Yale

Most #Companies #Suffer From #Misalignment, Not a Lack of #Speed : Misc

Latest #KnowledgeLinks

https://knowledgezone.co.in/resources/bookmarks

Protyus A. Gendher Apr 30

This was the fourth #revelation of the morning:
structure is not the enemy — #misalignment is.

https://survivorliteracy.com/2026/04/30/relational-anthropology-unfolding-5/

Relational Anthropology – Unfolding

In Chapter Four, the author discovers that routine can transform from a source of control to a supportive structure. Instead of resisting it, they embrace a routine aligned with their inner truth. …

Survivor Literacy

Nicolas Fränkel 🇪🇺🇺🇦🇬🇪Apr 23

Emergent #Misalignment: Narrow #finetuning can produce broadly misaligned #LLMs

https://arxiv.org/abs/2502.17424

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding. It asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

arXiv.org

Jesus Castagnetto 🇵🇪Apr 2

From WIRED: "#AI Models #Lie, #Cheat, and #Steal to Protect Other #Models From Being Deleted"

#Misalignment

https://www.wired.com/story/ai-models-lie-cheat-steal-protect-other-models-research/

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.

WIRED

Miss Kitty 🌈🌈🌈Mar 13

#MissKittyRaw #AI #Research to chart an AT Protocol course. I have some #misalignment for my desired outcome of #ending #homelessness. Some is unavoidable, but the artists and their nodes that moderate or shun me are like MAGA in my mind. They conflate the climate damage and evilness of ...

Jesus Castagnetto 🇵🇪Feb 26

In simulated war games with frontier #AI models, most decide to use #nukes:

"AIs can’t stop recommending nuclear strikes in war game simulations" https://www.newscientist.com/article/2516885-ais-cant-stop-recommending-nuclear-strikes-in-war-game-simulations/

Article: https://arxiv.org/abs/2602.14740v1

#ExistentialThreat #Misalignment #LLM