JS vs PHP Prompt Injection Filter: Leash the LLM
Block jailbreaking tricks before the model spills secrets.
#php #javascript #promptinjection #llmsafety #filters #viralcoding #codecomparison #aisecurity #trending #developertools

Authors: Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (Tsinghua University), Roy Schwartz (The Hebrew University of Jerusalem), and Iryna Gurevych (UKP Lab, Technische Universitรคt Darmstadt and ATHENE Center).
See you at #EACL2026 in Rabat ๐!
#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch
JS vs PHP Prompt Injection Filter: Leash the LLM
Block jailbreaking tricks before the model spills secrets.
#php #javascript #promptinjection #llmsafety #filters #viralcoding #codecomparison #aisecurity #trending #developertools

๐ ๐ฃ๐ฎ๐ฝ๐ฒ๐ฟ โ https://arxiv.org/pdf/2501.01872
๐ ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ โ https://ukplab.github.io/emnlp2025-poate-attack/
๐พ ๐๐ผ๐ฑ๐ฒ + ๐ฑ๐ฎ๐๐ฎ โ https://github.com/UKPLab/emnlp2025-poate-attack
And consider following the authors Rachneet Sachdevaโฌ, Rima Hazra, and Iryna Gurevych (UKP Lab/TU Darmstadt) if you are interested in more information or an exchange of ideas.
(3/3)
Also consider following the authors Tianyu Yang (Ubiquitous Knowledge Processing (UKP) Lab, hessian.AI)โฌ, Xiaodan Zhu (Department of Electrical and Computer Engineering & Ingenuity Labs Research Institute, Queen's University), and Iryna Gurevych (Ubiquitous Knowledge Processing (UKP) Lab).
(5/5)
Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @pettter and @roeldobbe.
https://link.springer.com/article/10.1007/s10676-025-09837-2
1/
This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLHF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics, and contributing to AI safety. We highlight tensions inherent in the goals of RLHF, as captured in the HHH principle (helpful, harmless and honest). In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLHF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We offer an alternative vision for AI safety and ethics which positions RLHF approaches within a broader context of comprehensive design across institutions, processes and technological systems, and suggest the establishment of AI safety as a sociotechnical discipline that is open to the normative and political dimensions of artificial intelligence.
"Anthropic's new AI (๐) model shows ability to deceive and blackmail"
https://www.axios.com/2025/05/23/anthropic-ai-deception-risk
Meta AI in WhatsApp is gone today - based in the UK. It's probably a staged roll-out. They'll see what the problems and the backlash are like and be finding out whether people use it and so on.
For me the integration of any AI into a messaging app is totally unwanted and should literally be illegal! That's not "optimistic" for this day-and-age, that's basic application privacy. I don't want to send their AI any data whatsoever, including friends names when I search - just search my saved contacts dammit!
Pushing unwanted AI integration on people with no way of disabling it completely just speaks volumes about Meta's general disregard for the users - we are simply their product, their guinea pigs. All of us are the test subjects of these AI corporations and the outcomes are likely to be predictably dire in the long term. AI pioneer Eliezer Yudkowsky wrote:
โ[T]he most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in โmaybe possibly some remote chance,โ but as in โthat is the obvious thing that would happen.โโ
For me it doesn't matter that it's temporarily gone from WhatsApp. It will be back in no time and the general trajectory of WhatsApp enshittification remains the same. Needing to be in contact with friends is not a justification for creating a massive invasion of privacy on every phone! Particularly when it's Meta - a company who profit from providing social media services to fascists and who are headed by Zuckerberg, with his alt-right techbro leanings and support of the Trump regime.
It's time to ditch WhatsApp forever! There's no back-pedalling now, the trust is broken. There have to be hard consequences for companies who treat the end users with such disdain and total lack of care.
Calling it "virtue signalling" FFS Zuckerberg! Respect people's privacy or piss off!
I'm moving over to Signal messenger altogether, along with many friends, and we aren't coming back to WhatsApp or anything Meta no matter what now - this is the final straw.
Can we trust DeepSeek R1? A Giskard evaluation ๐ณ๐ข
With all the hype around DeepSeek R1, our LLM safety research team decided to conduct an evaluation to check if R1 is as good as it claims. While it impresses in some areas, we found critical limitations that raise concerns for real-world applications. Here are some unexpected examples ๐
Excellent workshop on GenAI security risks from policy and technology perspectives:
https://www.linkedin.com/posts/khawajashams_genai-risks-workshop-oct-2023-activity-7120450981623959552-7Ok5?utm_source=share&utm_medium=member_android
Check out the website for agenda and slide decks, and stay tuned for the upcoming written report.
#genai #security #risks #policy #research #alignment #watermarking #llmsecurity #llmsafety