Mastodawn

Chloé Messdaghi Sep 30, 2025

I’m excited to be hosting the O’Reilly Security Superstream: Secure Code in the Age of AI on October 7 at 11:00 AM ET.

We’ll be diving into practical insights, real-world experiences, and emerging trends to address the full spectrum of AI security.

✨ Save your free spot here: https://bit.ly/4nEWzgj

Security Superstream: Secure Code in the Age of AI - O'Reilly Media

AI tools are transforming the ways that we write and deploy code, making development faster and more efficient, but they also introduce new risks and vulnerabilities. To protect organizations, security must remain a paramount concern across the entire AI ecosystem.

Chloé Messdaghi Jul 10, 2025

Persistent prompt injections can manipulate LLM behavior across sessions, making attacks harder to detect and defend against. This is a new frontier in AI threat vectors.
Read more: https://dl.acm.org/doi/10.1145/3728901
#PromptInjection #Cybersecurity #AIsecurity

Chloé Messdaghi Jul 9, 2025

New research reveals timing side channels can leak ChatGPT prompts, exposing confidential info through subtle delays. AI security needs to consider more than just inputs.
Read more: https://dl.acm.org/doi/10.1145/3714464
#AIsecurity #SideChannel #LLM

Chloé Messdaghi Jul 3, 2025

Magistral is Mistral’s first-ever reasoning model trained purely with reinforcement learning—no prior traces used. Early demos show stronger math, code, and multimodal abilities. A major step for RL-driven LLMs!
🔗 https://arxiv.org/abs/2506.10910
#AI #ReinforcementLearning

Chloé Messdaghi Jul 2, 2025

R&D is the backbone of long-term innovation, but warning signs are emerging.

This new piece from Brookings highlights how declining funding, tighter grant access, and talent barriers could slow U.S. progress in areas like AI and quantum.
🔗 https://brookings.edu/articles/attacks-on-research-and-development-could-hamper-technological-innovation/

Support for open science, global collaboration, and public research matters more than ever.
#Research #SciencePolicy

Chloé Messdaghi Jul 1, 2025

New insights dissect data reconstruction attacks, revealing how AI models' training data can be recovered. This research offers precise definitions and metrics to enhance and assess future defenses.

#AI #DataSecurity #Research #MachineLearning

Chloé Messdaghi Jun 26, 2025

This study highlights the importance of establishing clear motivations, conducting impact evaluations, and offering mitigation strategies in offensive research involving large language models, promoting transparency and responsible disclosure.

#AIResearch #ResponsibleAI

Chloé Messdaghi Jun 25, 2025

This paper introduces a model-agnostic threat evaluation using N-gram language models to measure jailbreak likelihood, finding discrete optimization attacks more effective than LLM-based ones and that jailbreaks often exploit rare bigrams.

#AIResearch #JailbreakDetection

Chloé Messdaghi Jun 24, 2025

OpenAI shows that fine-tuning on biased data can induce misaligned 'personas' in language models, but such behavioral shifts can often be detected and reversed.

#Bias #OpenAI

Chloé Messdaghi Jun 20, 2025

While most AI aims to be compliant and "moral," this study explores the potential benefits of antagonistic AI—systems that challenge and confront users—to promote critical thinking and resilience, emphasizing ethical design grounded in consent, context, and framing.

https://arxiv.org/abs/2402.07350

#AI #CriticalThinking

Website	https://www.chloemessdaghi.com
LinkedIn	https://www.linkedin.com/in/chloemessdaghi/
Bluesky	https://bsky.app/profile/chloemessdaghi.bsky.social