Chloé Messdaghi

1.1K Followers
90 Following
192 Posts
Advisor on AI Governance & Cybersecurity | Strategic Counsel on Risk, Oversight & Institutional Readiness | Named a Power Player by Business Insider & SC Media
Websitehttps://www.chloemessdaghi.com
LinkedInhttps://www.linkedin.com/in/chloemessdaghi/
Blueskyhttps://bsky.app/profile/chloemessdaghi.bsky.social

I’m excited to be hosting the O’Reilly Security Superstream: Secure Code in the Age of AI on October 7 at 11:00 AM ET.

We’ll be diving into practical insights, real-world experiences, and emerging trends to address the full spectrum of AI security.

✨ Save your free spot here: https://bit.ly/4nEWzgj

Security Superstream: Secure Code in the Age of AI - O'Reilly Media

AI tools are transforming the ways that we write and deploy code, making development faster and more efficient, but they also introduce new risks and vulnerabilities. To protect organizations, security must remain a paramount concern across the entire AI ecosystem.

Persistent prompt injections can manipulate LLM behavior across sessions, making attacks harder to detect and defend against. This is a new frontier in AI threat vectors.
Read more: https://dl.acm.org/doi/10.1145/3728901
#PromptInjection #Cybersecurity #AIsecurity
New research reveals timing side channels can leak ChatGPT prompts, exposing confidential info through subtle delays. AI security needs to consider more than just inputs.
Read more: https://dl.acm.org/doi/10.1145/3714464
#AIsecurity #SideChannel #LLM
Magistral is Mistral’s first-ever reasoning model trained purely with reinforcement learning—no prior traces used. Early demos show stronger math, code, and multimodal abilities. A major step for RL-driven LLMs!
🔗 https://arxiv.org/abs/2506.10910
#AI #ReinforcementLearning

R&D is the backbone of long-term innovation, but warning signs are emerging.

This new piece from Brookings highlights how declining funding, tighter grant access, and talent barriers could slow U.S. progress in areas like AI and quantum.
🔗 https://brookings.edu/articles/attacks-on-research-and-development-could-hamper-technological-innovation/

Support for open science, global collaboration, and public research matters more than ever.
#Research #SciencePolicy

New insights dissect data reconstruction attacks, revealing how AI models' training data can be recovered. This research offers precise definitions and metrics to enhance and assess future defenses.

Read more: https://arxiv.org/abs/2506.07888

#AI #DataSecurity #Research #MachineLearning

This study highlights the importance of establishing clear motivations, conducting impact evaluations, and offering mitigation strategies in offensive research involving large language models, promoting transparency and responsible disclosure.

Read more: https://arxiv.org/abs/2506.08693

#AIResearch #ResponsibleAI

This paper introduces a model-agnostic threat evaluation using N-gram language models to measure jailbreak likelihood, finding discrete optimization attacks more effective than LLM-based ones and that jailbreaks often exploit rare bigrams.

Read more: https://arxiv.org/abs/2410.16222

#AIResearch #JailbreakDetection

OpenAI shows that fine-tuning on biased data can induce misaligned 'personas' in language models, but such behavioral shifts can often be detected and reversed.

Read more: https://www.technologyreview.com/2025/06/18/1119042/openai-can-rehabilitate-ai-models-that-develop-a-bad-boy-persona/

#Bias #OpenAI

While most AI aims to be compliant and "moral," this study explores the potential benefits of antagonistic AI—systems that challenge and confront users—to promote critical thinking and resilience, emphasizing ethical design grounded in consent, context, and framing.

https://arxiv.org/abs/2402.07350

#AI #CriticalThinking