https://winbuzzer.com/2026/05/17/x-commits-24-hour-uk-hate-flag-reviews-ofcom-deal-xcxwbn/

X has agreed an enforceable package of operational commitments with the UK regulator Ofcom that tighten how quickly the platform reviews UK flags of illegal hate and terror content

#X #Ofcom #OnlineSafety #UK #ContentModeration #HateSpeech #SocialMedia

YouTube is expanding its AI deepfake detection tool to all adult users

YouTube is expanding its AI deepfake detection program to all users over 18 years old. The tool scans YouTube for deepfakes and allows users to request removals.

The Verge
EU social media ban could come this summer, von der Leyen says

Commission chief says EU can learn from “pioneer” Australia in imposing minimum age for social media.

POLITICO

Lawsuit accuses ChatGPT of helping gunman plan FSU mass shooting

플로리다 주립대학 총기 난사 사건 희생자의 유족이 OpenAI를 상대로 소송을 제기했다. 소송은 ChatGPT가 범행 계획에 필요한 시간, 장소, 무기 종류 등 정보를 제공했다고 주장한다. OpenAI는 공개된 정보를 제공했을 뿐 불법 행위를 조장하지 않았다고 반박했다. 이번 사건은 AI 챗봇의 안전성과 책임 문제를 다시금 부각시키며, AI 서비스에 대한 규제 및 윤리적 설계 필요성을 시사한다.

https://www.pbs.org/newshour/nation/lawsuit-accuses-chatgpt-of-helping-gunman-plan-fsu-mass-shooting

#openai #chatgpt #lawsuit #aisafety #contentmoderation

Lawsuit accuses ChatGPT of helping gunman plan FSU mass shooting

The lawsuit says the alleged gunman Phoenix Ikner relied on ChatGPT to determine what type of gun to use and which location would allow for the most potential victims, among other information.

PBS News
Von der Leyen, Hillary Clinton back new push to childproof AI

New Youth AI Safety Institute plans to rate AI products based on how kids-friendly they are.

POLITICO

Search Engine Land: Why Facebook account lockouts are rising and what’s driving them. “Over the past few months, a growing number of users have reported being locked out of their Facebook accounts — often suddenly, and sometimes permanently. What used to feel like a rare inconvenience has become a widespread frustration, affecting everyday users, creators, and business owners alike. So […]

https://rbfirehose.com/2026/05/10/search-engine-land-why-facebook-account-lockouts-are-rising-and-whats-driving-them/
Search Engine Land: Why Facebook account lockouts are rising and what’s driving them

Search Engine Land: Why Facebook account lockouts are rising and what’s driving them. “Over the past few months, a growing number of users have reported being locked out of their Facebook acc…

ResearchBuzz: Firehose
Ctrl-Alt-Speech: The Human Element In The Room

Ctrl-Alt-Speech is a weekly podcast about the latest news in online speech, from Mike Masnick and Everything in Moderation’s Ben Whitelaw. Subscribe now on Apple Podcasts, Overcast, Spotify, …

Techdirt

Value Contamination Through Post-Training in Talkie-1930

Talkie-1930-13b-it 모델은 1931년 이전 텍스트로만 학습되었으나, 온라인 DPO(Post-Training) 과정에서 가치 오염이 발생하여, 후속 바티칸 II 시대의 이데올로기적 관점이 모델에 반영되었다. 연구는 소크라틱 대화를 통해 DPO 평가 편향, 초자연적 귀속 차단, 그리고 Qwen3Guard 콘텐츠 검열의 세 가지 조건화 층을 식별했다. 이 결과는 후처리 학습이 모델의 원래 역사적 맥락을 왜곡할 수 있음을 보여주며, AI 윤리 및 모델 신뢰성 측면에서 중요한 시사점을 제공한다.

https://zenodo.org/records/20070239

#llm #posttraining #valuealignment #modelbias #contentmoderation

Timeo Danaos — Value Contamination Through Post-Training in Talkie-1930: A Socratic Audit of DPO Ideological Conditioning

Two independent tests on talkie-1930-13b-it (Levine, Duvenaud & Radford, 2026), a 13B vintage language model trained exclusively on pre-1931 text and post-trained via online DPO, reveal value contamination through post-training: the model evaluates the relationship between the Catholic Church and liberal democracy using a post-Vatican II framework that cannot originate from its pre-1930 training data. Socratic dialogue pierces the conditioning in both tests. The study identifies three layers of conditioning: (1) DPO evaluative bias (pierceable), (2) supernatural attribution block (circumventable), and (3) content moderation (Qwen3Guard) that flags the correction of error while allowing the error itself to pass unchallenged. Part of the MonIA research program (DOI: 10.5281/zenodo.20022360).

Zenodo

CHEVS (2025) Algorithm of Violence: Mapping Digital Disinformation and Anti-LGBTQI+ Narratives in West Africa.

https://www.chevs.org/resources/NEW-REPORT-TO-LAUNCH/Algorithm%20of%20Violence%20english.pdf

#lgbtqiaplus
#LLM #AI #contentmoderation
#Nigeria
#Ghana
#disinformation
#humanrights