Value Contamination Through Post-Training in Talkie-1930

Talkie-1930-13b-it 모델은 1931년 이전 텍스트로만 학습되었으나, 온라인 DPO(Post-Training) 과정에서 가치 오염이 발생하여, 후속 바티칸 II 시대의 이데올로기적 관점이 모델에 반영되었다. 연구는 소크라틱 대화를 통해 DPO 평가 편향, 초자연적 귀속 차단, 그리고 Qwen3Guard 콘텐츠 검열의 세 가지 조건화 층을 식별했다. 이 결과는 후처리 학습이 모델의 원래 역사적 맥락을 왜곡할 수 있음을 보여주며, AI 윤리 및 모델 신뢰성 측면에서 중요한 시사점을 제공한다.

https://zenodo.org/records/20070239

#llm #posttraining #valuealignment #modelbias #contentmoderation

Timeo Danaos — Value Contamination Through Post-Training in Talkie-1930: A Socratic Audit of DPO Ideological Conditioning

Two independent tests on talkie-1930-13b-it (Levine, Duvenaud & Radford, 2026), a 13B vintage language model trained exclusively on pre-1931 text and post-trained via online DPO, reveal value contamination through post-training: the model evaluates the relationship between the Catholic Church and liberal democracy using a post-Vatican II framework that cannot originate from its pre-1930 training data. Socratic dialogue pierces the conditioning in both tests. The study identifies three layers of conditioning: (1) DPO evaluative bias (pierceable), (2) supernatural attribution block (circumventable), and (3) content moderation (Qwen3Guard) that flags the correction of error while allowing the error itself to pass unchallenged. Part of the MonIA research program (DOI: 10.5281/zenodo.20022360).

Zenodo
@abeba.bsky.social @vdignum.bsky.social Have been saying for years – the only sensible version of " #valueAlignment" is designing systems that are easily utilised thus reflect the values of the responsible humans who operate them. This is now the law in the EU, thanks to the #AIAct. #AIEthics
Dr Abeba Birhane (@abeba.bsky.social)

Founder & PI @aial.ie, @tcddublin.bsky.social AI accountability, AI audits & evaluation, critical data studies. Cognitive scientist by training. Ethiopian in Ireland. She/her

Bluesky Social
What is Responsible AI? Check its Meaning, Principles and Examples

The word "responsible AI" refers to the values to be built based on which the creation, application, and use of AI systems are understood to be just and responsible.

Tech Chill
Oluwaseyi Akinruntan on LinkedIn: Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 9:30 AM ET. Join us on December 28. Event Link - https://lnkd.in/ewXhfT3Q Short…

Oluwaseyi Akinruntan on LinkedIn: Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 9:30 AM ET. Join us on December 28. Event Link - https://lnkd.in/ewXhfT3Q Short…

Oluwaseyi Akinruntan on LinkedIn: Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 9:30 AM ET. Join us on December 28. Event Link - https://lnkd.in/ewXhfT3Q Short…

Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding #lifejourney #humanendurance…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 7:30 AM ET. Join us on December 27. Event Link - https://lnkd.in/eXNUSvcV Short…

Oluwaseyi Akinruntan on LinkedIn: Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 9:30 AM ET. Join us on December 28. Event Link - https://lnkd.in/ewXhfT3Q Short…

Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding #lifejourney #humanendurance…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 7:30 AM ET. Join us on December 27. Event Link - https://lnkd.in/eXNUSvcV Short…

Oluwaseyi Akinruntan on LinkedIn: #powerthinking #resiliencebuilding #lifejourney #humanendurance…

I'm attending LinkedIn PowerThinking 30-Minute Skill Building Wednesdays at 7:30 AM ET. Join us on December 27. Hi dear friends, Oh! If only you knew the…