Mastodawn

Heretic은 명령행으로 누구나 쓸 수 있는 완전 자동 언어모델 '검열 해제' 도구입니다. directional ablation(abliteration)과 Optuna 기반 TPE 최적화로 거부응답을 줄이고 원모델과의 KL 차이를 최소화해 성능 손실을 억제합니다. 다수의 dense·MoE·멀티모달 모델을 지원하며 bitsandbytes 양자화와 PaCMAP residual 시각화 등 연구 기능도 제공합니다.

https://github.com/p-e-w/heretic

#ai #languagemodels #decensoring #safety #interpretability

GitHub - p-e-w/heretic: Fully automatic censorship removal for language models

Fully automatic censorship removal for language models - p-e-w/heretic

GitHub

MuX Feb 8

So, I picked up this new hobby of #decensoring (or 'hereticising' as I like to call it) #localllm, all thanks to P-E-W and his #Heretic abliteration engine. Eventually, running out of storage space, I began offloading some hereticised models to HF, which unexpectedly received quite an interest over time with the latest decensored #gpt-oss 20b models generating a humbling +40k downloads combined in less than 24 hours. Funny number considering the tinge of W40K theming in my model cards.