๐ข Weโll present our TACL paper โ๐๐น๐ถ๐ด๐ป๐ฒ๐ฑ ๐ฃ๐ฟ๐ผ๐ฏ๐ถ๐ป๐ด: ๐ฅ๐ฒ๐น๐ฎ๐๐ถ๐ป๐ด ๐ง๐ผ๐ ๐ถ๐ฐ ๐๐ฒ๐ต๐ฎ๐๐ถ๐ผ๐ฟ ๐ฎ๐ป๐ฑ ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ป๐๐ฒ๐ฟ๐ป๐ฎ๐น๐โ at #EACL2026 ๐ฒ๐ฆ
๐ฅ Key finding:
LMs generate less toxic output when they more strongly encode input toxicity internally
๐งต https://bsky.app/profile/tresiwald.bsky.social/post/3mdfswxr5jn2y
๐ https://arxiv.org/abs/2503.13390
Full paper & code: https://alignedprobing.github.io/
