Mastodawn

LLMs Corrupt Your Documents When You Delegate

본 논문은 LLM이 문서 편집을 위임받아 수행할 때 문서 내용이 손상되는 문제를 다룬다. DELEGATE-52라는 52개 전문 분야를 아우르는 장기 위임 워크플로우 시뮬레이션을 통해 19개 LLM을 평가한 결과, 최신 모델조차도 평균 25%의 문서 내용을 손상시키는 것으로 나타났다. 에이전트 도구 사용이나 문서 크기, 상호작용 길이, 방해 파일 존재 등이 손상 정도를 악화시키며, LLM은 장기 작업에서 신뢰할 수 없는 위임자임을 시사한다. 이는 LLM 기반 문서 자동화 및 에이전트 구축 시 신뢰성과 오류 누적 문제에 대한 주의가 필요함을 의미한다.

https://arxiv.org/abs/2604.15597

#llm #documentcorruption #delegation #aiagents #workflow

LLMs Corrupt Your Documents When You Delegate

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation that the LLM will faithfully execute the task without introducing errors into documents. We introduce DELEGATE-52 to study the readiness of AI systems in delegated workflows. DELEGATE-52 simulates long delegated workflows that require in-depth document editing across 52 professional domains, such as coding, crystallography, and music notation. Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. Additional experiments reveal that agentic tool use does not improve performance on DELEGATE-52, and that degradation severity is exacerbated by document size, length of interaction, or presence of distractor files. Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.

arXiv.org