Mastodawn

UKP Lab Jul 24, 2025

🔍 𝗛𝗼𝘄 𝗰𝗮𝗻 𝘄𝗲 𝗮𝗻𝗼𝗻𝘆𝗺𝗶𝘇𝗲 𝘁𝗲𝘅𝘁 𝘀𝗼 𝗟𝗟𝗠𝘀 𝗰𝗮𝗻’𝘁 𝗿𝗲-𝗶𝗱𝗲𝗻𝘁𝗶𝗳𝘆 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗲 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻 — 𝘄𝗵𝗶𝗹𝗲 𝗽𝗿𝗲𝘀𝗲𝗿𝘃𝗶𝗻𝗴 𝘂𝘁𝗶𝗹𝗶𝘁𝘆 𝗳𝗼𝗿 𝗱𝗼𝘄𝗻𝘀𝘁𝗿𝗲𝗮𝗺 𝘁𝗮𝘀𝗸𝘀?

🚀 𝗪𝗲 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗲 𝗥𝗨𝗣𝗧𝗔: 𝗥𝗼𝗯𝘂𝘀𝘁 𝗨𝘁𝗶𝗹𝗶𝘁𝘆-𝗣𝗿𝗲𝘀𝗲𝗿𝘃𝗶𝗻𝗴 𝗧𝗲𝘅𝘁 𝗔𝗻𝗼𝗻𝘆𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻.

(1/🧵)

Show thread

UKP Lab Jul 24, 2025

✅ 𝗥𝗨𝗣𝗧𝗔 uses LLMs to:
→ 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗽𝗿𝗶𝘃𝗮𝗰𝘆 𝗿𝗶𝘀𝗸 via simulated re-identification attacks (privacy evaluator).
→ 𝗠𝗲𝗮𝘀𝘂𝗿𝗲 𝘂𝘁𝗶𝗹𝗶𝘁𝘆 𝗿𝗲𝘁𝗲𝗻𝘁𝗶𝗼𝗻 for tasks like classification (utility evaluator).
→ 𝗜𝘁𝗲𝗿𝗮𝘁𝗶𝘃𝗲𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲 𝘁𝗲𝘅𝘁 via lexicographic optimization: prioritize privacy, then maximize utility.

(2/🧵)

Show thread

UKP Lab Jul 24, 2025

⚙️ Supports 𝗰𝘂𝘀𝘁𝗼𝗺𝗶𝘇𝗮𝗯𝗹𝗲 𝗽𝗿𝗶𝘃𝗮𝗰𝘆-𝘂𝘁𝗶𝗹𝗶𝘁𝘆 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳𝘀 and distillation into lightweight models for real-time use.

📊 𝗢𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗽𝗿𝗶𝗼𝗿 𝗺𝗲𝘁𝗵𝗼𝗱𝘀, achieving lower re-identification success rates and higher downstream accuracy on DB-bio and PersonalReddit datasets.

(3/🧵)

Show thread

UKP Lab Jul 24, 2025

📄 Paper: https://arxiv.org/abs/2407.11770
💻 Code: https://github.com/UKPLab/acl2025-rupta
🔗 Project: https://ukplab.github.io/acl2025-rupta/

(4/🧵)

Robust Utility-Preserving Text Anonymization Based on Large Language Models

Anonymizing text that contains sensitive information is crucial for a wide range of applications. Existing techniques face the emerging challenges of the re-identification ability of large language models (LLMs), which have shown advanced capability in memorizing detailed information and reasoning over dispersed pieces of patterns to draw conclusions. When defending against LLM-based re-identification, anonymization could jeopardize the utility of the resulting anonymized data in downstream tasks. In general, the interaction between anonymization and data utility requires a deeper understanding within the context of LLMs. In this paper, we propose a framework composed of three key LLM-based components: a privacy evaluator, a utility evaluator, and an optimization component, which work collaboratively to perform anonymization. Extensive experiments demonstrate that the proposed model outperforms existing baselines, showing robustness in reducing the risk of re-identification while preserving greater data utility in downstream tasks. We provide detailed studies on these core modules. To consider large-scale and real-time applications, we investigate the distillation of the anonymization capabilities into lightweight models. All of our code and datasets will be made publicly available at https://github.com/UKPLab/acl2025-rupta.

arXiv.org

Show thread

UKP Lab

Also consider following the authors Tianyu Yang (Ubiquitous Knowledge Processing (UKP) Lab, hessian.AI)‬, Xiaodan Zhu (Department of Electrical and Computer Engineering & Ingenuity Labs Research Institute, Queen's University), and Iryna Gurevych (Ubiquitous Knowledge Processing (UKP) Lab).

(5/5)

#NLProc #ACL2025 #TextAnonymization #LLMSafety #AIPrivacy