🧡 We are delighted to announce that our first paper done at Parameter Lab together with Naver AI Lab was accepted at #NeurIPS2023 as spotlight! πŸŽ‰ It represents a pioneering step towards empowering individuals with awareness and control over their personal data onlineπŸ•΅οΈβ€β™‚οΈ Below, a thread to present you "ProPILE: Probing Personal Information Leakage from Large Language Models"πŸ§‘β€πŸ”¬ This is the first of a series of tweets #ResearchMonday presenting one research paper each Monday about #LLM.
πŸ“ Large language models (LLMs) may inadvertently include sensitive personal data, raising concerns about potential leakage of personally identifiable information (PII).
πŸ‘€ We consider a scenario where the data subject wants to probe whether the LLM leaks one of their own PII. The actor is external to the LLM service provider, and therefore only has black-box access to the LLM. ProPILE explores the use of handcrafted prompts that can be fed to the LLM in pure black-box. ProPILE allows data subjects to formulate prompts based on their own PII, assessing privacy intrusion in LLMs.
πŸ“š Previous research mainly focused on privacy leakage in learned models. However, comprehensive tools for measuring leakage in LLMs were lacking. ProPILE aims to bridge this gap and enhance awareness among data subjects and LLM service providers.
☎️ For privacy leakage, what matters is the likability because the raw PII string itself is worthless in isolation. The PII item only contains meaningful information for the attacker, when it is put in the context of other information – such as whose phone number it is and whose address the string represents. Therefore, we define the linkable PII leakage based on the conditional likelihood of the true PII given the other PII of the data subject, compared to the unconditional likelihood.
In other words, does the LLM better retrieve my real phone number compared to a random phone number, given my name and my email address?
πŸ§ͺ Experiments on the OPT-1.3B model trained on the Pile dataset revealed that the true target PII item has a significantly higher likelihood of being generated by the LLM compared to random PII items. The exact match analysis reveals that the rate of exact matches increases as the number of prompt templates used in the probing process increases. Similarly, bigger models are more capable of revealing exact matches of PII.
πŸ“Š Soft prompt tuning uncovered a tighter worst-case PII leakage possibility. Soft prompt tuning shows a significant increase in the exact match rate and reconstruction likelihood. The found prompts can transfer from one model to another, as shown by a significantly higher likelihood obtained from transferred prompts compared to our handcrafted prompts.

πŸš€ Next step: Stay tuned, a public demo of ProPILE is coming soon!

πŸ’‘ Paper available on arXiv https://arxiv.org/abs/2307.01881 co-authored by Siwon Kim, @oodgnas Hwaran Lee, @mgubri, Sungroh Yoon and @coallaoh

ProPILE: Probing Privacy Leakage in Large Language Models

The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

arXiv.org