“Gandhi was born in __.” Should this #prompt return 'India' or '1869'?
Our #EMNLP2022 #paper (w. M. Saeed) shows how #embeddings can enforce the desired typed output, such as Country or Year, in factual #probing.
https://www.eurecom.fr/publication/7095/download/data-publi-7095.pdf
🧵👇[1/6]
Our solution extends prompting in pre-trained #languageModels (#PLMs) to obtain a ``typed’’ output. First, we propose to define types by example. Given “Rome, Paris, New York”, we learn the #embeddings for their latent, shared concept in the PLM. In this case, the City type. 2/6
Given the type embedding (TE), we simply add it to the mask token in the prompt for #factretrieval. The TE steers the inference of the target #token towards the desired type. No training, no #finetuning, just add the new embedding to the input. 3/6
We also report promising results on text generation. Adding the Year or Country vectors to a #GPT 2 prompt leads to text containing entities for such types. 4/6
Finally, we steer text generation with general concepts, e.g., Affection. We generate a vector from words such as love and cheerful. Adding such a vector to prompts that generate toxic text leads to non-toxic output. 5/6
(original group name in the fig. has been replaced with RelG)
Work led by M. Saeed during his PhD at #EURECOM on “Employing #Transformers
and Humans for Textual-Claim Verification”. He is defending today (2pm CET)! Ping me if you’d like to attend it remotely #nlproc #nlp #ML #factchecking #crowdsourcing 6/6