Carsten Eickhoff

103 Followers
121 Following
54 Posts
Professor of NLP/IR @ University of Tübingen
Webhttps://health-nlp.com
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
https://arxiv.org/abs/2311.18034
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings

Cross-lingual transfer learning is an important property of multilingual large language models (LLMs). But how do LLMs represent relationships between languages? Every language model has an input layer that maps tokens to vectors. This ubiquitous layer of language models is often overlooked. We find that similarities between these input embeddings are highly interpretable and that the geometry of these embeddings differs between model families. In one case (XLM-RoBERTa), embeddings encode language: tokens in different writing systems can be linearly separated with an average of 99.2% accuracy. Another family (mT5) represents cross-lingual semantic similarity: the 50 nearest neighbors for any token represent an average of 7.61 writing systems, and are frequently translations. This result is surprising given that there is no explicit parallel cross-lingual training corpora and no explicit incentive for translations in pre-training objectives. Our research opens the door for investigations in 1) The effect of pre-training and model architectures on representations of languages and 2) The applications of cross-lingual representations embedded in language models.

arXiv.org
Look-back Decoding for Open-Ended Text Generation
https://arxiv.org/abs/2305.13477
Look-back Decoding for Open-Ended Text Generation

Given a prefix (context), open-ended generation aims to decode texts that are coherent, which do not abruptly drift from previous topics, and informative, which do not suffer from undesired repetitions. In this paper, we propose Look-back, an improved decoding algorithm that leverages the Kullback-Leibler divergence to track the distribution distance between current and historical decoding steps. Thus Look-back can automatically predict potential repetitive phrase and topic drift, and remove tokens that may cause the failure modes, restricting the next token probability distribution within a plausible distance to the history. We perform decoding experiments on document continuation and story generation, and demonstrate that Look-back is able to generate more fluent and coherent text, outperforming other strong decoding methods significantly in both automatic and human evaluations.

arXiv.org
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
https://arxiv.org/abs/2310.10378
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.

arXiv.org
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
https://arxiv.org/abs/2305.14160
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided context remains under-explored. In this paper, we investigate the working mechanism of ICL through an information flow lens. Our findings reveal that label words in the demonstration examples function as anchors: (1) semantic information aggregates into label word representations during the shallow computation layers' processing; (2) the consolidated information in label words serves as a reference for LLMs' final predictions. Based on these insights, we introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL. The promising applications of our findings again validate the uncovered ICL working mechanism and pave the way for future studies.

arXiv.org
Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models
https://arxiv.org/abs/2311.05746
Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models

Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by these technologies. Among the minority groups under-represented in AI, data from low-income households are often overlooked in data collection and model evaluation. We evaluate the performance of a state-of-the-art vision-language model (CLIP) on a geo-diverse dataset containing household images associated with different income values (Dollar Street) and show that performance inequality exists among households of different income levels. Our results indicate that performance for the poorer groups is consistently lower than the wealthier groups across various topics and countries. We highlight insights that can help mitigate these issues and propose actionable steps for economic-level inclusive AI development. Code is available at https://github.com/MichiganNLP/Bridging_the_Digital_Divide.

arXiv.org
EMNLP 2023, Singapore

By popular request: We analyzed over 40,000 papers and computed 2022 Impact Factors for all 88 Computer Science Conferences listed on csrankings.org and CORE A*. See them here: https://github.com/eickhoff/csimpact

#computerscience #academia #conferences

GitHub - eickhoff/csimpact

Contribute to eickhoff/csimpact development by creating an account on GitHub.

GitHub

Join our team! Tübingen is hiring an Assistant Professor of Human Language Technologies.

https://www.nature.com/naturecareers/job/12807722

#nlproc #ai #ml #jobs #academia

W1 Assistant Professor (f/m/d) of Human Language Technologies - Tübingen, Baden-Württemberg (DE) job with University of Tübingen – Faculty of Medicine | 12807722

The Tübingen Faculty of Medicine and University Hospital are part of the University of Excellence Tübingen. They provide medical services at the hi...

Nature Careers

Our lab (and the wider IMPRS community in Tübingen and Stuttgart) are admitting doctoral researchers for the 2024 cycle.

Apply by Nov 15th!

http://imprs.is.mpg.de/application

#jobs #phd #gradschool

Application

SparseGPT: Massive Language Models can be Accurately Pruned in One-Shot
https://arxiv.org/abs/2301.00774
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

arXiv.org