Findings from #WMT23
Our Chat4 friend is in the winning group across tasks
Most submissions still use from scratch training
Less constrained (low resource) submissions than before
More test suit submissions!
Low resource results TBD (tech issue)
#EMNLP2023 #WMT #neuralEmpty #LLMs

שלום כיתה ‫#נלפ‬ א1
כמה גורמים אפשריים לדעתכם לכישלון המפואר הזה?

‫#תרגום‬ ‫#בלשנות‬
‏‪#HumanLevelTranslation‬ ‪#neuralempty‬ ‪#NLProc #NLP

Few-shot learning almost reaches traditional machine translation

https://arxiv.org/abs/2302.01398
#enough2skim #NLProc #neuralEmpty

The unreasonable effectiveness of few-shot learning for machine translation

We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.

arXiv.org

3 reasons for hallucinations started
only 2 prevailed

Finding how networks behave while hallucinating, they
filter hallucinations (with great success)

https://arxiv.org/abs/2301.07779
#NLProc #neuralEmpty #NLP #deepRead

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Neural sequence generation models are known to "hallucinate", by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.

arXiv.org

Bold statement (need to think about it more), especially when coming from a machine translation person.

I’d claim MT was no less revolutionary once it became pervasive in industry. But @marian_nmt seems to dismiss it now given ChatGPT

https://twitter.com/marian_nmt/status/1605956605475901456

#NLP #NLProc #neuralempty #NMT

Marcin Junczys-Dowmunt (Marian NMT) on Twitter

“@mich_ptaszynski Sure. The difference is the demo and the interface. I would say we thought we "knew", but we didn't until we could play with it. If I had to bet, I would say NLP has produced its first historic moment, also in the public eye. What else compares? My research certainly doesn't.”

Twitter

@ #conll #EMNLP talk to me about
ColD Fusion & https://ibm.github.io/model-recycling/
BabyLM shared task
https://www.label-sleuth.org/
Enhancing decoders with syntax

And guided work (talk to them too)
Estimating #neuralEmpty quality with source only
Controlling structure in - neuron level
Details:

Home

Model-recycling - the best model per architecture. Comparing finetuned models from HF, as base models for future finetuning.

Model Recycling