Unsere Blogreihe „Frauen in der Wissenschaft“ stellt #TIB-Kolleginnen vor, die Einblicke in ihre Wege und ihre persönlichen Erfahrungen in der Wissenschaft geben. #JenniferDSouza studierte #NaturalLanguageProcessing an der University of Texas in Dallas und arbeitet heute als KI-/NLP Group Lead an der TIB.
Im Interview spricht sie über ihren Weg in die Forschung sowie darüber, wie vielfältige Perspektiven Wissenschaft und #KünstlicheIntelligenz bereichern: https://blog.tib.eu/2026/06/25/frauen-in-der-wissenschaft-dr-jennifer-dsouza/
Our blog series "Women in Science" introduces #TIB colleagues who provide insights into their careers and personal experiences in science. #JenniferDSouza studied #NaturalLanguageProcessing at the University of Texas at Dallas and now works as AI/NLP Group Lead at TIB.
In the interview, she talks about her path into research, the importance of curiosity and openness, and how diverse perspectives strengthen science and #ArtificialIntelligence: https://blog.tib.eu/2026/06/25/women-in-science-dr-jennifer-dsouza/

Топ вопросов по LLM: стратегии генерации текста и метрики оценки LLM

На NLP/LLM-собеседованиях часто проверяют не то, знаешь ли ты слова top-k, top-p и BLEU, а понимаешь ли ты, что происходит с распределением вероятностей, почему greedy decoding зацикливается, зачем нужна temperature и почему BLEU плохо оценивает ответы современных LLM. В этой статье - чеклист по языковому моделированию, стратегиям генерации и метрикам качества. Это не полноценная лекция с нуля, а тренажёр, по которому стоит пройтись перед техническим интервью по NLP, чтобы закрыть пробелы и вспомнить необходимую базу.

https://habr.com/ru/articles/1044418/

#машинное_обучение #искусственный_интеллект #naturallanguageprocessing #deeplearning #large_language_model #data_science

Топ вопросов по LLM: стратегии генерации текста и метрики оценки LLM

На NLP/LLM-собеседованиях часто проверяют не то, знаешь ли ты слова top-k, top-p и BLEU, а понимаешь ли ты, что происходит с распределением вероятностей, почему greedy decoding зацикливается, зачем...

Хабр
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

arXiv.org

RE: https://mathstodon.xyz/@xameer/116744860546236676

statistical #naturallanguageprocessing processing method has been applied to automatically predict the outcome of cases tried by the European Court of Human Rights (violation or no violation of a specific article) based on their textual contents, reaching a prediction accuracy of 79%.[24] A subsequent qualitative analysis of these results provided some support towards the theory of legal realism. The authors write: "In general, and notwithstanding the simplified snapshot of a very complex debate that we just presented, our results could be understood as lending some support to the basic legal realist intuition according to which judges are primarily responsive to non-legal, rather than to legal, reasons when they decide hard cases."
#humanrights

OpenAI Upgrades GPT-5.5 Model with Improved Accuracy and Conversational Style

OpenAI has upgraded its GPT-5.5 model with a major update, boosting accuracy and conversational style to make interactions feel more human and natural. The new version promises more readable and engaging responses, with a focus on practical help tasks and a more conversational tone.

https://osintsights.com/openai-upgrades-gpt-55-model-with-improved-accuracy-and-conversational-style?utm_source=mastodon&utm_medium=social

#Gpt55 #ArtificialIntelligence #ConversationalAi #EmergingTechnologies #NaturalLanguageProcessing

OpenAI Upgrades GPT-5.5 Model with Improved Accuracy and Conversational Style

Discover how OpenAI's GPT-5.5 model upgrade enhances accuracy and conversational style, learn more about the improvements and what it means for you, read now and stay ahead.

OSINTSights
Stanford CS336 | Language Modeling from Scratch

Official course website for Stanford CS336: Language Modeling from Scratch (Spring 2026), including logistics, schedule, assignments, and course materials.

Stanford CS336
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)

The wording of natural language prompts has been shown to influence the performance of large language models (LLMs), yet the role of politeness and tone remains underexplored. In this study, we investigate how varying levels of prompt politeness affect model accuracy on multiple-choice questions. We created a dataset of 50 base questions spanning mathematics, science, and history, each rewritten into five tone variants: Very Polite, Polite, Neutral, Rude, and Very Rude, yielding 250 unique prompts. Using ChatGPT 4o, we evaluated responses across these conditions and applied paired sample t-tests to assess statistical significance. Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts. These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation. Our results highlight the importance of studying pragmatic aspects of prompting and raise broader questions about the social dimensions of human-AI interaction.

arXiv.org

How Can We Prevent AI Models From Cannibalizing Themselves When Human-Generated Data Runs Out? 

Getty Images While the evolution of artificial intelligence (AI) systems has shown no sign of slowing, there's a growing concern that large language models (LLMs) will soon run out of human-made data to ingest and learn from. Once this happens, scientists say, AI models will increasingly rely on synthetic AI-made information, which will lead to an effect called "model collapse."......Continue reading... By:  Roland Moore-Colyer Source:  Live Science . Critics: A backdoor in a […]

https://onlinemarketingscoops.com/2026/05/22/how-can-we-prevent-ai-models-from-cannibalizing-themselves-when-human-generated-data-runs-out/

How Can We Prevent AI Models From Cannibalizing Themselves When Human-Generated Data Runs Out? 

Getty Images While the evolution of artificial intelligence (AI) systems has shown no sign of slowing, there’s a growing concern that large language models (LLMs) will soon run out of human-m…

Online Marketing Scoops

I'd like to introduce #Emily, an #OpenSource #InformationRetrieval system I've been working on in my spare time.

https://petebleackley.github.io/Emily/

#Python #NaturalLanguageProcessing @IRRJ

Emily

Dr Peter Bleackley’s Portfolio

Pete Bleackley