🧪 Modellen er basert på en ny metode for post-trening av LLM-er utviklet ved LTG og beskrevet i artikkelen "Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages" av David Samuel, Lilja Øvrelid, Erik Velldal og Andrey Kutuzov:
https://arxiv.org/abs/2512.08777

#NLPRoc #NorMistral #Norsk #Norwegian

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

We propose a post-training method for lower-resource languages that preserves fluency of language models even when aligned by disfluent reward models. Preference-optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and language models capable of generating fluent synthetic data. Thus, in this work, we focus on developing a fluent preference-aligned language model without any instruction-tuning data in the target language. Our approach uses an on-policy training method, which we compare with two common approaches: supervised finetuning on machine-translated data and multilingual finetuning. We conduct a case study on Norwegian Bokmål and evaluate fluency through native-speaker assessments. The results show that the on-policy aspect is crucial and outperforms the alternatives without relying on any hard-to-obtain data.

arXiv.org

🤖 Ny norsk praterobot fra #LTG med chat-grensesnitt! NorMistral-11B-thinking er en åpen språkmodell (#LLM) finjustert for å følge instruksjoner og å resonnere før den svarer.

💬 For tilgang til chat: https://chat.llm.sigma2.no
🤗 Nedlasting fra #HF: https://huggingface.co/norallm/normistral-11b-thinking
👷‍♂️ Hovedutvikler: LTG-stipendiat David Samuel
ℹ️ For mer informasjon, se https://www.mn.uio.no/ifi/forskning/grupper/ltg/store-sprakmodeller-for-norsk

#NLProc #NorMistral #Norsk #Norwegian

Open WebUI