Microsoft Research (@MSFTResearch)

Microsoft Research가 인간 중심 음성 파이프라인 'Paza'와 저자원 언어를 위한 최초 리더보드 'PazaBench'를 공개했습니다. 이 작업은 39개의 아프리카 언어와 52개 모델을 포함하며, 실제 커뮤니티 환경에서 테스트되어 저자원 언어 지원과 현장 검증을 강조합니다.

https://x.com/MSFTResearch/status/2019277164319842784

#microsoft #speech #paza #pazabench #lowresource

Microsoft Research (@MSFTResearch) on X

Microsoft Research unveils Paza, a human-centered speech pipeline, and PazaBench, the first leaderboard for low-resource languages. It covers 39 African languages and 52 models and is tested with communities in real settings. https://t.co/OMZebr2YYO

X (formerly Twitter)

Meta’s new Omnilingual ASR model drops character error rates below 10 % for 78 % of the 1,600 languages it was tested on – a huge leap for low‑resource, under‑represented tongues. The system leverages in‑context learning and is released under Creative Commons, inviting the community to build on it. Read the full benchmark details! #OmnilingualASR #SpeechAI #LowResource #UnderrepresentedLangs

🔗 https://aidailypost.com/news/metas-omnilingual-asr-hits-sub10-error-78-1600-languages

Off-topic, but just in case: the Transducens Research Group offers a PhD position to work with LLMs to translate low-resourced languages.

https://transducens.dlsi.ua.es/opening-for-a-phd-position-at-the-transducens-research-group-in-alicante-spain-working-with-llms-for-translating-low-resource-languages/

#lowresource #translation #LLM #Phd

Opening for a PhD position at the Transducens research group in Alicante (Spain), working with LLMs for translating low-resource languages | Transducens

Are you compositionally curious 🤓

Want to know how to learn embeddings using🌲?

In our new #ICML2025 paper, we present Banyan:
A recursive net that you can train super efficiently for any language or domain, and get embeddings competitive with much much larger LLMs 1/🧵

#embeddings #structure #nlp #semantics #efficient #lowresource

#Testing #LowResource #Language Support in #LLMs Using Language Proficiency Exams: the Case of #Luxembourgish https://arxiv.org/abs/2504.01667
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish

Large Language Models (LLMs) have become an increasingly important tool in research and society at large. While LLMs are regularly used all over the world by experts and lay-people alike, they are predominantly developed with English-speaking users in mind, performing well in English and other wide-spread languages while less-resourced languages such as Luxembourgish are seen as a lower priority. This lack of attention is also reflected in the sparsity of available evaluation tools and datasets. In this study, we investigate the viability of language proficiency exams as such evaluation tools for the Luxembourgish language. We find that large models such as ChatGPT, Claude and DeepSeek-R1 typically achieve high scores, while smaller models show weak performances. We also find that the performances in such language exams can be used to predict performances in other NLP tasks.

arXiv.org
You might have heard me claim that most #NLG is #LowResource (not just #NaturalLanguageGeneration for #LowResourceLanguages). If you want to hear me explain a bit more, my talk from last year's #GEM workshop at #EMNLP2022 is now up online: https://underline.io/lecture/66771-most-nlg-is-low-resource-here-s-what-we-can-do-about-it
Most NLG is Low-Resource: here's what we can do about it

On-demand video platform giving you access to lectures from conferences worldwide.

Underline.io

#DigitalIssues

@rek2 Fight for control over social facilities on the web🗣️✊

The control over social facilities that empower how we socialize, organize and share.

We do this through spread and collaborating facilities such as peertube, mastodon and xmpp/matrix.

#lowResource #facilities

Heaviness makes a tool become less interoperable, less tinkerable, less understandable and usage drains more resources.

Development of high resource tools requires more financiation. Its at least a full time job to maintain it and further development.

Much of the problem is that todays operating systems are designed around high resource tools. Low resource tools are often seen as technical.

We need low resource tools that aims to be part of a toolbox.

#lowresource #DigitalIssues

#DigitalIssues

If we really want maintaining a society to be less resource demanding, then low resource societies, enabled by lightweight software, is the solution.

Low resource societies would benefit us in many ways. Most apparently, we would have control over development, we would encourage studying code and enable increased flexibility.

A major problem with this, is that todays portals are incompatible with low resource usage.

#portals #lightweight #lowresource #gemini

When your covid lasts a week and screws up end of year plans as well as attending #GEM at #EMNLP2022.

I will be connecting to the virtual only poster session to discuss our paper on #LowResource #NLG in about an hour