Next up at #CLIDA2024 is Gruffudd Prys sharing "Recent Language Technology developments fro Welsh at Bangor University"

Prys: The team includes 8 linguists and 5 developers working on dictionaries, grammar and spelling checkers, speech synthesis and recognition, machine translation, computer-assisted translation tools, and NLP tools, all in order to facilitate the use of Welsh in digital contexts.

Prys: Two types of outputs: language tech building blocks that companies and others can use (distributed under permissive open source licenses) and end-user products (e.g. apps) for the language.

These tools from https://techiaith.cymru/ have also been integrated into #Spacy (https://spacy.io/).

They also have a Welsh virtual assistant called Macsen which you can run on iOS or Android to interact with your phone by voice in Welsh.

Prys: you can also use Macsen through the web at https://macsen.techiaith.cymru/ to do speech recognition.

Text-to-speech demo (with bilingual voices) is available at https://tts.techiaith.cymru This bilingualism represents the actual use of the language today

macsen

A new Flutter project.

Prys: we have also collaborated with OpenAI to provide chatbot answers through ChatGPT in Welsh. This and translation from spoken language are also available from the Macsen demo online.

There are also translation tools specialised to health and policy domains.

Prys: on a Welsh legislation dataset, fine-tuning GPT3.5 achieves better BLEU scores (~58) than GPT-4 (without fine-tuning) at ~55. The Bangor neural machine translation system achieves a BLEU score of 72.6(!)