Mastodawn

I’m training up some machine learning models to 'disambiguate homographs' (aka heteronyms, or words with identical Latin alphabet spellings but different pronunciations like ‘bow/bow’, ‘tear/tear’). This will help solve one of the more annoying aspects of auto transliteration into #Shavian. It is both very exciting and intensely boring, since I’m having to make the data sets. Hard to believe there is almost no publicly available training data for this. #𐑖𐑱𐑝𐑾𐑯

Show thread

𐑖𐑱𐑝𐑾𐑯 (Shaw Alphabet) 🦁Aug 4, 2024

𐑷𐑤𐑕𐑴, 𐑣𐑧𐑤𐑴—𐑦𐑑𐑕 𐑚𐑰𐑯 𐑩 𐑢𐑲𐑤! #𐑖𐑱𐑝𐑾𐑯

Show thread

Woogachaka Aug 4, 2024

@shavian 𐑢𐑧𐑤𐑒𐑪𐑥 𐑚𐑨𐑒!

Show thread

lovetocode999

Aug 4, 2024

@shavian 𐑣𐑱, 𐑢𐑧𐑤𐑒𐑩𐑥 𐑚𐑧𐑒!

Show thread

yttyx Aug 4, 2024

@shavian Sounds like a great project! I have spent many an hour manually resolving homographs :)

Show thread

Benjamin Kwiecień 🇵🇸Aug 4, 2024

@shavian This is a big problem with Persian texts too

Show thread

Jesse Onland Aug 5, 2024

@shavian Seems like part of speech classification would cover most of these.

Show thread

𐑖𐑱𐑝𐑾𐑯 (Shaw Alphabet) 🦁Aug 5, 2024

@jdonland Most cases yes, but there are still about 80 or so words that are troublesome (e.g. if you count bow, bows, bowing, bowed as separated words).