Long and fiery winter night it is ! User:Dragons_Bot is importing frequency lists in 500 more #languages from Unilex into #Lingualibre. Dragons Bot script is running tonight, editing Lili persistently. We then will have common words list for 1001 languages, ready for you to record. At step 3 of the Recording Studio, click "local list", then search List:{your_iso}/Unilex and you are good to go ! If your community's languages aren't there you can let me know below. 🎉
https://lingualibre.org/wiki/Special:RecordWizard
Login required - Lingua Libre

🤖🐲Another long day with User:Dragons_Bot!

Months ago, I activated several #SignLanguages on #LinguaLibre. People can video record signed words. While doing activity stats, a #SPARQL query shown missing data on 467 languages items. Dragons_Bot just fixed those. Will be useful for incoming 3rd recording type for #WhistledLanguages. 😉

Today, I use Lingualibre Wikibase as a calm pad for coding my bot. Some days, I will move to #Wikidata for live editing on languages. 🎉
https://en.wikipedia.org/wiki/Whistled_language

Whistled language - Wikipedia

🤖🐲 User:Dragons_Bot to the rescue ! Doing clean ups !

Did you know ? Lingualibre has 219 #languages recorded, but a #SPARQL query will return 221 languages. Why ? Because Chinese, by example, is erroneously present twice 😲 :
- ❌ as Q130, iso: zho for #Chinese writing
- ✅ as Q113, iso `cmn`, for Chinese Mandarin
Tonight, I code a script to move all records toward `cmn`, on both #Lingualibre's items and #Commons' file wikipages. Fighto ! ò__ó

Script running tonight, still ~2hours to go. 3 languages and ~1,000 recordings to fix to get back to clean data. I wonder how others scripts of the toolchain will follow to be honest. But for now : sleep !
[2:25am edit: Well, I added few hours and finished that task ò__ó]
https://lingualibre.org/wiki/Help:SPARQL_for_maintenance#Counts
Help:SPARQL for maintenance - Lingua Libre

This new year, I'm working an open licence dump of 8,598 #Chinese audio recordings.

# The source
Those files were part of the original project -Shootka recorder (2005-2016)- which @Wikimedia_Fr, Nicolas Vion recoded and renamed into #Lingualibre (2016+). The total dumps of 150~300k audios have been laying there for 8 years now, in need for processing and migration to #WikimediaCommons.

# Scrapping
In past years, I noticed webpages of this archive project collapsed. Data still seemed available by one or two access points. Yesterday I scrapped all what I could get:

# Download all from Shtooka
$ wget -r -np -nc -nH --accept=.flac,.html -A .flac,.html https://packs.shtooka.net/ --no-check-certificate
>FINISHED --2024-01-03 14:06:55--
>Total clock time: 1h 26m 49s
>Downloaded: 114574 files, 7.9G in 21m 56s (6.14 MB/s)

😿 Many filepaths failed
💖 #Chinese HSK succeed
🔉 高低 gāodī: height

# INSPECT
I now got 8,598 Chinese #HSK audios. As usual, we progress with a small sample of files. The cycle goes :
-investigate
-code
-fix
-expand.

Thank good I document most of my actions on #Lingualibre for past decade ! Dozen of Help pages to onboard junior programmers.

https://lingualibre.org/wiki/Help:Converting_audios#Helpers

I updated a bit the command thanks to ChatGPT's suggestions. To print the Shtooka files' rich metadata, try :

$ ffprobe -hide_banner ./data/cmn-0a0a8a8b.ogg

*HSK version 1, from the 2000s

Help:Converting audios - Lingua Libre

# METADATA OF INTEREST
Among the twenty metadata this Chinese audio file contains, several are of intestest.

```
- speaker name
- speaker LL id
- speaker gender
- uploader username
- Wikidata language id
- word
- date of creation
- open license
```

Great ! I need those value for WikimediaCommons Template:Lingua_libre_record. 😉

# WIKIBOT
Dragons_Bot stands upon NodeJS and #WikiapiJS , a powerful JS framework I use. (I love this project, so I was also involved in its documentation.) It has 44 stars ⭐ on github :

https://kanasimi.github.io/wikiapi/

As of now, I have a decent script which creates the suitable wikitext and filename to upload, audios converted to .wav, and a clean data file to run the whole.

Sidenote: A more popular alternative for JS devs wanting a #wikibot is #WTF_wikipedia (750 ⭐ ) :
https://github.com/spencermountain/wtf_wikipedia/graphs/contributors

Home - Documentation

[Temporary toot]
Ok ! I'm done for today. As visible in the previous screenshot I got too much Shtooka metadata from audio files (ffmpeg). I need enlightenment from Nicolas Vion to know if the duplications I see there and indeed duplications or not. Phoning him later. When I get the greenlight, then I can mass import those HSK audios.

Chinese + Lingualibre + WikimediaCommons + Wikibot thread ➡️ cc @Ash_Crow @harmonia_amanda

Will you notice the differences ?

@lingualibre being #OpenSource, web developer and Wikimedian @elfix jumped in and switched our SPARQL endpoint url. This revives an important but heavy query to document and visualize #Gender biases, in order to counter it.

#LinguaLibre, as most @Wikimedia projects, reproduce gender and diversity biases. Our movement therefore leads explicit efforts for inclusivity and diversity.

- Prod https://lingualibre.org/LanguagesGallery/index.html
- Dev https://hugolpz.github.io/LanguagesGallery/

Lingualibre Languages Gallery

**90% of Wikipedia's Editors Are Male—Here's What They're Doing About It.**

The group that oversees the free encyclopedia is trying to fix a years-old problem.
By Robinson Meyer "

https://www.theatlantic.com/technology/archive/2013/10/90-of-wikipedias-editors-are-male-heres-what-theyre-doing-about-it/280882/

90% of Wikipedia's Editors Are Male—Here's What They're Doing About It

The group that oversees the free encyclopedia is trying to fix a years-old problem. 

The Atlantic

🎉 Finally ! Wikimedia Commons was blocking upload of files & filenames with characters in some minority #languages writing systems.

With today's change those characters and filenames are now accepted on Commons and therefore #Lingualibre. *It allows Lingualibre & Wikimedia to support more and smaller languages & cultures.*

Doesn't seem much but Lili members have been pushed this issue since 2021.

It's now fixed thank to MW dev @LucasWerkmeister and User:Nikki.❤️ https://commons.wikimedia.org/w/index.php?title=MediaWiki%3ATitleblacklist&diff=845765023&oldid=829500280

MediaWiki:Titleblacklist: Difference between revisions - Wikimedia Commons

Woow !! #LinguaLibre just made a +6 languages jump tonight ! I saw some unfamiliar ISO codes yesterday in the recent changes logs, I need to investigate.

We are now at 227 open content audio lexicons !
https://hugolpz.github.io/LanguagesGallery/

Lingualibre Languages Gallery

240 languages !
Thank to #Indonesia #Haji language, there are now 240 #languages on #lingualibre .
- Stats: https://hugolpz.github.io/LanguagesGallery/
- Wikidata: https://wikidata.org/wiki/Q5639933
- Wikipedia: https://en.wikipedia.org/wiki/Haji_language

Lingualibre continues to provide user-friendly systems for all and smaller languages communities to rapidly record their local vocabulary.

Lingualibre Languages Gallery

What a week for #Lingualibre !
Lingualibre is rich in #language.s data but also in huge need of web developers and outreach efforts.
I've been monitoring potential grants for a time, gathering all those i could identify in a Grants table.
https://meta.wikimedia.org/wiki/Template:Grants
Template:Grants - Meta

@Wikimedia_Fr and French Ministry of Culture are providing an helpful yearly lifeline for software maintenance.

But we need more to be above water and being bold.

@wikimediafoundation's Technology Fund was my main hope. But after monitoring it for 2 years and despite being a very necessary lifeline for Wikimedia's open source projects, it hasn't started yet. 😢 I still hope it will somedays, but ... https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Technology_Fund

Grants:Programs/Wikimedia Research & Technology Fund - Meta

Two month ago, the @wikimediafoundation got accepted within the Google Summer of Code 2024 ! #gsoc24

The GSoC is a @Google funded summer project were Google pays the internship for junior developers to contribute to open source codebases.

So I took my Grant writter / Mentor hat, and submitted not one, but TWO Lingualibre coding projects.

## LinguaLibre v3

The First is to speed up the deployment of the next version of Lingualibre, with easier to maintain, easier to query application. It is critical for every open source project to be easy to dive in, to fix, and expand. This effort will be co-mentored with @Poslovitch

And gosh ! In just few days, *7* junior developers expressed interest in this #Lingualibre GSoC24 . I onboarded 5 of them. 3 of whom already cloned the repository and started hacking around that #Django/Vuejs. 😮🎇

## LinguaLibre SignIt

The Second one is to revamp #SignIt, the click-to-translate web extension we use for #SignLanguage. This extension is cute, awesome, and can assist us all to learn sign languages ! But is in a deadly impasse for two reasons :
- it's Firefox only, with only <5% market share nowadays
- #webextension.s are phasing out the version of code we use.
A whole revamp is needed !

🔥IMPORTANT: We are looking for a 2nd mentor, with solid git/github (and ideally webextension) know how.

What a week of #Wikimedia projects !
1. Whistle language map
2. Label Culture Libre Silver recipient for Toulouse University
3. #Wikimania 2024 sessions submissions x2
4. « Sign language for welcoming Libraries » deployment !

I just landed back to Toulouse after 6h (nuclear) train's journey. I need sleep, so i let you know more about those projects tomorrow !

1. Whistled Occitan : leveraging #Lingualibre speed to record #languages vocabularies we are collaborating with one of the last 5 practicians of Whistled #Occitan. Completmented with #Wikidata and #SPARQL, we successfully prototyped an interactive multimedia map with local toponyms, so the public can explore their ancestral land and hear place names being whistled to them in this endangered language. A working prototype, it works but also needs further care.

2. Label Culture Libre: following a year of efforts from our Documentation service (SICD) team, assisting toward Open content cultural change, training myself 300+ staff along my Wikiresidence, and leading a dozen open content projects... We are happy to announced the University of #Toulouse received from @Wikimedia_Fr the Label Culture Libre, Silver level.

- UT SICD announcement: https://bibliotheques.univ-toulouse.fr/actualites/label-culture-libre-wikimedia-2024-le-sicd-labellise-niveau-argent
- Presentation : https://docs.google.com/presentation/d/13ECIt5qg2WOR3YOOypRzI5YS9R5Pwhvz/edit?usp=sharing&ouid=100658033193494547613&rtpof=true&sd=true

Label Culture Libre Wikimédia 2024 : le SICD labellisé niveau Argent | Service Inter-établissements de Coopération Documentaire

3. #Wikimania 2024, Katowice, Poland.

Looking forward to join my 10th yearly Wikimedian global conference and share with wiki peers, I submitted 2 sessions proposals :

- Supporting #minority #languages and Wikimedia's global community // Round table // 40mins
The panel will discuss needs, network and capabilities.

- Lingualibre in review : achievements, changes, analysis and pilote projects. // Presentation // 25mins
After passing the 1M recordings milestones, let's do a sum up. #SignIt

4. « Basic #SignLanguage lexicon for welcoming Libraries » (🇫🇷 Lexique #LSF pour des bibliothèques inclusives), on track for past 4 months as well, starts to bear tangible fruits with the prototyping of a webpages showcasting an essential set of useful French Sign Languages words to welcome signing people in French libraries. The project videos is contributed to by local sign language association IRIS.

1/3 Alert on #Lingualibre !

3 toots to learn how a large band Wikimedia IP/proxies block collapsed our Wikimedia tools and how to fix it.

It beggin 3 days ago when half a dozen users converged to report a vague but critical bug : LinguaLibre uploads were fully failing when sending files to #Commons !

2/3 It took me 2 workdays investigating, diving into #Lingualibre +#Commons + Meta's #SPARQLs, logs and APIs (recentchanges with tagfilter=lingualibre, userrights).

But there we are ! ✌️
1) massive collapse of uploads was observed
2) bug was replicated and error message was identified
3) unblock request made ! thk @ancilu , and the large ban block was reconfigured.
4) we are back online !
https://public-paws.wmcloud.org/User:Yug/QueryLingualibre-monthly.ipynb
https://commons.wikimedia.org/wiki/File:2024.05_Lingualibre_IP_ban_bug.png
https://meta.wikimedia.org/w/index.php?title=Steward_requests/Global&oldid=26774369#Unregistered_users_only_block_for_the_range_2001:41D0:0:0:0:0:0:0/32
https://commons.wikimedia.org/wiki/Special:RecentChanges?hidebots=1&translations=filter&hidecategorization=1&hideWikibase=1&tagfilter=OAuth+CID%3A+1735&limit=500&days=30&urlversion=2

Notebook

3/3 That bug has been ongoing since.... May 2th ! Thank to all volunteers who reported the bug and helped solve this CRITICAL #Lingualibre issue. 🌻 Feedbacks are critical to start our investigations and solve open source issues. For us, we had to reconfigure the IP ban. User:EPIC did it. ✌️

#Wikidata #Lingualibre #OSM news !

About 800 recordings in #Occitan Whistled were added to Wikidata thank to Univòc64 recording and @sriveenkat .

This fills in a gap since User:Lingua_Libre_Bot has been inactive.

Thank Sriveen !

As part of an exhibit those audios now can be observed, read and listen upon their territories via the following interactive #OSM+Wikidata map : https://hugolpz.github.io/NamesOfTheLand/

(more on this pilote project later 😉 )

Leaflet Map with Wikidata Villages

Toponims en occitan sifla

🎉 Lingualibre Google Summer of Code 2024 officially starting today ! Following successful application process, i'm honored to be a double GSoC mentor :
- On #SignIt web extension : Kabir, mentored by Ishan and myself. Fore better #signLanguages services.
- On #Lingualibre web app : Pushkar, mentored by @Poslovitch and myself. Documenting language diversity and musicality.
#GSoC24
@Hugo Tons of work ahead in the coming months, but I'm excited! 🥳

🎉 🎉 🎉
🇬🇧/🇮🇳 en: #Lingualibre just recorded its 250th language: #Dusun from Malaysia!

🇫🇷 fr: #Lingualibre enregistre sa 250e langue!
🇩🇪 de: #Lingualibre hat gerade seine 250. Sprache aufgenommen!
🇸🇦 ar: #Lingualibre سجلت للتو لغتها الـ 250!
🇨🇳 zh: #Lingualibre 刚刚记录了第250种语言!
🇺🇦 uk: #Lingualibre щойно записав свою 250-ту мову!
🇹🇷 tr: #Lingualibre az önce 250. dilini kaydetti!
🇪🇸 es: ¡#Lingualibre acaba de grabar su idioma número 250!
🇮🇳 hi: #Lingualibre ने अभी-अभी अपनी 250वीं भाषा रिकॉर्ड की है!
🎉 🎉 🎉

In the last 6 years LinguaLibre recorded and uploaded over 1.3M audio files to Wikimedia Commons, mostly to audio-illustrate @Wiktionary.ies and other digital dictionaries.
To explore further :
- Languages stats : https://lingualibre.org/LanguagesGallery/
- Explore Commons categories : https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation
- Record your languages for Wikimedia : https://lingualibre.org/wiki/Special:RecordWizard

#Language #diversity #audio

Lingualibre Languages Gallery

46,796,872 times a in January 2023 !
917,287,302 times in 37 months !

That's the numbers of times #Lingualibre audios were queried by online users .

191 wikimedia projects use and share Lingualibre recordings in 250 languages.

A year later, we can reasonably assume 1.5 Billions views/queries occurred.

cc @Wikimedia_Fr @wikimediauk @wikiresearch @wikidatatw
https://glamtools.toolforge.org/baglama2/#gid=377&month=202301&giu=enwiktionary&server=en.wiktionary.org

Following a week of collaborative discussions and coding, #LinguaLibre audio resources are getting integrated into the popular web extension dictionary #Yomitan.

Yomitan is an rich, extensive dictionary used by 20,000+ Chrome and Firefox users. It also has #Anki integration. cc @Wikimedia_Fr

Thank @bicolino34 for this collaboration idea !

https://github.com/themoeway/yomitan/pull/1129#pullrequestreview-2144436929
https://chromewebstore.google.com/detail/yomitan/likgccmbimhjbgkjambclfkhldnlhbnn
https://addons.mozilla.org/en-US/firefox/addon/yomitan/

add lingua libre audio source by StefanVukovic99 · Pull Request #1129 · themoeway/yomitan

Closes #1093 @hugolpz thanks for the fiddle, it was super helpful 🧎‍♂️ I have some more questions, please answer if you can (or point to documentation) 🙏 : Search for Hund returns both German and...

GitHub

In #Lyon for State Of The Map 2024 !
@dmontagne and I will co-present our work and extreme field linguistic interactive map on endangered Whistled #Occitan. Our map leverages and combines #OSM, #Wikidata , #Commons , #Lingualibre , aka the best of the open knowledge movement.
Thank to URFIST Occitanie for supporting this project.
https://pretalx.com/sotm-fr-2024/talk/CL7UFN/

#SOTM24 #SOTMFR

Adiu ! Quand l'union des communs fait la force du patrimoine immatériel State Of The Map 2024

L'occitan sifflé est utilisé par moins de dix personnes dans le monde. Déclinaison de l'occitan, il permet de dialoguer entre versants opposés dans la vallée d'Ossau. Cette présentation reviendra sur la construction et la valorisation d'une carte interactive de toponymes en occitan sifflé qui mobilise 5 communs : OpenStreetMap, Lingua Libre, Commons, Wikidata et Wikipédia.

#LinguaLibre's #Occitan Whistled map is willfully easy to hack and adapt to other regional #languages. Our will is to put our villages historic names and voices back on the map and its territories. Following's @dmontagne request, I drafted a "Replication" section which guide a willing contributor to replicate the methodology and fork project for its language. Writing is ongoing. It will indeed need more work ! cc @Poslovitch https://github.com/hugolpz/NamesOfTheLand/blob/main/README.md

#replication #map #fork #Occitan

NamesOfTheLand/README.md at main · hugolpz/NamesOfTheLand

Maps of local place in minority language. Contribute to hugolpz/NamesOfTheLand development by creating an account on GitHub.

GitHub

Missed our State of the Map presentation of Whistled Occitan interactive map using #OSM, #Commons, #Wikidata and #LinguaLibre ?

Our presentation is now available on #HAL archive thank to @dmontagne ! (🇫🇷 French) cc
@bibliofab66 https://hal.science/hal-04628915

Adiu ! Quand l'union des communs fait la force du patrimoine immatériel

Whistled Occitan is used by fewer than ten people worldwide. A declension of Occitan, it enables dialogue between opposing slopes in the Ossau valley. This presentation will look at the construction and valorization of an interactive map of toponyms in Occitan which mobilizes 5 commons: OpenStreetMap, Lingua Libre, Commons, Wikidata and Wikipedia.

🎉 According to Commons `Category:Files by upload tool`, #LinguaLibre _could be_ the 4rd most productive upload tool to date.

A rapid review shows :
- Uploaded with VicuñaUploader‎ (6 C, 2551994 F)
- Uploaded with pattypan‎ (2 P, 1742036 F)
GWToolset Batch Upload‎ (2 C, 3 P, 1289947 F)
- Lingua Libre pronunciation‎ (245 C, est. 1,260,261 F)

Source for Lingualibre: https://lingualibre.org/wiki/LinguaLibre:Stats

LinguaLibre:Stats - Lingua Libre

Heading to the #WikiCamp2024 . I will report on both recent Wikimedia residences and large efforts on #LinguaLibre. On those later fronts, past two years have seen some solid changes, by chronological order:
- #SignIt's recording studio fix, 0x010C & myself
- LinguaLibre revamp assessment, then core recording @Poslovitch
- Indonesian languages recordings, Ardzun
- SignIt #GSoC24 revamp, Kabiraa and myself
- LinguaLibre GSoC24 coding, Pushkar, @Poslovitch and myself
Hunger to exchange on those.
Heading to Wikimania !
My journey starts in Bayonne, land of the Basque people. In Toulouse, take off for Krakow. Tonight, landing in Katowice.

Landed in Katowice for #Wikimania !
After a restful night, i start collaborative work on our Friday 10:00am round table « Minority languages and Wikimedian community ».

No official #Lingualibre presentation this year despite massive news.
Why ? The codebase is being revamped *right this summer* by 2 Google Summer of Code 2024 interns* in addition to an exhaustingly super productive year as a wikiresident !

*: see earlier and future toots on #GSoC24

#Wikimania GLAM training by Andrew Lih & User:JamieF.
Very structuring, they report on Wiki professional surveys and findings.

Most used projects:
1. Wikidata
2. Commons
3. Wikipedia
4. Wikisource 👀

Most used tools:
1. WDQS
2. QuickStatement
3. Commons Upload wizard
4. OpenRefine

#Wikimania GLAM training raising awareness on key conceptual mapping.

https://wikidata.org/wiki/WD:LOD

Wikidata:Linked open data workflow - Wikidata

#Lingualibre Google Summer of Code 2024 has officially closed !

Past week I reported to @Wikimedia_Fr on Lingua Libre Django's progresses this summer.

Today was focused on feature-by-feature testing of Alpha at :
- https://dev.lingualibre.org
And closing associated Phabricator tickets :
- https://phabricator.wikimedia.org/tag/lingua-libre/

Deeper tech report due for Friday !
It will sent the direction for next developement cycle.

Anyway, enough Vuejs and Django for me today ! #pain

#GSoC #Languages #Wiktionary

Lingua Libre