Fix mojibake in Unicode text, after the fact

ftfy는 Python 패키지로, UTF-8 인코딩 오류(모지베이크)를 감지하고 복구하는 데 특화되어 있습니다. 여러 겹의 인코딩 오류나 HTML 엔티티 문제도 해결할 수 있으며, 잘못 디코딩된 텍스트를 원래 의도된 문자열로 복원합니다. AI 데이터 전처리나 NLP 연구에서 텍스트 정제에 유용하며, Apache 라이선스 하에 배포되어 사용 시 저작자 표기를 요구합니다.

https://github.com/rspeer/python-ftfy

#python #textprocessing #unicode #mojibake #nlp

GitHub - rspeer/python-ftfy: Fixes mojibake and other glitches in Unicode text, after the fact.

Fixes mojibake and other glitches in Unicode text, after the fact. - rspeer/python-ftfy

GitHub
@silverpill @Profpatsch @hongminhee @liaizon @Edent @north @aumetra
I have considered publishing an FEP about #GloballyInclusiveHandles . At FediForum six months ago I got the advice to write three:
1. Advocating for #GloballyInclusive handles and laying out requirements and issues
2. Explaining prior art from #Unicode technical annexes on domain names and identifiers, #ICANN label generation rules for DNS, #UniversalAcceptance, email addresses, etc.
3. Advocating for linkification of globally inclusive handles and layout out requirements and issues.
Do those sound like good FEPs to write at this point?

ASCII Chessboard, No HTML Required - Sometimes, when I have absolutely nothing to do, I play with ASCII characters in vim. Today I made an ASCII chess board with black and white chess pieces. I'm pretty sure I'm not the first one to make an ascii chessboard and I won't be the last. I thought it looks pretty nice so I wanted to share it on my blog.

Full blog post at https://sava.rocks/blog/ascii-chessboard-no-html-required/

#ascii #unicode #chess

Curious what character limit can actually mean. On Twitter it's bytes, so two-byte Unicode characters eat up the allocations fast but on Bluesky it seems to by glyphs so what had to be trimmed for Twitter has lots of legroom on Bluesky.

#Tech #character #Unicode

The latest version 2.0.0 of the open-source application "Unicopedia Symbolica" (previously part of the "Unicopedia Plus" application) adds a new "Emoji Taxonomy" utility.

🔗 https://codeberg.org/tonton-pixel/unicopedia-symbolica

#Unicopedia #Symbolica #Unicode #Emoji #Taxonomy

@dbattistella
@inthehands
A #VultureEmoji has been officially “Under Consideration” by the #Emoji people at #Unicode since 2019

These hard-working birds would be a great addition to the other avian emoji 🦆🐦‍⬛🦅🦉🪿🐦🐧🐔🐥🐣
https://docs.google.com/document/d/1hU8yWK8U6jcMjjxR8DKYA8VI3F0xsQsCcpNaufnBnh0/

VULTURE Emoji Proposal

Proposal for Emoji: VULTURE Submitters: 'Álvaro Flórez Estrada' Public School, Pola de Somiedo, Spain; Matumaini ONG, Oviedo, Spain; Mwema Street Children Centre-Karatu (MSCCK), Karatu, Tanzania; Dr. Patricia Mateo-Tomás, Biodiversity Research Unit (UMIB, CSIC-UO-PA), Oviedo, Spain and Centre ...

Google Docs

Unicode support once was a great strength of the programming language go - two of the founders developed UTF-8. The current state of the unicode support is not looking good, bugs don't get fixed, almost no more commits in x/text/unicode. :-(

#golang #unicode

Today I learned that the 64 I Ching hexagram symbols are all included in Unicode.

I'm fairly sure I'll never need to use them so I'll add it to my ever growing list of probably useless bits of knowledge.

https://en.wikipedia.org/wiki/List_of_hexagrams_of_the_I_Ching

#unicode #hexagram #symbols

List of hexagrams of the I Ching - Wikipedia

Ever wanted to create your own emoji? Now’s your chance
Graphic designer Jennifer Daniel helps Unicode decide on new emojis — and says a good emoji should be like a Swiss army knife.
https://www.cbc.ca/radio/thecurrent/call-for-new-emoji-9.7178355?cmp=rss