#Unicode

Revath S Kumar :javascript:'s avatar
Revath S Kumar :javascript:

@[email protected] · Reply to Revath S Kumar :javascript:'s post

Wrote a small web utility to visualize the different string normalization forms of a text.

string-normalize.surge.sh/?str

Not the best design 😄 , but feedbacks are welcome.

desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible
desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible
mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible
mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible
Michel Mariani's avatar
Michel Mariani

@[email protected]

New utility in Unicopedia Sinica:
- Pan-CJK Font Variants
(port from Unicopedia Plus, with Serif/明朝体 font style instead of Sans-Serif/ゴシック体)

🔗 codeberg.org/tonton-pixel/unic

Pan-CJK Font Variants utility screenshot
Pan-CJK Font Variants utility screenshot
Michel Mariani's avatar
Michel Mariani

@[email protected]

New utility in Unicopedia Plus:
- Unihan Phonetics

🔗 codeberg.org/tonton-pixel/unic

Unihan Phonetics utility screenshot
Unihan Phonetics utility screenshot
Revath S Kumar :javascript:'s avatar
Revath S Kumar :javascript:

@[email protected] · Reply to Revath S Kumar :javascript:'s post

Wrote a small web utility to visualize the different string normalization forms of a text.

string-normalize.surge.sh/?str

Not the best design 😄 , but feedbacks are welcome.

desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible
desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible
mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible
mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible
SnoopJ's avatar
SnoopJ

@[email protected]

have you ever "naturally" (i.e. not discussion among experts) encountered a font that correctly renders ꙮ?

OptionVoters
yes0 (0%)
no0 (0%)
what the hell are you talking about0 (0%)
Revath S Kumar :javascript:'s avatar
Revath S Kumar :javascript:

@[email protected]

New blog post : "JavaScript : understanding string normalize"

blog.revathskumar.com/2025/01/

:rss: Qiita - 人気の記事's avatar
:rss: Qiita - 人気の記事

@[email protected]

Unicode - 恩恵と厄介事
qiita.com/chai0917/items/16fa5

:rss: Qiita - 人気の記事's avatar
:rss: Qiita - 人気の記事

@[email protected]

[謹賀新年] 世界中に配置した Oracle Active Data Guard から新年のご挨拶
qiita.com/shirok/items/1da55c2

Paul McGuire's avatar
Paul McGuire

@[email protected] · Reply to Axel Rauschmayer's post

@rauschma Ah! I did something similar in Python - this is valid Python code:

def ℎ𝕖𝐥l𝙤():
try:
ℎ𝙚𝕝𝗹𝘰_ = "Hello"
w𝔬𝓇ˡ𝚍﹎ = "World"
𝖕𝘳𝒊𝖓𝑡(f"{𝗵𝒆𝘭𝓵𝚘﹍}, {𝑤º𝘳l𝑑︴}!")
except T𝗒ₚ𝕖E𝗿𝗋𝗈𝓻 as ᵉ𝒙ⅽ:
𝐩ᵣ𝚒𝖓𝓉("failed: {}".𝕗𝕠r𝑚𝖺𝘵(ⅇ𝔵𝚌))

if _︳n𝗮𝖒𝓮﹍︳ == "__main__":
h𝙚ⅼ𝐥𝕠()

ptmcg.pythonanywhere.com/font_

Scott Williams 🐧's avatar
Scott Williams 🐧

@[email protected]

"This coding interview is just going to be determining the human friendly length of a unicode utf-8 string."

Junior level dev: "Oh, this is going to be easy. How do they not know about len()?"

Senior level dev: "Oh, brilliant - a test of tolerance for pain by evaluating various code point chains with emoji, accents, and LTR/RTL markers. I'll start by writing some tests for 8-bit ord and char conversions with lookahead evals."

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
Scott Williams 🐧's avatar
Scott Williams 🐧

@[email protected]

"This coding interview is just going to be determining the human friendly length of a unicode utf-8 string."

Junior level dev: "Oh, this is going to be easy. How do they not know about len()?"

Senior level dev: "Oh, brilliant - a test of tolerance for pain by evaluating various code point chains with emoji, accents, and LTR/RTL markers. I'll start by writing some tests for 8-bit ord and char conversions with lookahead evals."

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
Scott Williams 🐧's avatar
Scott Williams 🐧

@[email protected]

"This coding interview is just going to be determining the human friendly length of a unicode utf-8 string."

Junior level dev: "Oh, this is going to be easy. How do they not know about len()?"

Senior level dev: "Oh, brilliant - a test of tolerance for pain by evaluating various code point chains with emoji, accents, and LTR/RTL markers. I'll start by writing some tests for 8-bit ord and char conversions with lookahead evals."

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
SiljeLB's avatar
SiljeLB

@[email protected]

TIL that a proposal was made in 1997 to add to . I'm disappointed it hasn't been made official yet though. Here's a link to the proposal document: unicode.org/wg2/docs/n1641.pdf

omg! ubuntu's avatar
omg! ubuntu

@[email protected]

Ubuntu LTS users will shortly be able to see and use the 8 new emoji included in Unicode 16.0.

omgubuntu.co.uk/2024/12/ubuntu

Michel Mariani's avatar
Michel Mariani

@[email protected]

In the open-source application `Unicopedia Sinica`, both data files used for the `CJK Components` and the `CJK Related` utilities are now in a consistent JSON format with MIT license: `cjk-ids.json` and `cjk-related.json` respectively.

🔗 codeberg.org/tonton-pixel/unic

These data files are still a work in progress, independently available in their own (frequently updated) repository:
🔗 codeberg.org/tonton-pixel/cjk-
🔗 codeberg.org/tonton-pixel/cjk-

CJK Related utility screenshot
CJK Related utility screenshot
CJK Related utility screenshot
CJK Related utility screenshot
CJK Components utility screenshot
CJK Components utility screenshot
SnoopJ's avatar
SnoopJ

@[email protected]

HUH, UAX#31 offers official guidance on hashtag identifiers, and I have somehow managed to miss that completely for several years (introduced along with Unicode 11.0 in 2018).

unicode.org/reports/tr31/#hash

It's not like I re-read the whole document regularly or anything but yea huh

Aaron “#e14n pro” Madlon-Kay's avatar
Aaron “#e14n pro” Madlon-Kay

@[email protected] · Reply to Aaron “#e14n pro” Madlon-Kay's post

iOS 18.2 did not add any new coverage, at least at the code point level. Nevertheless, I have updated

tofu.quest/?q=%F0%9F%A5%A8

洪 民憙 (Hong Minhee)'s avatar
洪 民憙 (Hong Minhee)

@[email protected]

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , and @hollo, an ActivityPub-enabled microblogging software for single users.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

Eniko | Kitsune Tails out now!'s avatar
Eniko | Kitsune Tails out now!

@[email protected]

Btw here's a little unicode protip: unicode defines several character ranges as private use areas. You can map code points in these ranges to whatever glyph you want. This can be very handy for custom characters in your game that won't conflict with established unicode characters

In our games we use the PUA for keyboard and controller button glyphs

Michael Zöllner's avatar
Michael Zöllner

@[email protected]

My study "Unicode Spaces" will be published in Slanted Magazine - Experimental Type 3!

Listing of Unicode white space characters
Listing of Unicode white space characters
Steamboat Willy formed with whitespaces in text.
Steamboat Willy formed with whitespaces in text.
Flower formed with whitespaces in text.
Flower formed with whitespaces in text.
Marcus Rohrmoser 🌻's avatar
Marcus Rohrmoser 🌻

@[email protected] · Reply to zirias (on snac)'s post

@zirias @stefano ​s are defined: unicode.org/reports/tr31/#D2

read 'em like this codeberg.org/seppo/seppo/src/c

Terence Eden's avatar
Terence Eden

@[email protected]

iOS 14 gets support for the Unicode Power Symbol!

shkspr.mobi/blog/2020/09/ios-1

Jim DeLaHunt

@[email protected] · Reply to Jim DeLaHunt's post

A cool change is that the Core Specification of the Unicode Standard is now released as a static HTML subsite, backed up by an archiveable of 1,140 pages.

unicode.org/versions/Unicode16

You can now link to specific sections and paragraphs, e.g.

"Unicode is about plain text, see: unicode.org/versions/Unicode16" .

I helped out in a small way with the project to produce the core spec as HTML + PDF. I think it is a marvellous improvement.

Jim DeLaHunt

@[email protected]

Yay! version 16.0 is released!

Announcement: blog.unicode.org/2024/09/annou

liilliil 🇫🇯🇱🇨's avatar
liilliil 🇫🇯🇱🇨

@[email protected]

Народ, айда форсить наш, славянский, кириллический !
«Три снежинки» — ⁂ — потенциальный повод для многочисленных подъёбок

Польские ребята (@brie) нашли лучшего кандидата — ꙮ, «серафим многꙮкий». Символ, найденный в 1928 году только в одной (!) рукописи, и только из-за этого (!) добавленный в несколько веков ждал своего часа
ru.wikipedia.org/wiki/Мультиок

(English version im-in.space/@liilliil/11302839 )

AmyFou 🥥🌴's avatar
AmyFou 🥥🌴

@[email protected]

I am a (non-tenure track, uni) interested in every single thing about , esp ones, & Side gig in ( lol). I love and will ask you too many questions about your etc . Proud fan. Love 👋

洪 民憙 (Hong Minhee)'s avatar
洪 民憙 (Hong Minhee)

@[email protected] · Reply to 洪 民憙 (Hong Minhee)'s post

こんにちは、私はソウルに住んでいる30代後半のオープンソースソフトウェアエンジニアで、自由・オープンソースソフトウェアとフェディバースの熱烈な支持者です。名前は洪 民憙(ホン・ミンヒ)です。

私はTypeScript用のActivityPubサーバーフレームワークである「@fedify」と、1人用フェディバースのマイクロブログである 「@hollo」の作成者でもあります。

私は東アジア言語(いわゆるCJK)とUnicodeにも興味が多いです。日本語、英語、韓国語で話しかけてください。(または、漢文でも!)

洪 民憙 (Hong Minhee)'s avatar
洪 民憙 (Hong Minhee)

@[email protected]

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , and @hollo, a fediverse microblog for single users.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (/#漢文)!

Chunshek :FlyingToaster:'s avatar
Chunshek :FlyingToaster:

@[email protected]

post for my own Mastodon instance!

• I’m a 43-year-old jack-of-all-trades.
• I grew up in , lived in the . My partner of 14 years and I moved to in 2020.
• We are “parents” to one remaining dog.
• I have worked in journalism, finance, L&D, and now EdTech.
• I speak 6 , and have dabbled in many others.
• Things I will nerd out about: , , .
• I am a person of faith, but not a fan of organized religions.
• I type in .

A man happily holding a ripe yellow pineapple in his left hand, while pointing at the pineapple with his right hand, smiling at the camera.
A man happily holding a ripe yellow pineapple in his left hand, while pointing at the pineapple with his right hand, smiling at the camera.
A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.
A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.
A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.
A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.
A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.
A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.
洪 民憙 (Hong Minhee)'s avatar
洪 民憙 (Hong Minhee)

@[email protected] · Reply to 洪 民憙 (Hong Minhee)'s post

If you believe that Chinese characters in , , and should all be divided into language-specific codes, then it is logical that the Latin characters in English, French, Italian, and German should all be divided into language-specific codes as well. Caveat: I don't believe so.

洪 民憙 (Hong Minhee)'s avatar
洪 民憙 (Hong Minhee)

@[email protected]

Well, I vote for Han unification of , and I rather think that more Chinese characters should have been unified (e.g., 高 & 髙, 產 & 産, 內 & 内). 🤷

xChaos's avatar
xChaos

@[email protected]

Nebaví vás googlit unicode znaky pro subscript a superscript? Mě už taky ne :-)

Akordy pro psaní horního a dolního indexu (ve smyslu Unicode) na klávesnici Windows se dají snadno vygooglit. Pod Linuxem je to ovšem trochu věda:

1) nejdřív Pravý alt + pravý shift + backspace + 2 (ano, čtyřhmat)
2) potom znak, který má být dolní index, třeba číslovka (což ovšem na české klávesnici, na kterou jste přepnutí, taky s shiftem, takže dvouhmat).

H₂O

Pro horní index ve stejném čtyřhmatu akorát nahradíte tu dvojku trojkou:

a² + b² = c²

Slušné akordy, ne? problém je, že pokud čtyřhmat nedomáčknete přesně (?) tak ten Backspace má tendenci fungovat jako backspace, takže umaže jeden znak... no zkrátka, dělám to pokaždé na několikátý pokus, zatím :-)

Vůbec jsem nepochopil návod
abclinuxu.cz/blog/kenyho_stesk
... asi proto, že nevím, která PC klávesa je "compose key", ale v komentářích čtenářů jsem si všiml návodu pro slovenskou klávesnici a funguje mi i pro český layout a tak to předávám dál.

SnoopJ's avatar
SnoopJ

@[email protected]

the most important part of history is when a mouse fell out of a light fixture and got added to the count of members present at a Technical Committee meeting (9 Nov 2016)

unicode.org/L2/L2016/16325.htm

Screenshot of meeting notes for UTC Meeting 149. Text reads:

Mouse now present. 6.502 members represented.

[149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.
Screenshot of meeting notes for UTC Meeting 149. Text reads: Mouse now present. 6.502 members represented. [149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.
Nemo_bis 🌈's avatar
Nemo_bis 🌈

@[email protected]

Re-: recurring topics here.

.net

1/4

Michel Mariani's avatar
Michel Mariani

@[email protected]

Unicopedia Ægypta is a developer-oriented set of utilities related to Egyptian hieroglyphs, wrapped into one single app, built with .

Repository: 🔗 codeberg.org/tonton-pixel/unic

Unicopedia Ægypta Social Preview
Unicopedia Ægypta Social Preview
Michel Mariani's avatar
Michel Mariani

@[email protected]

Unicopedia Plus is a developer-oriented set of Unicode, Unihan, Unikemet & emoji utilities wrapped into one single app, built with .

Repository: 🔗 codeberg.org/tonton-pixel/unic

Unicopedia Plus Social Preview
Unicopedia Plus Social Preview
Michel Mariani's avatar
Michel Mariani

@[email protected]

Unicopedia Sinica is a developer-oriented set of utilities related to ideographs, wrapped into one single app, built with .

Repository: 🔗 codeberg.org/tonton-pixel/unic

Unicopedia Sinica Social Preview
Unicopedia Sinica Social Preview
꧁ᐊ𰻞ᵕ̣̣̣̣̣̣́́♛ᵕ̣̣̣̣̣̣́́𰻞ᐅ꧂'s avatar
꧁ᐊ𰻞ᵕ̣̣̣̣̣̣́́♛ᵕ̣̣̣̣̣̣́́𰻞ᐅ꧂

@[email protected]

New 2d numeral system just dropped‽‽‽

It's based on ᚛ᚑᚌᚐᚋ᚜ & ☯ & bijective base 6, & works left→right or left←right

Aaron “#e14n pro” Madlon-Kay's avatar
Aaron “#e14n pro” Madlon-Kay

@[email protected]

Newly covered code points in 17.0:

ᜍ᜕ᜟ

My tooling also indicated that these are covered, but they don't actually show up on my iPhone:

􀑝

Gerrit Imsieke's avatar
Gerrit Imsieke

@[email protected]

Formatting people’s names correctly in a given context, for a given purpose, is hard. International linguists recently helped update the Common Locale Data Repository (). It will help programmers display person names correctly in many settings.
Mike McKenna wrote about it in “A Story Teller’s Case Study: Unlocking the Power of CLDR Person Name Formatting – A Solution for Formatting Names in a Globalized World” unicode.org/media/CLDR_Person_