Hashtag

#Unicode

415 posts tagged with this hashtag.

@mikaeru@mastodon.social

The latest version 2.3.0 of the open-source application "Unicopedia Symbolica" introduces a new Language drop-down menu in the "Emoji Data Finder" utility, which lets you display the short name and keywords of all the emoji in 170 languages, including the ones whose direction is Right-To-Left (RTL).

🔗 codeberg.org/tonton-pixel/unic

The linguistic data comes from the Unicode CLDR Project:

🔗 cldr.unicode.org/

And all contributions to it are much welcome!

Screenshot of the "Filter Text" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "Arabic".
ALT text

Screenshot of the "Filter Text" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "Arabic".

Screenshot of the "Find by Name" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "French".
ALT text

Screenshot of the "Find by Name" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "French".

@mikaeru@mastodon.social

The latest version 2.3.0 of the open-source application "Unicopedia Symbolica" introduces a new Language drop-down menu in the "Emoji Data Finder" utility, which lets you display the short name and keywords of all the emoji in 170 languages, including the ones whose direction is Right-To-Left (RTL).

🔗 codeberg.org/tonton-pixel/unic

The linguistic data comes from the Unicode CLDR Project:

🔗 cldr.unicode.org/

And all contributions to it are much welcome!

Screenshot of the "Filter Text" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "Arabic".
ALT text

Screenshot of the "Filter Text" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "Arabic".

Screenshot of the "Find by Name" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "French".
ALT text

Screenshot of the "Find by Name" feature of the "Emoji Data Finder" utility of the "Unicopedia Symbolica" application, with "Language" set to "French".

@mikaeru@mastodon.social
@jdlh@mstdn.ca · Reply to silverpill

@silverpill @Profpatsch @hongminhee @liaizon @Edent @north @aumetra
I have considered publishing an FEP about . At FediForum six months ago I got the advice to write three:
1. Advocating for handles and laying out requirements and issues
2. Explaining prior art from technical annexes on domain names and identifiers, label generation rules for DNS, , email addresses, etc.
3. Advocating for linkification of globally inclusive handles and layout out requirements and issues.
Do those sound like good FEPs to write at this point?

@SavaRocks@mastodon.social

ASCII Chessboard, No HTML Required - Sometimes, when I have absolutely nothing to do, I play with ASCII characters in vim. Today I made an ASCII chess board with black and white chess pieces. I'm pretty sure I'm not the first one to make an ascii chessboard and I won't be the last. I thought it looks pretty nice so I wanted to share it on my blog.

Full blog post at sava.rocks/blog/ascii-chessboa

ASCII Chessboard
ALT text

ASCII Chessboard

@SavaRocks@mastodon.social

ASCII Chessboard, No HTML Required - Sometimes, when I have absolutely nothing to do, I play with ASCII characters in vim. Today I made an ASCII chess board with black and white chess pieces. I'm pretty sure I'm not the first one to make an ascii chessboard and I won't be the last. I thought it looks pretty nice so I wanted to share it on my blog.

Full blog post at sava.rocks/blog/ascii-chessboa

ASCII Chessboard
ALT text

ASCII Chessboard

@mikaeru@mastodon.social

Unicode Emoji: Money, Money, Money...

• <U+1F4B6> euro banknote
• <U+1F4B4> yen banknote
• <U+1F4B7> pound banknote
• <U+1F4B5> dollar banknote
• <U+1FA99> coin
• <U+1F4B0> money bag
• <U+1F4B8> money with wings
• <U+1F911> money-mouth face

Unicode Emoji: Money, Money, Money...

💶💴💷💵
🪙💰💸🤑
ALT text

Unicode Emoji: Money, Money, Money... 💶💴💷💵 🪙💰💸🤑

@mikaeru@mastodon.social

Unicode Emoji: Money, Money, Money...

• <U+1F4B6> euro banknote
• <U+1F4B4> yen banknote
• <U+1F4B7> pound banknote
• <U+1F4B5> dollar banknote
• <U+1FA99> coin
• <U+1F4B0> money bag
• <U+1F4B8> money with wings
• <U+1F911> money-mouth face

Unicode Emoji: Money, Money, Money...

💶💴💷💵
🪙💰💸🤑
ALT text

Unicode Emoji: Money, Money, Money... 💶💴💷💵 🪙💰💸🤑

@w3cdevs@w3c.social

A character encoding defines how text is stored as bytes. If you’ve seen “café” instead of “café,” that’s an encoding mismatch. This intro covers how encoding works, how and UTF-8 relate, and why mistakes break apps and websites. It also matters beyond , affecting readability, search, data exchange, and whether people see text correctly online.

🎬 Watch @xfq, @w3c's Internationalization Lead, explain what character encoding is: youtu.be/y2ay7otbFWk

The word "café" is written as “café” (on the left) instead of “café” (on the right)
ALT text

The word "café" is written as “café” (on the left) instead of “café” (on the right)

@w3cdevs@w3c.social

A character encoding defines how text is stored as bytes. If you’ve seen “café” instead of “café,” that’s an encoding mismatch. This intro covers how encoding works, how and UTF-8 relate, and why mistakes break apps and websites. It also matters beyond , affecting readability, search, data exchange, and whether people see text correctly online.

🎬 Watch @xfq, @w3c's Internationalization Lead, explain what character encoding is: youtu.be/y2ay7otbFWk

The word "café" is written as “café” (on the left) instead of “café” (on the right)
ALT text

The word "café" is written as “café” (on the left) instead of “café” (on the right)

@w3cdevs@w3c.social

A character encoding defines how text is stored as bytes. If you’ve seen “café” instead of “café,” that’s an encoding mismatch. This intro covers how encoding works, how and UTF-8 relate, and why mistakes break apps and websites. It also matters beyond , affecting readability, search, data exchange, and whether people see text correctly online.

🎬 Watch @xfq, @w3c's Internationalization Lead, explain what character encoding is: youtu.be/y2ay7otbFWk

The word "café" is written as “café” (on the left) instead of “café” (on the right)
ALT text

The word "café" is written as “café” (on the left) instead of “café” (on the right)

@w3cdevs@w3c.social

A character encoding defines how text is stored as bytes. If you’ve seen “café” instead of “café,” that’s an encoding mismatch. This intro covers how encoding works, how and UTF-8 relate, and why mistakes break apps and websites. It also matters beyond , affecting readability, search, data exchange, and whether people see text correctly online.

🎬 Watch @xfq, @w3c's Internationalization Lead, explain what character encoding is: youtu.be/y2ay7otbFWk

The word "café" is written as “café” (on the left) instead of “café” (on the right)
ALT text

The word "café" is written as “café” (on the left) instead of “café” (on the right)

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-04-20): Adobe, Airbnb, Apple, Google, Meta, Microsoft, Salesforce, Translated.

🔗 home.unicode.org/membership/me

Compared to the full members list dated 2026-04-04, Amazon has disappeared and Google (re-)appeared. Great "substitution" magic trick indeed!

On a side note, the HTML page source code indicates:

<!-- List generated: 2026-04-20, 16:07:01 GMT -->

and tomorrow starts the UTC #187 meeting (2026-04-21 to 2026-04-23)...

Full members (voting) of the Unicode Consortium (2026-04-20): Adobe, Airbnb, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-04-20): Adobe, Airbnb, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-04-20): Adobe, Airbnb, Apple, Google, Meta, Microsoft, Salesforce, Translated.

🔗 home.unicode.org/membership/me

Compared to the full members list dated 2026-04-04, Amazon has disappeared and Google (re-)appeared. Great "substitution" magic trick indeed!

On a side note, the HTML page source code indicates:

<!-- List generated: 2026-04-20, 16:07:01 GMT -->

and tomorrow starts the UTC #187 meeting (2026-04-21 to 2026-04-23)...

Full members (voting) of the Unicode Consortium (2026-04-20): Adobe, Airbnb, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-04-20): Adobe, Airbnb, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@seanpm2001@techhub.social

I am not getting very good results online, so I am asking here. Is it possible to apply bold/italic to only the combining character of a letter (such as the ̈ within ö)
The character did not render correctly in my post (after pasting, the mark applied to the e, it didn't while I was drafting this) I am referring to the combining character in the included image, or U+0308 (Combining Diæresis)
I have been working on a large linguistic dictionary, and I am interested in doing something like this.

O with diaeresis
ALT text

O with diaeresis

@mikaeru@mastodon.social

While implementing a file drag-and-drop feature in one of my Electron-based apps, I fortuitously found an issue in the Electron framework which I believe could be a major security hole... Fortunately, this was not too difficult to fix, but I still don't understand why this has been overlooked so far...

All applications have been corrected and can be downloaded from my Codeberg repository:
🔗 codeberg.org/tonton-pixel/

List of open-source desktop applications from the Codeberg repository:
https://codeberg.org/tonton-pixel/
ALT text

List of open-source desktop applications from the Codeberg repository: https://codeberg.org/tonton-pixel/

@mikaeru@mastodon.social

While implementing a file drag-and-drop feature in one of my Electron-based apps, I fortuitously found an issue in the Electron framework which I believe could be a major security hole... Fortunately, this was not too difficult to fix, but I still don't understand why this has been overlooked so far...

All applications have been corrected and can be downloaded from my Codeberg repository:
🔗 codeberg.org/tonton-pixel/

List of open-source desktop applications from the Codeberg repository:
https://codeberg.org/tonton-pixel/
ALT text

List of open-source desktop applications from the Codeberg repository: https://codeberg.org/tonton-pixel/

@hugovk@mastodon.social · Reply to SnoopJ
@hugovk@mastodon.social · Reply to SnoopJ
@SnoopJ@hachyderm.io · Reply to SnoopJ

more specifically, this PR exposes a curious side effect of the Unicode 15.0 → Unicode 15.1 upgrade when it comes to identifiers: ZWJ is now allowed as a 'continue' character (i.e. you can use it in an identifier as long as it's not the first codepoint)

```
$ python3.12 -c 'print(str.isidentifier("A_\u200d_B"))'
False
$ python3.13 -c 'print(str.isidentifier("A_\u200d_B"))'
True
$ python3.13 -c 'print(str.isidentifier("A_\u200d"))' # unfortunately, a trailing ZWJ is legal too
```

github.com/python/cpython/pull

github.com

gh-109559: Update `unicodedata` for Unicode 15.1 by SnoopJ · Pull Request #109560 · python/cpython

This changeset implements #109559, adding Unicode 15.1 support to the internal databases that support the unicodedata module. The bulk of this Unicode update is the addition of a new CJK Ideograph ...

@mikaeru@mastodon.social

The latest version v18.1.0 of the open-source application "Unicopedia Sinica" is now available, embedding all data files required to display CJK ideographs as SVG glyphs in the "CJK Sources" and "CJK Variations" utilities...

🔗 codeberg.org/tonton-pixel/unic

Unicopedia Sinica - CJK Sources utility screenshot
Four simplified and traditional "Love" characters
ALT text

Unicopedia Sinica - CJK Sources utility screenshot Four simplified and traditional "Love" characters

Unicopedia Sinica - CJK Variations utility screenshot
Japanese glyph variations on "Love" character
ALT text

Unicopedia Sinica - CJK Variations utility screenshot Japanese glyph variations on "Love" character

@mikaeru@mastodon.social

The latest version v18.1.0 of the open-source application "Unicopedia Sinica" is now available, embedding all data files required to display CJK ideographs as SVG glyphs in the "CJK Sources" and "CJK Variations" utilities...

🔗 codeberg.org/tonton-pixel/unic

Unicopedia Sinica - CJK Sources utility screenshot
Four simplified and traditional "Love" characters
ALT text

Unicopedia Sinica - CJK Sources utility screenshot Four simplified and traditional "Love" characters

Unicopedia Sinica - CJK Variations utility screenshot
Japanese glyph variations on "Love" character
ALT text

Unicopedia Sinica - CJK Variations utility screenshot Japanese glyph variations on "Love" character

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-04-04): Adobe, Airbnb, Amazon, Apple, Meta, Microsoft, Salesforce, Translated.

🔗 home.unicode.org/membership/me

Adobe is back too! Just in time for Easter Day. Maybe a sign from heaven...

Full members (voting) of the Unicode Consortium (2026-04-04): Adobe, Airbnb, Amazon, Apple, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-04-04): Adobe, Airbnb, Amazon, Apple, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-04-04): Adobe, Airbnb, Amazon, Apple, Meta, Microsoft, Salesforce, Translated.

🔗 home.unicode.org/membership/me

Adobe is back too! Just in time for Easter Day. Maybe a sign from heaven...

Full members (voting) of the Unicode Consortium (2026-04-04): Adobe, Airbnb, Amazon, Apple, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-04-04): Adobe, Airbnb, Amazon, Apple, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social
@ngate@mastodon.social
@h4ckernews@mastodon.social
@ngate@mastodon.social
@h4ckernews@mastodon.social
@mikaeru@mastodon.social

- Technically speaking, Khitan Small Script and Yi script are not included (yet) in the data for non-Han ideographic scripts.

- The Jurchen and Seal scripts are poised to be officially added to Unicode 18.0 in September 2026...

- BabelStone (Andrew West) reference links:
🔗 babelstone.co.uk/Jurchen/
🔗 babelstone.co.uk/Khitan/
🔗 babelstone.co.uk/Yi/

babelstone.co.uk

Babel Stone : Yi

@mikaeru@mastodon.social

About two-thirds of the 17.0 standard characters originate from China, most of them of ideographic nature, and are therefore largely over-represented...

Ideographic: 110,943
Han: 103,351
Non-Han (Khitan Small Script + Nüshu + Tangut + Yi): 9,148
Han + Non-Han: 112,499
Standard: 159,799

Ideographic / Standard: 69.43 %
(Han + Non-Han) / Standard: 70.40 %

UAX #38: Unicode Han Database (Unihan)
unicode.org/reports/tr38/

UAX #60: Data for non Han Ideographic Scripts
unicode.org/reports/tr60/

unicode.org

UAX #60: Data for non Han Ideographic Scripts

@mikaeru@mastodon.social

About two-thirds of the 17.0 standard characters originate from China, most of them of ideographic nature, and are therefore largely over-represented...

Ideographic: 110,943
Han: 103,351
Non-Han (Khitan Small Script + Nüshu + Tangut + Yi): 9,148
Han + Non-Han: 112,499
Standard: 159,799

Ideographic / Standard: 69.43 %
(Han + Non-Han) / Standard: 70.40 %

UAX #38: Unicode Han Database (Unihan)
unicode.org/reports/tr38/

UAX #60: Data for non Han Ideographic Scripts
unicode.org/reports/tr60/

unicode.org

UAX #60: Data for non Han Ideographic Scripts

@mikaeru@mastodon.social

- Technically speaking, Khitan Small Script and Yi script are not included (yet) in the data for non-Han ideographic scripts.

- The Jurchen and Seal scripts are poised to be officially added to Unicode 18.0 in September 2026...

- BabelStone (Andrew West) reference links:
🔗 babelstone.co.uk/Jurchen/
🔗 babelstone.co.uk/Khitan/
🔗 babelstone.co.uk/Yi/

babelstone.co.uk

Babel Stone : Yi

@mikaeru@mastodon.social

About two-thirds of the 17.0 standard characters originate from China, most of them of ideographic nature, and are therefore largely over-represented...

Ideographic: 110,943
Han: 103,351
Non-Han (Khitan Small Script + Nüshu + Tangut + Yi): 9,148
Han + Non-Han: 112,499
Standard: 159,799

Ideographic / Standard: 69.43 %
(Han + Non-Han) / Standard: 70.40 %

UAX #38: Unicode Han Database (Unihan)
unicode.org/reports/tr38/

UAX #60: Data for non Han Ideographic Scripts
unicode.org/reports/tr60/

unicode.org

UAX #60: Data for non Han Ideographic Scripts

@v_i_o_l_a@openbiblio.social

"graphic languages: a visual guide to the world’s writing systems" – ein wunderbares buch für -nerds wie mich. 😊
slanted.de/product/graphic-lan

foto der titelseite des buchs "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto der titelseite des buchs "graphic languages: a visual guide to the world’s writing systems"

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"

@v_i_o_l_a@openbiblio.social

"graphic languages: a visual guide to the world’s writing systems" – ein wunderbares buch für -nerds wie mich. 😊
slanted.de/product/graphic-lan

foto der titelseite des buchs "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto der titelseite des buchs "graphic languages: a visual guide to the world’s writing systems"

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"
ALT text

foto einer doppelseite aus dem buch "graphic languages: a visual guide to the world’s writing systems"

@mikaeru@mastodon.social

About two-thirds of the 17.0 standard characters originate from China, most of them of ideographic nature, and are therefore largely over-represented...

Ideographic: 110,943
Han: 103,351
Non-Han (Khitan Small Script + Nüshu + Tangut + Yi): 9,148
Han + Non-Han: 112,499
Standard: 159,799

Ideographic / Standard: 69.43 %
(Han + Non-Han) / Standard: 70.40 %

UAX #38: Unicode Han Database (Unihan)
unicode.org/reports/tr38/

UAX #60: Data for non Han Ideographic Scripts
unicode.org/reports/tr60/

unicode.org

UAX #60: Data for non Han Ideographic Scripts

@Edent@mastodon.social

Which of these symbols do you think *best* represents the concept of "copy"?

That is, if you click it, something will be copied to your clipboard.

(Other suggestions welcome if they are in Unicode.)

  • 18 (5%)
  • 139 (39%)
  • 2 (1%)
  • 194 (55%)
@Edent@mastodon.social

Which of these symbols do you think *best* represents the concept of "copy"?

That is, if you click it, something will be copied to your clipboard.

(Other suggestions welcome if they are in Unicode.)

  • 18 (5%)
  • 139 (39%)
  • 2 (1%)
  • 194 (55%)
@mikaeru@mastodon.social

The latest version 3.5.0 of the open-source application "Unicopedia Ægypta" adds a new "Cross-Referenced" field to the "Unikemet Inspector" utility.

🔗 codeberg.org/tonton-pixel/unic

It relies on the important "Unikemet" database, which is an impressive work, still in progress... Feedback is welcome!

Public Review Issue #538: Proposed Update UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)

unicode.org/review/pri538/
unicode.org/reports/tr57/tr57-

Screenshot of the Unikemet Inspector utility of the Unicopedia Ægypta application
ALT text

Screenshot of the Unikemet Inspector utility of the Unicopedia Ægypta application

@mikaeru@mastodon.social

The latest version 3.5.0 of the open-source application "Unicopedia Ægypta" adds a new "Cross-Referenced" field to the "Unikemet Inspector" utility.

🔗 codeberg.org/tonton-pixel/unic

It relies on the important "Unikemet" database, which is an impressive work, still in progress... Feedback is welcome!

Public Review Issue #538: Proposed Update UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)

unicode.org/review/pri538/
unicode.org/reports/tr57/tr57-

Screenshot of the Unikemet Inspector utility of the Unicopedia Ægypta application
ALT text

Screenshot of the Unikemet Inspector utility of the Unicopedia Ægypta application

@Edent@mastodon.social

Which of these symbols do you think *best* represents the concept of "copy"?

That is, if you click it, something will be copied to your clipboard.

(Other suggestions welcome if they are in Unicode.)

  • 18 (5%)
  • 139 (39%)
  • 2 (1%)
  • 194 (55%)
@Edent@mastodon.social

Which of these symbols do you think *best* represents the concept of "copy"?

That is, if you click it, something will be copied to your clipboard.

(Other suggestions welcome if they are in Unicode.)

  • 18 (5%)
  • 139 (39%)
  • 2 (1%)
  • 194 (55%)
@sleepycat@infosec.exchange

"The invisible characters were devised decades ago and then largely forgotten. That is, until 2024, when hackers began using the characters to conceal malicious prompts fed to AI engines. While the text was invisible to humans and text scanners, had little trouble reading them and following the malicious instructions they conveyed."

arstechnica.com/security/2026/

arstechnica.com

Supply-chain attack using invisible code hits GitHub and other repositories

Unicode that's invisible to the human eye was largely abandoned—until attackers took notice.

@sleepycat@infosec.exchange

"The invisible characters were devised decades ago and then largely forgotten. That is, until 2024, when hackers began using the characters to conceal malicious prompts fed to AI engines. While the text was invisible to humans and text scanners, had little trouble reading them and following the malicious instructions they conveyed."

arstechnica.com/security/2026/

arstechnica.com

Supply-chain attack using invisible code hits GitHub and other repositories

Unicode that's invisible to the human eye was largely abandoned—until attackers took notice.

@thias@mastodon.social
@thias@mastodon.social
@h4ckernews@mastodon.social
@ngate@mastodon.social
@ngate@mastodon.social
@h4ckernews@mastodon.social
@alainmi11@mamot.fr

Je viens d'apprendre un truc.

Le petits symboles de drapeaux qu'on trouve sur nos claviers avec tous les autres émojis… eh bien ce ne sont PAS des caractères uniques (comme les autres émojis) mais des combinaisons de 2 caractères pris dans la famille des « Regional Indicator Symbol » (compart.com/fr/unicode/search? ) selon la codification des pays avec 2 caractères de la norme ISO fr.wikipedia.org/wiki/ISO_3166

1/2

Capture d'écran d'une partie des caractères Unicode de la famille des « Regional Indicator Symbol », organisés sur 3 colonnes :
– la première colonne donne un aperçu du caractère, par exemple : 🇦
– la seconde colonne donne le code Unicode du caractère, par exemple U+1F1F6 
– et la 3e colonne donne le nom du caractère, par exemple Regional Indicator Symbol Letter A
ALT text

Capture d'écran d'une partie des caractères Unicode de la famille des « Regional Indicator Symbol », organisés sur 3 colonnes : – la première colonne donne un aperçu du caractère, par exemple : 🇦 – la seconde colonne donne le code Unicode du caractère, par exemple U+1F1F6 – et la 3e colonne donne le nom du caractère, par exemple Regional Indicator Symbol Letter A

Capture d'écran d'une partie des codes pays à 2 caractères selon la norme ISO 3166-1, par exemple : 
– AF = Afghanistan
– BE = Belgique
– ES = Espagne
etc.
ALT text

Capture d'écran d'une partie des codes pays à 2 caractères selon la norme ISO 3166-1, par exemple : – AF = Afghanistan – BE = Belgique – ES = Espagne etc.

@alainmi11@mamot.fr

Je viens d'apprendre un truc.

Le petits symboles de drapeaux qu'on trouve sur nos claviers avec tous les autres émojis… eh bien ce ne sont PAS des caractères uniques (comme les autres émojis) mais des combinaisons de 2 caractères pris dans la famille des « Regional Indicator Symbol » (compart.com/fr/unicode/search? ) selon la codification des pays avec 2 caractères de la norme ISO fr.wikipedia.org/wiki/ISO_3166

1/2

Capture d'écran d'une partie des caractères Unicode de la famille des « Regional Indicator Symbol », organisés sur 3 colonnes :
– la première colonne donne un aperçu du caractère, par exemple : 🇦
– la seconde colonne donne le code Unicode du caractère, par exemple U+1F1F6 
– et la 3e colonne donne le nom du caractère, par exemple Regional Indicator Symbol Letter A
ALT text

Capture d'écran d'une partie des caractères Unicode de la famille des « Regional Indicator Symbol », organisés sur 3 colonnes : – la première colonne donne un aperçu du caractère, par exemple : 🇦 – la seconde colonne donne le code Unicode du caractère, par exemple U+1F1F6 – et la 3e colonne donne le nom du caractère, par exemple Regional Indicator Symbol Letter A

Capture d'écran d'une partie des codes pays à 2 caractères selon la norme ISO 3166-1, par exemple : 
– AF = Afghanistan
– BE = Belgique
– ES = Espagne
etc.
ALT text

Capture d'écran d'une partie des codes pays à 2 caractères selon la norme ISO 3166-1, par exemple : – AF = Afghanistan – BE = Belgique – ES = Espagne etc.

@MichalBryxi@mastodon.world
@MichalBryxi@mastodon.world
@mikaeru@mastodon.social
@steven@zeroes.ca · Reply to Steven 💚

Want to know how handles Braille?

One code point for each possible glyph.

Not character. Glyph.

There is one code point for each of 2⁸ (256) possible combinations you can punch out of an 8 dot braille pad.

That means the unicode code point for ⠜ can represent literally 18 different characters!

en.wiktionary.org/wiki/%E2%A0%

en.wiktionary.org

⠜ - Wiktionary, the free dictionary

@steven@zeroes.ca · Reply to Steven 💚

Anyways, here's my challenge to somebody.

The CJK space in represents ~100,000 Chinese, Japanese, and/or Korean language words.

Can we universalize the Kanji?

鹿 is the character for the animal deer. How can we make that readable to the rest of the world?

Who is a bad enough dude to make a CJK emoji font.

@steven@zeroes.ca · Reply to Steven 💚

Musqueam language literally uses the North American Phonetic Alphabet.

Is going to add a hən̓q̓əmin̓əm̓ block?

- Of course not!

Saanich language uses a modified version of IPA.

Is going to add a SENĆOŦEN block?

- Of course not! Saanich gets five supplementary characters and they'll be happy about it.

Would it be possible to represent both of these phonetic alphabets by sharing the same code points?

Yes! You would literally just need to change the fonts.

@steven@zeroes.ca · Reply to Steven 💚
ALL CAPS TO IMPLY YELLING

NOT EVERY LANGUAGE HAS AN ALPHABET.

THERE ARE EXISTING LANGUAGES TODAY THAT JUST WRITE DOWN THE SOUNDS IN IPA.

WHAT ARE THE PEOPLE WHO USE THESE LANGUAGES SUPPOSED TO DO WITHOUT IPA IN UNICODE.

WE COULD HAVE A BASICALLY UNIVERSAL ALPHABET IN UNICODE.

YOU COULD CONVERT BETWEEN PHONETIC-BASED SCRIPTS BY CHANGING A FONT.

@steven@zeroes.ca · Reply to Steven 💚

IPA uses a basically random assortment of characters from whatever existing Unicode blocks had similarly-shaped scripts.

There's no consistent IPA in Unicode. Just a patchwork.

Why does any of this matter?

Well, for one is makes linguistics more difficult.

Unicode is fine with adding a bunch of dead or even undeciphered languages to Unicode to help out academics, but linguists I guess can get fucked.

But also there's a bigger and more obvious problem.

@steven@zeroes.ca · Reply to Steven 💚

is the alphabet used to less ambiguously represent sounds.

Just like Latin, Greek, and Cyrillic, it's an alphabet.

The IPA "a" doesn't have the same meaning as the Latin "a" or the Cyrillic "а". Instead it represents the "open front unrounded vowel".

en.wikipedia.org/wiki/Open_fro

So what character are IPA users supposed to use?

Just the Latin one.

en.wikipedia.org

Open front unrounded vowel - Wikipedia

@steven@zeroes.ca · Reply to Steven 💚

goes by characters, not glyphs.

Each Unicode character is supposed to represent a unique meaning, not just the shape associated with a letter.

That's why the glyph "A" is in Unicode more than three times.

It's not actually the same letter in Latin, Greek, and Cyrillic alphabets. They're three different characters represented by the same glyph.

Unicode allows you to make clear which you're talking about.

U+0041 A LATIN CAPITAL LETTER A

U+0391 Α GREEK CAPITAL LETTER ALPHA

U+0410 А CYRILLIC CAPITAL LETTER A

@steven@zeroes.ca

I like .

If you happened to have followed me on Twitter, you'll know that I know way more about how emoji work than most people.

But holy crap, did Unicode manage to mess up how they handled .

For anybody who knows what this means: I think Unicode's handling of IPA is more serious stumble than CJK Unification.

@shaft@piaille.fr

18.0 will add at least 13,000 characters.

“At UTC #185, nearly 13,000 additional characters were approved for encoding in Unicode 18.0.

The approved additions include encoding of Small Seal script ("Seal"), a repertoire of 11,328 ideographic characters. Seal is distinct from modern Han ideographs (aka, "CJK"), but is an important precursor of CJK resulting from the first efforts to standardize writing across Chinese-speaking regions during China's Qin Dynasty. As such, Seal has important cultural significance in China and for Chinese speakers throughout the world”

blog.unicode.org/2025/12/utc-1

More on seal script: en.wikipedia.org/wiki/Seal_scr

en.wikipedia.org

Seal script - Wikipedia

@SnoopJ@hachyderm.io

Whoa, I just noticed:

Technical Standard #58 was published two weeks ago!

Unicode Link Detection and Formatting:
URLs and Email Addresses
unicode.org/reports/tr58/

---

【This document specifies two consistent, standardized mechanisms that address [URL] problems, consisting of:

1 )link detection: detecting URLs and email addresses embedded in plain text that properly handles non-ASCII characters, and
2) minimally escaping: minimal escaping of non-ASCII code points in the Path, Query, and Fragment portions of a URL.】

unicode.org

UTS #58: Unicode Link Detection and Formatting: URLs and Email Addresses

@SnoopJ@hachyderm.io · Reply to shibaozi

@shibao there is a special codepoint called ZERO WIDTH JOINER (abbreviated ZWJ) that is not printable (so you would never "see" it) but which carries the meaning that it's meant to join two codepoints (usually but not *exclusively* emoji) together in some sense.

The semantics for emoji ZWJ sequences (as they are called) allow for fallback behavior that "just" shows the two emoji next to each other if the system is not capable of showing you the glyph for the "combined" form.

@emojipedia has a good blog post about the concept in general: blog.emojipedia.org/emoji-zwj-

And if you want to see the nuts and bolts of the standardization, check Technical Report #51, §2.5 ("Emoji ZWJ Sequences"): unicode.org/reports/tr51/#Emoj

unicode.org

UTS #51: Unicode Emoji

@mikaeru@mastodon.social

The latest post on the Unicode Consortium blog gives an exhaustive list of all the new Unicode properties in regular expressions (regex), and explains why all the supported properties are so important and can be so useful:

blog.unicode.org/2026/03/uts-1

blog.unicode.org

UTS #18: More Unicode Properties in Regular Expressions

Regular Expressions, or “Regex”, are the invisible workhorses of the digital world. Regex allows apps and computer systems to find, validate...

@mikaeru@mastodon.social

The "official" Unicode Regular Expressions (UTS #18) document, dated February 8, 2022, has never been updated since then, and the four new Unicode properties introduced in Unicode 15.1 are only listed in the Proposed Update *draft*, dated May 11, 2023...

This could explain why , , and the framework () trigger an "invalid property" error for the /\p{IDS_Unary_Operator}/u in JavaScript, while /\p{IDS_Binary_Operator}/u is ok...

@SnoopJ@hachyderm.io · Reply to shibaozi

@shibao there is a special codepoint called ZERO WIDTH JOINER (abbreviated ZWJ) that is not printable (so you would never "see" it) but which carries the meaning that it's meant to join two codepoints (usually but not *exclusively* emoji) together in some sense.

The semantics for emoji ZWJ sequences (as they are called) allow for fallback behavior that "just" shows the two emoji next to each other if the system is not capable of showing you the glyph for the "combined" form.

@emojipedia has a good blog post about the concept in general: blog.emojipedia.org/emoji-zwj-

And if you want to see the nuts and bolts of the standardization, check Technical Report #51, §2.5 ("Emoji ZWJ Sequences"): unicode.org/reports/tr51/#Emoj

unicode.org

UTS #51: Unicode Emoji

@mikaeru@mastodon.social

The latest version 3.0.0 of the open-source application "Unicopedia Ægypta" is now available, displaying all the representative glyphs of the 4,403 Egyptian hieroglyphs belonging to the "Core Unikemet" set.

🔗 codeberg.org/tonton-pixel/unic

Screenshot of the Hieroglyph Picture Book utility of the open-source application Unicopedia Ægypta v.3.0.0
ALT text

Screenshot of the Hieroglyph Picture Book utility of the open-source application Unicopedia Ægypta v.3.0.0

@mikaeru@mastodon.social

Unicode Emoji: Pan-CJK Flags

• <U+1F1E8, U+1F1F3> flag: China [CN]
• <U+1F1ED, U+1F1F0> flag: Hong Kong SAR China [HK]
• <U+1F1EF, U+1F1F5> flag: Japan [JP]
• <U+1F1F0, U+1F1F5> flag: North Korea [KP]
• <U+1F1F0, U+1F1F7> flag: South Korea [KR]
• <U+1F1F2, U+1F1F4> flag: Macao SAR China [MO]
• <U+1F1F2, U+1F1FE> flag: Malaysia [MY]
• <U+1F1F8, U+1F1EC> flag: Singapore [SG]
• <U+1F1F9, U+1F1FC> flag: Taiwan [TW]
• <U+1F1FB, U+1F1F3> flag: Vietnam [VN]

Unicode Emoji: Pan-CJK Flags

🇨🇳🇭🇰🇯🇵🇰🇵🇰🇷🇲🇴🇲🇾🇸🇬🇹🇼🇻🇳
ALT text

Unicode Emoji: Pan-CJK Flags 🇨🇳🇭🇰🇯🇵🇰🇵🇰🇷🇲🇴🇲🇾🇸🇬🇹🇼🇻🇳

@mikaeru@mastodon.social

:

U+2640 FEMALE SIGN
U+2642 MALE SIGN
U+26A2 DOUBLED FEMALE SIGN
U+26A3 DOUBLED MALE SIGN
U+26A4 INTERLOCKED FEMALE AND MALE SIGN
U+26A5 MALE AND FEMALE SIGN
U+26A6 MALE WITH STROKE SIGN
U+26A7 MALE WITH STROKE AND MALE AND FEMALE SIGN
U+26A8 VERTICAL MALE WITH STROKE SIGN
U+26A9 HORIZONTAL MALE WITH STROKE SIGN
U+26B2 NEUTER

Unicode Symbols: Diversity

♀♂⚢⚣⚤⚥⚦⚧⚨⚩⚲
ALT text

Unicode Symbols: Diversity ♀♂⚢⚣⚤⚥⚦⚧⚨⚩⚲

@mikaeru@mastodon.social

:

U+2764 U+FE0F U+1FA77 U+1F9E1 U+1F49B U+1F49A U+1F499 U+1FA75 U+1F49C U+1F90E U+1F5A4 U+1FA76 U+1F90D

U+1F49F U+2764 U+FE0F U+200D U+1F525 U+1F494 U+2764 U+FE0F U+200D U+1FA79 U+2763 U+FE0F U+1F498 U+1F493 U+1F497 U+1F496 U+1F49D U+1F495 U+1F49E

U+1F970 U+1F60D U+1F618 U+1F63B U+1F48C U+1FAF6 U+1FAF6 U+1F3FB U+1FAF6 U+1F3FC U+1FAF6 U+1F3FD U+1FAF6 U+1F3FE U+1FAF6 U+1F3FF U+1FAC0

Unicode Emoji: Hearts Galore

❤️🩷🧡💛💚💙🩵💜🤎🖤🩶🤍
💟❤️‍🔥💔❤️‍🩹❣️💘💓💗💖💝💕💞
🥰😍😘😻💌🫶🫶🏻🫶🏼🫶🏽🫶🏾🫶🏿🫀
ALT text

Unicode Emoji: Hearts Galore ❤️🩷🧡💛💚💙🩵💜🤎🖤🩶🤍 💟❤️‍🔥💔❤️‍🩹❣️💘💓💗💖💝💕💞 🥰😍😘😻💌🫶🫶🏻🫶🏼🫶🏽🫶🏾🫶🏿🫀

@mikaeru@mastodon.social

: &

U+1F4A6 U+1F4A7 U+1F979 U+1F639 U+1F63F

U+1F602 U+1F605 U+1F613 U+1F622 U+1F625 U+1F62A U+1F62D U+1F630 U+1F923 U+1F972 U+1F975

Unicode Emoji: Sweat & Tears

💦💧🥹😹😿
😂😅😓😢😥😪😭😰🤣🥲🥵
ALT text

Unicode Emoji: Sweat & Tears 💦💧🥹😹😿 😂😅😓😢😥😪😭😰🤣🥲🥵

@mikaeru@mastodon.social

:

U+1F473 U+1F473 U+1F3FB U+1F473 U+1F3FC U+1F473 U+1F3FD U+1F473 U+1F3FE U+1F473 U+1F3FF

U+1F478 U+1F478 U+1F3FB U+1F478 U+1F3FC U+1F478 U+1F3FD U+1F478 U+1F3FE U+1F478 U+1F3FF

Unicode Emoji: Skin Tones

👳➔👳🏻👳🏼👳🏽👳🏾👳🏿
👸➔👸🏻👸🏼👸🏽👸🏾👸🏿
ALT text

Unicode Emoji: Skin Tones 👳➔👳🏻👳🏼👳🏽👳🏾👳🏿 👸➔👸🏻👸🏼👸🏽👸🏾👸🏿

@mikaeru@mastodon.social

:

U+1F201 U+1F202 U+FE0F U+1F233 U+1F237 U+FE0F U+1F236 U+1F21A U+1F251 U+1F238 U+1F23A U+000A U+1F22F U+1F250 U+1F239 U+1F232 U+1F234 U+3297 U+FE0F U+3299 U+FE0F U+1F235

Unicode Emoji: Japanese Buttons

🈁🈂️🈳🈷️🈶🈚🉑🈸🈺
🈯🉐🈹🈲🈴㊗️㊙️🈵
ALT text

Unicode Emoji: Japanese Buttons 🈁🈂️🈳🈷️🈶🈚🉑🈸🈺 🈯🉐🈹🈲🈴㊗️㊙️🈵

@jdlh@mstdn.ca · Reply to Mike Williamson

@sleepycat this paper doesn't cite two relevant official reports on the subject: "Unicode Security Mechanisms" unicode.org/reports/tr39/ and "Unicode Identifiers and Syntax" unicode.org/reports/tr31/ . Was the paper interested in solving problems, or just in collecting the engagement from pointing them out?

unicode.org

UAX #31: Unicode Identifiers and Syntax

@mikaeru@mastodon.social
@Edent@mastodon.social

In *theory* you should be able to follow this test user:

@你好@i18n.viii.fi

But I can't find any Fediverse software which actually supports non-ASCII usernames.

If you are able to see the user, its description, and its avatar - please send me a screenshot 🙂

@mikaeru@mastodon.social

The current version of Unicopedia Sigilla is marked as "alpha", since it relies on 18.0-alpha, which is still a draft: assigned code points for Seal characters, as well as their source references and glyphs, may evolve before the final release planned for September 2026.

Consequently, no Unicode-aware font exists yet for Seal characters, at least until the new Seal block gets stable. So, display of characters in the application is "Tōfu Matsuri" for the time being...

mastodon.social

Michel Mariani (@mikaeru@mastodon.social)

Attached: 1 image Unicopedia Sigilla is a developer-oriented set of #Unicode utilities related to Seal characters, wrapped into one single app, built with #Electron. Repository: 🔗 https://codeberg.org/tonton-pixel/unicopedia-sigilla #Unicopedia #Seal #Characters #JavaScript #CodePoints #Glyphs #OpenSource #DesktopApplication

@jordanhipwell@mastodon.world
@jordanhipwell@mastodon.world
@leffe@social.linux.pizza

I found this reply that I made in 1984 to Dennis Ritchie in the net.followup newsgroup. I was at the time lobbying Sun to add 8-bit character set support to the firmware, but they wanted to hold out for a 16-bit system, like the as yet unnamed Unicode. There was eventually an interim solution but my memory of that is a bit foggy.

› ... The problem was that, to the Swedes, characters like
› {}|\ were letters, not syntactic symbols.
›
› It's a real problem. I gather that the best-equipped users
› had terminals that would switch graphics depending on
› whether they were writing C or documents.
›
› Dennis Ritchie

That's right, writing C and shell commands is almost impossible on a terminal with a swedish character set. Even Pascal is a bit hard, but some compilers will accept (* *) instead of { } and (. .) instead of [ ].

If you have a terminal with selectable character sets, you can train your editor to switch, depending on what type of text you are editing. I have set up EMACS so that it selects the right character set on my VT100 depending on what mode I'm in (which in turn is controlled by filename suffixes). This works even if I have two windows, one with C code in it and the other holding a document in swedish.

Leif Samuelsson

LM ERICSSON Tel. Co.
S-126 25 STOCKHOLM
SWEDEN
..{decvax, philabs}!mcvax!enea!erix!leif

"E { e }, } i }a { e |"
"It is a river, and in the river there is an island"
(This is a dialect of swedish. My apologies to the people in the
province of V{rmland for the lack of a V{rmland character set).
ALT text

› ... The problem was that, to the Swedes, characters like › {}|\ were letters, not syntactic symbols. › › It's a real problem. I gather that the best-equipped users › had terminals that would switch graphics depending on › whether they were writing C or documents. › › Dennis Ritchie That's right, writing C and shell commands is almost impossible on a terminal with a swedish character set. Even Pascal is a bit hard, but some compilers will accept (* *) instead of { } and (. .) instead of [ ]. If you have a terminal with selectable character sets, you can train your editor to switch, depending on what type of text you are editing. I have set up EMACS so that it selects the right character set on my VT100 depending on what mode I'm in (which in turn is controlled by filename suffixes). This works even if I have two windows, one with C code in it and the other holding a document in swedish. Leif Samuelsson LM ERICSSON Tel. Co. S-126 25 STOCKHOLM SWEDEN ..{decvax, philabs}!mcvax!enea!erix!leif "E { e }, } i }a { e |" "It is a river, and in the river there is an island" (This is a dialect of swedish. My apologies to the people in the province of V{rmland for the lack of a V{rmland character set).

@blog@shkspr.mobi

Internationalise The Fediverse

shkspr.mobi/blog/2024/02/inter

We live in the future now. It is OK to use Unicode everywhere.

It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

What Does The Fox Spec Say?

The ActivityPub specification says:

Building an international base of users is important in a federated network. Internationalization

I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

The user's @ name is defined by preferredUsername which is:

A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

There's nothing in there about what scripts it can contain. However, later on, the spec says:

Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

The ActivityStreams specification talks about language mapping.

Finally, the ActivityPub specification has some examples on non-Latin text in names.

So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

But What About...?

There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

What about homograph attacks?

Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

What if people make names that can't be typed?

Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

What about weird "Zalgo" text?

It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

What about bi-directional text?

The spec makes clear this is allowed.

Do people even want a username in their own script?

I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

What's Next?

If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

If your software can see @你好@i18n.viii.fi and its posts, please let me know.

@blog@shkspr.mobi

Internationalise The Fediverse

shkspr.mobi/blog/2024/02/inter

We live in the future now. It is OK to use Unicode everywhere.

It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

What Does The Fox Spec Say?

The ActivityPub specification says:

Building an international base of users is important in a federated network. Internationalization

I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

The user's @ name is defined by preferredUsername which is:

A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

There's nothing in there about what scripts it can contain. However, later on, the spec says:

Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

The ActivityStreams specification talks about language mapping.

Finally, the ActivityPub specification has some examples on non-Latin text in names.

So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

But What About...?

There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

What about homograph attacks?

Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

What if people make names that can't be typed?

Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

What about weird "Zalgo" text?

It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

What about bi-directional text?

The spec makes clear this is allowed.

Do people even want a username in their own script?

I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

What's Next?

If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

If your software can see @你好@i18n.viii.fi and its posts, please let me know.

@blog@shkspr.mobi

Internationalise The Fediverse

shkspr.mobi/blog/2024/02/inter

We live in the future now. It is OK to use Unicode everywhere.

It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

What Does The Fox Spec Say?

The ActivityPub specification says:

Building an international base of users is important in a federated network. Internationalization

I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

The user's @ name is defined by preferredUsername which is:

A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

There's nothing in there about what scripts it can contain. However, later on, the spec says:

Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

The ActivityStreams specification talks about language mapping.

Finally, the ActivityPub specification has some examples on non-Latin text in names.

So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

But What About...?

There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

What about homograph attacks?

Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

What if people make names that can't be typed?

Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

What about weird "Zalgo" text?

It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

What about bi-directional text?

The spec makes clear this is allowed.

Do people even want a username in their own script?

I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

What's Next?

If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

If your software can see @你好@i18n.viii.fi and its posts, please let me know.

@blog@shkspr.mobi

Internationalise The Fediverse

shkspr.mobi/blog/2024/02/inter

We live in the future now. It is OK to use Unicode everywhere.

It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

What Does The Fox Spec Say?

The ActivityPub specification says:

Building an international base of users is important in a federated network. Internationalization

I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

The user's @ name is defined by preferredUsername which is:

A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

There's nothing in there about what scripts it can contain. However, later on, the spec says:

Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

The ActivityStreams specification talks about language mapping.

Finally, the ActivityPub specification has some examples on non-Latin text in names.

So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

But What About...?

There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

What about homograph attacks?

Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

What if people make names that can't be typed?

Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

What about weird "Zalgo" text?

It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

What about bi-directional text?

The spec makes clear this is allowed.

Do people even want a username in their own script?

I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

What's Next?

If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

If your software can see @你好@i18n.viii.fi and its posts, please let me know.

@mikaeru@mastodon.social

All documents published by the Ideographic Research Group (IRG) are now available on the Unicode web site, and can be easily and efficiently found through the new search bar provided on the IRG homepage.

🔗 unicode.org/irg/

This long-awaited search feature is very convenient, and so useful to find what you're interested in, and even more (ah, the wonderful power of serendipity!)...

Screenshot of the IRG home page, looking for "taboo" from the search bar
ALT text

Screenshot of the IRG home page, looking for "taboo" from the search bar

Screenshot of list of search results in the IRG documents
ALT text

Screenshot of list of search results in the IRG documents

@SnoopJ@hachyderm.io

Whoa, I just noticed:

Technical Standard #58 was published two weeks ago!

Unicode Link Detection and Formatting:
URLs and Email Addresses
unicode.org/reports/tr58/

---

【This document specifies two consistent, standardized mechanisms that address [URL] problems, consisting of:

1 )link detection: detecting URLs and email addresses embedded in plain text that properly handles non-ASCII characters, and
2) minimally escaping: minimal escaping of non-ASCII code points in the Path, Query, and Fragment portions of a URL.】

unicode.org

UTS #58: Unicode Link Detection and Formatting: URLs and Email Addresses

@mikaeru@mastodon.social

All documents published by the Ideographic Research Group (IRG) are now available on the Unicode web site, and can be easily and efficiently found through the new search bar provided on the IRG homepage.

🔗 unicode.org/irg/

This long-awaited search feature is very convenient, and so useful to find what you're interested in, and even more (ah, the wonderful power of serendipity!)...

Screenshot of the IRG home page, looking for "taboo" from the search bar
ALT text

Screenshot of the IRG home page, looking for "taboo" from the search bar

Screenshot of list of search results in the IRG documents
ALT text

Screenshot of list of search results in the IRG documents

@mikaeru@mastodon.social

All documents published by the Ideographic Research Group (IRG) are now available on the Unicode web site, and can be easily and efficiently found through the new search bar provided on the IRG homepage.

🔗 unicode.org/irg/

This long-awaited search feature is very convenient, and so useful to find what you're interested in, and even more (ah, the wonderful power of serendipity!)...

Screenshot of the IRG home page, looking for "taboo" from the search bar
ALT text

Screenshot of the IRG home page, looking for "taboo" from the search bar

Screenshot of list of search results in the IRG documents
ALT text

Screenshot of list of search results in the IRG documents

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-02-08): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

🔗 home.unicode.org/membership/me

Salesforce is back, once again... on and off, and on and off, and on... Part-time member, possibly?

Full members (voting) of the Unicode Consortium (2026-02-08): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-02-08): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-01-25): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Translated.

🔗 home.unicode.org/membership/me

Salesforce is gone, once again... on and off and on and off...

Some avatar of 's cat, perhaps?

Full members (voting) of the Unicode Consortium (2026-01-25): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-01-25): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-02-08): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

🔗 home.unicode.org/membership/me

Salesforce is back, once again... on and off, and on and off, and on... Part-time member, possibly?

Full members (voting) of the Unicode Consortium (2026-02-08): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-02-08): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2026-01-25): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Translated.

🔗 home.unicode.org/membership/me

Salesforce is gone, once again... on and off and on and off...

Some avatar of 's cat, perhaps?

Full members (voting) of the Unicode Consortium (2026-01-25): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2026-01-25): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Translated.

@sleepycat@infosec.exchange

"Rather than inserting logical bugs, adversaries can attack the encoding of source code files to inject vulnerabilities.

These adversarial encodings produce no visual artifacts.

The trick is to use Unicode control characters to reorder tokens in source code at the encoding level."

trojansource.codes/

trojansource.codes

Trojan Source Attacks

Some vulnerabilities are invisible. Rather than inserting logical bugs, adversaries can attack the encoding of source code files to inject vulnerabilities.

@blog@shkspr.mobi

A small collection of text-only websites

shkspr.mobi/blog/2025/12/a-sma

A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt

Obviously a webpage without links is like a fish without a bicycle, but the joy of the web is that there are no gatekeepers. People can try new concepts and, if enough people join in, it becomes normal. I'm not saying the plain-text is the best web experience. But it is an experience. Perfect if you like your browsing fast, simple, and readable. There are no cookie banners, pop-ups, permission prompts, autoplaying videos, or garish colour schemes.

I'm certainly not the first person to do this, so I thought it might be fun to gather a list of websites which you browse in text-only mode. If you know of any more - including your own site - please drop a comment in the box!

If you'd like to add a site, please get in touch. The rules are simple - content which has the MIME type of text/plain. No HTML, no multimedia, no RTF, no XML, no ANSI colour escape sequences.

Emoji are fine though; emoji are cool.

@shaft@piaille.fr

18.0 will add at least 13,000 characters.

“At UTC #185, nearly 13,000 additional characters were approved for encoding in Unicode 18.0.

The approved additions include encoding of Small Seal script ("Seal"), a repertoire of 11,328 ideographic characters. Seal is distinct from modern Han ideographs (aka, "CJK"), but is an important precursor of CJK resulting from the first efforts to standardize writing across Chinese-speaking regions during China's Qin Dynasty. As such, Seal has important cultural significance in China and for Chinese speakers throughout the world”

blog.unicode.org/2025/12/utc-1

More on seal script: en.wikipedia.org/wiki/Seal_scr

en.wikipedia.org

Seal script - Wikipedia

@mikaeru@mastodon.social

The latest version of the open-source application "Unicopedia Plus" is now available, adding support for all the new characters, scripts, and blocks defined in Unicode 17.0.

🔗 codeberg.org/tonton-pixel/unic

This current app version is a pre-release (Beta), since full support for Unicode 17.0 is not yet available in the Electron framework. More specifically, results from the "Unicode Foldings", "Unicode Normalizer", and "Unicode Segmenter" utilities cannot be fully trusted...

Unicopedia Plus application screenshot
ALT text

Unicopedia Plus application screenshot

@Jay16K@chaos.social

Was da wohl schon so kaputt gegangen ist, Unicode ist aber in Ordnung? 😬

Screenshot einer Bestellwebsite:
"Zusätzliche Bemerkungen (bitte keine Emojis verwenden):"
darunter ein leeres Eingabefeld
ALT text

Screenshot einer Bestellwebsite: "Zusätzliche Bemerkungen (bitte keine Emojis verwenden):" darunter ein leeres Eingabefeld

@kagan@wandering.shop

Fun tip for anyone who's wondering how I got the "hashtags" in my last toot to not be *actual* hashtags: after the # symbol, and before the next letter, I put a Unicode Word Joiner character. That breaks up the string so it no longer counts as a hashtag, but also makes it so there can't be a line-break after the # symbol.

unicode-explorer.com/c/2060

You can type one in Linux Mint by doing Ctrl+Shift+U then "2060" and Enter.

unicode-explorer.com

⁠ U+2060 WORD JOINER - Unicode Explorer

⁠ U+2060 WORD JOINER, copy and paste, unicode character symbol info, commonly abbreviated WJ, a zero width non-breaking space (only), intended for disambiguation of functions for byte order mark

@pandoc@fosstodon.org

Off-label uses of pandoc: conversion between text encodings.

E.g., UTF-8 to UTF-16:

echo 'X' | pandoc lua -e 'io.write(pandoc.text.toencoding(io.read"a", "utf-16"))'

Other direction:

echo 'X' | pandoc lua -e 'io.write(pandoc.text.fromencoding(io.read"a", "utf-16"))'

The set of supported encodings is platform dependent, but always includes UTF-8, UTF-16, UTF-32, and latin1.

@pandoc@fosstodon.org

Off-label uses of pandoc: conversion between text encodings.

E.g., UTF-8 to UTF-16:

echo 'X' | pandoc lua -e 'io.write(pandoc.text.toencoding(io.read"a", "utf-16"))'

Other direction:

echo 'X' | pandoc lua -e 'io.write(pandoc.text.fromencoding(io.read"a", "utf-16"))'

The set of supported encodings is platform dependent, but always includes UTF-8, UTF-16, UTF-32, and latin1.

@mikaeru@mastodon.social

There is a very interesting article about gender-inclusive pronouns in Chinese, including mentions of characters yet to be added to the Unicode set, making use of Ideographic Description Sequences (IDS): ⿰无也, ⿰㐅也, ⿰男也...

Janet Davey. (2025). Taking "TA" Beyond the Binary: In Search of Multimodal Gender-inclusive Pronouns in Chinese. Image & Narrative, 25(03), 131–163. Retrieved from imageandnarrative.be/index.php

🔗 [PDF] imageandnarrative.be/index.php

imageandnarrative.be

View of Taking "TA" Beyond the Binary

@mikaeru@mastodon.social

Unicode 17.0 introduces five new CJK Unified Ideographs related to Chinese personal pronouns, four of them having been proposed by Andrew West (BabelStone):

« The other Chinese pronoun coming to Unicode v. 17.0 next year, in addition to ⿰㐅也 (3p gender-neutral, ⿰男也 (3p explicitly male), ⿱妳心 ( f. equivalent of 您), ⿱我心 (Taiwanese 1p plural), is ⿱她心 (f. equivalent to 怹) »

🔗 bsky.app/profile/babelstone.co

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns
ALT text

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns

@blog@shkspr.mobi

A small collection of text-only websites

shkspr.mobi/blog/2025/12/a-sma

A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt

Obviously a webpage without links is like a fish without a bicycle, but the joy of the web is that there are no gatekeepers. People can try new concepts and, if enough people join in, it becomes normal. I'm not saying the plain-text is the best web experience. But it is an experience. Perfect if you like your browsing fast, simple, and readable. There are no cookie banners, pop-ups, permission prompts, autoplaying videos, or garish colour schemes.

I'm certainly not the first person to do this, so I thought it might be fun to gather a list of websites which you browse in text-only mode. If you know of any more - including your own site - please drop a comment in the box!

If you'd like to add a site, please get in touch. The rules are simple - content which has the MIME type of text/plain. No HTML, no multimedia, no RTF, no XML, no ANSI colour escape sequences.

Emoji are fine though; emoji are cool.

@blog@shkspr.mobi

A small collection of text-only websites

shkspr.mobi/blog/2025/12/a-sma

A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt

Obviously a webpage without links is like a fish without a bicycle, but the joy of the web is that there are no gatekeepers. People can try new concepts and, if enough people join in, it becomes normal. I'm not saying the plain-text is the best web experience. But it is an experience. Perfect if you like your browsing fast, simple, and readable. There are no cookie banners, pop-ups, permission prompts, autoplaying videos, or garish colour schemes.

I'm certainly not the first person to do this, so I thought it might be fun to gather a list of websites which you browse in text-only mode. If you know of any more - including your own site - please drop a comment in the box!

If you'd like to add a site, please get in touch. The rules are simple - content which has the MIME type of text/plain. No HTML, no multimedia, no RTF, no XML, no ANSI colour escape sequences.

Emoji are fine though; emoji are cool.

@timotimo@peoplemaking.games

It was the best of times, it was the

Screenshot of a unicode information website showing an image (empty) of the character called "INVISIBLE TIMES", which is codepoint U+2062
ALT text

Screenshot of a unicode information website showing an image (empty) of the character called "INVISIBLE TIMES", which is codepoint U+2062

@timotimo@peoplemaking.games

It was the best of times, it was the

Screenshot of a unicode information website showing an image (empty) of the character called "INVISIBLE TIMES", which is codepoint U+2062
ALT text

Screenshot of a unicode information website showing an image (empty) of the character called "INVISIBLE TIMES", which is codepoint U+2062

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2025-12-24): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

home.unicode.org/membership/me

Salesforce is back, as if by magic, just in time for Christmas... A true miracle!

Full members (voting) of the Unicode Consortium (2025-12-24): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2025-12-24): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Today (April Fools' Day), Adobe is apparently back to the list of full members (voting) of the Unicode Consortium, but for how long this time: one full year?

« Ça s’en va et ça revient
C’est fait de tout petits riens
Ça se chante et ça se danse
Et ça revient, ça se retient
Comme une chanson populaire »

Full members (voting) of the Unicode Consortium: Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

home.unicode.org/membership/me

Full members (voting) of the Unicode Consortium: Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium: Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@mikaeru@mastodon.social · Reply to Michel Mariani

Full members (voting) of the Unicode Consortium (2025-12-24): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

home.unicode.org/membership/me

Salesforce is back, as if by magic, just in time for Christmas... A true miracle!

Full members (voting) of the Unicode Consortium (2025-12-24): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium (2025-12-24): Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@reiver@mastodon.social · Reply to @reiver ⊼ (Charles) :batman:

This probably means that someone should modernize HTTP by creating HTTP/1.4.

mastodon.social

@reiver ⊼ (Charles) :batman: (@reiver@mastodon.social)

Google more-or-less created 2 new versions of the HTTP protocol — HTTP/2 and HTTP/3 — But didn't bother make either of them (officially) support UTF-8 in the HTTP request. #HTTP #Unicode #UTF8 #WorldWideWeb

@SnoopJ@hachyderm.io

the most important part of history is when a mouse fell out of a light fixture and got added to the count of members present at a Technical Committee meeting (9 Nov 2016)

unicode.org/L2/L2016/16325.htm

Screenshot of meeting notes for UTC Meeting 149. Text reads:

Mouse now present. 6.502 members represented.

[149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.
ALT text

Screenshot of meeting notes for UTC Meeting 149. Text reads: Mouse now present. 6.502 members represented. [149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.

@mikaeru@mastodon.social

Generally, new CJK Ideographs proposed by members of the IRG (Ideographic Research Group) go through several rounds of exchanges/discussions until they get approved or possibly postponed or rejected.

For instance, here is the page dedicated to UK-20538 ⿰㐅也 (with images as "pieces of evidence"), which eventually made its way to Unicode 17.0, encoded as U+323BF 𲎿 :

🔗 hc.jsecs.org/irg/ws2021/app/?i

hc.jsecs.org

00029 | ⿰㐅也 | WS2021v7.0

@mikaeru@mastodon.social

Unicode 17.0 introduces five new CJK Unified Ideographs related to Chinese personal pronouns, four of them having been proposed by Andrew West (BabelStone):

« The other Chinese pronoun coming to Unicode v. 17.0 next year, in addition to ⿰㐅也 (3p gender-neutral, ⿰男也 (3p explicitly male), ⿱妳心 ( f. equivalent of 您), ⿱我心 (Taiwanese 1p plural), is ⿱她心 (f. equivalent to 怹) »

🔗 bsky.app/profile/babelstone.co

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns
ALT text

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns

@amake@mastodon.social

I see various and libraries offering functions for detecting kanji characters, but they almost always do this in a limited way that misses a huge number of characters, i.e. nothing beyond the BMP, or even missing ranges in the BMP.

The only way to do this right is to

1. Work with codepoints, not UTF-16 code units

2. Look at the Unicode script property, which should be `Han` for kanji/hanzi

I used the new Unicode script matchers in Orgro (orgro.org/) to improve text reflow for Japanese and Chinese text.

Previously all text would reflow like the Latin text above—with a space where line breaks were. Now I remove the space when appropriate based on the script of the abutting non-whitespace characters.

Screen capture of Orgro demonstrating text reflow that correctly handles spaces between Japanese and Chinese characters
ALT text

Screen capture of Orgro demonstrating text reflow that correctly handles spaces between Japanese and Chinese characters

I used the new Unicode script matchers in Orgro (orgro.org/) to improve text reflow for Japanese and Chinese text.

Previously all text would reflow like the Latin text above—with a space where line breaks were. Now I remove the space when appropriate based on the script of the abutting non-whitespace characters.

Screen capture of Orgro demonstrating text reflow that correctly handles spaces between Japanese and Chinese characters
ALT text

Screen capture of Orgro demonstrating text reflow that correctly handles spaces between Japanese and Chinese characters

@amake@mastodon.social

I see various and libraries offering functions for detecting kanji characters, but they almost always do this in a limited way that misses a huge number of characters, i.e. nothing beyond the BMP, or even missing ranges in the BMP.

The only way to do this right is to

1. Work with codepoints, not UTF-16 code units

2. Look at the Unicode script property, which should be `Han` for kanji/hanzi

@TheKeystoneCollective@infosec.exchange

Unicode isn't a standard. It's a geopolitical fever dream where linguistics, cyber warfare, influence ops, censorship, OSINT, and national identity all collide.

Homoglyph attacks, script politics, emoji diplomacy, etc.

"The Geopolitics of Unicode: How Scripts, Fonts, and Character Sets Become Cybersecurity Issues"

New read at:
keystone-collective.org/the-ge

@mikaeru@mastodon.social

Unicode Emoji: Pan-CJK Flags

• <U+1F1E8, U+1F1F3> flag: China [CN]
• <U+1F1ED, U+1F1F0> flag: Hong Kong SAR China [HK]
• <U+1F1EF, U+1F1F5> flag: Japan [JP]
• <U+1F1F0, U+1F1F5> flag: North Korea [KP]
• <U+1F1F0, U+1F1F7> flag: South Korea [KR]
• <U+1F1F2, U+1F1F4> flag: Macao SAR China [MO]
• <U+1F1F2, U+1F1FE> flag: Malaysia [MY]
• <U+1F1F8, U+1F1EC> flag: Singapore [SG]
• <U+1F1F9, U+1F1FC> flag: Taiwan [TW]
• <U+1F1FB, U+1F1F3> flag: Vietnam [VN]

Unicode Emoji: Pan-CJK Flags

🇨🇳🇭🇰🇯🇵🇰🇵🇰🇷🇲🇴🇲🇾🇸🇬🇹🇼🇻🇳
ALT text

Unicode Emoji: Pan-CJK Flags 🇨🇳🇭🇰🇯🇵🇰🇵🇰🇷🇲🇴🇲🇾🇸🇬🇹🇼🇻🇳

@mikaeru@mastodon.social

The latest post on the Unicode Blog gives some important details about the future character repertoire in Unicode 18.0, notably the addition of 11,328 "Small Seal" ideographic characters, plus 965 "Jurchen" characters and radicals . It also offers very clear insights about the work of the UTC (Unicode Technical Committee) on CJK & Unihan characters, in collaboration with the IRG (Ideographic Research Group).

🔗 blog.unicode.org/2025/12/utc-1

blog.unicode.org

UTC #185 Highlights

  Unicode Technical Committee meeting #185 was held October 27 – 29 in Cupertino, CA, hosted by Apple. Here are some highlights. Starting th...

@mikaeru@mastodon.social

The Ideographic Research Group (IRG) is responsible for preparing and reviewing sets of CJK unified ideographs to be included in the Unicode Standard.

It has recently made available a useful list of so-called disunified CJK ideographs, coming with images of glyphs and IRG source references, which also provides links to documents giving the rationale behind each disunification:

🔗 unicode.org/irg/disunified.html

unicode.org

IRG Disunified Ideographs

@TheKeystoneCollective@infosec.exchange

Unicode isn't a standard. It's a geopolitical fever dream where linguistics, cyber warfare, influence ops, censorship, OSINT, and national identity all collide.

Homoglyph attacks, script politics, emoji diplomacy, etc.

"The Geopolitics of Unicode: How Scripts, Fonts, and Character Sets Become Cybersecurity Issues"

New read at:
keystone-collective.org/the-ge

@mikaeru@mastodon.social

Unicode 17.0 introduces five new CJK Unified Ideographs related to Chinese personal pronouns, four of them having been proposed by Andrew West (BabelStone):

« The other Chinese pronoun coming to Unicode v. 17.0 next year, in addition to ⿰㐅也 (3p gender-neutral, ⿰男也 (3p explicitly male), ⿱妳心 ( f. equivalent of 您), ⿱我心 (Taiwanese 1p plural), is ⿱她心 (f. equivalent to 怹) »

🔗 bsky.app/profile/babelstone.co

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns
ALT text

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns

@mikaeru@mastodon.social

Unicode 17.0 introduces five new CJK Unified Ideographs related to Chinese personal pronouns, four of them having been proposed by Andrew West (BabelStone):

« The other Chinese pronoun coming to Unicode v. 17.0 next year, in addition to ⿰㐅也 (3p gender-neutral, ⿰男也 (3p explicitly male), ⿱妳心 ( f. equivalent of 您), ⿱我心 (Taiwanese 1p plural), is ⿱她心 (f. equivalent to 怹) »

🔗 bsky.app/profile/babelstone.co

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns
ALT text

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns

@mikaeru@mastodon.social

> This increases the number of encoded CJK ideographs to over 100,000!

十万字【じゅうまんじ】!

mastodon.social

Michel Mariani (@mikaeru@mastodon.social)

New additions include 4,298 additional CJK unified ideographs in a new block, CJK Unified Ideographs Extension J, as well as 18 other CJK ideographs added to the existing Extension C and Extension E blocks. This increases the number of encoded CJK ideographs to over 100,000! Also, nearly 2,500 already-encoded CJK ideographs are horizontally extended by the addition of source references and glyphs reflecting use of those ideographs in China and Korea. 🔗 https://blog.unicode.org/2025/09/unicode-170-release-announcement.html #Unicode #CJK

@mikaeru@mastodon.social

RE: mastodon.social/@mikaeru/11556

New additions include 4,298 additional CJK unified ideographs in a new block, CJK Unified Ideographs Extension J, as well as 18 other CJK ideographs added to the existing Extension C and Extension E blocks.

This increases the number of encoded CJK ideographs to over 100,000!

Also, nearly 2,500 already-encoded CJK ideographs are horizontally extended by the addition of source references and glyphs reflecting use of those ideographs in China and Korea.

🔗 blog.unicode.org/2025/09/unico

blog.unicode.org

Unicode 17.0 Release Announcement

Announcing The Unicode® Standard, Version 17.0 The Unicode Standard is the foundation for all global digital communications, providing the e...

@mikaeru@mastodon.social

New additions include 4,298 additional CJK unified ideographs in a new block, CJK Unified Ideographs Extension J, as well as 18 other CJK ideographs added to the existing Extension C and Extension E blocks.

This increases the number of encoded CJK ideographs to over 100,000!

Also, nearly 2,500 already-encoded CJK ideographs are horizontally extended by the addition of source references and glyphs reflecting use of those ideographs in China and Korea.

🔗 blog.unicode.org/2025/09/unico

blog.unicode.org

Unicode 17.0 Release Announcement

Announcing The Unicode® Standard, Version 17.0 The Unicode Standard is the foundation for all global digital communications, providing the e...

@SnoopJ@hachyderm.io

the most important part of history is when a mouse fell out of a light fixture and got added to the count of members present at a Technical Committee meeting (9 Nov 2016)

unicode.org/L2/L2016/16325.htm

Screenshot of meeting notes for UTC Meeting 149. Text reads:

Mouse now present. 6.502 members represented.

[149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.
ALT text

Screenshot of meeting notes for UTC Meeting 149. Text reads: Mouse now present. 6.502 members represented. [149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.

@mikaeru@mastodon.social

In Unicode, up to 11,328 "Small Seal" characters are finally making their way through the "Pipeline"...

"WG2 N5341 - Small Seal Code charts and Data set"
🔗 unicode.org/wg2/docs/n5341-Sma

"Topical Document List: Seal Script"
🔗 unicode.org/L2/topical/seal/

"Proposed New Characters: The Pipeline"
🔗 unicode.org/alloc/Pipeline.html

Screenshot of the "Small Seal" forms of 馬 (Horse) in the WG2 document n5341: 	"Small Seal Codecharts and Data Set"
ALT text

Screenshot of the "Small Seal" forms of 馬 (Horse) in the WG2 document n5341: "Small Seal Codecharts and Data Set"

@mikaeru@mastodon.social

New in Unicopedia Sinica:

- Added all Unihan-related utilities from Unicopedia Plus.
- Added typeface selector between serif and sans-serif in the Pan-CJK Font Variants utility.

Planned:

- Utilities for non-Han scripts: Khitan Small Script, Nüshu, Tangut.
- Utilities for Jurchen, Small Seal (Unicode 18.0?)

🔗 codeberg.org/tonton-pixel/unic

Screenshot of Unicopedia Sinica app: Unihan Total Strokes utility
ALT text

Screenshot of Unicopedia Sinica app: Unihan Total Strokes utility

@mikaeru@mastodon.social

New in Unicopedia Sinica:

- Added all Unihan-related utilities from Unicopedia Plus.
- Added typeface selector between serif and sans-serif in the Pan-CJK Font Variants utility.

Planned:

- Utilities for non-Han scripts: Khitan Small Script, Nüshu, Tangut.
- Utilities for Jurchen, Small Seal (Unicode 18.0?)

🔗 codeberg.org/tonton-pixel/unic

Screenshot of Unicopedia Sinica app: Unihan Total Strokes utility
ALT text

Screenshot of Unicopedia Sinica app: Unihan Total Strokes utility

@mikaeru@mastodon.social

New in Unicopedia Plus:

- All Unihan-related utilities have been moved to Unicopedia Sinica.
- All Unikemet-related utilities have been moved to Unicopedia Ægypta.

🔗 codeberg.org/tonton-pixel/unic
🔗 codeberg.org/tonton-pixel/unic
🔗 codeberg.org/tonton-pixel/unic

Screenshot of Unicopedia Plus app: log(😅) =💧log(😄) [Math Geekiness]
ALT text

Screenshot of Unicopedia Plus app: log(😅) =💧log(😄) [Math Geekiness]

@mikaeru@mastodon.social

New in Unicopedia Plus:

- All Unihan-related utilities have been moved to Unicopedia Sinica.
- All Unikemet-related utilities have been moved to Unicopedia Ægypta.

🔗 codeberg.org/tonton-pixel/unic
🔗 codeberg.org/tonton-pixel/unic
🔗 codeberg.org/tonton-pixel/unic

Screenshot of Unicopedia Plus app: log(😅) =💧log(😄) [Math Geekiness]
ALT text

Screenshot of Unicopedia Plus app: log(😅) =💧log(😄) [Math Geekiness]

@mikaeru@mastodon.social · Reply to Michel Mariani
@mikaeru@mastodon.social

The "official" Unicode Regular Expressions (UTS #18) document, dated February 8, 2022, has never been updated since then, and the four new Unicode properties introduced in Unicode 15.1 are only listed in the Proposed Update *draft*, dated May 11, 2023...

This could explain why , , and the framework () trigger an "invalid property" error for the /\p{IDS_Unary_Operator}/u in JavaScript, while /\p{IDS_Binary_Operator}/u is ok...

@argv_minus_one@mastodon.sdf.org

Back in the 1990s, I was kind of annoyed by people's fondness for misusing the grave accent character ` as an open-quote character. They would write quoted text ``like this''.

I assume it looked good on some 1970s terminal or another, but it looked atrocious in your average '90s GUI font.

Thankfully, came along and defined actual open-quote and close-quote characters, and this whole issue exists largely in the past now.

@mikaeru@mastodon.social

“I'm still waiting for him to learn about Unicode, then mandate US ASCII on all government websites”
[Andrew West 魏安 - January 2025]

bsky.app/profile/babelstone.co

bsky.app

Andrew West 魏安 (@babelstone.co.uk)

I'm still waiting for him to learn about Unicode, then mandate US ASCII on all government websites

@mikaeru@mastodon.social

Some people in the US are possibly nostalgic of the "ASCII" acronym where "A" stands for "American"... Unicode is definitely more "universal", some might even say "woke":
- It encodes characters of writing systems from all around the world.
- The Script Encoding Initiative (SEI) comes from the University of Berkeley, CA.
- It encodes "diversity" symbols such as ♀♂⚢⚣⚤⚥⚦⚧⚨⚩⚲, 🏳️‍🌈 🏳️‍⚧️, or even 🇪🇺 🇺🇳.
- More than two-thirds of the Unicode characters originate from China.

@mikaeru@mastodon.social

Some people in the US are possibly nostalgic of the "ASCII" acronym where "A" stands for "American"... Unicode is definitely more "universal", some might even say "woke":
- It encodes characters of writing systems from all around the world.
- The Script Encoding Initiative (SEI) comes from the University of Berkeley, CA.
- It encodes "diversity" symbols such as ♀♂⚢⚣⚤⚥⚦⚧⚨⚩⚲, 🏳️‍🌈 🏳️‍⚧️, or even 🇪🇺 🇺🇳.
- More than two-thirds of the Unicode characters originate from China.

@mikaeru@mastodon.social

According to the "Can I Unicode‽" web page, as of today, the navigator is still "stuck" in Unicode 15.1, while the latest version of is 17.0!

mathiasbynens.github.io/caniun

The fact that the framework is based on probably explains why it is still lagging behind too...

Supporting Unicode 16.0 would allow me to produce a final stable version of my Unicopedia Plus app, before I can start working on a version for Unicode 17.0.

mathiasbynens.github.io

Can I Unicode‽ Unicode support across JavaScript engines

@mikaeru@mastodon.social

Until now, I've been able to provide a working (pre-release though) edition of my Unicopedia Plus app, targeting a specific version not yet supported by the framework, by embedding a copy of all the up-to-date Unicode data files, and making use of the `regexpu-core` module to emulate the most "critical" regular expressions, but this is merely a workaround, not what it has been designed for in the first place...

github.com/mathiasbynens/regex

github.com

GitHub - mathiasbynens/regexpu-core: regexpu’s core functionality, i.e. `rewritePattern(pattern, flag, options)`, which enables rewriting regular expressions that make use of the ES6 `u` flag into equivalent ES5-compatible regular expression patterns.

regexpu’s core functionality, i.e. `rewritePattern(pattern, flag, options)`, which enables rewriting regular expressions that make use of the ES6 `u` flag into equivalent ES5-compatible regular exp...

@mikaeru@mastodon.social

As you might expect, my main application Unicopedia Plus relies heavily on ...

Today, I updated the framework to its latest major version 39.0.0, hoping it would at last bring full support to Unicode 16.0, published by the UTC in September 2024 , but unfortunately no; it is still stuck in Unicode 15.1, published in September 2023! Moreover, Unicode 17.0 has already been officially released...

🔗 codeberg.org/tonton-pixel/unic

codeberg.org

unicopedia-plus

Developer-oriented set of Unicode, Unihan, Unikemet & emoji utilities wrapped into one single app, built with Electron.

@kirschwipfel@nerdculture.de

Wer hat Details dazu, wie sich der via versteckt?

Der Wurm ist sehr ausgefeilt, mich interessiert jedoch dieser Aspekt besonders, weil dadurch angeblich auch "normale" Code-Analyser es nicht erkennen, aber der JavaScript-Interpreter es akzeptiert. Das wurde ich mir gerne mit anderen Interpreten und anderen Editoren ansehen.

Perfekt wäre, wenn jemand den Wurm (oder Teile davon) hätte. Ich nehme aber auch detaillierte Beschreibungen, mit denen ich FAS nachstellen könnte.

@reiver@mastodon.social · Reply to @reiver ⊼ (Charles) :batman:

This probably means that someone should modernize HTTP by creating HTTP/1.4.

mastodon.social

@reiver ⊼ (Charles) :batman: (@reiver@mastodon.social)

Google more-or-less created 2 new versions of the HTTP protocol — HTTP/2 and HTTP/3 — But didn't bother make either of them (officially) support UTF-8 in the HTTP request. #HTTP #Unicode #UTF8 #WorldWideWeb

@screambiogenesis@mastodon.social

Thanks to a ~3700 year old clay disc found in Crete a hundred years ago, Unicode has a dude with a mohawk.

U+101D1: 𐇑

Detail shot of the Phaistos Disc, a presumably Minoan artifact from the second millennium BCE, found in southern Crete in 1908.

In the picture, many pictographic characters from the spiral track on the as-yet-undeciphered disc are visible, such as shields, eagles, horns, combs, tuna fish, and more.

The "plumed head" character is seen at least four times in this small inlay, indicating that punk rock was very important to Minoan culture.
ALT text

Detail shot of the Phaistos Disc, a presumably Minoan artifact from the second millennium BCE, found in southern Crete in 1908. In the picture, many pictographic characters from the spiral track on the as-yet-undeciphered disc are visible, such as shields, eagles, horns, combs, tuna fish, and more. The "plumed head" character is seen at least four times in this small inlay, indicating that punk rock was very important to Minoan culture.

@screambiogenesis@mastodon.social

Thanks to a ~3700 year old clay disc found in Crete a hundred years ago, Unicode has a dude with a mohawk.

U+101D1: 𐇑

Detail shot of the Phaistos Disc, a presumably Minoan artifact from the second millennium BCE, found in southern Crete in 1908.

In the picture, many pictographic characters from the spiral track on the as-yet-undeciphered disc are visible, such as shields, eagles, horns, combs, tuna fish, and more.

The "plumed head" character is seen at least four times in this small inlay, indicating that punk rock was very important to Minoan culture.
ALT text

Detail shot of the Phaistos Disc, a presumably Minoan artifact from the second millennium BCE, found in southern Crete in 1908. In the picture, many pictographic characters from the spiral track on the as-yet-undeciphered disc are visible, such as shields, eagles, horns, combs, tuna fish, and more. The "plumed head" character is seen at least four times in this small inlay, indicating that punk rock was very important to Minoan culture.

@michels@mastodon.social

That was new to me. You can combine any character with COMBINING ENCLOSING KEYCAP (U+20E3) to get a character for a keyboard shortcut.

@rnd@toot.cat

one thing i don't understand at all is why is specifically set up so codepoints larger than U+10FFFF are treated as invalid, not even "reserved for future use"

are we completely sure that we NEVER end up needing more than 1114112 codepoints? sure, right now we're at 159801, less than 15%, but who knows what will happen in the future

@frontenddogma@mas.to
@frontenddogma@mas.to
@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@amake@mastodon.social

Newly covered code points in iOS 26.

I have to admit I have not updated anything to 26 yet. At least on Mac I usually wait for issues to be cleared up, but this one might take me a while...

㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟㇠㇡㇢㇣㇤㇥𞓐𞓑𞓒𞓓𞓔𞓕𞓖𞓗𞓘𞓙𞓚𞓛𞓜𞓝𞓞𞓟𞓠𞓡𞓢𞓣𞓤𞓥𞓦𞓧𞓨𞓩𞓪𞓫𞓮𞓯𞓬𞓭𞓰𞓱𞓲𞓳𞓴𞓵𞓶𞓷𞓸𞓹𠁣𠃛𠊎𠖄𠖫𠗻𠘆𠜖𠞩𠞭𠠃𠠝𠠫𠢕𠴭𠺅𠺣𠻞𡌴𡟓𡨞𡳞𡽜𢄧𢎙𢒉𢓜𢛟𢜳𢬳𢯭𢯾𢱤𢲴𢳪𢶀𢺴𢻷𢼌𢼛𢿞𣁳𣍐𣗺𣦼𣩈𣮈𣲩𣸤𣼎𤁢𤊶𤍒𤐙𤐰𤖯𤘅𤞚𤡯𤲍𤶃𤸁𤺅𤺪𤿎𥉔𥌚𥍉𥏘𥐵𥯟𥯥𥰔𥴊𥽕𦃓𦉎𦊓𦒨𦘅𦜆𧉅𧉟𧌄𧜞𧩣𧮙𧰵𧺤𧻴𧿳𨂿𨅔𨒇𨢑𩏠𩑾𩔵𩚨𩛩𩜄𩜇𩜰𩟗𩣳𩨑𩵱𩸙𩼧𪀋𪐞𪖐𪖶𪘒𪜶𪢼𪳕𪹚𫓩𫜼𫜽𫝏𫝘𫝙𫝞𫝺𫝻𫞭𫞼𫟂𫟊𫟧𫠄𫠛𫣆𫰡𬈜𬏛𬠖𬤐𬦰𬬺𬮤𮀎𮣳𮭦𮯴𰣻𰵝𰵞𰵧𰹬𰾫𱂐𱮒𱱿𱳪𲂎𲓖

@amake@mastodon.social

Newly covered code points in iOS 26.

I have to admit I have not updated anything to 26 yet. At least on Mac I usually wait for issues to be cleared up, but this one might take me a while...

㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟㇠㇡㇢㇣㇤㇥𞓐𞓑𞓒𞓓𞓔𞓕𞓖𞓗𞓘𞓙𞓚𞓛𞓜𞓝𞓞𞓟𞓠𞓡𞓢𞓣𞓤𞓥𞓦𞓧𞓨𞓩𞓪𞓫𞓮𞓯𞓬𞓭𞓰𞓱𞓲𞓳𞓴𞓵𞓶𞓷𞓸𞓹𠁣𠃛𠊎𠖄𠖫𠗻𠘆𠜖𠞩𠞭𠠃𠠝𠠫𠢕𠴭𠺅𠺣𠻞𡌴𡟓𡨞𡳞𡽜𢄧𢎙𢒉𢓜𢛟𢜳𢬳𢯭𢯾𢱤𢲴𢳪𢶀𢺴𢻷𢼌𢼛𢿞𣁳𣍐𣗺𣦼𣩈𣮈𣲩𣸤𣼎𤁢𤊶𤍒𤐙𤐰𤖯𤘅𤞚𤡯𤲍𤶃𤸁𤺅𤺪𤿎𥉔𥌚𥍉𥏘𥐵𥯟𥯥𥰔𥴊𥽕𦃓𦉎𦊓𦒨𦘅𦜆𧉅𧉟𧌄𧜞𧩣𧮙𧰵𧺤𧻴𧿳𨂿𨅔𨒇𨢑𩏠𩑾𩔵𩚨𩛩𩜄𩜇𩜰𩟗𩣳𩨑𩵱𩸙𩼧𪀋𪐞𪖐𪖶𪘒𪜶𪢼𪳕𪹚𫓩𫜼𫜽𫝏𫝘𫝙𫝞𫝺𫝻𫞭𫞼𫟂𫟊𫟧𫠄𫠛𫣆𫰡𬈜𬏛𬠖𬤐𬦰𬬺𬮤𮀎𮣳𮭦𮯴𰣻𰵝𰵞𰵧𰹬𰾫𱂐𱮒𱱿𱳪𲂎𲓖

@bortzmeyer@mastodon.gougere.fr
@bortzmeyer@mastodon.gougere.fr
@Edent@mastodon.social

Android will *not* be getting most of the Unicode 17 updates.

Some of its fonts are over a decade out of date - and Google refuses to re-use its own Noto font stack.

I've raised the issue at:
issuetracker.google.com/issues

If you're a Googler please ask someone to prioritise this issue. Can everyone else please hit the +1 button.

issuetracker.google.com

Google Issue Tracker

@triker@mstdn.plus

I just learned how to type unicode letters and dingbats in Linux!

Ctrl + Shift + U press all 3 keys at once then let all three letters go.

then type in the unicode and press enter.

en.wikipedia.org/wiki/List_of_

IE.

Ctrl + Shift + U 2713 is a tick or check mark

Similarly, I can write ñ (n tilde) with:

ctrl + shift + U 00f1

See dingbats block for more check mark choices.
en.wikipedia.org/wiki/Dingbats

All of unicode here:
home.unicode.org/

home.unicode.org

Home

@crickxson@post.lurk.org

Each time i use shapecatcher.com
I'm gratefull to to have build it and keep it running.
"You know what some looks like, but you've forgotten its name or its code point. Now what do you do? is a new website, that helps you to find specific Unicode characters, just by their shape. Currently about 10000 of the most important Unicode characters are compared to your sketch and are analysed for similarities.
Under the hood, Shapecatcher uses so called " contexts" to find similarities between two shapes. Shape contexts, a robust mathematical way of describing the concept of similarity between shapes, is a feature descriptor first proposed by and ."

Screen capture of the drawing zone of ShapeCatcher.
ALT text

Screen capture of the drawing zone of ShapeCatcher.

1st results of the recognizing process
ALT text

1st results of the recognizing process

Following results of the recognizing process
ALT text

Following results of the recognizing process

Following results of the recognizing process
ALT text

Following results of the recognizing process

@crickxson@post.lurk.org

Each time i use shapecatcher.com
I'm gratefull to to have build it and keep it running.
"You know what some looks like, but you've forgotten its name or its code point. Now what do you do? is a new website, that helps you to find specific Unicode characters, just by their shape. Currently about 10000 of the most important Unicode characters are compared to your sketch and are analysed for similarities.
Under the hood, Shapecatcher uses so called " contexts" to find similarities between two shapes. Shape contexts, a robust mathematical way of describing the concept of similarity between shapes, is a feature descriptor first proposed by and ."

Screen capture of the drawing zone of ShapeCatcher.
ALT text

Screen capture of the drawing zone of ShapeCatcher.

1st results of the recognizing process
ALT text

1st results of the recognizing process

Following results of the recognizing process
ALT text

Following results of the recognizing process

Following results of the recognizing process
ALT text

Following results of the recognizing process

@timbray@cosocial.ca

Three small announcements:
1. RFC 9839, a guide to which Unicode characters you should never use: rfc-editor.org/rfc/rfc9839.htm
2. Blog piece with background and context, “RFC 9839 and Bad Unicode”: tbray.org/ongoing/When/202x/20
3. A little Go library that implements 9839’s exclusion subsets: github.com/timbray/RFC9839

github.com

GitHub - timbray/RFC9839: Go-language library to check for problematic Unicode code points

Go-language library to check for problematic Unicode code points - timbray/RFC9839

@timbray@cosocial.ca

Three small announcements:
1. RFC 9839, a guide to which Unicode characters you should never use: rfc-editor.org/rfc/rfc9839.htm
2. Blog piece with background and context, “RFC 9839 and Bad Unicode”: tbray.org/ongoing/When/202x/20
3. A little Go library that implements 9839’s exclusion subsets: github.com/timbray/RFC9839

github.com

GitHub - timbray/RFC9839: Go-language library to check for problematic Unicode code points

Go-language library to check for problematic Unicode code points - timbray/RFC9839

@timbray@cosocial.ca

Three small announcements:
1. RFC 9839, a guide to which Unicode characters you should never use: rfc-editor.org/rfc/rfc9839.htm
2. Blog piece with background and context, “RFC 9839 and Bad Unicode”: tbray.org/ongoing/When/202x/20
3. A little Go library that implements 9839’s exclusion subsets: github.com/timbray/RFC9839

github.com

GitHub - timbray/RFC9839: Go-language library to check for problematic Unicode code points

Go-language library to check for problematic Unicode code points - timbray/RFC9839

american: OwO
cyrilic:  ꙮшꙮ
armenian: ՕաՕ
georgian: ტოტ ႣⴍႣ
gothic:  𐍈𐌸𐍈
greek:  ΘωΘ ΩωΩ ΦωΦ ΟωΟ
coptic:  ⲐⲱⲐ ⲪⲱⲪ ⲞⲱⲞ
hebrew: סשס
ge'ez:  ዐሠዐ
chinese: 口山口
inuktitut: ᑭᓚᓗᑫ ᐁᓚᓗᐁ
vai:   ꖘꕀꖘ ꖴꕀꖴ
khmer: ឰឃឰ ២ឃ២ ៙ឃ៙
sinhala: ඞ෴ඞ ට෴ට මයම
tibetan: ༠ྻ ༠ ༠ྏ ༠
jap:    ᶘᵒᴥᵒᶅ

american: OwO
cyrilic:  ꙮшꙮ
armenian: ՕաՕ
georgian: ტოტ ႣⴍႣ
gothic:  𐍈𐌸𐍈
greek:  ΘωΘ ΩωΩ ΦωΦ ΟωΟ
coptic:  ⲐⲱⲐ ⲪⲱⲪ ⲞⲱⲞ
hebrew: סשס
ge'ez:  ዐሠዐ
chinese: 口山口
inuktitut: ᑭᓚᓗᑫ ᐁᓚᓗᐁ
vai:   ꖘꕀꖘ ꖴꕀꖴ
khmer: ឰឃឰ ២ឃ២ ៙ឃ៙
sinhala: ඞ෴ඞ ට෴ට මයම
tibetan: ༠ྻ ༠ ༠ྏ ༠
jap:    ᶘᵒᴥᵒᶅ

@mikaeru@mastodon.social

Beautifully crafted BabelStone Han font, by Andrew West 魏安

Han v. 15.1.3 is a free with over 57,000 Han characters (, , ), and 62,061 Unicode characters in total. It is a Song/Ming style (宋体/明體) font, with glyphs modelled on the official character forms used in the People's Republic of China, and is primarily intended for writing Modern Standard , Classical Chinese, and various Sinitic languages and dialects.

🔗 babelstone.co.uk/Fonts/Han.html

Repeated: 龙
U+9F99 U+31342 U+2EE5D
ALT text

Repeated: 龙 U+9F99 U+31342 U+2EE5D

@mikaeru@mastodon.social

New in the CJK Variations utility of Unicopedia Sinica:

- Support for the latest Ideographic Variation Database (IVD 2025), adding the new CAAPH Collection.

- Support for the updated BabelStone Collection (unregistered), based on the latest BabelStone Han font (v17.0.0 BETA), by Andrew C. West (魏安), 1960-2025 RIP (安息吧).

🔗 https://codeberg.org/tonton-pixel/unic

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4
ALT text

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B
ALT text

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B

@mikaeru@mastodon.social

New in the CJK Variations utility of Unicopedia Sinica:

- Support for the latest Ideographic Variation Database (IVD 2025), adding the new CAAPH Collection.

- Support for the updated BabelStone Collection (unregistered), based on the latest BabelStone Han font (v17.0.0 BETA), by Andrew C. West (魏安), 1960-2025 RIP (安息吧).

🔗 https://codeberg.org/tonton-pixel/unic

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4
ALT text

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B
ALT text

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@Timwi@nerdculture.de

I just found out that has segment-display digit characters. The below screenshot is all in one font (). The characters are U+1FBF0 to U+1FBF9. Unicode is gorgeous

@mikaeru@mastodon.social · Reply to Michel Mariani

@electronjs

No Electron support for the latest Unicode version is a major hindrance for my open-source Unicopedia Plus application, which I have to keep in Beta version for a long time because of that...

codeberg.org/tonton-pixel/unic

codeberg.org

unicopedia-plus

Developer-oriented set of Unicode, Unihan, Unikemet & emoji utilities wrapped into one single app, built with Electron.

@mikaeru@mastodon.social · Reply to Michel Mariani

@electronjs

No Electron support for the latest Unicode version is a major hindrance for my open-source Unicopedia Plus application, which I have to keep in Beta version for a long time because of that...

codeberg.org/tonton-pixel/unic

codeberg.org

unicopedia-plus

Developer-oriented set of Unicode, Unihan, Unikemet & emoji utilities wrapped into one single app, built with Electron.

@michels@mastodon.social

I added typographic guides to my Unicode viewer. I first tried the new TextRenderer, but found it too limited. I then switched back to CoreText. However, I then noticed that SwiftUI was cutting off some parts of the glyphs. It seems that they don’t expect the glyphs to extend beyond their bounding box.

@michels@mastodon.social

I added typographic guides to my Unicode viewer. I first tried the new TextRenderer, but found it too limited. I then switched back to CoreText. However, I then noticed that SwiftUI was cutting off some parts of the glyphs. It seems that they don’t expect the glyphs to extend beyond their bounding box.

@mikaeru@mastodon.social

Apart from the issue of line formatting of plain text in the new Unicode contact form <support.unicode.org/osticket/o>, it appears that some pretty innocuous characters such as the vertical bar | or the degree sign ° are getting stripped out from the latest reports, in <unicode.org/review/pri526/> for instance.

Ironically enough, it seems that the Unicode contact form is not Unicode-conformant/compliant then. Maybe some kind of "Make ASCII Great Again" thing?

Example of vertical bar | character getting stripped out from a PRI 526 report
ALT text

Example of vertical bar | character getting stripped out from a PRI 526 report

Example of degree sign ° character getting stripped out from a PRI 526 report
ALT text

Example of degree sign ° character getting stripped out from a PRI 526 report

@mikaeru@mastodon.social

Apart from the issue of line formatting of plain text in the new Unicode contact form <support.unicode.org/osticket/o>, it appears that some pretty innocuous characters such as the vertical bar | or the degree sign ° are getting stripped out from the latest reports, in <unicode.org/review/pri526/> for instance.

Ironically enough, it seems that the Unicode contact form is not Unicode-conformant/compliant then. Maybe some kind of "Make ASCII Great Again" thing?

Example of vertical bar | character getting stripped out from a PRI 526 report
ALT text

Example of vertical bar | character getting stripped out from a PRI 526 report

Example of degree sign ° character getting stripped out from a PRI 526 report
ALT text

Example of degree sign ° character getting stripped out from a PRI 526 report

@amake@mastodon.social

The iOS 18.5 SDK finally came out and the only change for Unicode coverage is the *removal* of a bunch of Sinhala codepoints:

ඁ෦෧෨෩෪෫෬෭෮෯𑇡𑇢𑇣𑇤𑇥𑇦𑇧𑇨𑇩𑇪𑇫𑇬𑇭𑇮𑇯𑇰𑇱𑇲𑇳𑇴

(Those of you on iOS 18.4: Enjoy seeing those glyphs while you can!)

@amake@mastodon.social

The iOS 18.5 SDK finally came out and the only change for Unicode coverage is the *removal* of a bunch of Sinhala codepoints:

ඁ෦෧෨෩෪෫෬෭෮෯𑇡𑇢𑇣𑇤𑇥𑇦𑇧𑇨𑇩𑇪𑇫𑇬𑇭𑇮𑇯𑇰𑇱𑇲𑇳𑇴

(Those of you on iOS 18.4: Enjoy seeing those glyphs while you can!)

@mikaeru@mastodon.social · Reply to Michel Mariani

Unicode's new contact form at <support.unicode.org/osticket/o> is apparently an HTML editor "in disguise"; the only way I found to force it to keep the formatting of my plain text messages was to select the HTML mode and paste the text inside a <pre></pre> tag...

Still, some contents gets unexpectedly stripped out after submission of the report, like text between "<" and ">".

support.unicode.org

Unicode Consortium Support

customer support platform

@mikaeru@mastodon.social · Reply to Michel Mariani

Unicode's new contact form at <support.unicode.org/osticket/o> is apparently an HTML editor "in disguise"; the only way I found to force it to keep the formatting of my plain text messages was to select the HTML mode and paste the text inside a <pre></pre> tag...

Still, some contents gets unexpectedly stripped out after submission of the report, like text between "<" and ">".

support.unicode.org

Unicode Consortium Support

customer support platform

@Timwi@nerdculture.de

I just found out that has segment-display digit characters. The below screenshot is all in one font (). The characters are U+1FBF0 to U+1FBF9. Unicode is gorgeous

@Timwi@nerdculture.de

I just found out that has segment-display digit characters. The below screenshot is all in one font (). The characters are U+1FBF0 to U+1FBF9. Unicode is gorgeous

@mikaeru@mastodon.social

In case my feedback to the UTC gets garbled once again, here are the links to the plain text messages I attempted to submit through copy-paste from their new contact page <support.unicode.org/osticket/o>: no truly WYSIWYG editor, no basic preview mode either...

tonton-pixel.codeberg.page/PRI
tonton-pixel.codeberg.page/PRI
tonton-pixel.codeberg.page/PRI

I'm dreaming of a simple world without technology wanting to "help" us so much. We shouldn't have to struggle to achieve simple tasks...

@mikaeru@mastodon.social

In case my feedback to the UTC gets garbled once again, here are the links to the plain text messages I attempted to submit through copy-paste from their new contact page <support.unicode.org/osticket/o>: no truly WYSIWYG editor, no basic preview mode either...

tonton-pixel.codeberg.page/PRI
tonton-pixel.codeberg.page/PRI
tonton-pixel.codeberg.page/PRI

I'm dreaming of a simple world without technology wanting to "help" us so much. We shouldn't have to struggle to achieve simple tasks...

@mikaeru@mastodon.social

From time to time (since this represents a tremendous amount of translation/adaptation work), a French version of the "code charts" gets published by the Unicode Consortium: the latest one is for Unicode 16.0:

unicode.org/Public/16.0.0/char

This is especially useful for French speakers in , , , , etc. but may soon be obsolete for , in case it gets "absorbed" by a neighboring country whose official language is now English only...

@mikaeru@mastodon.social

From time to time (since this represents a tremendous amount of translation/adaptation work), a French version of the "code charts" gets published by the Unicode Consortium: the latest one is for Unicode 16.0:

unicode.org/Public/16.0.0/char

This is especially useful for French speakers in , , , , etc. but may soon be obsolete for , in case it gets "absorbed" by a neighboring country whose official language is now English only...

@mikaeru@mastodon.social

De temps en temps (cela représente un énorme travail d'adaptation), une version française des "code charts" est publiée par le Consortium Unicode, la dernière en date est pour Unicode 16.0:

unicode.org/Public/16.0.0/char

Malheureusement, celle-ci risque d'être bientôt obsolète pour les francophones de la belle province de Québec, dans le cas où celle-ci serait «absorbée» par un pays voisin dont la langue officielle est désormais uniquement l'anglais...

@mikaeru@mastodon.social

De temps en temps (cela représente un énorme travail d'adaptation), une version française des "code charts" est publiée par le Consortium Unicode, la dernière en date est pour Unicode 16.0:

unicode.org/Public/16.0.0/char

Malheureusement, celle-ci risque d'être bientôt obsolète pour les francophones de la belle province de Québec, dans le cas où celle-ci serait «absorbée» par un pays voisin dont la langue officielle est désormais uniquement l'anglais...

@SteveFaulkner@mastodon.social

👁️short note on emoji text alternative variations

"Unicode symbols do not have inbuilt text alternatives. They are exposed in the browser accessibility tree as a text symbol"

html5accessibility.com/stuff/2

html5accessibility.com

short note on emoji text alternative variations – HTML Accessibility

@SteveFaulkner@mastodon.social

👁️short note on emoji text alternative variations

"Unicode symbols do not have inbuilt text alternatives. They are exposed in the browser accessibility tree as a text symbol"

html5accessibility.com/stuff/2

html5accessibility.com

short note on emoji text alternative variations – HTML Accessibility

@mikaeru@mastodon.social

Unicopedia Anatolica is a developer-oriented set of utilities related to Anatolian hieroglyphs, wrapped into one single app, built with .

Repository: 🔗 codeberg.org/tonton-pixel/unic

Unicopedia Anatolica Social Preview
ALT text

Unicopedia Anatolica Social Preview

@mikaeru@mastodon.social

Unicopedia Ægypta is a developer-oriented set of utilities related to Egyptian hieroglyphs, wrapped into one single app, built with .

Repository: 🔗 codeberg.org/tonton-pixel/unic

Unicopedia Ægypta Social Preview
ALT text

Unicopedia Ægypta Social Preview

@mikaeru@mastodon.social
Unicopedia Plus Social Preview
ALT text

Unicopedia Plus Social Preview

@mikaeru@mastodon.social
@mikaeru@mastodon.social

:

U+2640 FEMALE SIGN
U+2642 MALE SIGN
U+26A2 DOUBLED FEMALE SIGN
U+26A3 DOUBLED MALE SIGN
U+26A4 INTERLOCKED FEMALE AND MALE SIGN
U+26A5 MALE AND FEMALE SIGN
U+26A6 MALE WITH STROKE SIGN
U+26A7 MALE WITH STROKE AND MALE AND FEMALE SIGN
U+26A8 VERTICAL MALE WITH STROKE SIGN
U+26A9 HORIZONTAL MALE WITH STROKE SIGN
U+26B2 NEUTER

Unicode Symbols: Diversity

♀♂⚢⚣⚤⚥⚦⚧⚨⚩⚲
ALT text

Unicode Symbols: Diversity ♀♂⚢⚣⚤⚥⚦⚧⚨⚩⚲

@mikaeru@mastodon.social

:

U+2764 U+FE0F U+1FA77 U+1F9E1 U+1F49B U+1F49A U+1F499 U+1FA75 U+1F49C U+1F90E U+1F5A4 U+1FA76 U+1F90D

U+1F49F U+2764 U+FE0F U+200D U+1F525 U+1F494 U+2764 U+FE0F U+200D U+1FA79 U+2763 U+FE0F U+1F498 U+1F493 U+1F497 U+1F496 U+1F49D U+1F495 U+1F49E

U+1F970 U+1F60D U+1F618 U+1F63B U+1F48C U+1FAF6 U+1FAF6 U+1F3FB U+1FAF6 U+1F3FC U+1FAF6 U+1F3FD U+1FAF6 U+1F3FE U+1FAF6 U+1F3FF U+1FAC0

Unicode Emoji: Hearts Galore

❤️🩷🧡💛💚💙🩵💜🤎🖤🩶🤍
💟❤️‍🔥💔❤️‍🩹❣️💘💓💗💖💝💕💞
🥰😍😘😻💌🫶🫶🏻🫶🏼🫶🏽🫶🏾🫶🏿🫀
ALT text

Unicode Emoji: Hearts Galore ❤️🩷🧡💛💚💙🩵💜🤎🖤🩶🤍 💟❤️‍🔥💔❤️‍🩹❣️💘💓💗💖💝💕💞 🥰😍😘😻💌🫶🫶🏻🫶🏼🫶🏽🫶🏾🫶🏿🫀

@mikaeru@mastodon.social

: &

U+1F4A6 U+1F4A7 U+1F979 U+1F639 U+1F63F

U+1F602 U+1F605 U+1F613 U+1F622 U+1F625 U+1F62A U+1F62D U+1F630 U+1F923 U+1F972 U+1F975

Unicode Emoji: Sweat & Tears

💦💧🥹😹😿
😂😅😓😢😥😪😭😰🤣🥲🥵
ALT text

Unicode Emoji: Sweat & Tears 💦💧🥹😹😿 😂😅😓😢😥😪😭😰🤣🥲🥵

@mikaeru@mastodon.social

:

U+1F473 U+1F473 U+1F3FB U+1F473 U+1F3FC U+1F473 U+1F3FD U+1F473 U+1F3FE U+1F473 U+1F3FF

U+1F478 U+1F478 U+1F3FB U+1F478 U+1F3FC U+1F478 U+1F3FD U+1F478 U+1F3FE U+1F478 U+1F3FF

Unicode Emoji: Skin Tones

👳➔👳🏻👳🏼👳🏽👳🏾👳🏿
👸➔👸🏻👸🏼👸🏽👸🏾👸🏿
ALT text

Unicode Emoji: Skin Tones 👳➔👳🏻👳🏼👳🏽👳🏾👳🏿 👸➔👸🏻👸🏼👸🏽👸🏾👸🏿

@mikaeru@mastodon.social

:

U+1F201 U+1F202 U+FE0F U+1F233 U+1F237 U+FE0F U+1F236 U+1F21A U+1F251 U+1F238 U+1F23A U+000A U+1F22F U+1F250 U+1F239 U+1F232 U+1F234 U+3297 U+FE0F U+3299 U+FE0F U+1F235

Unicode Emoji: Japanese Buttons

🈁🈂️🈳🈷️🈶🈚🉑🈸🈺
🈯🉐🈹🈲🈴㊗️㊙️🈵
ALT text

Unicode Emoji: Japanese Buttons 🈁🈂️🈳🈷️🈶🈚🉑🈸🈺 🈯🉐🈹🈲🈴㊗️㊙️🈵

Offering a new : ꙮ

The previously suggested symbol ⁂ is good for depict group and unity, but is poor in terms of associations: “3 snowflakes”.

Polish fediusers have noticed a piece of an old Russian manuscript, it says about ‘many-eyed seraphim’ (серафим многоокий). An unknown 15th-century monk played with the combination of the letters oo, turning them into a multi-eyed creature. The character found in only 1 manuscript, but despite this, it has been added into .

Not only does the symbol beautifully reflect the unity of the fediverse, but it also shows an all-seeing open-minded wise and powerful being (Ezekiel 1:18, 10:12 etc)

also: social.hackerspace.pl/@q3k/110

@achadwick@urbanists.social

Hey, fedi nerds! :boostRequest:

's Andy Mabbett (@Pigsonthewing) is asking whether anyone knows about any instances of the 's bench mark symbol appearing in actual print, on a page. Looks a bit like ⭱ or ⤒ but a broader arrow. Usually found carved on stone or brick all over the UK/ROI.

Their goal is to propose it as a Unicode symbol! community.openstreetmap.org/t/

Any known international usage of this symbol would doubtless be appreciated too

@openstreetmap

A non-print example.

The most common form of the symbol, although other variants exist. It's carved into a smooth block of stone on the side of a building or monument plinth, and it looks like a capital T with an upside down V overlaid onto it so that the two angled lines from the V come together with the T's vertical stroke to meet its horizontal stroke at a single point. All the bottom lines taper toward that point.  Looking at it another way, it's a horizontal line with an arrow pointing at it, saying "here, this level!"

They were used for marking height reference points during various surveys of the British isles.

Photo by Mike Taylor on geograph.org.uk, CC:by
ALT text

A non-print example. The most common form of the symbol, although other variants exist. It's carved into a smooth block of stone on the side of a building or monument plinth, and it looks like a capital T with an upside down V overlaid onto it so that the two angled lines from the V come together with the T's vertical stroke to meet its horizontal stroke at a single point. All the bottom lines taper toward that point. Looking at it another way, it's a horizontal line with an arrow pointing at it, saying "here, this level!" They were used for marking height reference points during various surveys of the British isles. Photo by Mike Taylor on geograph.org.uk, CC:by

Another non-print example.

This is another form of the symbol. I don't know how common. This one's a pre-cast metal (?) plaque with a serial number set into a wall. The arrow lacks the top bar, but above it, along with the O and S of Ordnance Survey, are some very specific-looking slots. Perhaps the slots accepted some sort of surveying equipment.

Photo by Gary Rogers on geograph. Links can be found in this thread below.
ALT text

Another non-print example. This is another form of the symbol. I don't know how common. This one's a pre-cast metal (?) plaque with a serial number set into a wall. The arrow lacks the top bar, but above it, along with the O and S of Ordnance Survey, are some very specific-looking slots. Perhaps the slots accepted some sort of surveying equipment. Photo by Gary Rogers on geograph. Links can be found in this thread below.

@achadwick@urbanists.social

Hey, fedi nerds! :boostRequest:

's Andy Mabbett (@Pigsonthewing) is asking whether anyone knows about any instances of the 's bench mark symbol appearing in actual print, on a page. Looks a bit like ⭱ or ⤒ but a broader arrow. Usually found carved on stone or brick all over the UK/ROI.

Their goal is to propose it as a Unicode symbol! community.openstreetmap.org/t/

Any known international usage of this symbol would doubtless be appreciated too

@openstreetmap

A non-print example.

The most common form of the symbol, although other variants exist. It's carved into a smooth block of stone on the side of a building or monument plinth, and it looks like a capital T with an upside down V overlaid onto it so that the two angled lines from the V come together with the T's vertical stroke to meet its horizontal stroke at a single point. All the bottom lines taper toward that point.  Looking at it another way, it's a horizontal line with an arrow pointing at it, saying "here, this level!"

They were used for marking height reference points during various surveys of the British isles.

Photo by Mike Taylor on geograph.org.uk, CC:by
ALT text

A non-print example. The most common form of the symbol, although other variants exist. It's carved into a smooth block of stone on the side of a building or monument plinth, and it looks like a capital T with an upside down V overlaid onto it so that the two angled lines from the V come together with the T's vertical stroke to meet its horizontal stroke at a single point. All the bottom lines taper toward that point. Looking at it another way, it's a horizontal line with an arrow pointing at it, saying "here, this level!" They were used for marking height reference points during various surveys of the British isles. Photo by Mike Taylor on geograph.org.uk, CC:by

Another non-print example.

This is another form of the symbol. I don't know how common. This one's a pre-cast metal (?) plaque with a serial number set into a wall. The arrow lacks the top bar, but above it, along with the O and S of Ordnance Survey, are some very specific-looking slots. Perhaps the slots accepted some sort of surveying equipment.

Photo by Gary Rogers on geograph. Links can be found in this thread below.
ALT text

Another non-print example. This is another form of the symbol. I don't know how common. This one's a pre-cast metal (?) plaque with a serial number set into a wall. The arrow lacks the top bar, but above it, along with the O and S of Ordnance Survey, are some very specific-looking slots. Perhaps the slots accepted some sort of surveying equipment. Photo by Gary Rogers on geograph. Links can be found in this thread below.

@mikaeru@mastodon.social · Reply to Michel Mariani

Today (April Fools' Day), Adobe is apparently back to the list of full members (voting) of the Unicode Consortium, but for how long this time: one full year?

« Ça s’en va et ça revient
C’est fait de tout petits riens
Ça se chante et ça se danse
Et ça revient, ça se retient
Comme une chanson populaire »

Full members (voting) of the Unicode Consortium: Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

home.unicode.org/membership/me

Full members (voting) of the Unicode Consortium: Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.
ALT text

Full members (voting) of the Unicode Consortium: Adobe, Airbnb, Amazon, Apple, Google, Meta, Microsoft, Salesforce, Translated.

@SnoopJ@hachyderm.io

the most important part of history is when a mouse fell out of a light fixture and got added to the count of members present at a Technical Committee meeting (9 Nov 2016)

unicode.org/L2/L2016/16325.htm

Screenshot of meeting notes for UTC Meeting 149. Text reads:

Mouse now present. 6.502 members represented.

[149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.
ALT text

Screenshot of meeting notes for UTC Meeting 149. Text reads: Mouse now present. 6.502 members represented. [149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@sibaku@mas.to

Found out something interesting/annoying related to ! There is an issue with the character 浅. You might see it one of two ways (see screenshots) depending on which font you use, which was the cause of my confusion. One form has 2 and the other 3 horizontal strokes. So why is that?

The simplified Chinese Hanzi equivalent of 浅
ALT text

The simplified Chinese Hanzi equivalent of 浅

The Japanese kanji 浅
ALT text

The Japanese kanji 浅

@sibaku@mas.to

Found out something interesting/annoying related to ! There is an issue with the character 浅. You might see it one of two ways (see screenshots) depending on which font you use, which was the cause of my confusion. One form has 2 and the other 3 horizontal strokes. So why is that?

The simplified Chinese Hanzi equivalent of 浅
ALT text

The simplified Chinese Hanzi equivalent of 浅

The Japanese kanji 浅
ALT text

The Japanese kanji 浅

@doctormo@floss.social

It might have taken an ungodly amount of time. But getting these corner cases right in this PDF export is going to mean the world to a lot of people.

Arabic and Hebrew and non messing up the glyphs.

Sample Text on three PDF pages read:

מילים נסתרות

كلمات مخفية
مرحبا بالعالم

"Text on Path" curved on a thick line
"تجربة نص على المنحى" curved on a thin line

"What is Lorem Ipsum?"
... full text explaining lorum ipsum flowing around a large lack circle ...

"Can we do Arabic?"
... A passage in arabic from the Quran flowing around a smaller black circle ...
ALT text

Sample Text on three PDF pages read: מילים נסתרות كلمات مخفية مرحبا بالعالم "Text on Path" curved on a thick line "تجربة نص على المنحى" curved on a thin line "What is Lorem Ipsum?" ... full text explaining lorum ipsum flowing around a large lack circle ... "Can we do Arabic?" ... A passage in arabic from the Quran flowing around a smaller black circle ...

@doctormo@floss.social

It might have taken an ungodly amount of time. But getting these corner cases right in this PDF export is going to mean the world to a lot of people.

Arabic and Hebrew and non messing up the glyphs.

Sample Text on three PDF pages read:

מילים נסתרות

كلمات مخفية
مرحبا بالعالم

"Text on Path" curved on a thick line
"تجربة نص على المنحى" curved on a thin line

"What is Lorem Ipsum?"
... full text explaining lorum ipsum flowing around a large lack circle ...

"Can we do Arabic?"
... A passage in arabic from the Quran flowing around a smaller black circle ...
ALT text

Sample Text on three PDF pages read: מילים נסתרות كلمات مخفية مرحبا بالعالم "Text on Path" curved on a thick line "تجربة نص على المنحى" curved on a thin line "What is Lorem Ipsum?" ... full text explaining lorum ipsum flowing around a large lack circle ... "Can we do Arabic?" ... A passage in arabic from the Quran flowing around a smaller black circle ...

@phrawzty@hachyderm.io

Today I learned that there is a specific "record separator" symbol, formally known as "U+001E Information Separator Two".

codepoints.net/U+001E

It is meant to be used to indicate a separation between two units of information. An example of where this could be used is in a separated-value file, e.g. a CSV, but using this symbol instead of a comma.

This is interesting because there are vanishingly few instances where the record separator symbol would appear in most contexts, but many instances where a comma appears. Using this symbol instead of a comma (or a semi-colon, or an exclamation point, or any one of the usual separators) could make some data hygiene scenarios much more straightforward.

codepoints.net

U+001E INFORMATION SEPARATOR TWO*: ␞ – Unicode

␞, codepoint U+001E INFORMATION SEPARATOR TWO* in Unicode, is located in the block “Basic Latin”. It belongs to the Common script and is a Control.

@phrawzty@hachyderm.io

Today I learned that there is a specific "record separator" symbol, formally known as "U+001E Information Separator Two".

codepoints.net/U+001E

It is meant to be used to indicate a separation between two units of information. An example of where this could be used is in a separated-value file, e.g. a CSV, but using this symbol instead of a comma.

This is interesting because there are vanishingly few instances where the record separator symbol would appear in most contexts, but many instances where a comma appears. Using this symbol instead of a comma (or a semi-colon, or an exclamation point, or any one of the usual separators) could make some data hygiene scenarios much more straightforward.

codepoints.net

U+001E INFORMATION SEPARATOR TWO*: ␞ – Unicode

␞, codepoint U+001E INFORMATION SEPARATOR TWO* in Unicode, is located in the block “Basic Latin”. It belongs to the Common script and is a Control.

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@thias@mastodon.social

Treasure Hunt – Braille Hints

So I prepared a treasure hunt for my older daughter, which involved some form of coded message. I found a braille table I could 3D-print, using a real system instead of some made-up code gave me the opportunity to explain how/why this was used in reality, you find braille codes in lifts, staircase handrails.

wiesmann.codiferes.net/wordpre

wiesmann.codiferes.net

Treasure Hunt – Braille Hints

So I prepared a treasure hunt for my older daughter, which involved some form of coded message. I found a braille table I could 3D-print, using a real system instead of some made-up code gave me th…

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@ptmcg@fosstodon.org

Here are some emojidentifiers for your next Python code:

import math
乁_ツ_ㄏ = None
乁_益_ㄏ = math.nan
ఠ_ఠ = isinstance

def minnums(values: list | 乁_ツ_ㄏ = 乁_ツ_ㄏ):
if (
values is 乁_ツ_ㄏ
or not all(ఠ_ఠ(n, (float, int))
for n in values)
):
return 乁_益_ㄏ
return min(values)

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@simontatham@hachyderm.io

In the old days, you could change a letter between upper and lower case by XORing its character code with 0x20. Of course, if you tried this with anything that wasn't a letter, you'd get nonsense results.

If you try that with code points, it sometimes works, and sometimes doesn't. But Unicode can deliver much more impressive nonsense when it doesn't.

A fun example I just found: the "lower-case" version of CAR is NO PEDESTRIANS.

>>> chr(ord('🚗') ^ 0x20)
'🚷'

@revathskumar@fosstodon.org · Reply to Revath S Kumar :javascript:

Wrote a small web utility to visualize the different string normalization forms of a text.

string-normalize.surge.sh/?str

Not the best design 😄 , but feedbacks are welcome.

desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible
ALT text

desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible

mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible
ALT text

mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible

@revathskumar@fosstodon.org · Reply to Revath S Kumar :javascript:

Wrote a small web utility to visualize the different string normalization forms of a text.

string-normalize.surge.sh/?str

Not the best design 😄 , but feedbacks are welcome.

desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible
ALT text

desktop view of string normalize web page, showing NFC, NFD, NFKC and NFKD normalization forms of text "I ♥ Köln" is visible

mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible
ALT text

mobile view of string normalize web page, showing NFC, NFD and NFKC normalization forms of text "I ♥ Köln" is visible

@SnoopJ@hachyderm.io

have you ever "naturally" (i.e. not discussion among experts) encountered a font that correctly renders ꙮ?

  • yes0 (0%)
  • no0 (0%)
  • what the hell are you talking about0 (0%)
@qiita@rss-mstdn.studiofreesia.com
@qiita@rss-mstdn.studiofreesia.com
@ptmcg@fosstodon.org · Reply to Axel Rauschmayer

@rauschma Ah! I did something similar in Python - this is valid Python code:

def ℎ𝕖𝐥l𝙤():
try:
ℎ𝙚𝕝𝗹𝘰_ = "Hello"
w𝔬𝓇ˡ𝚍﹎ = "World"
𝖕𝘳𝒊𝖓𝑡(f"{𝗵𝒆𝘭𝓵𝚘﹍}, {𝑤º𝘳l𝑑︴}!")
except T𝗒ₚ𝕖E𝗿𝗋𝗈𝓻 as ᵉ𝒙ⅽ:
𝐩ᵣ𝚒𝖓𝓉("failed: {}".𝕗𝕠r𝑚𝖺𝘵(ⅇ𝔵𝚌))

if _︳n𝗮𝖒𝓮﹍︳ == "__main__":
h𝙚ⅼ𝐥𝕠()

ptmcg.pythonanywhere.com/font_

ptmcg.pythonanywhere.com

ᴾ𝘆𝙩𝚑𝓸𝔫 𝐹º𝑛t 𝘔ⅸᵉ𝐫

@vwbusguy@mastodon.online

"This coding interview is just going to be determining the human friendly length of a unicode utf-8 string."

Junior level dev: "Oh, this is going to be easy. How do they not know about len()?"

Senior level dev: "Oh, brilliant - a test of tolerance for pain by evaluating various code point chains with emoji, accents, and LTR/RTL markers. I'll start by writing some tests for 8-bit ord and char conversions with lookahead evals."

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
ALT text

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.

@vwbusguy@mastodon.online

"This coding interview is just going to be determining the human friendly length of a unicode utf-8 string."

Junior level dev: "Oh, this is going to be easy. How do they not know about len()?"

Senior level dev: "Oh, brilliant - a test of tolerance for pain by evaluating various code point chains with emoji, accents, and LTR/RTL markers. I'll start by writing some tests for 8-bit ord and char conversions with lookahead evals."

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.
ALT text

Python docs showing how the same one letter can count for one or two character lengths in unicode depending on the code point definition.

@mikaeru@mastodon.social

In the open-source application `Unicopedia Sinica`, both data files used for the `CJK Components` and the `CJK Related` utilities are now in a consistent JSON format with MIT license: `cjk-ids.json` and `cjk-related.json` respectively.

🔗 codeberg.org/tonton-pixel/unic

CJK Related utility screenshot
ALT text

CJK Related utility screenshot

CJK Components utility screenshot
ALT text

CJK Components utility screenshot

CJK Related utility screenshot
ALT text

CJK Related utility screenshot

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

@eniko@peoplemaking.games

Btw here's a little unicode protip: unicode defines several character ranges as private use areas. You can map code points in these ranges to whatever glyph you want. This can be very handy for custom characters in your game that won't conflict with established unicode characters

In our games we use the PUA for keyboard and controller button glyphs

@emnullfuenf@chaos.social

My study "Unicode Spaces" will be published in Slanted Magazine - Experimental Type 3!

Listing of Unicode white space characters
ALT text

Listing of Unicode white space characters

Steamboat Willy formed with whitespaces in text.
ALT text

Steamboat Willy formed with whitespaces in text.

Flower formed with whitespaces in text.
ALT text

Flower formed with whitespaces in text.

@jdlh@mstdn.ca · Reply to Jim DeLaHunt

A cool change is that the Core Specification of the Unicode Standard is now released as a static HTML subsite, backed up by an archiveable of 1,140 pages.

unicode.org/versions/Unicode16

You can now link to specific sections and paragraphs, e.g.

"Unicode is about plain text, see: unicode.org/versions/Unicode16" .

I helped out in a small way with the project to produce the core spec as HTML + PDF. I think it is a marvellous improvement.

@liilliil@mastodon.online

Народ, айда форсить наш, славянский, кириллический !
«Три снежинки» — ⁂ — потенциальный повод для многочисленных подъёбок

Польские ребята (@brie) нашли лучшего кандидата — ꙮ, «серафим многꙮкий». Символ, найденный в 1928 году только в одной (!) рукописи, и только из-за этого (!) добавленный в несколько веков ждал своего часа
ru.wikipedia.org/wiki/Мультиок

(English version im-in.space/@liilliil/11302839 )

@amyfou@lingo.lol

I am a (non-tenure track, uni) interested in every single thing about , esp ones, & Side gig in ( lol). I love and will ask you too many questions about your etc . Proud fan. Love 👋

@hongminhee@fosstodon.org · Reply to 洪 民憙 (Hong Minhee)

こんにちは、私はソウルに住んでいる30代後半のオープンソースソフトウェアエンジニアで、自由・オープンソースソフトウェアとフェディバースの熱烈な支持者です。名前は洪 民憙(ホン・ミンヒ)です。

私はTypeScript用のActivityPubサーバーフレームワークである「@fedify」と、1人用フェディバースのマイクロブログである 「@hollo」の作成者でもあります。

私は東アジア言語(いわゆるCJK)とUnicodeにも興味が多いです。日本語、英語、韓国語で話しかけてください。(または、漢文でも!)

@hongminhee@fosstodon.org

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , and @hollo, a fediverse microblog for single users.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (/#漢文)!

@chunshek@prettyaweso.me

post for my own Mastodon instance!

• I’m a 44-year-old jack-of-all-trades.
• I grew up in , lived in the . My partner of 15 years and I moved to in 2020.
• We are “parents” to one remaining dog.
• I speak 6 , and have dabbled in many others.
• Things I will nerd out about: , , .
• I am a person of faith, but not a fan of organized religions.
• I type in .
• I curate pop music at @soniccruise.

A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.
ALT text

A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.

A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.
ALT text

A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.

A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.
ALT text

A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.

A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.
ALT text

A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.

A man happily holding a ripe yellow pineapple in his left hand, while pointing at the pineapple with his right hand, smiling at the camera.
ALT text

A man happily holding a ripe yellow pineapple in his left hand, while pointing at the pineapple with his right hand, smiling at the camera.

A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.
ALT text

A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.

A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.
ALT text

A man standing in front of a wall covered in dozens of containers of various types of instant ramen and udon noodles. The man's facial expression shows amusement.

A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.
ALT text

A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.

A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.
ALT text

A man kneels down next to two tilted mailboxes in Taipei, Taiwan, pretending to be carrying one of the mailboxes on his back.

A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.
ALT text

A top-down shot of a man lying down, looking into the eyes of a shiba inu dog. The dog has curled up into a resting position.

Offering a new : ꙮ

The previously suggested symbol ⁂ is good for depict group and unity, but is poor in terms of associations: “3 snowflakes”.

Polish fediusers have noticed a piece of an old Russian manuscript, it says about ‘many-eyed seraphim’ (серафим многоокий). An unknown 15th-century monk played with the combination of the letters oo, turning them into a multi-eyed creature. The character found in only 1 manuscript, but despite this, it has been added into .

Not only does the symbol beautifully reflect the unity of the fediverse, but it also shows an all-seeing open-minded wise and powerful being (Ezekiel 1:18, 10:12 etc)

also: social.hackerspace.pl/@q3k/110

@xChaos@f.cz

Nebaví vás googlit unicode znaky pro subscript a superscript? Mě už taky ne :-)

Akordy pro psaní horního a dolního indexu (ve smyslu Unicode) na klávesnici Windows se dají snadno vygooglit. Pod Linuxem je to ovšem trochu věda:

1) nejdřív Pravý alt + pravý shift + backspace + 2 (ano, čtyřhmat)
2) potom znak, který má být dolní index, třeba číslovka (což ovšem na české klávesnici, na kterou jste přepnutí, taky s shiftem, takže dvouhmat).

H₂O

Pro horní index ve stejném čtyřhmatu akorát nahradíte tu dvojku trojkou:

a² + b² = c²

Slušné akordy, ne? problém je, že pokud čtyřhmat nedomáčknete přesně (?) tak ten Backspace má tendenci fungovat jako backspace, takže umaže jeden znak... no zkrátka, dělám to pokaždé na několikátý pokus, zatím :-)

Vůbec jsem nepochopil návod
abclinuxu.cz/blog/kenyho_stesk
... asi proto, že nevím, která PC klávesa je "compose key", ale v komentářích čtenářů jsem si všiml návodu pro slovenskou klávesnici a funguje mi i pro český layout a tak to předávám dál.

@SnoopJ@hachyderm.io

the most important part of history is when a mouse fell out of a light fixture and got added to the count of members present at a Technical Committee meeting (9 Nov 2016)

unicode.org/L2/L2016/16325.htm

Screenshot of meeting notes for UTC Meeting 149. Text reads:

Mouse now present. 6.502 members represented.

[149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.
ALT text

Screenshot of meeting notes for UTC Meeting 149. Text reads: Mouse now present. 6.502 members represented. [149-A94] Action Item for Landlord: Capture and exile the mouse that just fell out of the light fixture.

@nemobis@mamot.fr
@thias@mastodon.social

Treasure Hunt – Braille Hints

So I prepared a treasure hunt for my older daughter, which involved some form of coded message. I found a braille table I could 3D-print, using a real system instead of some made-up code gave me the opportunity to explain how/why this was used in reality, you find braille codes in lifts, staircase handrails.

wiesmann.codiferes.net/wordpre

wiesmann.codiferes.net

Treasure Hunt – Braille Hints

So I prepared a treasure hunt for my older daughter, which involved some form of coded message. I found a braille table I could 3D-print, using a real system instead of some made-up code gave me th…

@mikaeru@mastodon.social

Beautifully crafted BabelStone Han font, by Andrew West 魏安

Han v. 15.1.3 is a free with over 57,000 Han characters (, , ), and 62,061 Unicode characters in total. It is a Song/Ming style (宋体/明體) font, with glyphs modelled on the official character forms used in the People's Republic of China, and is primarily intended for writing Modern Standard , Classical Chinese, and various Sinitic languages and dialects.

🔗 babelstone.co.uk/Fonts/Han.html

Repeated: 龙
U+9F99 U+31342 U+2EE5D
ALT text

Repeated: 龙 U+9F99 U+31342 U+2EE5D

@Edent@mastodon.social

🆕 blog! “Internationalise The Fediverse”

We live in the future now. It is OK to use Unicode everywhere. It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some …

👀 Read more: shkspr.mobi/blog/2024/02/inter

shkspr.mobi

Internationalise The Fediverse

We live in the future now. It is OK to use Unicode everywhere. It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!! A decade ago, I was miffed that GitHub only…

@blog@shkspr.mobi

Internationalise The Fediverse

shkspr.mobi/blog/2024/02/inter

We live in the future now. It is OK to use Unicode everywhere.

It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

What Does The Fox Spec Say?

The ActivityPub specification says:

Building an international base of users is important in a federated network. Internationalization

I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

The user's @ name is defined by preferredUsername which is:

A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

There's nothing in there about what scripts it can contain. However, later on, the spec says:

Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

The ActivityStreams specification talks about language mapping.

Finally, the ActivityPub specification has some examples on non-Latin text in names.

So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

But What About...?

There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

What about homograph attacks?

Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

What if people make names that can't be typed?

Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

What about weird "Zalgo" text?

It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

What about bi-directional text?

The spec makes clear this is allowed.

Do people even want a username in their own script?

I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

What's Next?

If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

If your software can see @你好@i18n.viii.fi and its posts, please let me know.

@mikaeru@mastodon.social
Unicopedia Plus Social Preview
ALT text

Unicopedia Plus Social Preview

@mikaeru@mastodon.social
@Edent@mastodon.social

In *theory* you should be able to follow this test user:

@你好@i18n.viii.fi

But I can't find any Fediverse software which actually supports non-ASCII usernames.

If you are able to see the user, its description, and its avatar - please send me a screenshot 🙂

@tirifto@jam.xwx.moe

So apparently server administrators on the #Fediverse won’t be able to name custom emoji in their native languages and expect them to work in Mastodon, because according to @Gargron non-ASCII signs are hard to input and diacritics shouldn’t change the meaning of words:

https://github.com/mastodon/mastodon/pull/28572#issuecomment-1878952504

No, in my view emoji identifiers shouldn’t be ‘straightforward to input for everyone’. Custom emoji are local to a server; they should be straightforward to input for the users of that server. People from other servers don’t ever have to type their names (unless their administrators choose to add them to their own server), so their ability to type them is completely irrelevant.

Why should a server made specifically for people speaking Russian or Japanese have to use ASCII for their emoji identifiers? Their users have no trouble typing Cyrillic or Kanji signs; it’s what they already do when they make a post; it’s how they normally talk. Why force them to use a different language/alphabet when typing emoji identifiers?

Moreover, linking the username issue makes no sense whatsoever. Usernames are typed across servers and it makes sense to impose stricter technical limitations so more people can read, write and recognise them. This is not the case for emoji; you rarely ever need to type other servers’ emoji identifiers. Normally you don’t even get to see them; you only get to see the picture they represent! Assuming server admins do their job responsibly, there is zero added confusion for anyone involved.

I understand that Unicode is complex, language support is challenging and compromises might be necessary at times. But can we please accept the existence of different languages and writing systems as a reality that we should try to accommodate for, rather than change or circumvent? Yes, a and á are different signs. Yes, they might radically change the meaning of a word. That’s not a proposition for us to accept or reject; that’s the reality of our multilingual world, and should be the basis of our discussion.

#lang_en #accessibility #a11y #custom_emoji #development #emoji #emojos #free_software #internationalisation #internationalization #i18n #languages #localisation #localization #l10n #Mastodon #multilingual #programming #software #Unicode

jam.xwx.moe

Mansardo Jamada

@gimsieke@mastodon.cloud

Formatting people’s names correctly in a given context, for a given purpose, is hard. International linguists recently helped update the Common Locale Data Repository (). It will help programmers display person names correctly in many settings.
Mike McKenna wrote about it in “A Story Teller’s Case Study: Unlocking the Power of CLDR Person Name Formatting – A Solution for Formatting Names in a Globalized World” unicode.org/media/CLDR_Person_

@eemeli@mefi.social

1/2

Hello! My current Big Project is fixing , making it easier for software and sites to communicate in various human languages. So I'm spending quite a bit of time in and trying to shepherd along spec proposals so that we can fix this for everyone. Nowadays I even get paid for this, on account of being a staff software engineer on the l10n team at .