#CJK

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Thanks to @jlhwung, the so beautifully crafted 'BabelStone Han' font by Andrew West (้ญๅฎ‰), is alive and well!

The latest version 17.0.0, made of 'BabelStoneHanBasic.ttf' and 'BabelStoneHanExtra.ttf', is available from:

๐Ÿ”— github.com/babelstone/babelsto

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social ยท Reply to Michel Mariani's post

The icon of the new application shows the provisional character U+3FBB5 ๐ฟฎต whose equivalent is U+5B57 ๅญ—, meaning "letter, character, word".

Icon of the Unicopedia Sigilla application, with the provisional Seal character U+3FBB5 ๐ฟฎต whose equivalent CJK ideograph is U+5B57 ๅญ—
ALT text detailsIcon of the Unicopedia Sigilla application, with the provisional Seal character U+3FBB5 ๐ฟฎต whose equivalent CJK ideograph is U+5B57 ๅญ—
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

All documents published by the Ideographic Research Group (IRG) are now available on the Unicode web site, and can be easily and efficiently found through the new search bar provided on the IRG homepage.

๐Ÿ”— unicode.org/irg/

This long-awaited search feature is very convenient, and so useful to find what you're interested in, and even more (ah, the wonderful power of serendipity!)...

Screenshot of the IRG home page, looking for "taboo" from the search bar
ALT text detailsScreenshot of the IRG home page, looking for "taboo" from the search bar
Screenshot of list of search results in the IRG documents
ALT text detailsScreenshot of list of search results in the IRG documents
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

All documents published by the Ideographic Research Group (IRG) are now available on the Unicode web site, and can be easily and efficiently found through the new search bar provided on the IRG homepage.

๐Ÿ”— unicode.org/irg/

This long-awaited search feature is very convenient, and so useful to find what you're interested in, and even more (ah, the wonderful power of serendipity!)...

Screenshot of the IRG home page, looking for "taboo" from the search bar
ALT text detailsScreenshot of the IRG home page, looking for "taboo" from the search bar
Screenshot of list of search results in the IRG documents
ALT text detailsScreenshot of list of search results in the IRG documents
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

All documents published by the Ideographic Research Group (IRG) are now available on the Unicode web site, and can be easily and efficiently found through the new search bar provided on the IRG homepage.

๐Ÿ”— unicode.org/irg/

This long-awaited search feature is very convenient, and so useful to find what you're interested in, and even more (ah, the wonderful power of serendipity!)...

Screenshot of the IRG home page, looking for "taboo" from the search bar
ALT text detailsScreenshot of the IRG home page, looking for "taboo" from the search bar
Screenshot of list of search results in the IRG documents
ALT text detailsScreenshot of list of search results in the IRG documents
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

The latest version of the open-source application "Unicopedia Sinica" is now available, adding support for all the new CJK/Unihan characters defined in Unicode 17.0.

๐Ÿ”— codeberg.org/tonton-pixel/unic

Screenshot of the open-source application Unicopedia Sinica v.17.0.0
ALT text detailsScreenshot of the open-source application Unicopedia Sinica v.17.0.0
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

RE: mastodon.social/@mikaeru/11558

Generally, new CJK Ideographs proposed by members of the IRG (Ideographic Research Group) go through several rounds of exchanges/discussions until they get approved or possibly postponed or rejected.

For instance, here is the page dedicated to UK-20538 โฟฐใ…ไนŸ (with images as "pieces of evidence"), which eventually made its way to Unicode 17.0, encoded as U+323BF ๐ฒŽฟ :

๐Ÿ”— hc.jsecs.org/irg/ws2021/app/?i

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Unicode 17.0 introduces five new CJK Unified Ideographs related to Chinese personal pronouns, four of them having been proposed by Andrew West (BabelStone):

ยซย The other Chinese pronoun coming to Unicode v. 17.0 next year, in addition to โฟฐใ…ไนŸ (3p gender-neutral, โฟฐ็”ทไนŸ (3p explicitly male), โฟฑๅฆณๅฟƒ ( f. equivalent of ๆ‚จ), โฟฑๆˆ‘ๅฟƒ (Taiwanese 1p plural), is โฟฑๅฅนๅฟƒ (f. equivalent to ๆ€น)ย ยป

๐Ÿ”— bsky.app/profile/babelstone.co

Screenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns
ALT text detailsScreenshot of CJK Related data from Unicopedia Sinica: Chinese Personal Pronouns
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

The Ideographic Research Group (IRG) is responsible for preparing and reviewing sets of CJK unified ideographs to be included in the Unicode Standard.

It has recently made available a useful list of so-called disunified CJK ideographs, coming with images of glyphs and IRG source references, which also provides links to documents giving the rationale behind each disunification:

๐Ÿ”— unicode.org/irg/disunified.html

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

RE: mastodon.social/@mikaeru/11557

> This increases the number of encoded CJK ideographs to over 100,000!

ๅไธ‡ๅญ—ใ€ใ˜ใ‚…ใ†ใพใ‚“ใ˜ใ€‘๏ผ

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

RE: mastodon.social/@mikaeru/11556

New additions include 4,298 additional CJK unified ideographs in a new block, CJK Unified Ideographs Extension J, as well as 18 other CJK ideographs added to the existing Extension C and Extension E blocks.

This increases the number of encoded CJK ideographs to over 100,000!

Also, nearly 2,500 already-encoded CJK ideographs are horizontally extended by the addition of source references and glyphs reflecting use of those ideographs in China and Korea.

๐Ÿ”— blog.unicode.org/2025/09/unico

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

RE: mastodon.social/@mikaeru/11556

New additions include 4,298 additional CJK unified ideographs in a new block, CJK Unified Ideographs Extension J, as well as 18 other CJK ideographs added to the existing Extension C and Extension E blocks.

This increases the number of encoded CJK ideographs to over 100,000!

Also, nearly 2,500 already-encoded CJK ideographs are horizontally extended by the addition of source references and glyphs reflecting use of those ideographs in China and Korea.

๐Ÿ”— blog.unicode.org/2025/09/unico

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

The latest version of the open-source application "Unicopedia Sinica" is now available, adding support for all the new CJK/Unihan characters defined in Unicode 17.0.

๐Ÿ”— codeberg.org/tonton-pixel/unic

Screenshot of the open-source application Unicopedia Sinica v.17.0.0
ALT text detailsScreenshot of the open-source application Unicopedia Sinica v.17.0.0
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

The latest version of the open-source application "Unicopedia Sinica" is now available, adding support for all the new CJK/Unihan characters defined in Unicode 17.0.

๐Ÿ”— codeberg.org/tonton-pixel/unic

Screenshot of the open-source application Unicopedia Sinica v.17.0.0
ALT text detailsScreenshot of the open-source application Unicopedia Sinica v.17.0.0
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

dcz's avatar
dcz

@dcz@fosstodon.org

input methods.

I'm sorry to say doesn't have enough people with review/merge rights, so my work making and good on Wayland is going nowhere.

What are other libraries I could contribute to instead?

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Beautifully crafted BabelStone Han font, by Andrew West ้ญๅฎ‰

Han v. 15.1.3 is a free with over 57,000 Han characters (, , ), and 62,061 Unicode characters in total. It is a Song/Ming style (ๅฎ‹ไฝ“/ๆ˜Ž้ซ”) font, with glyphs modelled on the official character forms used in the People's Republic of China, and is primarily intended for writing Modern Standard , Classical Chinese, and various Sinitic languages and dialects.

๐Ÿ”— babelstone.co.uk/Fonts/Han.html

Repeated: ้พ™
U+9F99 U+31342 U+2EE5D
ALT text detailsRepeated: ้พ™ U+9F99 U+31342 U+2EE5D
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

New in the CJK Variations utility of Unicopedia Sinica:

- Support for the latest Ideographic Variation Database (IVD 2025), adding the new CAAPH Collection.

- Support for the updated BabelStone Collection (unregistered), based on the latest BabelStone Han font (v17.0.0 BETA), by Andrew C. West (้ญๅฎ‰), 1960-2025 RIP (ๅฎ‰ๆฏๅง).

๐Ÿ”— https://codeberg.org/tonton-pixel/unic

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4
ALT text detailsScreenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4
Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B
ALT text detailsScreenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

New in the CJK Variations utility of Unicopedia Sinica:

- Support for the latest Ideographic Variation Database (IVD 2025), adding the new CAAPH Collection.

- Support for the updated BabelStone Collection (unregistered), based on the latest BabelStone Han font (v17.0.0 BETA), by Andrew C. West (้ญๅฎ‰), 1960-2025 RIP (ๅฎ‰ๆฏๅง).

๐Ÿ”— https://codeberg.org/tonton-pixel/unic

Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4
ALT text detailsScreenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+3AB4
Screenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B
ALT text detailsScreenshot of the CJK Variations utility of Unicopedia Sinica for Unicode character U+4E9B
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

The Ideographic Research Group (IRG) is responsible for preparing and reviewing sets of CJK unified ideographs to be included in the Unicode Standard.

Current and future IRG source prefixes used to be listed in the main IRG homepage, but are now available in a separate dedicated page:

๐Ÿ”— unicode.org/irg/prefixes.html

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Unicopedia Plus is a developer-oriented set of Unicode, Unihan, Unikemet & emoji utilities wrapped into one single app, built with .

Repository: ๐Ÿ”— codeberg.org/tonton-pixel/unic

Unicopedia Plus Social Preview
ALT text detailsUnicopedia Plus Social Preview
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Unicopedia Sinica is a developer-oriented set of utilities related to ideographs, wrapped into one single app, built with .

Repository: ๐Ÿ”— codeberg.org/tonton-pixel/unic

Unicopedia Sinica Social Preview
ALT text detailsUnicopedia Sinica Social Preview
yoxem's avatar
yoxem

@yoxem@sns.kianting.info

https://www.ptt.cc/bbs/LaTeX/M.1741424225.A.B28.html

#XeLaTeX ่ฒŒไผผๆฒ’็ถญ่ญทไบ†
#CJK
yoxem's avatar
yoxem

@yoxem@sns.kianting.info

https://www.ptt.cc/bbs/LaTeX/M.1741424225.A.B28.html

#XeLaTeX ่ฒŒไผผๆฒ’็ถญ่ญทไบ†
#CJK
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

The Ideographic Research Group (IRG) is responsible for preparing and reviewing sets of CJK unified ideographs to be included in the Unicode Standard.

The IRG homepage is now including comprehensive lists of current and future IRG source prefixes...

๐Ÿ”— unicode.org/irg/

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

์•„์‚ฌ's avatar
์•„์‚ฌ

@asa@serafuku.moe ยท Reply to ์•„์‚ฌ's post

์ค‘๊ตญ์–ด ํ•œ์ž ํฐํŠธ์— ์—†๋Š” ํ•œ๊ตญ/์ผ๋ณธ ํ•œ์ž ์ƒ์„ฑ์€ ์ •๋‹ต ์Œ์ด ๋งŽ์•„์„œ ์ข€๋” ํ’€๊ธฐ ์‰ฌ์šด ๋ฌธ์ œ์ผ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค๊ธฐ๋Š” ํ•ฉ๋‹ˆ๋‹ค.

์•„์‚ฌ's avatar
์•„์‚ฌ

@asa@serafuku.moe ยท Reply to ์•„์‚ฌ's post

์ค‘๊ตญ์–ด ํ•œ์ž ํฐํŠธ์— ์—†๋Š” ํ•œ๊ตญ/์ผ๋ณธ ํ•œ์ž ์ƒ์„ฑ์€ ์ •๋‹ต ์Œ์ด ๋งŽ์•„์„œ ์ข€๋” ํ’€๊ธฐ ์‰ฌ์šด ๋ฌธ์ œ์ผ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค๊ธฐ๋Š” ํ•ฉ๋‹ˆ๋‹ค.

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

FoW's avatar
FoW

@FoW@netsphere.one

์•„์‹œ์•„ ์–ธ์–ด๋ฅผ ์œ„ํ•œ ๋ฉ”์‹œ์ง• ์„œ๋น„์Šค ์Œ์ ˆ ๊ฒ€์ƒ‰
CJK๋Š” ์กฐ์‚ฌ;postposition์„ ๋‹จ์–ด ๋’ค์— ๋ฐ”๋กœ ๋ถ™์—ฌ์“ฐ๋ฏ€๋กœ, ์Œ์ ˆ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ ์—ฌ๋ถ€์— ๋”ฐ๋ผ ์ƒ์‚ฐ์„ฑ์ด ๋‹ฌ๋ผ์ง„๋‹ค.
1. ๋‚˜์จ: ์Œ์ ˆ ๊ฒ€์ƒ‰ ๋ถˆ๊ฐ€. ์–ด์ ˆ ๊ฒ€์ƒ‰๋งŒ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘์‚ฐ์ด).
- Discord
- Matrix
- Synology Chat
- Telegram
2. ๋ณดํ†ต: ์„ธ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘์‚ฐ, ๋‘์‚ฐ์ด).
- Google Meet
- WhatsApp
3. ์ข‹์Œ: ๋‘ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘, ๋‘์‚ฐ, ์‚ฐ์ด).
- Microsoft Teams
- Webex
4. ์ง€๋‚˜์ณ์„œ ๋‹นํ™ฉ์Šค๋Ÿฌ์›€: ํ•œ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ, ๋‘, ์‚ฐ, ์ด)
- Slack

FoW's avatar
FoW

@FoW@netsphere.one

์•„์‹œ์•„ ์–ธ์–ด๋ฅผ ์œ„ํ•œ ๋ฉ”์‹œ์ง• ์„œ๋น„์Šค ์Œ์ ˆ ๊ฒ€์ƒ‰
CJK๋Š” ์กฐ์‚ฌ;postposition์„ ๋‹จ์–ด ๋’ค์— ๋ฐ”๋กœ ๋ถ™์—ฌ์“ฐ๋ฏ€๋กœ, ์Œ์ ˆ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ ์—ฌ๋ถ€์— ๋”ฐ๋ผ ์ƒ์‚ฐ์„ฑ์ด ๋‹ฌ๋ผ์ง„๋‹ค.
1. ๋‚˜์จ: ์Œ์ ˆ ๊ฒ€์ƒ‰ ๋ถˆ๊ฐ€. ์–ด์ ˆ ๊ฒ€์ƒ‰๋งŒ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘์‚ฐ์ด).
- Discord
- Matrix
- Synology Chat
- Telegram
2. ๋ณดํ†ต: ์„ธ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘์‚ฐ, ๋‘์‚ฐ์ด).
- Google Meet
- WhatsApp
3. ์ข‹์Œ: ๋‘ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘, ๋‘์‚ฐ, ์‚ฐ์ด).
- Microsoft Teams
- Webex
4. ์ง€๋‚˜์ณ์„œ ๋‹นํ™ฉ์Šค๋Ÿฌ์›€: ํ•œ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ, ๋‘, ์‚ฐ, ์ด)
- Slack

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

In the open-source application `Unicopedia Sinica`, both data files used for the `CJK Components` and the `CJK Related` utilities are now in a consistent JSON format with MIT license: `cjk-ids.json` and `cjk-related.json` respectively.

๐Ÿ”— codeberg.org/tonton-pixel/unic

CJK Related utility screenshot
ALT text detailsCJK Related utility screenshot
CJK Components utility screenshot
ALT text detailsCJK Components utility screenshot
CJK Related utility screenshot
ALT text detailsCJK Related utility screenshot
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s post

ๅฎ‰ๅฏง(์•ˆ๋…•)ํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ(๋Œ€) ๅพŒๅŠ(ํ›„๋ฐ˜) ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑ(์ž์œ )·์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)(fediverse)์˜ ็†ฑ็ƒˆ(์—ด๋ ฌ)ํ•œ ๆ”ฏๆŒ่€…(์ง€์ง€์ž)์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ(์šฉ) ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ(์šฉ) ActivityPub ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์™€ ActivityPub ๋ด‡ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @botkit ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…(์ œ์ž‘์ž)์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ(๋™)์•„์‹œ์•„ ่จ€่ชž(์–ธ์–ด)(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ(๊ด€์‹ฌ)์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ่ฏๅˆๅฎ‡ๅฎ™(์—ฐํ•ฉ์šฐ์ฃผ)์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”(๊ตญํ•œ๋ฌธ ํ˜ผ์šฉ์ฒด)๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž(ํ•œ๊ตญ์–ด)๋‚˜ ่‹ฑ่ชž(์˜์–ด), ๆ—ฅๆœฌ่ชž(์ผ๋ณธ์–ด)๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡(ํ•œ๋ฌธ)์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , @hollo, an ActivityPub-enabled microblogging software for single users, and @botkit, a simple ActivityPub bot framework.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (, )!

FoW's avatar
FoW

@FoW@netsphere.one

์•„์‹œ์•„ ์–ธ์–ด๋ฅผ ์œ„ํ•œ ๋ฉ”์‹œ์ง• ์„œ๋น„์Šค ์Œ์ ˆ ๊ฒ€์ƒ‰
CJK๋Š” ์กฐ์‚ฌ;postposition์„ ๋‹จ์–ด ๋’ค์— ๋ฐ”๋กœ ๋ถ™์—ฌ์“ฐ๋ฏ€๋กœ, ์Œ์ ˆ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ ์—ฌ๋ถ€์— ๋”ฐ๋ผ ์ƒ์‚ฐ์„ฑ์ด ๋‹ฌ๋ผ์ง„๋‹ค.
1. ๋‚˜์จ: ์Œ์ ˆ ๊ฒ€์ƒ‰ ๋ถˆ๊ฐ€. ์–ด์ ˆ ๊ฒ€์ƒ‰๋งŒ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘์‚ฐ์ด).
- Discord
- Matrix
- Synology Chat
- Telegram
2. ๋ณดํ†ต: ์„ธ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘์‚ฐ, ๋‘์‚ฐ์ด).
- Google Meet
- WhatsApp
3. ์ข‹์Œ: ๋‘ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ๋‘, ๋‘์‚ฐ, ์‚ฐ์ด).
- Microsoft Teams
- Webex
4. ์ง€๋‚˜์ณ์„œ ๋‹นํ™ฉ์Šค๋Ÿฌ์›€: ํ•œ ์Œ์ ˆ๋ถ€ํ„ฐ ๊ฒ€์ƒ‰ ํ—ˆ์šฉ (์˜ˆ: ๋ฐฑ, ๋‘, ์‚ฐ, ์ด)
- Slack

Dan Poulin (he/him)'s avatar
Dan Poulin (he/him)

@epocsquadron@fosstodon.org

@hongminhee random question about languages; i know that traditional chinese characters have been simplified multiple times to improve literacy of the masses. i know of simplified chinese, japanese kanji, and most recently learned of korean hanja. are they all mutually intelligible to readers? are the simplifications obvious to those who stick to traditional characters like taiwanese? it's not something i've seen much written about

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s post

ใ“ใ‚“ใซใกใฏใ€็งใฏใ‚ฝใ‚ฆใƒซใซไฝใ‚“ใงใ„ใ‚‹30ไปฃๅพŒๅŠใฎใ‚ชใƒผใƒ—ใƒณใ‚ฝใƒผใ‚นใ‚ฝใƒ•ใƒˆใ‚ฆใ‚งใ‚ขใ‚จใƒณใ‚ธใƒ‹ใ‚ขใงใ€่‡ช็”ฑใƒปใ‚ชใƒผใƒ—ใƒณใ‚ฝใƒผใ‚นใ‚ฝใƒ•ใƒˆใ‚ฆใ‚งใ‚ขใจใƒ•ใ‚งใƒ‡ใ‚ฃใƒใƒผใ‚นใฎ็†ฑ็ƒˆใชๆ”ฏๆŒ่€…ใงใ™ใ€‚ๅๅ‰ใฏๆดช ๆฐ‘ๆ†™๏ผˆใƒ›ใƒณใƒปใƒŸใƒณใƒ’๏ผ‰ใงใ™ใ€‚

็งใฏTypeScript็”จใฎActivityPubใ‚ตใƒผใƒใƒผใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏใงใ‚ใ‚‹ใ€Œ@fedifyใ€ใจใ€1ไบบ็”จใƒ•ใ‚งใƒ‡ใ‚ฃใƒใƒผใ‚นใฎใƒžใ‚คใ‚ฏใƒญใƒ–ใƒญใ‚ฐใงใ‚ใ‚‹ ใ€Œ@holloใ€ใฎไฝœๆˆ่€…ใงใ‚‚ใ‚ใ‚Šใพใ™ใ€‚

็งใฏๆฑใ‚ขใ‚ธใ‚ข่จ€่ชž๏ผˆใ„ใ‚ใ‚†ใ‚‹CJK๏ผ‰ใจUnicodeใซใ‚‚่ˆˆๅ‘ณใŒๅคšใ„ใงใ™ใ€‚ๆ—ฅๆœฌ่ชžใ€่‹ฑ่ชžใ€้Ÿ“ๅ›ฝ่ชžใง่ฉฑใ—ใ‹ใ‘ใฆใใ ใ•ใ„ใ€‚๏ผˆใพใŸใฏใ€ๆผขๆ–‡ใงใ‚‚๏ผ๏ผ‰

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s post

ๅฎ‰ๅฏงํ•˜์„ธ์š”, ์ €๋Š” ์„œ์šธ์— ์‚ด๊ณ  ์žˆ๋Š” 30ไปฃ ๅพŒๅŠ ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด์ด๋ฉฐ, ่‡ช็”ฑยท์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์™€ ่ฏๅˆๅฎ‡ๅฎ™์˜ ็†ฑ็ƒˆํ•œ ๆ”ฏๆŒ่€…์ž…๋‹ˆ๋‹ค.

์ €๋Š” TypeScript็”จ ActivityPub ์„œ๋ฒ„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ @fedify ํ”„๋กœ์ ํŠธ์™€ ์‹ฑ๊ธ€ ์œ ์ €็”จ ่ฏๅˆๅฎ‡ๅฎ™ ๋งˆ์ดํฌ๋กœ๋ธ”๋กœ๊ทธ์ธ @hollo ํ”„๋กœ์ ํŠธ์˜ ่ฃฝไฝœ่€…์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ๆฑ์•„์‹œ์•„ ์–ธ์–ด(์ด๋ฅธ๋ฐ” )์™€ ์œ ๋‹ˆ์ฝ”๋“œ์—๋„ ้—œๅฟƒ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. Mastodon์—์„œ๋Š” ๅœ‹ๆผขๆ–‡ๆทท็”จ้ซ”๋ฅผ ์“ฐ๊ณ  ์žˆ์–ด์š”! ์ œ๊ฒŒ ้Ÿ“ๅœ‹่ชž๋‚˜ ่‹ฑ่ชž, ๆ—ฅๆœฌ่ชž๋กœ ๋ง์„ ๊ฑธ์–ด์ฃผ์„ธ์š”. (์•„๋‹ˆ๋ฉด, ๆผขๆ–‡์œผ๋กœ๋„!)

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org

Hello, I'm an open source software engineer in my late 30s living in , , and an avid advocate of and the .

I'm the creator of @fedify, an server framework in , and @hollo, a fediverse microblog for single users.

I'm also very interested in East Asian languages (so-called ) and . Feel free to talk to me in , (), or (), or even in Literary Chinese (/#ๆผขๆ–‡)!

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org

Wow, English-only people (or Western languages, for that matter) are so naรฏve. In case you didn't know, the lang attribute is very important in East Asian languages.

lobste.rs/s/9ck6y9/what_progra

jsfiddle.net/8sa8ndLj/2/

Thread discussion about the utility and implementation of the lang attribute in HTML, including various opinions on its importance and examples of its uses.
ALT text detailsThread discussion about the utility and implementation of the lang attribute in HTML, including various opinions on its importance and examples of its uses.
Table showing the characters ๆˆฟ, ๆธฏ, ๆผข, ็›ด, ่ง’, ้ชจ in Korean, Traditional Chinese, Simplified Chinese, and Japanese scripts.
ALT text detailsTable showing the characters ๆˆฟ, ๆธฏ, ๆผข, ็›ด, ่ง’, ้ชจ in Korean, Traditional Chinese, Simplified Chinese, and Japanese scripts.
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org

If you're a software engineer and interested in East Asian languages (so-called ), check out the โ€œCJK computer science terms comparisonโ€ I edited!

cjk-compsci-terms.netlify.app/

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's post

๋“œ๋””์–ด @thisismissem ๋‹˜ ๅพทๅˆ†์—, ๆœ€ๆ–ฐ ้–‹็™ผ ๋ฒ„์ „์—๋Š” ๋ฃจ๋น„ ๆ–‡ๅญ— ๋ฅผ ๋ Œ๋”๋งํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!

github.com/mastodon/mastodon/p

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's post

ใคใ„ใซ @thisismissem ใ•ใ‚“ใฎใŠใ‹ใ’ใงใ€Mastodonใฎๆœ€ๆ–ฐ้–‹็™บ็‰ˆใงใƒซใƒ“ใƒผๆ–‡ๅญ—ใ‚’ใƒฌใƒณใƒ€ใƒชใƒณใ‚ฐใงใใ‚‹ใ‚ˆใ†ใซใชใ‚Šใพใ—ใŸ๏ผ

github.com/mastodon/mastodon/p

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ

@hongminhee@todon.eu ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's post

็พๅœจMastodonใงใฏใ€Misskey็ญ‰ใ€ไป–ใฎActivityPubใ‚ฝใƒ•ใƒˆใ‚ฆใ‚งใ‚ขใ‹ใ‚‰ๅ—ใ‘ๅ–ใฃใŸใ‚ณใƒณใƒ†ใƒณใƒ„ใฎHTMLใฎไธญใงใ€<strong>ใ‚„<em>ใฎๆจฃใช็„กๅฎณใชๅนพใคใ‹ใฎใ‚ฟใ‚ฐใซ้™ใฃใฆใƒฌใƒณใƒ€ใƒชใƒณใ‚ฐใ—ใฆใ„ใพใ™ใ€‚็งใฏใ“ใ‚ŒใซๅŠ ใˆใฆใ€ๆ‰€่ฌ‚ใ€Œใ€ใจๅ‘ผใฐใ‚Œใ‚‹ๆฑใ‚ขใ‚ธใ‚ขใฎใƒ†ใ‚ญใ‚นใƒˆใงใ‚ˆใไฝฟใ‚ใ‚Œใ‚‹ใƒซใƒ“ๆ–‡ๅญ—ใซ้—œใ™ใ‚‹ใ‚ฟใ‚ฐใ‚‚่จฑๅฏใƒชใ‚นใƒˆใซๅ…ฅใ‚‹ในใใ ใจๆ€ใ„ใพใ™ใ€‚ใƒซใƒ“ๆ–‡ๅญ—ใฏๅ–ฎใซๆ–‡็ซ ใฎ่กจ็พใ‚’่ฟฝๅŠ ใ™ใ‚‹ใฎใงใฏใชใใ€ๅฏฆ่ณช็š„ใซๆ–‡ๅญ—ใฎ่ฎ€ใฟๆ–นใ‚’็คบใ™ๆ–นๆณ•ใงใ‚ขใ‚ฏใ‚ปใ‚ทใƒ“ใƒชใƒ†ใ‚ฃใซใ‚‚ๅฝน็ซ‹ใกใพใ™ใ€‚MastodonใฎGitHubใฎใ‚คใ‚ทใƒฅใƒผใซใ‚‚ๆ›ธใ„ใฆใ„ใพใ™ใฎใงใ€ๆ˜ฏ้žใ”่ฆงใใ ใ•ใ„ใ€‚

github.com/mastodon/mastodon/i

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ

@hongminhee@todon.eu ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's post

็พๅœจ Mastodon์—์„œ๋Š” Misskey ็ญ‰ ๋‹ค๋ฅธ ActivityPub ์†Œํ”„ํŠธ์›จ์–ด๋กœ๋ถ€ํ„ฐ ๋ฐ›์€ ์ฝ˜ํ…์ธ ์˜ HTML ไธญ์— <strong>์ด๋‚˜ <em>๊ณผ ๊ฐ™์€ ็„กๅฎณํ•œ ๋ช‡ ๊ฐ€์ง€ ํƒœ๊ทธ๋“ค์— ้™ํ•ด์„œ ๋ Œ๋”๋ง์„ ํ•ด์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ €๋Š” ์ด์— ๋”ํ•ด ์ด๋ฅธ๋ฐ” ใ€Œใ€๋ผ ๋ถˆ๋ฆฌ๋Š” ๆฑ์•„์‹œ์•„ ํ…์ŠคํŠธ์—์„œ ์ž์ฃผ ์“ฐ์ด๋Š” ๋ฃจ๋น„ ๋ฌธ์ž ้—œ่ฏ ํƒœ๊ทธ๋„ ่จฑๅฎน ๋ฆฌ์ŠคํŠธ์— ๋“ค์–ด๊ฐ€์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋ฃจ๋น„ ๆ–‡ๅญ—๋Š” ๅ–ฎ็ด”ํžˆ ๊ธ€์˜ ่กจ็พ์„ ๋”ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๅฏฆ่ณช์ ์œผ๋กœ ๆ–‡ๅญ—๋ฅผ ์ฝ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๆ–นๆณ•์œผ๋กœ ๆŽฅ่ฟ‘ๆ€ง์—๋„ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. GitHub์˜ ์ด์Šˆ์—๋„ ๊ธ€์„ ๋‚จ๊ฒผ์œผ๋‹ˆ ์‚ดํŽด๋ด ์ฃผ์„ธ์š”.

github.com/mastodon/mastodon/i

ๆดช ๆฐ‘ๆ†™ (Hong Minhee)'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee)

@hongminhee@fosstodon.org ยท Reply to ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's post

Thanks to @thisismissem, the latest development version of now has the ability to render ruby characters! ๐Ÿ‘๐Ÿ‘๐Ÿ‘

github.com/mastodon/mastodon/p

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ's avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) ๐Ÿค๐Ÿผ

@hongminhee@todon.eu

Currently, only renders a few harmless tags like <strong> and <em> in the HTML of content received from other ActivityPub softwares like Misskey. In addition to these, I believe that tags related to ruby characters, which are often used in East Asian texts such as so-called , should also be allowed, as they don't just add to the presentation of the text, but actually represent how the characters are read, which also improves accessibility. I also wrote about this in an issue on Mastodon GitHub:

github.com/mastodon/mastodon/i

Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Beautifully crafted BabelStone Han font, by Andrew West ้ญๅฎ‰

Han v. 15.1.3 is a free with over 57,000 Han characters (, , ), and 62,061 Unicode characters in total. It is a Song/Ming style (ๅฎ‹ไฝ“/ๆ˜Ž้ซ”) font, with glyphs modelled on the official character forms used in the People's Republic of China, and is primarily intended for writing Modern Standard , Classical Chinese, and various Sinitic languages and dialects.

๐Ÿ”— babelstone.co.uk/Fonts/Han.html

Repeated: ้พ™
U+9F99 U+31342 U+2EE5D
ALT text detailsRepeated: ้พ™ U+9F99 U+31342 U+2EE5D
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Unicopedia Plus is a developer-oriented set of Unicode, Unihan, Unikemet & emoji utilities wrapped into one single app, built with .

Repository: ๐Ÿ”— codeberg.org/tonton-pixel/unic

Unicopedia Plus Social Preview
ALT text detailsUnicopedia Plus Social Preview
Michel Mariani's avatar
Michel Mariani

@mikaeru@mastodon.social

Unicopedia Sinica is a developer-oriented set of utilities related to ideographs, wrapped into one single app, built with .

Repository: ๐Ÿ”— codeberg.org/tonton-pixel/unic

Unicopedia Sinica Social Preview
ALT text detailsUnicopedia Sinica Social Preview