#CommonMark

Karl Voit :emacs: :orgmode:'s avatar
Karl Voit :emacs: :orgmode:

@publicvoit@graz.social

RE: graz.social/@publicvoit/115875

I really do like how is explaining the consequences of using their version of on silverbullet.md/Markdown ().

With statements like that, people learn about the consequences of using that tool.

They can either accept this or think about the negative effects before investing too much energy and data.

I really urge any (-)tool to include such a warning statement on their project page. It's for the benefit of your users.

One of the reasons why I most probably would recommend switching to SilverBullet if you - for some reason - can't use with which is IMO the optimum tool for many set of requirements: karl-voit.at/2021/01/18/tool-c

I'll migrate my wife's from (recent changes are a no-go to me) to SilverBullet or preferably Emacs. My upcoming Org-mode workshop (no recording) will tell her.

Karl Voit :emacs: :orgmode:'s avatar
Karl Voit :emacs: :orgmode:

@publicvoit@graz.social

My article about " Is a Disaster: Why and What to Do Instead" from karl-voit.at/2025/08/17/Markdo was listed on the entry page of yesterday.

It hurts me to read through the comments. One part of the people who commented obviously didn't read the article they're commenting on.

And another part of the commenters does mix up , the Elisp implementation within , with orgdown, the lightweight syntax which is actually the topic of this article. This part of the discussion is totally missing the whole point of my article: practical issues related to Markdown; choosing any other which doesn't come with those downsides. was just one example of many which I wanted to mention because it is one of the least known alternatives outside the Emacs bubble.

๐Ÿคท

ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:'s avatar
ๆดช ๆฐ‘ๆ†™ (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

Why 's emphasis syntax (**) fails outside of Western languages: A deep dive into 's โ€œdelimiter runโ€ flaws and their impact on users.

A must-read for anyone interested in and the future of Markdown:

https://hackers.pub/@yurume/019b912a-cc3b-7e45-9227-d08f0d1eafe8

์œ ๋ฃจ๋ฉ” Yurume's avatar
์œ ๋ฃจ๋ฉ” Yurume

@yurume@hackers.pub ยท Reply to ์œ ๋ฃจ๋ฉ” Yurume's post

As Markdown has become the standard for LLM outputs, we are now forced to witness a common and unsightly mess where Markdown emphasis markers (**) remain unrendered and exposed, as seen in the image. This is a chronic issue with the CommonMark specification---one that I once reported about ten years ago---but it has been left neglected without any solution to this day.

The technical details of the problem are as follows: In an effort to limit parsing complexity during the standardization process, CommonMark introduced the concept of "delimiter runs." These runs are assigned properties of being "left-flanking" or "right-flanking" (or both, or neither) depending on their position. According to these rules, a bolded segment must start with a left-flanking delimiter run and end with a right-flanking one. The crucial point is that whether a run is left- or right-flanking is determined solely by the immediate surrounding characters, without any consideration of the broader context. For instance, a left-flanking delimiter must be in the form of **<ordinary character>, <whitespace>**<punctuation>, or <punctuation>**<punctuation>. (Here, "ordinary character" refers to any character that is not whitespace or punctuation.) The first case is presumably intended to allow markers embedded within a word, like **๋งˆํฌ๋‹ค์šด**์€, while the latter cases are meant to provide limited support for markers placed before punctuation, such as in ์ด **"๋งˆํฌ๋‹ค์šด"** ํ˜•์‹์€. The rules for right-flanking are identical, just in the opposite direction.

However, when you try to parse a string like **๋งˆํฌ๋‹ค์šด(Markdown)**์€ using these rules, it fails because the closing ** is preceded by punctuation (a parenthesis) and it must be followed by whitespace or another punctuation mark to be considered right-flanking. Since it is followed by an ordinary letter (์€), it is not recognized as right-flanking and thus fails to close the emphasis.

As explained in the CommonMark spec, the original intent of this rule was to support nested emphasis, like **this **way** of nesting**. Since users typically don't insert spaces inside emphasis markers (e.g., **word **), the spec attempts to resolve ambiguity by declaring that markers adjacent to whitespace can only function in a specific direction. However, in CJK (Chinese, Japanese, Korean) environments, either spaces are completly absent or (as in Korean) punctuations are commonly used within a word. Consequently, there are clear limits to inferring whether a delimiter is left or right-flanking based on these rules. Even if we were to allow <ordinary character>**<punctuation> to be interpreted as left-flanking to accommodate cases like **๋งˆํฌ๋‹ค์šด(Markdown)**์€, how would we handle something like ใ“ใฎใ‚ˆใ†ใช**[็Šถๆณ](...)ใฏ**?

In my view, the utility of nested emphasis is marginal at best, while the frustration it causes in CJK environments is significant. Furthermore, because LLMs generate Markdown based on how people would actually use it---rather than strictly following the design intent of CommonMark---this latent inconvenience that users have long felt is now being brought directly to the surface.

* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
ALT text details* 21. Ba5# - ๋ฐฑ์ด ๋ฃฉ๊ณผ ํ€ธ์„ ํฌ์ƒํ•œ ํ›„, ํ€ธ ๋Œ€์‹  **๋น„์ˆ(Ba5)**์ด ๊ฒฐ์ •์ ์ธ ์ฒดํฌ๋ฉ”์ดํŠธ๋ฅผ ์„ฑ๊ณต์‹œํ‚ต๋‹ˆ๋‹ค. ํ‘ ํ‚น์ด ํƒˆ์ถœํ•  ๊ณณ์ด ์—†์œผ๋ฉฐ, ๋ฐฑ์˜ ๊ธฐ๋ฌผ๋กœ ๋ง‰์„ ์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค. [The emphasized portion `๋น„์ˆ(Ba5)` is surrounded by unrendered Markdown emphasis marks `**`.]
pandoc's avatar
pandoc

@pandoc@fosstodon.org ยท Reply to pandoc's post

and its variants, like , allow to span setext-style header across multiple lines:

Line one
and two
--------

Note that this allows to insert line breaks in headings

The Hobbit\
or\
There and Back Again
==================

pandoc's avatar
pandoc

@pandoc@fosstodon.org

Line breaks within a paragraph are treated as spaces in Markdown. However, this gives bad results in East Asian languages, where spaces between words are unusual. Use

pandoc -f markdown+east_asian_line_breaks

to ensure that line breaks between East Asian wide characters get ignored.

The extension also works with (commonmark), GitHub Flavored Markdown (gfm), and pandoc's CommonMark extension (commonmark_x).

pandoc's avatar
pandoc

@pandoc@fosstodon.org

allows to add to all elements when the `attributes` extension is enabled:

{.fruit}
- apple
- banana

doesn't support attributes on some elements (yet); this includes lists. The attributes are attached to a wrapping div in those cases, so the above is equivalent to this:

::: {.fruit}
- apple
- banana
:::

The extension is enabled by default in "commonmark_x".

Richie Khoo's avatar
Richie Khoo

@richiekhoo@hachyderm.io

Package Manager for Markdown

I'm working on a project that is intended to encourage folk to make markdown text files which can be bundled together in different bundles of text files using a package manager.

Question for coders; Which package manager would you suggest I use?

Main criterias (in order) are:

1. Easy for someone with basic command line skills to edit the file and update version numbers and add additional packages.

2. All being equal, more commonly and easy to setup is preferred.



pandoc's avatar
pandoc

@pandoc@fosstodon.org

syntax tip for code block attributes, yielding the best rendering results with both pandoc and on platforms such as , , etc:

``` lua {-id .another-class}
io.stdout:write('hi!')
```

The curly-braces syntax for attributes is ignored when read as , the Markdown variant used by most platforms. The above ensures that syntax highlighting still works with CommonMark, and that the other attributes get respected when converting with .

pandoc's avatar
pandoc

@pandoc@fosstodon.org

allows to add to all elements when the `attributes` extension is enabled:

{.fruit}
- apple
- banana

doesn't support attributes on some elements (yet); this includes lists. The attributes are attached to a wrapping div in those cases, so the above is equivalent to this:

::: {.fruit}
- apple
- banana
:::

The extension is enabled by default in "commonmark_x".