洪 民憙 (Hong Minhee) :nonbinary:'s avatar
洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

The hashtag problem in CJK languages

I keep thinking about how fediverse hashtag advice assumes English.

In English, you can drop #coffee into a sentence and it still reads fine. The spacing already does most of the work.

In Korean, Japanese, or Chinese, that feels much less natural. Chinese and Japanese have no spaces between words. Korean does, but particles and endings stick to the word, so putting a hashtag mid-sentence often just looks awkward or breaks the flow.

So people tend to dump hashtags at the end, or skip them.

That changes the usual “follow hashtags to find your community” advice. If people tag less, there's just less there to find. And the fediverse's discovery is already shaky enough without that.

Not sure whether it's a UI problem or just hashtags fitting space-delimited languages better. But it seems like one of those small frictions that makes the fediverse harder to get into for CJK users.

波鉄 (Hatetsu)'s avatar
波鉄 (Hatetsu)

@HaTetsu@mastodon.com.pl · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee It's really a problem in any language with word inflection… And even when writing in English, I think more people on the Fediverse have been switching to mostly putting hastags at the end in order to not break the flow of posts so much (screen readers have been used as an argument for that, too, not without reason)

marius's avatar
marius

@mariusor@metalhead.club · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee are you thinking about rendering them, or about how to have users input them?

I think once you have a plain text and a list of tags, it's not difficult to re-render the text with matching tags in a differentiated way as long as they're not likely to pop up as elements of other words and have false positives.

stuart yeates's avatar
stuart yeates

@stuartyeates@cloudisland.nz · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee there exist reliable algorithms for automatically detecting word boundaries in CJK languages. Not perfect for proper nouns, but may be helpful.

Shadow, First of His Name's avatar
Shadow, First of His Name

@darkpixel@infosec.exchange · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee I used to add hashtags in message context some moons ago, it's just readability format that's an issue for blind users and those who uses screen reader so I stopped doing it and dump hashtags at the end instead. Not sure about CJK users, so my comment may be irrelevant.

Krazov's avatar
Krazov

@Krazov@mstdn.social · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee On the other hand, due to accessibility and how screen readers operate, it is advised to put hashtags at the end of the post. There's even UI for that in Mastodon web client: those tags are excluded from the message and displayed underneath. So, there is a precedence in Latin-letter-based texts, too.

Claire, The Ultimate Worrier's avatar
Claire, The Ultimate Worrier

@waitworry@sakurajima.moe · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee For what it is worth I rarely see people writing in English put tags in the middle of posts anyway.

Adding a tag to a post is often a very deliberate choice so it goes at the end.

500 Internal Server Error's avatar
500 Internal Server Error

@bootlegrydia@treehouse.systems · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee weibo solved this issue by adding a hash at the end of the tag, ie. # instead of

however western platforms definitely won't do this

Alexia :makko_bnuuy:'s avatar
Alexia :makko_bnuuy:

@alexia@starlightnet.work · Reply to 洪 民憙 (Hong Minhee) :nonbinary:'s post

@hongminhee yeah there's something to that; although I'm personally of the belief hashtags should go at the end in pretty much any language for readability sake

bonus points if the software has a separate tags field in the editor, it can subconsciously push people to add tags when they might not have

but yeah, you're right, maybe I should do some research and see what we can do for wafrn here...