@hongminhee@hollo.social

The hashtag problem in CJK languages

I keep thinking about how fediverse hashtag advice assumes English.

In English, you can drop #coffee into a sentence and it still reads fine. The spacing already does most of the work.

In Korean, Japanese, or Chinese, that feels much less natural. Chinese and Japanese have no spaces between words. Korean does, but particles and endings stick to the word, so putting a hashtag mid-sentence often just looks awkward or breaks the flow.

So people tend to dump hashtags at the end, or skip them.

That changes the usual “follow hashtags to find your community” advice. If people tag less, there's just less there to find. And the fediverse's discovery is already shaky enough without that.

Not sure whether it's a UI problem or just hashtags fitting space-delimited languages better. But it seems like one of those small frictions that makes the fediverse harder to get into for CJK users.

17 replies

@HaTetsu@mastodon.com.pl · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee It's really a problem in any language with word inflection… And even when writing in English, I think more people on the Fediverse have been switching to mostly putting hastags at the end in order to not break the flow of posts so much (screen readers have been used as an argument for that, too, not without reason)

@mariusor@metalhead.club · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee are you thinking about rendering them, or about how to have users input them?

I think once you have a plain text and a list of tags, it's not difficult to re-render the text with matching tags in a differentiated way as long as they're not likely to pop up as elements of other words and have false positives.

@Krazov@mstdn.social · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee On the other hand, due to accessibility and how screen readers operate, it is advised to put hashtags at the end of the post. There's even UI for that in Mastodon web client: those tags are excluded from the message and displayed underneath. So, there is a precedence in Latin-letter-based texts, too.

@alexia@starlightnet.work · Reply to 洪 民憙 (Hong Minhee) :nonbinary:
@hongminhee yeah there's something to that; although I'm personally of the belief hashtags should go at the end in pretty much any language for readability sake

bonus points if the software has a separate tags field in the editor, it can subconsciously push people to add tags when they might not have

but yeah, you're right, maybe I should do some research and see what we can do for wafrn here...
@ianthetechie@fosstodon.org · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee I remember early Twitter had hashtags at the end normally. But this ran up against character limits.

It’s less of an issue on here but still a very real one as most instances have limits at 500 characters.

I think ultimately boosts are very effective but it’s very ahrd when you first join (it took me over a year, even as a native English speaker).

I almost feel like hashtags shouldn’t count against character limits.

@licho@kolektiva.social · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee it's the same with polish, we also do declination and it's super weird to be putting hashtags mid sentence. People often use wrong form to fit it in. We end up putting our tags at the end.

In I don't know how it would work with CJK languages but it would be cool to have an option to invisibly separate the hashtagged word core part from the particle

@nicemicro@fosstodon.org · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee I mean it isn't only Korean that has particles modifying words, like look at any Romance or Slavic language.

On the lack of spaces... whell what can you do when some cultures are too stubborn to use a thousand year old invention that provably improves reading speed and comprehension? Hashtags are the least of their issues.

@sosquqer@en.osm.town · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee to be honest, I've never understood hashtags neither in the text of the message itself nor at the end. What is the use case? Never used them to search anything in the social networks. And if you use them you'll end up with something too broad or too specific.
What do people use them for? Except for making fun of terminally online in spoken comedy stuffing their phases with "hashtag this just happened hashtag lmao".

@jnkrtech@treehouse.systems · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee what would CJK hashtags look like if you were to design them from scratch?

I also wonder whether some subset of the population would be willing to opt-in to training an ML model (explicitly way less capable than an LLM) on their toots which was designed to suggest hashtags to new users. This could probably be done mostly by just counting the usage of hashtags to collect the common ones and then trying to decide which ones might be relevant at any given time.

@aslakr@mastodon.social · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee I sometimes add a zero width space (I think). Like "#jernbanedirektoratet​s " (genetiv s). Could also be used like ​#pinne?

Putting hashtags at the end creates a in Mastodon, and I seems to recall that non-space hashtags works for non-latin languages? On the other hand Mastodon doesn't consider line break to be space in the hashtagbar which breaks some asian posts

@mr_daemon@untrusted.website · Reply to 洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee For what it's worth, Mastodon at least as a platform encourages you to put them at the end, and even renders them in a neat visually distinct block if you do that, separate from the message.

For what it's worth, dumping hashtags at the bottom is actually my preference, both when reading and when writing, and I find my communities that way just fine -- and I'd even go so far as to say that I personally even dislike inlined hashtags, as it makes the text difficult to read, and probably wreaks havoc on people with limited or no vision using screenreaders.

This is of course, a preference, and not enforced by anyone (who matters), but I think it's wrong to think of it as the Lesser Fallback Option(tm).