Reading Time: 5 minutes
There’s a newish CSS feature called hyphens that specifies how you want words to be hyphenated when the text wraps.
But if you use it (and really you should), you’re going to need to make sure you’re also correctly identifying the language. And to understand why that matters, we need to talk about words and syllables.
Doesn’t matter if it’s English, Dutch, Spanish, Cherokee, or Cree — the hyphenation rules will be different. Linguists Don’t Know What a Word Is
Ok, it’s not that they don’t know what a “word” is. It’s that linguists don’t have a universal definition for a word. Linguists kinda sorta agree that a word is the smallest unit of language that carries meaning and can stand alone.
The problem is that “smallest unit and can stand alone” is pretty different from one language to the next. This is why some languages might (or might not) have more “words” than another — they can literally say more with a single “word”.1 Many attempts to universally define a word end up falling short and ultimately leave linguists wondering, “are words even real?” 2
My own personal definition of word is the thing you say between the spaces of those other things you say.
This definition is full-proof and I will not be elaborating.
But a word is some unit of meaning with at least one syllable!
Someone who confidently knows what a word isLinguists Haven’t Quite Nailed down a Syllable, Either
Just in case you thought you could say something like, “well words have syllables so it’s some unit of meaning with at least one syllable,” we’ve got yet another problem.
Turns out different languages be different, and that includes how those native speakers perceive sound boundaries in those words — if words are even real. So of course there’s no precise definition of a syllable. 3
Linguists have all generally agreed that a syllable is a “phonological unit of sonority” 4. And if you’re wondering, “WTF does that even mean,” it amounts to something like, it’s when you start and then stop making sound with your sound-hole.
Once again, I will not be elaborating.
The parts of a syllable
While we lack a universal definition of syllable, we have pretty solid agreement on what goes in it. In general, linguists break down a syllable into three-ish parts (strong emphasis on -ish).
- onset: The start. Optional. Usually a consonant or group of consonants
- rime: The thing that contrasts itself with the start
- nucleus: The middle bit. Mostly required. Usually a vowel. Sometimes a vowelly consonant.
- coda: The end. Optional. Usually a consonant.
From a psychological perspective, native speakers tend to only perceive an onset and rime. From a phonological (sounding with your sound hole) perspective, that’s where the nucleus and coda get involved.
And, of course, all this is different on a per-language basis.
What Does All This Have to Do with CSS and Hyphens?
Do me a solid and look at the browser compatibility chart for CSS hyphens.
Notice how it’s per-language and per-browser? And how friggin’ long that is?
That’s because we insert hyphenation at either syllabic or morphemic (meaning-based) boundaries, and every language defines their syllables (and of course morphemes) differently. Not only that, some languages actually use the hyphen for spelling a word!5
A lot of work has gone into this one lil’ CSS property. Please clap for the implementers.
Hey FrankGPT, Summarize Everything So Far
It boils down to three big points:
- Every language defines words differently
- Every language defines syllables differently
- And therefore every language hyphenates differently
And that means…
You Absolutely Need to Use lang Attributes if You’re Using Hyphens
Not only do you need to make sure your container has the correct language attribute, you need to also make sure it has the correct language.
Here’s a demo so you can see what I mean. From left-to-right you have:
- A version written in English, with the lang="en"
- A version written in English, but with lang="<not-en>"
- A version written in the target language, with lang="<not-en>"
Play with the adjusters to force word breaks.
Take note how the hyphenation can be different when English has the wrong language attribute.
Adjust width 8rem
Adjust font 1.4em
English, Welsh, Irish, Scots Gaelic
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Beth ydych chi’n ceisio ei gyfleu i mi am Swydd Gaerwrangon
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Cad atá tú ag iarraidh a chur in iúl dom faoi Worcestershire
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Dè tha thu a’ feuchainn ri innse dhomh mu Worcestershire?
English, French, Spanish, Catalan
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Qu’essayez-vous de me communiquer à propos du Worcestershire ?
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- ¿Qué intentas comunicarme sobre Worcestershire?
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Què estàs intentant comunicar-me sobre Worcestershire?
English, German, Dutch, Danish
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Was möchten Sie mir über Worcestershire mitteilen?
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Wat probeer je mij duidelijk te maken over Worcestershire?
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Hvad prøver du at kommunikere til mig om Worcestershire?
English, Hebrew, Arabic, Farsi, Amharic
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- מה אתה מנסה להעביר לי על ווסטרשייר
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- ما الذي تحاول أن توصله لي بشأن ووسترشاير؟
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- ስለ ዎርሴስተርሻየር ምን ልታነጋግረኝ እየሞከርክ ነው።
English, Turkish, Azerbaijani
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Worcestershire hakkında bana ne iletmeye çalışıyorsun?
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- Worcestershire haqqında mənə nə danışmağa çalışırsan?
English, Punjabi, Hindi
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- ਤੁਸੀਂ ਮੈਨੂੰ ਵੌਰਸਟਰਸ਼ਾਇਰ ਬਾਰੇ ਕੀ ਕਹਿਣ ਦੀ ਕੋਸ਼ਿਸ਼ ਕਰ ਰਹੇ ਹੋ?
- What are you trying to communicate to me about Worcestershire
- What are you trying to communicate to me about Worcestershire
- आप मुझे वॉर्सेस्टरशायर के बारे में क्या कहना चाह रहे हैं?
English, Cherokee, Cree, Inuktut
- He choked trying to swallow something whole
- He choked trying to swallow something whole
- ᎪᎱᏍᏗ ᎬᏩᏃᏍᏓ ᎤᎩᎯᏍᏗ ᎠᏁᎵᏗᏍᎬ ᎤᏴᏥᏅᏨᎢ.
- He choked trying to swallow something whole
- He choked trying to swallow something whole
- ᒪᐦᑎ ᐁᓴ ᐁᒥᔥᑎᑯᔒᐅᐊᔨᒨ ᐃᔥᐧᑳᐸᔫᐲᓯᒻ
- He choked trying to swallow something whole
- He choked trying to swallow something whole
- ᑐᖁᖓᓕᓚᐅᖅᑐᖅ ᐄᔭᕋᓱᐊᖅᑐᓂ ᑭᓱᑐᐃᓐᓇᕐᒥᒃ ᐃᓗᐃᒃᑲᒥᒃ
In Summary
Always indicate what language your content is in. Period. Doesn’t matter if it’s just one language. Do it.
But also be aware of how CSS properties can behave differently based on the language you’ve told the browser the content is in.
CSS’ hyphens is very cool but also very language-specific. Use it with full appreciation of the diversity of human languages.
.png)

