, , , ,

rosa eris ursa aries

A friend of mine recently made a joke about Scrabble: if you have a vowel-heavy rack, switch to Tongan, and if you have a consonant-heavy rack, switch to Welsh. It’s funny because it fits our stereotypes of those two languages; likewise, I’d expect Italian to be on the vowel-heavy side, while Polish would be on the consonant-heavy side, and English would be in the middle. But do the numbers bear out these various stereotypes?

Yes, with one striking exception. I tallied the proportions for several languages using letter frequency tables on the Internet. Although I couldn’t find numbers for my friend’s suggestion of Tongan, two other Polynesian languages led the vowel-heavy list, followed by two Romance languages, while the Slavic and Germanic languages (except Dutch) were on the consonant-heavy side, all as expected. But surprisingly, English (a Germanic language, despite its prolific vocabulary borrowing from non-Germanic languages) was at the far consonant-heavy end, below Czech, German, and Polish, outflanked only by Swedish.

Vowel percentages, based on letter frequencies, in declining order:
58.3% Hawaiian
53.7% Fijian
48.8% Italian
47.7% Spanish
46.1% Basque
45.5% Latin
45.0% Dutch
44.3% French
43.3% Turkish
42.5% Indonesian
42.1% Hungarian
41.3% Welsh
41.3% Polish
40.7% German
40.2% Czech
39.8% English
36.1% Swedish

What about Y? I considered Y to be a consonant for all languages except English; for English, I counted the occurrences of Y in public-domain text to estimate that it’s a vowel roughly 84% of the time, so I adjusted Y’s contribution to the vowel count accordingly. Of course, this elevates English’s vowel percentage compared to all the other languages, so its position near the bottom despite that is even more striking.

The words “the” and “and” make up a disproportionate amount of English text, and each is two-thirds consonants, while the equivalents in other languages (“le,” “la,” “les,” and “et” in French; “el,” “la,” and “y” in Spanish) approximate a 50-50 mix, so that could be a factor. If anyone has a better explanation, I’m all ears. I hope we never lose vowels altogether because there’d be a lot of ambiguity, regardless of language, as shown for Latin in the above graphic; likewise, imagine a steadfast traveler reaching the Valley of Rome only to realize that’s not his goal at all.