My Language Has More Words Than Yours!

In the comments section, James Schipper comments on the notion that English has more words than, say, Swedish:

Measuring the number of words in a language isn’t very scientific. What is a word? Is it anything that is separated by empty space? If so, then the more words are written as one, the more words there are in a language. Bookkeeper and steamship would be separate words but book publisher and passenger ship would not be.
I’m currently reading a book by your Swedish colleague Mikael Parkvall about language myths. One myth that he discusses is Engelska har fler ord än svenska = English has more words than Swedish.
He says that no evidence is ever provided for the claim, except to say that English has borrowed a lot. He mocks an English chauvinist who states that English has over 1 million words and French about 100,000 and who then says that English borrowed a lot from other languages, especially from French. In other words, English is rich and French poor because English borrowed a lot from French. As Parkvall sarcastically notes, the English must have borrowed a lot of words from the French without ever paying them back.
He says that if all the works of Shakespeare are run through a computer program designed to count words, the result is 29,066. However, if all the works of August Strindberg are run through the same program, the result is 119,288 words! I can easily see why the Swedish count is so high. In Swedish, all nominal compounds are written as one word and the definite article is a suffix. On top of that, the genitive is used more than in English.
We have for instance bil = car, bilen = the car, bilar = cars, bilarna, the cars, bils = of a car, bilens = of the car, bilars = of cars, bilarnas = of the cars, bilolycka = car accident, bilägare = car owner, bilmekaniker = car mechanic, bilparkering = car parking, bilbälte = seat belt, etc. How can a computer or anybody else decide how many of these are separate words or not?
When the French language had a lot of prestige, people were saying that it was exceptionally clear. Now that English is very prestigious, we keep hearing that it is exceptionally rich.
In any case, languages that have borrowed a lot are not uncommon. Moreover, the more a language borrows, the greater the probability that the borrowed words simply displace native words, in which case no enrichment takes place.

I read somewhere that someone said that Dutch has 4 million words!
On my other site, we do a lot of translations of posts to other languages. So far, we have done Spanish, Portuguese, Italian, Norwegian, Swedish, Finnish, Serbo-Croatian, German, French, Bulgarian, Romanian, Polish and Korean.
So far, I have had few complaints from translators along the lines of “we don’t have a word or  phrase in our language for that English word or phrase.” Cases of having to use an English word or phrase because no translation was available are few. However, Korean did some to stick out. I am told by Korean speakers that Korean has few to no synonyms. I knew a young Korean-American woman who was stunned by the number of synonyms in English. The Koreans think the plethora of US synonyms is somewhere  between ridiculous and idiotic. Why do you need more than one word with the same meaning?
Norwegian, a very small language in terms of speakers, struck me as being particularly word-rich for some reason.
An interesting question is how many words a typical primitive language had or has. A study was recently done on one of the Araucanian languages of South America, Yaghan. A recent dictionary of Yaghan listed around 30,000 words! The author made the supposition that your typical primitive language pre-contact had around 30,000 words. No one knows for sure.
I worked for 1½ years on a California Indian language called Chukchansi. It’s true that they lacked words for a lot of modern concepts, many more obscure body parts, and many fine gradations of meaning. The speakers were all elderly and spoke English well. The last near-monolingual speaker died around 1965. She spoke English, but it was broken English. These speakers are helpful for a language. I heard from people who knew this woman that she had coined many Chukchansi borrowings and calques for many words having to do with modern living.
When the last of the monolingual or near monolingual speakers die, your small language may get in bad shape. Calques and proper borrowings wedded to the phonology of the receptive language will simply disappear.
We have many speakers of a SE Asian language called Hmong around here. It has millions of native speakers, but I understand that it lacks many words for modern concepts, even though there large number of monolinguals to near monolinguals around here – older people, especially women. I don’t understand why they don’t borrow English words or engage is calques or word-formations.
The Hmong have an interesting cultural concept – if you are over 40, they say that you are too old to learn a foreign language. Hence, a lot of the older Hmong, especially the women, simply do not even try to learn English here in the US.

  1. I am told by Korean speakers that Korean has few to no synonyms. I knew a young Korean-American woman who was stunned by the number of synonyms in English. The Koreans think the plethora of US synonyms is somewhere between ridiculous and idiotic. Why do you need more than one word with the same meaning?
    Without a definitive corpus study indicating word paucity in spoken or literary Korean in comparison with spoken or literary English, I am highly skeptical of the subjective feeling of Korean ESL learners regarding relative vocabulary richness between English and Korean.
    Like James, I am also deeply troubled by the “English has the most words” claim mostly because it is a fundamentally unscientific and unrigorous claim. To use the words of Wolfgang Pauli, “Not only is it not right, it is not even wrong!” The claim is so vague as to be useless blather.
    I would be more interested in the following analyses:
    Comparing an equivalent corpus between two languages (for example, 1000 hours of television transcription within the past year from similar types of programming, or 1000 hours of transcribed colloquial dialogue, or 1 million words from the most recently published literature in the language, or 1 year’s worth of magazine articles from similar types of magazines) what is the unique lexical unit count from each corpus?
    If the question were formulated in precise terms such as these, I would be highly surprised if there were significant differences in lexicon count between languages with more than 5 million speakers and an established literary culture.

    1. Actually, are you familiar with any corpus studies like this, Robert? I never formally studied linguistics, so I’ve never really been “down and dirty” with the literature.

  2. Pitjantjatjara is an Australian Aboriginal ‘language’ (actually a dialect of Western Desert Language). It has a surprising number of synonyms due to a cultural practice of certain words becoming taboo when someone with a similar sounding name dies. eg. Water can be kapi or minna, and there are other words for water when it is in a lake or river.
    And there are lots of different verb and noun endings, including an alternative form of each pronoun that can be optionally tacked on the end of other words (not just verbs).
    But that can’t hide the fact that necessary words are just missing. There is no word for “or”. There are no numbers higher than 3 (I have a theory this is because the language has single, dual, and plural as a grammatical feature). There is no word for blue (only the sky and one pale flower are blue in the australian desert). Most advanced concepts, precision, etc. are missing. There are lots of monolingual native speakers, but they are too stupid to fix the obvious problems with their language by coining new words.

  3. In my opinion, the richness of a language does not have to do with how many words it has but how many *original* words it has. If a language has many words but more than half of its words came from another language then it can not be really considered the richest, since if it was, it would not need to borrow foreign words. Take for example, I am very poor and don’t have a lot of clothes to wear, but my friend is richer and has a small wardrobe. I borrowed some of her clothes and thus now I have more clothes than her. Does this mean I am richer than her? I know that this is a stupid example but you know what I mean. I consider languages like Ancient Greek and Chinese to be extremely rich because of the amount of original words they have. This may not be really that surprising, given that they were once the centre of major civilisations. In the end however, it is politically incorrect to consider one language to be richer than another. All languages have their own ways of expressing concepts that can not be translated to other languages. Thus all languages are equally rich in their own way.

