Repost from the old site.
Via Marilyn Vos Savant in Parade Magazine, we are told that Tiki-Tiki, otherwise known as Sranan Togo, a creole with 100,000 native speakers and many more second languages speakers on Suriname, has the smallest vocabulary of any known language – with only 250 words. This claim is credulously repeated elsewhere on the Net.
It is true that Internet dictionaries of Tiki-Tiki do show few words, possibly as few as several hundred. The SIL (Summer Institute of Linguistics) page says that Sranan Togo has maybe 3,000-4,000 words, as opposed to hundreds of thousands of words for major world languages (Vos Savant notes that English has the largest vocabulary at 250,000).
Many of those English words are neologisms, that is, new words that are being created on the fly, especially on places like the Internet. I actually think that English has more than 250,000 words, but I can’t prove it. As slang and whatnot proliferates in a widely spoken language, it gets pretty darn hard to count up all the words, much less write them all down.
There are other ways to create words, so it is not really so true to say that certain languages have low vocabularies. For instance, many languages spoken by small tribes have an almost endless productive variety of features for word production. In some (or perhaps many) such languages, roots can be manipulated almost endlessly to create new words to describe just about anything.
Nouns can turn into adjectives, adverbs and verbs and verbs can turn into nouns, adjectives and adverbs. Adding morphological particles onto existing roots creates a process whereby one root could possibly create up to 1000 or so new words if one is creative enough.
This potential is lost in much of the nonsense about “primitive” versus “advanced” languages, a distinction that hardly exists anyway. The truth is that the most insanely maddening languages on Earth, languages so crazy that brilliant linguists are still trying to figure them out, are spoken in general by the world’s most primitive and backwards peoples.
As a language gets bigger and used more by a civilization, it gets stupidified more and more as it loses its complexity. The reason is that people need to be on time and earn a paycheck. They need to say things quickly, make the sale or hang up the cellphone, and get to work on time.
In a more primitive situation, people are hunter-gatherers or they are laid-back agriculturalists who just take it easy and tend their fields all day. Despite blatherings of IQ theorists, even primitive humans are highly intelligent beings. We can prove this by looking at the insanely brilliant languages they have constructed all by their own selves.
We think that people get bored in these primitive settings, as their high intellect is not stimulated enough. One of the things these tribes do to stimulate their high intelligence is to play games with languages. This is why you such wildly complicated languages in such places. Much of this complexity is superfluous (noun markers, case endings, etc.) and can easily be jettisoned if one wishes to become a multitasking metrosexual.
Anyway, I did some quick research on Sranan Togo and found this paper. Creoles are intensively studied by linguists for a variety of reasons. As part of this paper, the authors used a German-language dictionary of Early Sranan Togo, Neger-Englisches Wörterbuch , completed in 1783 by Christian Ludwig Schumann. This dictionary contains 2,391 types and 17,731 tokens.
Types and tokens are often used in creole literature because it gets hard to figure out what exactly is a word in a creole language. Types and tokens is a semantic distinction derived from philosophy. Briefly, a type is a generic and a token is a specific instance of that generic. For instance, tree would be a type and maple tree would be a token. Waterfall would be a type and Vernal Falls would be a token. Man would be a type and Jesus would be a token.
So in 1783, an early version of this creole already had 20,122 words. It must only have increased its vocabulary since then. I’m calling bullshit on this 250 words line.
A creole is different from a pidgin. A pidgin is often created by immigrants to a new country where none of them understand each other.
Early immigrants to Hawaii created some pidgins. Filipinos, Chinese, Japanese, Hawaiians, Koreans, etc. were all thrown together on sugar and pineapple plantations and no one could understand each other. English was the main language. The immigrants took English, I believe, and then layered onto it parts of their native languages and finally created a pidgin that they could all understand.
A pidgin is a mess, since it is a language made by adults, and due to brain constraints, adults cannot create a functional language out of thin air on the fly. The pidgin is then spoken to the adults’ kids, who pick it up as a first language. But kids are little language-creating genius machines, and they somehow take this messed-up pidgin and transform into a full-fledged language, a creole, by expanding it in a variety of important ways.
The creole is then transmitted to kids again, and soon the pidgin dies and everyone is speaking creole. It took some time for us to figure out what was really going on here, but we are pretty confident that kids are indeed expanding the pidgin and turning it into a creole. A guy named Derek Bickerton at the University of Hawaii has done some great work in this area.
I actually bought and tried to read Bickerton’s Language and Species, but I only got 40% of the way through it. Some of this stuff gets pretty intense. I don’t want to say ponderous, but pretty soon you have the book down on the desk and both of your hands are wrapped over your head Praying Mantis-like, bent down over the book, as you try to suck the concepts into your humiliated mind.
In Suriname, actually formerly Dutch Guyana, Sranan Togo is the mother tongue of some 100,000 descendants of former slaves brought to the country. It has also become a lingua franca for other ethnicities in the place, including speakers of Hindustani, Amerindian, Javanese, Dutch, and Chinese tongues.
Like all of the Guyanas, there is quite a fine mess of ethnicities in Suriname, and I think they have been breeding together for a while such that race is becoming a bit of an afterthought.
As another aside, although Vos Savant, in addition to being a hottie, is quite brilliant and is even smarter than I am, it is not true that she has the highest IQ on Earth, or that her IQ is 220 or whatever. She got that score at age 10 or so. There are others who have gotten sky high scores at that age.
At a young age, IQ is computed by looking at how the young person’s mind compares to older peoples minds. In adults, we do not compute it that way, and adult scores are never as high as the same kids’ score. In Vos Savant and other extremely high-IQ kids, their IQ’s have seen considerable regression in adulthood, but they are still sky-high.
Glad to see she’s getting a paycheck just by being smart. Wish I could.
- Braun, Maria and Plag, Ingo. (2002). How Transparent is Creole Morphology? A Study of Early Sranan Word-Formation. University of Siegen, Germany. Yearbook of Morphology 2002. Dordrecht: Kluwer.
Schumann, Christian Ludwig. (1783). Neger-Englisches Wörterbuch. Editio Tertia. Paramaribo.
6 thoughts on “Tiki-Tiki Has 250 Words?”
Via Marilyn Vos Savant in Parade Magazine, we are told that Tiki-Tiki … has the smallest vocabulary of any known language – with only 250 words.
I have only so far read this sentence of your post, but my linguistics bullshit-meter is already way off the charts.
Like, into “AAVE is just bad English” or “Eskimos have a [insert large number] words for snow” territory.
Measuring the number of words in a language isn’t very scientific. What is a word? Is it anything that is separated by empty space? If so, then the more words are written as one, the more words there are in a language. Bookkeeper and steamship would be separate words but book publisher and passenger ship would not be.
I’m currently reading a book by your Swedish colleague Mikael Parkvall about language myths. One myth that he discusses is Engelska har fler ord än svenska = English has more words than Swedish.
He says that no evidence is ever provided for the claim, except to say that English has borrowed a lot. He mocks an English chauvinist who states that English has over 1 million words and French about 100,000 and who then says that English borrowed a lot from other languages, especially from French. In other words, English is rich and French poor because English borrowed a lot from French. As Parkvall sarcastically notes, the English must have borrowed a lot of words from the French without ever paying them back.
He says that if all the works of Shakespeare are run through a computer program designed to count words, the result is 29,066. However, if all the works of August Strindberg are run through the same program, the result is 119,288 words! I can easily see why the Swedish count is so high. In Swedish, all nominal compounds are written as one word and the definite article is a suffix. On top of that, the genitive is used more than in English.
We have for instance bil = car, bilen = the car, bilar = cars, bilarna, the cars, bils = of a car, bilens = of the car, bilars = of cars, bilarnas = of the cars, bilolycka = car accident, bilägare = car owner, bilmekaniker = car mechanic, bilparkering = car parking, bilbälte = seat belt, etc. How can a computer or anybody else decide how many of these are separate words or not?
When the French language had a lot of prestige, people were saying that it was exceptionally clear. Now that English is very prestigious, we keep hearing that it is exceptionally rich.
In any case, languages that have borrowed a lot are not uncommon. Moreover, the more a language borrows, the greater the probability that the borrowed words simply displace native words, in which case no enrichment takes place.
If words were decided in such a way, agglutinative languages would have millions of combinations and analytic languages would have very few. If we count all compound words in English, even ones which are not connected, then we can up the count. It’s hard to determine what one means by word.
It does seem to the case that as societies intermingle and progress, language becomes more analytic (in the linguistic sense) mainly because of contact with other groups and standardization, which exposes the “uselessness” of many endings—at least in the name of commerce and inter-cultural communication. English, French and Chinese are extremely analytic. The emphasis on word order and the tendency for words to become unchangeable explicitly (through endings) though capable of being used as any particle of speech can be confounding to foreigners at first, until they adjust to the direct and contextual mode of expression.
It’s hard to determine what one means by word.
It’s actually fairly straightforward to determine what constitutes a word: it’s simply a member of the set of smallest morpheme combinations from all utterances that may not be further broken down without either semantic loss or syntactic error. “Green car” is two words because it can be further broken into “green” and “car” while retaining contextual semantic meaning, whereas “ice cream” is a single compound word because breaking it into “ice” and “cream” would result in semantic loss. Phrases of multiple words like “green car” can be split apart grammatically, as in “green and shiny car,” whereas true compound words cannot be split in this way, as in “ice delicious cream.”
English is simply highly irregular about how it writes compound words, sometimes having space and sometimes having no space between the word’s etymological parts, unlike a language like say German, which is fairly strict about writing compound words as a single written unit.
As for comparing “word counts” between synthetic and analytic languages (for example, between English and Finnish or between Chinese and Navajo), it makes more sense to compare the lexical item count (i.e., unique semantic units) rather than word count per se due to the level of grammatical word combination possibilities present in synthetic languages.
I maintain it is not so simple. How do you count multiple meanings? and what degree of meaning should determine that it is another word? How about homophones? and what about words which are inflected? For example, in Spanish you conjugate for every version of “to come” vengo, vienes, viene, and so forth. Are those all separate words? Very well. All the past tenses also? And then “vino” also meaning wine is another word? What about archaic words which are only used in literature—do they still get counted? Is a language’s count of words the sum total of all of its dialects’ or just the standard form? Do thinks like “get on” “get off” “get going”—all of which have semantic differences from the two components—all get counted as words? They are compound words in a sense, but can be split in two. They also have different meanings based on context—vastly different ones. Do past tenses count? if there are multiple version of the past tense, even if there is no semantic difference (thrived, throve, struck, stricken)? Ok, so perhaps the past tenses are easy to count, but what the real problem is when it comes to multiple meanings, spellings (as well as same ones) and what degree of these differences there must be to have a new word.
Fa waka yu? Grantangi….adyosi