Repost: The Classification of the Vietnamese Language

This ran first a long time ago, but I just sold an ad on this post, so I decided to repost it. Rereading it, it’s a great Historical Linguistics post. One of the reasons that I am doing this post is that one of my commenters asked me a while back to do a post on the theories of long-range comparison like Joseph Greenberg’s and how well they hold up. That will have to wait for another day, but for now, I can  at least show you how some principles of Historical Linguistics, a subfield that I know a thing or two about. I will keep this post pretty non-technical, so most of you ought to be able to figure out what is going on. Let us begin by looking at some proposals about the classification of Vietnamese. The Vietnamese language has been subject to a great deal of speculation regarding its classification. At the moment, it is in the Mon-Khmer or Austroasiatic family with Khmer, Mon, Muong, Wa, Palaung, Nicobarese, Khmu, Munda, Santali, Pnar, Khasi, Temiar, and some others. The family ranges through Vietnam, Cambodia, Laos, Thailand, Malaysia, Burma, China, and over into Northeastern India. It is traditionally divided into Mon-Khmer and Munda branches. Here is Ethnologue’s split, and here are some other ways of dividing up the family. The homeland of the Austroasiatics was probably in China, in Yunnan, Southwest China. They moved down from China probably around 5,000 years ago. Some of the most ancient Austroasiatics are probably the Senoi people, who came down from China into Malaysia about 4,000 years ago. Others put the time frame at about 4-8,000 YBP (years before present). A major fraud has been perpetrated lately based on Senoi Dream Therapy. I discussed it on the old blog, and you can Google it if you are interested. In Anthropology classes we learned all about these fascinating Senoi people, who based their lives around their dreams. Turns out most of the fieldwork was poor to fraudulent like Margaret Mead’s unfortunate sojourn in the South Pacific. The Senoi resemble Veddas of India, so it is probably true that they are ancient people.  Also, their skulls have Australoid features. In hair, they mostly have wavy hair (like Veddoids), a few have straight hair (like Mongoloids) and a scattering have woolly hair (like Negritos). Bottom line is that ancient Austroasiatics were probably Australoid types who resembled what the Senoi look like today. There has long been a line arguing that the Vietnamese language is related to Sino-Tibetan (the family that Chinese is a part of). Even those who deny this acknowledge that there is a tremendous amount of borrowing from Chinese (especially Cantonese) to Vietnamese. This level of borrowing so long ago makes historical linguistics a difficult field. Here is an excellent piece by a man who has done a tremendous amount of work detailing his case for Vietnamese as a Sino-Tibetan language. It’s not for the amateur, but if you want to dip into it, go ahead. I spent some time there, and after a while, I was convinced that Vietnamese was indeed a Sino-Tibetan language. One of the things that convinced me is that if borrowing was involved, seldom have I seen such a case for such a huge amount of borrowing, in particular of basic vocabulary. I figured the  case was sealed. Not so fast now. Looking again, and reading some of Joseph Greenberg’s work on the subject, I am now convinced otherwise. There is a serious problem with the cognates between Vietnamese and Chinese, of which there are a tremendous number. This problem is somewhat complex, but I will try to simplify it. Briefly, if Vietnamese is indeed related to Sino-Tibetan, its cognates should be not only with Chinese, but with other members of Sino-Tibetan also. In other words, we should find cognates with Tibetan, Naga, Naxi, Tujia, Karen, Lolo, Kuki, Nung, Jingpho, Chin, Lepcha, etc. We should also find cognates with those languages, where we do not find them in Chinese. That’s a little complicated, so I will let you think about it a bit. Further, the comparisons between Chinese and Vietnamese should be variable. Some should look quite close, while others should look much more distant. So there’s a problem with the Vietnamese as ST theory. The cognates look like Chinese. Problem is, they look too much like Chinese. They look more like Chinese than they should in a genetic relationship. Further, they look like Chinese and only Chinese. Looking for relationships in S-T outside of Chinese, and we find few if any. That’s a dead ringer for borrowing from Chinese to Vietnamese. If it’s not clear to you how that is, think about it a bit. Looking at Mon-Khmer, the case is not so open and shut. There seem to be more cognates with Chinese than with Mon-Khmer. So many more that the case for Vietnamese as AA looks almost silly, and you wonder how anyone came up with it. But let us look again. The cognates with AA and Vietnamese are not just with its immediate neighbors like Cambodian and Khmu but with languages far off in far Eastern India like Munda and Santali. There are words that are found only in the Munda branch in one or two obscure languages that somehow show up again as cognates in Vietnamese. Now tell me how Vietnamese borrowed ancient basic vocabulary from some obscure Munda tongue way over in Northeast India? It did not. How did those words end up in some unheard of NE Indian tongue and also in Vietnamese? Simple. They both descended long ago from a common ancestor. This is Historical Linguistics. The concepts I have dealt with here are not easy for the non-specialist to figure out, but most smart people can probably get a grasp on them. A different subject is the deep relationships of AA. Is AA related to any other languages? I leave that as an open question now,  though there does appear to be a good case for AA being related to Austronesian. One good piece of evidence is the obscure AA languages found in the Nicobar Islands off the coast of Thailand. Somehow, we see quite a few cognates in Nicobarese with Austronesian. We do not see them in any other branches of AA, only in Nicobarese. This seems odd,  and it’s hard to make a case for borrowing. On the other hand, why cognates in Nicobarese and only in Nicobarese? Truth is there are some cognates outside of Nicobarese but not a whole lot. In historical linguistics, one thing we look at is morphology. Those are parts of words, like the -s plural ending in English. In both AA and Austronesian, we have funny particles called infixes. Those are what in English we might call prefixes or suffixes, except they are stuck in the middle of the word instead of at the end or the beginning. So, in English, we have pre- as a prefix meaning “before” and -er meaning “object that does X verb”. So pre-destination means that our lives are figured out before we are even born.  Comput-er and print-er are two objects, one that computes and the other that prints. If we had infixes instead, pre-destination would look something like destin-pre-ation and comput-er and print-er would look something like com-er-pute and prin-er-t. Anyway, there are some fairly obscure infixes that show up not only in some isolated languages in AA but also in far-flung Austronesian languages in, say, the Philippines. Ever heard of the borrowing of an infix? Neither have I? So were those infixes borrowed,  and what are they doing in languages as far away as Thailand and the Philippines, and none in between? Because they  got borrowed? When? How? Forget it. Bottom line is that said borrowing did not happen. So what are those infix cognates doing there? Probably ancient particles left over from a common language that derived both Austronesian and AA, probably spoken somewhere in SW China maybe 9,000 years ago or more. Why is this sort of long-range comparison so hard? For one thing, because after 9,000 years or more, there are hardly any cognates left anymore, due to the fact of language change. Languages change and tend to change at a certain rate. After 1000X years, so much change has taken place that even if two languages were once “sprung from a common source,” in the famous words of Sir William Jones in his epochal lecture to the Asiatic Society in Calcutta on February 2, 1786, there is almost nothing, or actually nothing, left to show of that relationship. Any common words have become so mangled by time that they don’t look much or anything alike anymore. So are AA and Austronesian related? I think so, but I suppose it’s best to say that it has not been proven yet. This thesis is part of a larger long-range concept known as “Austric.” Paul Benedict, a great scholar, was one of the champions of this. Austric is normally made up of AA, Austronesian, Tai-Kadai (the Thai language and its relatives) and Hmong-Mien (the Hmong and Mien languages). Based on genetics, the depth of Austric may be What Makes Vietnamese So Chinese? An Introduction to Sinitic-Vietnamese Studies.

My Language Has More Words Than Yours!

In the comments section, James Schipper comments on the notion that English has more words than, say, Swedish:

Measuring the number of words in a language isn’t very scientific. What is a word? Is it anything that is separated by empty space? If so, then the more words are written as one, the more words there are in a language. Bookkeeper and steamship would be separate words but book publisher and passenger ship would not be. I’m currently reading a book by your Swedish colleague Mikael Parkvall about language myths. One myth that he discusses is Engelska har fler ord än svenska = English has more words than Swedish. He says that no evidence is ever provided for the claim, except to say that English has borrowed a lot. He mocks an English chauvinist who states that English has over 1 million words and French about 100,000 and who then says that English borrowed a lot from other languages, especially from French. In other words, English is rich and French poor because English borrowed a lot from French. As Parkvall sarcastically notes, the English must have borrowed a lot of words from the French without ever paying them back. He says that if all the works of Shakespeare are run through a computer program designed to count words, the result is 29,066. However, if all the works of August Strindberg are run through the same program, the result is 119,288 words! I can easily see why the Swedish count is so high. In Swedish, all nominal compounds are written as one word and the definite article is a suffix. On top of that, the genitive is used more than in English. We have for instance bil = car, bilen = the car, bilar = cars, bilarna, the cars, bils = of a car, bilens = of the car, bilars = of cars, bilarnas = of the cars, bilolycka = car accident, bilägare = car owner, bilmekaniker = car mechanic, bilparkering = car parking, bilbälte = seat belt, etc. How can a computer or anybody else decide how many of these are separate words or not? When the French language had a lot of prestige, people were saying that it was exceptionally clear. Now that English is very prestigious, we keep hearing that it is exceptionally rich. In any case, languages that have borrowed a lot are not uncommon. Moreover, the more a language borrows, the greater the probability that the borrowed words simply displace native words, in which case no enrichment takes place.

I read somewhere that someone said that Dutch has 4 million words! On my other site, we do a lot of translations of posts to other languages. So far, we have done Spanish, Portuguese, Italian, Norwegian, Swedish, Finnish, Serbo-Croatian, German, French, Bulgarian, Romanian, Polish and Korean. So far, I have had few complaints from translators along the lines of “we don’t have a word or  phrase in our language for that English word or phrase.” Cases of having to use an English word or phrase because no translation was available are few. However, Korean did some to stick out. I am told by Korean speakers that Korean has few to no synonyms. I knew a young Korean-American woman who was stunned by the number of synonyms in English. The Koreans think the plethora of US synonyms is somewhere  between ridiculous and idiotic. Why do you need more than one word with the same meaning? Norwegian, a very small language in terms of speakers, struck me as being particularly word-rich for some reason. An interesting question is how many words a typical primitive language had or has. A study was recently done on one of the Araucanian languages of South America, Yaghan. A recent dictionary of Yaghan listed around 30,000 words! The author made the supposition that your typical primitive language pre-contact had around 30,000 words. No one knows for sure. I worked for 1½ years on a California Indian language called Chukchansi. It’s true that they lacked words for a lot of modern concepts, many more obscure body parts, and many fine gradations of meaning. The speakers were all elderly and spoke English well. The last near-monolingual speaker died around 1965. She spoke English, but it was broken English. These speakers are helpful for a language. I heard from people who knew this woman that she had coined many Chukchansi borrowings and calques for many words having to do with modern living. When the last of the monolingual or near monolingual speakers die, your small language may get in bad shape. Calques and proper borrowings wedded to the phonology of the receptive language will simply disappear. We have many speakers of a SE Asian language called Hmong around here. It has millions of native speakers, but I understand that it lacks many words for modern concepts, even though there large number of monolinguals to near monolinguals around here – older people, especially women. I don’t understand why they don’t borrow English words or engage is calques or word-formations. The Hmong have an interesting cultural concept – if you are over 40, they say that you are too old to learn a foreign language. Hence, a lot of the older Hmong, especially the women, simply do not even try to learn English here in the US.

error

Enjoy this blog? Please spread the word :)