The Classification of the Vietnamese Language

One of the reasons that I am doing this post is that one of my commenters asked me a while back to do a post on the theories of long-range comparison like Joseph Greenberg’s and how well they hold up. That will have to wait for another day, but for now, I can  at least show you how some principles of Historical Linguistics, a subfield that I know a thing or two about. I will keep this post pretty non-technical, so most of you ought to be able to figure out what is going on.
Let us begin by looking at some proposals about the classification of Vietnamese.
The Vietnamese language has been subject to a great deal of speculation regarding its classification. At the moment, it is in the Mon-Khmer or Austroasiatic family with Khmer, Mon, Muong, Wa, Palaung, Nicobarese, Khmu, Munda, Santali, Pnar, Khasi, Temiar, and some others. The family ranges through Vietnam, Cambodia, Laos, Thailand, Malaysia, Burma, China, and over into Northeastern India.
It is traditionally divided into Mon-Khmer and Munda branches. Here is Ethnologue’s split, and here are some other ways of dividing up the family.
The homeland of the Austroasiatics was probably in China, in Yunnan, Southwest China. They moved down from China probably around 5,000 years ago. Some of the most ancient Austroasiatics are probably the Senoi people, who came down from China into Malaysia about 4,000 years ago. Others put the time frame at about 4-8,000 YBP (years before present).
A major fraud has been perpetrated lately based on Senoi Dream Therapy. I discussed it on the old blog, and you can Google it if you are interested. In Anthropology classes we learned all about these fascinating Senoi people, who based their lives around their dreams. Turns out most of the fieldwork was poor to fraudulent like Margaret Mead’s unfortunate sojourn in the South Pacific.
The Senoi resemble Veddas of India, so it is probably true that they are ancient people.  Also, their skulls have Australoid features. In hair, they mostly have wavy hair (like Veddoids), a few have straight hair (like Mongoloids) and a scattering have woolly hair (like Negritos). Bottom line is that ancient Austroasiatics were probably Australoid types who resembled what the Senoi look like today.
There has long been a line arguing that the Vietnamese language is related to Sino-Tibetan (the family that Chinese is a part of). Even those who deny this acknowledge that there is a tremendous amount of borrowing from Chinese (especially Cantonese) to Vietnamese. This level of borrowing so long ago makes historical linguistics a difficult field.
Here is an excellent piece by a man who has done a tremendous amount of work detailing his case for Vietnamese as a Sino-Tibetan language. It’s not for the amateur, but if you want to dip into it, go ahead. I spent some time there, and after a while, I was convinced that Vietnamese was indeed a Sino-Tibetan language. One of the things that convinced me is that if borrowing was involved, seldom have I seen such a case for such a huge amount of borrowing, in particular of basic vocabulary. I figured the  case was sealed.
Not so fast now.
Looking again, and reading some of Joseph Greenberg’s work on the subject, I am now convinced otherwise. There is a serious problem with the cognates between Vietnamese and Chinese, of which there are a tremendous number.
This problem is somewhat complex, but I will try to simplify it. Briefly, if Vietnamese is indeed related to Sino-Tibetan, its cognates should be not only with Chinese, but with other members of Sino-Tibetan also. In other words, we should find cognates with Tibetan, Naga, Naxi, Tujia, Karen, Lolo, Kuki, Nung, Jingpho, Chin, Lepcha, etc. We should also find cognates with those languages, where we do not find them in Chinese. That’s a little complicated, so I will let you think about it a bit.
Further, the comparisons between Chinese and Vietnamese should be variable. Some should look quite close, while others should look much more distant.
So there’s a problem with the Vietnamese as ST theory.
The cognates look like Chinese.
Problem is, they look too much like Chinese. They look more like Chinese than they should in a genetic relationship. Further, they look like Chinese and only Chinese. Looking for relationships in S-T outside of Chinese, and we find few if any.
That’s a dead ringer for borrowing from Chinese to Vietnamese. If it’s not clear to you how that is, think about it a bit.
Looking at Mon-Khmer, the case is not so open and shut. There seem to be more cognates with Chinese than with Mon-Khmer. So many more that the case for Vietnamese as AA looks almost silly, and you wonder how anyone came up with it.
But let us look again. The cognates with AA and Vietnamese are not just with its immediate neighbors like Cambodian and Khmu but with languages far off in far Eastern India like Munda and Santali. There are words that are found only in the Munda branch in one or two obscure languages that somehow show up again as cognates in Vietnamese.
Now tell me how Vietnamese borrowed ancient basic vocabulary from some obscure Munda tongue way over in Northeast India? It did not. How did those words end up in some unheard of NE Indian tongue and also in Vietnamese? Simple. They both descended long ago from a common ancestor. This is Historical Linguistics.
The concepts I have dealt with here are not easy for the non-specialist to figure out, but most smart people can probably get a grasp on them.
A different subject is the deep relationships of AA. Is AA related to any other languages? I leave that as an open question now,  though there does appear to be a good case for AA being related to Austronesian.
One good piece of evidence is the obscure AA languages found in the Nicobar Islands off the coast of Thailand. Somehow, we see quite a few cognates in Nicobarese with Austronesian. We do not see them in any other branches of AA, only in Nicobarese. This seems odd,  and it’s hard to make a case for borrowing. On the other hand, why cognates in Nicobarese and only in Nicobarese?
Truth is there are some cognates outside of Nicobarese but not a whole lot. In historical linguistics, one thing we look at is morphology. Those are parts of words, like the -s plural ending in English.
In both AA and Austronesian, we have funny particles called infixes. Those are what in English we might call prefixes or suffixes, except they are stuck in the middle of the word instead of at the end or the beginning. So, in English, we have pre- as a prefix meaning “before” and -er meaning “object that does X verb”. So pre-destination means that our lives are figured out before we are even born.  Comput-er and print-er are two objects, one that computes and the other that prints.
If we had infixes instead, pre-destination would look something like destin-pre-ation and comput-er and print-er would look something like com-er-pute and prin-er-t.
Anyway, there are some fairly obscure infixes that show up not only in some isolated languages in AA but also in far-flung Austronesian languages in, say, the Philippines. Ever heard of the borrowing of an infix? Neither have I? So were those infixes borrowed,  and what are they doing in languages as far away as Thailand and the Philippines, and none in between? Because they  got borrowed? When? How? Forget it.
Bottom line is that said borrowing did not happen. So what are those infix cognates doing there? Probably ancient particles left over from a common language that derived both Austronesian and AA, probably spoken somewhere in SW China maybe 9,000 years ago or more.
Why is this sort of long-range comparison so hard? For one thing, because after 9,000 years or more, there are hardly any cognates left anymore, due to the fact of language change. Languages change and tend to change at a certain rate.
After 1000X years, so much change has taken place that even if two languages were once “sprung from a common source,” in the famous words of Sir William Jones in his epochal lecture to the Asiatic Society in Calcutta on February 2, 1786, there is almost nothing, or actually nothing, left to show of that relationship. Any common words have become so mangled by time that they don’t look much or anything alike anymore.
So are AA and Austronesian related? I think so, but I suppose it’s best to say that it has not been proven yet. This thesis is part of a larger long-range concept known as “Austric.” Paul Benedict, a great scholar, was one of the champions of this. Austric is normally made up of AA, Austronesian, Tai-Kadai (the Thai language and its relatives) and Hmong-Mien (the Hmong and Mien languages). Based on genetics, the depth of Austric may be as deep as 30,000 years, so proving it is going to be a tall order indeed.
What do I think?
I think Tai-Kadai and Austronesian are proven to be related (more on that later). AA and Austronesian seem to be related also, with a lesser depth of proof. Hmong-Mien seems to be related to Sino-Tibetan, not Austric.
The case for Vietnamese being related to S-T is still very interesting, and I still have an open mind about it.
All of these discussions are hotly controversial, and mentioning it in linguistics circles is likely to set tempers flaring.


Author and date unknown, What Makes Vietnamese So Chinese? An Introduction to Sinitic-Vietnamese Studies.
Please follow and like us:
Tweet 20

11 thoughts on “The Classification of the Vietnamese Language”

  1. The Vietnamese won’t admit it, but let them face it. They are descended from essentially Chinese that broke away from China. Check their history to see. It has been only two thousand years and why should their language baffle everybody? Linguist experts and the likes? As they kept migrating southward in history, their language evolved into what’s known today. Language contact with others of the region has been a major factor effecting the evolution, apparently, understandably. Why all the fuss and debates about its classification, in fact?
    Besides, words “borrowed” or not, all so-called Chinese dialects of modern days are mutually unintelligible, as much as the Vietnamese language and Mandarin Chinese are.

    1. I know how old this comment is, but I thought I should post because of just how absolutely wrong it is. It is just so incredibly wrong.
      It isn’t “just” the Vietnamese people who won’t admit it. It’s hundreds if not thousands of well-respected academics from around the world who agree that Vietnamese is not directly related to the Chinese languages.
      I’ll break down every part of this post and show where it’s wrong. First, yes, linguistic experts do agree that the people who spoke what would eventually become Vietnamese originated in the southern parts of China. That’s the linguistic homeland of the Mon-Khmer language family. I won’t dispute that.
      Next, you say “It has been only two thousand years and why should their language baffle everybody? Linguist experts and the likes? As they kept migrating southward in history, their language evolved into what’s known today.” No, that’s just absolutely stupid. That’s not how linguistic evolution works. Scholars generally believe modern Sino-Tibetan languages evolved in western China, around the eastern parts of Tibet or so. That is a different homeland and shows that the language families began independently.
      “Language contact with others of the region has been a major factor effecting the evolution, apparently, understandably.” Yes, agreed. That’s exactly what this article talks about. Vietnamese vocabulary is 30-60% derived from Chinese vocabulary because of the period of Chinese domination of Vietnam. That has almost nothing to do with the genetic classification of the languages. That would be like saying English is obviously a Romance language because it took so many words from Latin and French due to the Norman conquest. No, that’s absolutely stupid. You can have a heavily derived vocabulary and not be genetically related.
      Next you say “Why all the fuss and debates about its classification, in fact?”. There’s so much fuss and debates because of stupid comments like this where people say things like oh, it’s so obvious that Vietnamese is related to Chinese. It’s ignorance like that that causes those debates.

  2. Interesting. Take for example the assumption that Holo (=Banlamese=Hokkien=Taiwanese) “is a Sinitic language”. To me, it makes almost as much sense to assume that Vietnamese “is a Sinitic language” too. But even before we get around to asking “What is a Sinitic language?”, it’s clear that there is some kind of a line separating Holo from Vietnamese. There is a continuum of “Siniticity” running from the languages of the Chinese heartland through Holo to Vietnamese and beyond, and the thickest “line” is probably somewhere between Holo and Vietnamese. Just look at the numbers, pronouns, word order… I really don’t think Vietnamese resembles Cantonese because of some “common ancestor”. At the same time, Holo may have also gotten its Sino-fix through borrowing and contact.

  3. One more thing. The VNY2K paper comes from a very interesting perspective, but some of the interpretation of the data is suspect. For example:
    sẽ ‘will’ [ SV tương | M 將 jiāng < MC tsjaŋ < OC *tsjaŋ | § 醬油 jiāngyóu, Cant.: /sijou/ : xìdầu],
    The Cantonese word /sijau/ is actually 豉油, not 醬油.
    Much of the paper goes on like this.
    This is a great piece of work, but it kind of irks me that the VNY2K leaned so heavily on Mandarin, then wrote a whole section justifying it, when it is clearly not ideal to lean so hard on Mandarin.

  4. I am a Fujian/Taiwanese Chinese in Singapore. I speak 3 kinds of Chinese dialects, Taiwanese, Cantonese and Mandarin. To me Vietnamese sounds like Cantonese, although I would not understand a single word they say.
    I found Japanese Chinese characters more readable than Vietnamese Chinese chacater. If Taiwanese is to be written, I believe Mandarin speaker would not really understand it as well.
    Although both are sinitic language, the grammar is too far apart, and we use very different spectrum of Chinese character. The Taiwanese language uses a very ancient kind of lexicon together with many other local lexicon not found in other Sinitic languages. Mandarin and Taiwanese is close to 100% mutually unintelligible. I studied German. I would say the language distance between Mandarin and Taiwanese is probably a more further apart than German is to Dutch.
    I can testify that for Cantonese as well.
    I believe we are different race. But all of us uses Chinese character and eventually, we become one race. Chinese characters expand its lexicon to accommodate all foreign inputs.
    If Vietnam is a China province, it would basically be another dialect.

      1. Yes. that is my mother tongue. And speaking Min Nan makes learning Mandarin way easier. I learned Mandarin in school. Strangely I am able to pick up Cantonese simply by watching TV in my childhood. I cannot explain why. I do not interact much with Cantonese people in my childhood. I interact with Cantonese more during adult. But then, it helps very much if there are cantonese staying close by, even though you don really hand shake with them.
        The lexicon and grammar of Hokkien and Cantonese is also quite different. I basically grasp the lexicon that Taiwanese use first. Then slowly I figure out the alien words.
        For example
        Friends- 朋友
        Cantonese – Pang Yao
        Min Nan- Peng U
        Apple 苹果
        Cantonese – Peng Kuo
        Min Nan – Peng Ko
        Others are much different
        fall sick
        Cantonese – Sang beng 生病
        Min Nan – pao bi 发病
        Generally, the Min Nan lexicon is more classical and ancient.

  5. If a Mandarin and Taiwanese speak, both will not be able to understand one another, for virgin ears. If we write it down, the Taiwanese would understand a lot more mandarin, close to 100%. The modern Chinese literal language is closer to Mandarin. The Mandarin would probably understand 50-60% of literal Taiwanese.
    When I read classical vietnamese, I understand around 30-80 %. The more classical the written language is, the more I can understand. If it is written in vulgar form, then I would understand much less.

Leave a Reply

Your email address will not be published. Required fields are marked *


Enjoy this blog? Please spread the word :)