Splitters Versus Lumpers in Historical Linguistics

Warning:  Long, runs to 57 pages. This article is intended at the moment more for the general audience than for specialists,  but specialists may also find it of interest. At the moment, it is not properly formatted or edited to be of use for publication in an academic journal, but perhaps it could be published in such a format some day.

For background into what Historical Linguistics is, see this Wikipedia article. Basically it involves determining which languages are related to each other via various means and once that is determined, reconstructing a proto-language that the related languages descended from, along with, hopefully, regular sound correspondences which supposedly proves the relationship once and for all. The argument in Historical Linguistics now is between conservatives or splitters or progressives or lumpers.

Splitters say that the comparative method – described above as reconstructing a proto-language with regular sound correspondences – is necessary in order to prove that two or more languages are related. However, they also say, probably correctly, that this method is not useful beyond ~6,000 years. Any relationships beyond that time frame would not be provable by the comparative method and hence could never be proven. This effectively shuts down all research into long-range older language families.

Some lumpers say that this method is not necessary and instead relationships can be determined by simply looking at the two or more languages, a process called comparison or mass comparison. I point out below that comparison need not be cursory but could mean deep study of languages over 10, 15, or 20 years.

They tend to focus on core vocabulary, numerals, family terms, pronouns, and deictics, in addition to small morphological particles – all things that are rarely borrowed. Once they find a number of these items that resemble one another greater than chance, they say that the two languages are related because chance and borrowing are ruled out.

They say that this is the way to prove language relatedness, not the comparative method. The comparative method instead is used to learn interesting things about language families that have already been discovered via comparison, such as reconstructing proto-languages and finding regular sound correspondences.

Splitters say that comparison or mass comparison is not a valid way of proving that languages are related and that only the comparative method can be used to prove this. However, as noted, they set a 6,000- year time limit on the method needed to prove this, and this walls off a lot of potential knowledge and about ancient and long-range language relationships as unprovable and hence undiscoverable. In a way, they are shutting the door to new scientific discovery beyond a certain time frame by claiming that the method needed to make these discoveries doesn’t work beyond X thousand years.

Other lumpers disagree that the comparative method has a time limit on it and are attempting to use the comparative method to reconstruct ancient long-range language families and find regular sound correspondences between them. Unfortunately, most of their efforts are in vain as splitters are using increasingly strict criteria for proof of language relationship and hence are shooting down most if not all of these efforts being done “in the proper way.”

So they are saying that proof must be done in a certain way, but when people try to play by the rules and use that way to find proof, they keep moving the goalposts and using increasingly strict, petty, and quibbling methods to in general say that the relationship is not proven.

So the say, “You must use this tool for your proof!” And then people play fair and use the tool, and almost always say, “Sorry, you didn’t prove it!” It all feels like a game that is rigged to fail is most if not all cases.

Hence, the current trend of extreme conservatism in Historical Linguistics has set up rules seem to be designed to prevent the discovery of most if not all new language families, in particular long-range families older than 6-8,000 years.

I am quite certain that long-range language families such as Altaic (with either three families or five), Indo-Uralic, Uralic-Yukaghir, Hokan, Penutian, Mosan, Almosan, Japanese-Korean, Gulf, Yuki-Gulf, Elamite-Dravidian, Quechumaran, Austroasiatic-Hmong Mien, Coahuiltecan, North Caucasian, or Na-Dene will never be proven in my lifetime, and that’s not to mention the more extreme proposals such as Eurasiatic, Nostratic, Dene-Caucasian, Austric, and Amerind, although the evidence for the first and last of these is quite powerful.

There are simply too many emotions tied up in any of these proposals. Further, many linguists have spent a good part of their careers arguing against these proposals. It is doubtful that any amount of evidence will cause them to change their minds. Scientists, like any other humans, don’t like to be shown that they’re wrong.

Lyle Campbell, Maryanne Mithun, Mauricio Mixco, Sarah Grey Thomason, Joanna Nichols, William Poser, Peter Daniels, Dell Hymes, Larry Trask, Gerrit Dimmendaal, Donald Ringe, Juha Janhunen, William Bright, and Paul Sidwell are among the leaders of this new conservatism.

At first I was very angry at what these people were doing, especially the most egregious cases such as Campbell. Then I realized that people lie and misrepresent things all day long every single day in my life and that this behavior is fairly normal behavior in humans, especially in a mushy area like this one where hard truths are hard to come by and most stated facts are more properly matters of opinion or could be construed that way.

I realized that they are simply defending a scientific paradigm and that unfortunately, this is the rather underhanded and emotion-ridden environment that defending paradigms tends to produce.

Though to be completely honest, I should not be singling these people out because the current conservatism is simply consensus and acts as the current paradigm on the language relatedness question in Historical Linguistics. The people listed above are at the top of the profession and are often considered the best historical linguists. They write books on historical linguistics. A number are considered to be ultimate authorities on questions of language relatedness. They are simply the leading edge of the current conservative consensus and paradigm in the field.

Although granted, of all of them, Campbell seems to be the most extreme conservative. He is also one of the top historical linguists in the world. Mixco, Mithun, and Poser are about on the same level as Campbell.

Campbell, Mithun, Thomason, and Mixco are Americanists whose conservatism was set off by the publication of Joseph Greenberg’s Language in the Americas (LIA) in 1987.

All of the linguists above are noted for the excellent scholarship.

The conservatives who are denying most if not all new families are are called splitters.They tend to be very angry if not out and out abusive, engaging in bullying, mockery, ridicule, ostracization, and all of the usual techniques used in science against the proposers of a new paradigm.

The people who propose long-range families are called lumpers. Lumpers are heavily disparaged in the field nowadays such that almost no one wants to be known as a lumper or associated with such. However, many other historical linguists seem to be taking a more moderate fence-sitter stance where they are open to questions of new language families, including long-range families.

Among the long-range families that the moderates are open to considering nowadays are Indo-Uralic, Dene-Yenisien, and Austro-Tai. Some of the smaller long-range families in the Americas even have supporters among the most hardline of splitters. I’m even dubious about well-argued proposals such as Dene-Yenisien.

Thomason takes extreme umbrage to the notion that splitters have a bias that will not allow few if any new families to be discovered after Greenberg compared them with Malcolm Guthrie’s objections to Greenberg’s new classification of Bantu. However, after thinking this over for some time now, I now believe that Greenberg is correct. The splitters have their minds made up. They are going to allow few if any new families to be discovered. A few of them have caved a bit.

I also work in mental health, and it’s pretty obvious to me when something is not right about a scientific debate. I’ve been getting that vibe about the splitters versus lumpers debate from the very start. When a debate in science has degenerated into bias, ideology and ideologues, propaganda, politics, and in particular extreme emotion, it gives off a certain intuitive feel about it. This debate has felt this way from Day One. To put it simply, the debate simply doesn’t smell right. I have a feeling that science left the room along time ago here.

One thing I noticed was that people who have worked on one particular language or family for much of their careers are especially angry and aggressive about the notion that their family could possibly be related to anything else. Indeed famous linguists were remarking on this tendency as early as 1901. Among the reasons given was that they had their hands full already without new work to take on and a disinclination to see their language family related to anything else as this would deny its specialness.

Trask is forceful that Basque could not possibly have any outside relatives.

I saw a debate on the Net some years ago with Trask and a Spanish assistant holding court over a debate over the external relations of Basque. Those who argued for external relations were pushing a relationship with the Caucasian languages, which is possible though not proven in my opinion. Trask and his assistant were very angry and aggressive in holding down the fort. Apparently everything was a Spanish borrowing. The debate didn’t smell right at all.

With a background in psychology, I wonder what is going on here. One possibility is as Greenberg suggests and as was suggested back in 1901 – simple narcissism. When one specializes in a language family for a long time, it probably become blurred with the self such that the self and the family become married to each other, and it’s hard to tell where one ends and the other begins. Yourself and the family you’ve spent your career working on become one and same thing. If your family is not related to anything else, it’s special.

We all think we are special. This is the essence of human narcissism. To say that their favorite language has relatives is to deny its specialness almost as if to say that our egos were not real but were instead extensions of other people’s egos. Actually if you read Sartre or study modern particle physics, that’s not a bad theory, but most people bristle at the notion.

I met Korean and Japanese people when I was doing my Masters. Both beamed when they told me that their language had no known relatives. Of course that made it special in their eyes and played right into their ethnocentrism.

Another problem may be the trajectory of one’s career. If one has been arguing forcefully for 30 years that there are no known relations to your family, your reputation is going to take a huge hit if you have to agree that you were wrong all those years.

There is also a politics question.

Another reason is Politics. We are dealing here with a Paradigm. For a good description of a Scientific Paradigm, see Thomas Kuhn’s The Structure of Scientific Revolutions. Kuhn holds that science is by its nature very conservative, some sciences being more conservative than others. A Paradigm is set up when the field reaches a satisfactory consensus that a particular theory is correct. After a while, serious barriers go up to any challenges to overthrow the proven theory.

The challenges are first ignored, then ridiculed (often severely), then attacked (often ferociously) and then, if the challenge is successful, it is accepted (often slowly and grudgingly). Kuhn pointed out that defenders of the old theory are usually so reluctant to see the paradigm overthrown that we often must wait literally until their deaths to finally overthrow the paradigm. They defend it to their deathbeds. I suggest we are dealing with something more than pure empiricism here.

It is quite risky to challenge a paradigm in science. People’s careers have suffered from it. A supporter of Keynesian economics, then challenging the current paradigm in economics, could not get hired at any university in the US during the 1930’s.

In the splitters versus lumpers debate, we have been in the Anger phase for some time now. We seem to be settling out of it, as many are taking a fence-sitting position and arguing for attempts to resolve the debate to make it less heated.

The Paradigm here involves extreme skepticism about any new language families to the point that any new families are simply going to be rejected on all sorts of grounds. Paradigms involve politics at the academic level. When a Paradigm is set up in science, almost all scientists write and do research within the paradigm. Anything outside of the paradigm is derided as pseudoscience or worse.

The problem is that when a Paradigm in in effect, all scholars are supposed to publish within the Paradigm. Publishing outside the paradigm is regarded as evidence that one is a kook, a crank, is practicing pseudoscience, or that one is crazy or a fool. It is instructive in this debate to note that most of the prominent lumpers are independent scholars operating outside of the politics of academia.

I have had them tell me that the only reason they can take the lumper position that they do is because they are independent and don’t have a university job, so there are no repercussions if they are wrong. They told me that if they had a professorship, they would not be able to do this work. They have also told me that they know for a fact that certain splitters might jeopardize their jobs, careers, and especially their funding if they took a lumper position. This was given as one of the reasons for their dogmatic splitterism.

In addition, science works according to fads, or more properly, standard beliefs. The trends for these beliefs are set by the biggest names in the field. The biggest names in Linguistics are all splitters now. They are the trendsetters, especially in whatever specialty of Historical Linguistics you are working in. Everyone else in the field is dutifully following in their footsteps. As an up and coming young scholar, you are supposed to follow the proper trends and hypotheses of your field to uphold the consensus of scholars in your area of specialty. As you can see there is a lot more than simple empiricism going on here.

With my background, I look for psychological motivations anywhere I can find them. And science is no stranger to bias and emotional psychological motivations driving, or usually distorting it. We are human and humans have emotions. Emotion is the enemy of logic. Logic is the basis of empiricism. Hence, emotions are the enemy of science.

Scientists are supposed to remain objective, but alas, they are humans themselves and subject to all of the emotional psychological motivations that the rest of them are. Scientists are supposed to police themselves for bias, but that’s probably hard to do, especially if the bias is rooted in psychological processes or in particular if it is unconscious, as many such processes are.

Campbell’s case is an extreme one, but I believe it is simply motivated by internal psychological process inside of the man himself.

Campbell is driven by psychological complexes. His entire turn towards extreme conservatism in this debate was set off by the huge feud he had with Greenberg, and everything since has flowed from that. He took a very angry position that LIA was completely false and did his best to trash its reputation far and wide. This disparagement is still the order of the day, and Greenberg’s name is as good as mud in the field.

Then Campbell generalized his extreme splitterist reaction to LIA out to all of the language families in the world because if he allowed any new families elsewhere in the world, he might have to allow them in the Americas, and he could not countenance that. Note also that Campbell has gone out of his way to specifically attack Greenberg’s four-family split in his proposal for language families in Africa.

This proposal, done with Greenberg’s derided method of mass comparison, has had a successful result in Africa and has been proven with the test of time. Campbell cannot allow this because if he admits that Greenberg was right in Africa, he might have to accept that he might be right in the Americas too, and that’s beyond the pale. So in his recent works he has specifically set out to state that Afroasiatic, Nilo-Saharan, Niger-Kordofanian, and Khoisan – the four families of Greenberg’s classification – have not been proven to exist yet. The truth is exactly the opposite, but the psychological process here is bald and naked for all to see.

Here he specifically trashes these language families because they were discovered by Joseph Greenberg, Campbell’s bete noir. Campbell’s agenda is to show the Greenberg is a preposterous kook and crank, although he was one of the greatest linguists of the 20th century. Greenberg’s African work is regarded as true, and this poses a problem if Campbell is to characterize Greenberg as a charlatan.

If Greenberg was right about one thing, could he not be right about another? In order to lay the foundation for the theory that Greenberg’s method doesn’t work and that it cannot discover any language relationships, Campbell will have to deny the method ever had any successes. So he sets about to deny that Greenberg’s four African families are proven.

Splitters have come up with a repertoire of reasons to shoot down proposed language relations and most are pretty poor.

They rely on overuse of the borrowing, chance, sound symbolism, nursery word, and onomatopoeia explanations for non-relatedness. There is also an overuse of the comparative method with excessively strict standards being set up for etymologies and sound correspondences. In a number of cases, linguists are going back to the etymologies of their proto-languages and reducing them by up to half.

In the last 20 years, Uralicists have gone back over the original Proto-Uralic etymologies and gotten rid of fully half of them (from 2,000 down to 1,000) on a variety of very poor reasons, mostly irregular sound correspondences. It appears to me that while there were some obvious bad etymologies in there, most of the ones that were thrown out were perfectly good.

Irregular sound correspondences is a bad reason to throw out an etymology. Keep in mind that 50% of Indo-European etymologies have irregular correspondences. By the logic of Uralicists we should throw out half of IE etymologies then. If Campbell finds any irregular sound correspondences in any new proposal, he automatically rejects it on those grounds alone. What the Uralicists have done is vandalism.

This is not just conservatism. It is out and out Reaction. Worse, it is nearly a Conservative Revolution, which I won’t define further. It is akin to a city council declaring that all of the old, beautiful buildings in the city are going to be torn down because they were not constructed properly. Will they be rebuilt? Well, of course not. Most of the top Uralicists are involved in this silly and destructive project.

In a recent paper, George Starostin warned that the splitters were not just conservatives determined to stop all progress. He pointed out that there was actually a trend towards rejection and going backwards in time to dismantle families that have already set up on the grounds that they were not done perfectly enough. As we can see, his warning was prescient.

There are statements being made by moderates that both sides, the splitters and the lumpers, are being equally unreasonable. As one linguist said, the debate is between lazy lumpers (Just believe us, don’t demand that we prove it!) and angry splitters (Not only is this new family false, but all new families proposed from now on will also be shot down!). He suggested that they are both wrong and that the solution lies in a point in the middle. I don’t have a problem with this moderate centrist belief

The splitter notion itself rests on an obvious falsehood, that there are hundreds of language families in the world that have no possible relationship with each other.

According to Campbell, there are 160 language families and isolates in the Americas. The question is where did all of these entities come from. Keep in mind, in Linguistics, the standard view is that these 160 entities are not related to each other in any way, shape, or form. Thinking back, this means that language would have had to have developed in humans 160 times among the Amerindians alone.

The truth is that there was no polygenesis of language.

Sit back and think for a moment. How could language possibly have been independently developed more than one time? Obviously it arose in one group. How could it have arose in other groups too? It couldn’t and it didn’t. Did some of the original speakers go deaf, become mutes, forget all their language, and  then have children, raising them without language, in which case the children devised language for themselves?

Children need comprehensible input to develop language. No language to hear in the environment, no language for the children to acquire on their own. With coclear implants, formerly deaf people are now able to hear for the first time. A woman got hers at age 32. Since she missed the Critical Period for language development, the window of which closes at age 8, she  has not, even at this late  date, been able to acquire language satisfactorily. She missed the boat. No input, no language.

Obviously language arose only once among humans. It had to. And hence, all human languages are related to each other de facto whether we can “prove” it by out fancy methods or not. In other words, all human languages are related. Those 160 language families and  isolates in the Americas? All related. Now we may not be able to prove which languages they are related to specifically and most closely, but we know they are all related to each other.

In the physical sciences, including Evolutionary Psychology, many things are simply assumed because the alternate theories could not have happened. But we have no evidence of much of anything in Evolutionary Psychology or Evolutionary Anthropology. We know our ancestors lived in X place at Y times, but we have no idea what they were doing there. We can’t go back in time to prove that this or that happened.

Using the logic of linguists, since we cannot make time machines to go back in time and make theories about Evolutionary Anthropology and Evolutionary Psychology of these peoples, we can make no statements about this matter, as the only way to prove it would be to see it. In physics, there are particles that we have never seen. We have simply posited their existence because according to our theories, they have to exist. According to linguists, we could not posit the discovery of these particles unless we see it.

Contrary to popular rumor, everything in science does not have to be “proven” by this or that rigorous method. Many things are simply posited, as no real evidence for their existence exists, either because we were not there or because we can’t see them, or in the case of pure physics, we can’t even test out our theories. They exist simply because they have to according to our existing theories, and all competing theories fall down flat.

Well, the Americanists beg to disagree. Greenberg’s theory was so extreme and radical that the entire field erupted in outrage. None of their alternate theories, not even one of them, make the slightest bit of sense.

Despite the fact that these languages are obviously related to each other, in order to “officially prove it” we have to use a method called the comparative method whereby proto-languages and families are reconstructed and regular sound correspondences are shown between the languages being studied.

This is the only way that we can prove one language is related to another. That’s simply absurd for a few reasons.

First of all, I concur with Joanna Nichols that the comparative method does not really work on language families older than 6-8,000 years. Beyond that time, so many sound changes have taken place, semantics have been distorted, and terms fallen out of use that there’s not much of anything left to reconstruct. Furthermore, time has washed away any evidence of sound correspondences.

Although Nichols is a splitter, I have to commend her. First, she’s right above.

Second, realizing this, she says that the comparative method will always fail beyond this time frame. I believe she thinks then that we need to use new methods if we are to prove that long-range families exist. The method she suggests is “individual-identifying evidence,” which seems to be another way of saying odd morpheme paradigms that were probably not borrowed and are hardly existent outside of that family.

This harkens back to Edward Sapir’s “submerged features,” where he says we can prove the existence of language families by these small morphemic resemblances alone.

The rest of the field remain sticks in the mud. They say that we must use the comparative method to discover that languages are related because no other method exists. The problem is that as noted, as splitters themselves note, if the comparative method fails beyond 6,000 years back, all attempts to prove language families that old or older are bound to fail.

The splitters seem positively gleeful that according to their paradigm, few if any new language families will be discovered. This delight in nihilism seems odd and disturbing. What sort of science is gleeful that no new knowledge will be found? Even in the even that this is true, it’s depressing. Why get excited about something so negative?

Many language families in the world were discovered by Greenberg’s “mass comparison” or simply comparing one language to another, which should be called “comparison.” And in fact, many of the smaller language families in the world are still being posited by the means of comparison or mass comparison. Comparison need not be the broad, sweeping, forest for the trees, holistic method Greenberg employs. I argue that it means lining up languages and looking for common features. We could be lining up one language against another and that would also be “comparison.”

It need not be a shallow examination. One could examine a possible language for five, ten, fifteen, or twenty years.

After studying a pair or group of languages for some time, if one finds a group of core vocabulary items that resemble one another and are above the rate found by chance (7%), and after which borrowing has been ruled out (core vocabulary is rarely borrowed), then you have proof positive of a language family.

I fail to understand why examining a language or group of languages for a long period of time to find resemblances and try to rule out chance or borrowings is a ridiculous method. What’s so ridiculous about that? Sure, it’s nice to reconstruct and get nice sound correspondences going, but it’s not always necessary, especially in long-range comparisons when such methods are doomed to failure.

One more thing: if splitters say that the comparative method fails beyond 6,000 years, why do they keep putting long-range families to the test using the comparative method? After all, the result will always come up negative, right? What’s the point of doing a study you know will come up negative? Just to get your punches in?

There are a number of folks who have bought into the splitters’ arguments and are trying to discover long-range families by the comparative method of reconstructing the proto-language and finding regular sound correspondences between them. A number of them claim to have been successful. There have been attempts to reconstruct proto-languages and find regular sound correspondences with Altaic, Nostratic, Dene-Caucasian, Dene-Yenisien, Austro-Tai, Totonozoquean, and Uralo-Yukaghir.

Altaic, Nostratic, and Dene-Caucasian all have proto-languages reconstructed with good sound correspondences running through them. Altaic and Nostratic have etymological dictionaries containing many words, 2,300 proto-forms in the case of Altaic in a 1,000 page volume. Further, a considerable Nostratic proto-language was reconstructed by Dogopolsky and Illich-Svitych.

All of these efforts claim that they have proven their hypotheses. However, the splitters such as Campbell have rejected all of them. So you see, even when people follow the mandated method and play it by the book the way they are supposed to, the splitters will nearly always say that the efforts come up short. It’s a rigged game.

How about another question? If the comparative method fails is doomed beyond 6,000 years, why don’t we use another method to discover these relationships? The splitter rejoinder is that there is no other method. It’s the comparative method or nothing. But how do they know this? Can they prove that other methods can never be used to successfully discover a language relationship?

The following quotes are from a textbook or general text on Historical Linguistics by Lyle Campbell and Mario Mixco, A Glossary of Historical Linguistics. The purpose of this paper will be misrepresented as critics who will say that I am a lumper who is saying criticizing splitters for their opposition to known language families.

There is some of that here, but more than lumper propaganda, what I am trying to do here more than anything else is to show how Campbell and Mixco have been untruthful about linguistic specialist consensus regarding these families. In most cases, they are openly misrepresenting the state of consensus in the field.

As will be shown, Campbell and Mixco repeatedly seriously distort the state of consensus regarding many language families, particularly long-range ones. They usually favor a more negative and conservative view, saying that a family has little support when it has significant support and saying it is controversial when the consensus in the field is that the family is real. Campbell and Mixco engage in serious distortions of fact all through this text:

Campbell and Mixco:

Afroasiatic: Enjoys wide support among linguists, but it is not uncontroversial, especially with regard to which of the groups assumed to be genetically related to one another are to be considered true members of the phylum.

There is disagreement concerning Cushitic, and Omotic (formerly called Sidama or West Cushitic) is disputed; the great linguistic diversity within Omotic makes it a questionable entity for some. Chadic is held to be uncertain by others. Typological and areal problems contribute to these doubts. For example, some treat Cushitic and Omotic together as a linguistic area (Sprachbund) of seven families within Afroasiatic.

Campbell and Mixco are wrong. Afroasiatic is not controversial at all. There is widespread consensus that the family exists and that all of the subfamilies are correct.

The “we can’t reconstruct the numerals” argument is much in evidence here too. See the Altaic debate below for more on this. One argument against Altaic is “We can’t reconstruct the numerals.” However, Afroasiatic is a recognized family and not only has reconstruction itself proved difficult, but the numerals in particular are a gigantic mess. It seems that one does not need to have a fully reconstructed numeral set after all to have a proven language family.

There is consensus that Cushitic is a valid entity. Granted, there has been some question about Omotic, but in the last 10-15 years, consensus has settled on an agreement that Omotic is part of Afroasiatic.

The great diversity of Omotic is no surprise. Omotic is probably 13,000 years old! It’s amazing that there’s anything left at all after all that time.

Where do we get the idea that a language family cannot possibly be highly diverse? Chadic is also uncontroversial by consensus. I am not aware of any serious proposals to see Cushitic and Omotic as an Altaic-like Sprachbund of mass borrowings. Campbell and Mixco’s comments above are simply not correct. The only people questioning the validity of Afroasiatic or any of its components are Campbell and Mixco, and they are not an experts on the family.

Campbell and Mixco:

Berber is usually believed to be one of the branches of Afroasiatic.

This is far too pessimistic. Berber is recognized by consensus as being one of the branches of Afroasiatic.

Campbell and Mixco:

Niger-Kordofanian (now often just called Niger-Congo): A hypothesis of distant genetic relationship proposed by Joseph H. Greenberg in his classification of African languages. Estimated counts of Niger-Kordofanian languages vary from around 900 to 1,500 languages. Greenberg grouped ‘West Sudanic’ and Bantu into a single large family, which he called Niger-Congo, after the two major rivers, the Niger and the Congo ‘in whose basins these languages predominate’ (Greenberg 1963: 7).

This included the subfamilies already recognized earlier: (1) West Atlantic (to which Greenberg joined Fulani, in a Serer-Wolof-Fulani [Fulfulde] group), (2) Mande (Mandingo) (thirty-five to forty languages), (3) Gur (or Voltaic), (4) Kwa (with Togo Remnant) and (5) Benue-Congo (Benue-Cross), with the addition of (6) Adamawa-Eastern, which had not previously been classified with these languages and whose classification remains controversial.

For Greenberg, Bantu was but a subgroup of Benue-Congo, not a separate subfamily on its own. In 1963 he joined Niger-Congo and the ‘Kordofanian’ languages into a larger postulated phylum, which he called Niger-Kordofanian.

Niger-Kordofanian has numerous supporters but is not well established; the classification of several of the language groups Greenberg assigned to Niger-Kordofanian is rejected or revised, though most scholars accept some form of Niger-Congo as a valid grouping.

As Nurse (1997: 368) points out, it is on the basis of general similarities and the noun-class system that most scholars have accepted Niger-Congo, but ‘the fact remains that no one has yet attempted a rigorous demonstration of the genetic unity of Niger-Congo by means of the Comparative Method.’

There is consensus among scholars that Niger-Kordofanian is a real thing.

Campbell and Mixco:

Nilo-Saharan: One of Greenberg’s four large phyla in his classification of African languages. In dismantling the inaccurate and racially biased ‘Hamitic,’ of which Nilo-Hamitic was held to be part, Greenberg demonstrated the inadequacy of those former classifications and argued for the connection between Nilotic and Eastern Sudanic.

He noted that ‘the Nilotic languages seem to be predominantly isolating, tend to monosyllabism, and employ tonal distinctions’ (Greenberg 1963: 92). To the extent that this classification is based on commonplace shared typology and perhaps areally diffused traits, it does not have a firm foundation. Nilo-Saharan is disputed, and many are not convinced of the proposed genetic relationships. It is generally seen as Greenberg’s wastebasket phylum, into which he placed all the otherwise unaffiliated languages of Africa.

First of all, Nilo-Saharan is not classified based on its language typology which were perhaps areally diffused. There is also a great deal of the more typical evidence in favor of this language family. Second,  it is not true that it lacks a firm foundation and that many are not convinced of its reality. The consensus among experts is that this family exists and the overwhelming majority of the subfamilies and isolates Greenberg put it in are correct.

Saying that it is a wastebasket phylum does not make sense because the Nilo-Saharan languages are only found in  a certain part of Africa. If it was truly such a phylum, there would be languages from all over Africa placed in this family.

According to Roger Bench, a moderate, there is now consensus in the last 10-15 years that Nilo-Saharan is a real thing.

Consensus has formed that 75% of the languages and families Greenberg put in Nilo-Saharan form a valid family. Controversy remains about the other 25% including Songhay, the Gumhuz family, and a few isolates. Some say these are part of Nilo-Saharan but others say they are not. Nilo-Saharan probably has a great time depth of ~13,000 years at least, such  that little probably remains to reconstruct. Reconstruction of Nilo-Saharan has proved difficult.

Yes, Campbell and Mixco say that Nilo-Saharan is not real, but they are not specialists.

Campbell and Mixco:

Khoisan: A proposed distant genetic relationship associated with Greenberg’s (1963) classification of African languages, which holds some thirty non-Bantu click languages of southern and eastern Africa to be genetically related to one another. Greenberg originally called his Khoisan grouping ‘the Click Languages’ but later changed this to a name based on a created compound of the Hottentots’ name for themselves, Khoi, and their name for the Bushmen, San.

Khoisan is the least accepted of Greenberg’s four African phyla. Several scholars agree in using the term ‘Khoisan’ not to reflect a genetic relationship among the languages but, rather, as a cover term for all the non-Bantu and non-Cushitic click languages.

Although it is probably true that Khoisan is the least accepted of Greenberg’s families, that’s not saying much, as it only means that 80% of experts accept its reality instead of 100%. I do not know who these several scholars are who feel that Khoisan is a typological area for click languages, but they do not seem to be specialists. Overall, Campbell and Mixco seriously distort consensus on Khoisan in this passage.

According to George Starostin, in the last 5-10 years, there is now consensus that Khoisan exists. There are five major Khoisan scholars, and four of them agree that Khoisan is real, with all of them including Sandawe and most including Hadza. There is one, Traill, who says it’s not real, but he is also a notorious Africanist splitter.

Campbell and Mixco:

Eurasiatic: Greenberg’s hypothesis of a distant genetic relationship that would group Indo-European, Uralic–Yukaghir, Altaic, Korean–Japanese–Ainu, Nivkh, Chukotian and Eskimo–Aleut as members of a very large ‘linguistic stock’. While there is considerable overlap in the putative members of Eurasiatic and Nostratic there are also significant differences. Eurasiatic has been sharply criticized and is largely rejected by specialists.

I have no doubt that Eurasiatic has been sharply criticized, but apart from a negative review in Language by Peter Daniels, the controversy seems quite muted compared to the furor over Amerind. I am also not sure that it is largely rejected by specialists. It probably is, but most of them have not even bothered to comment on it. I believe that this family is one of the best long-range proposals out there.

Based on the data from the pronouns alone, it’s obviously a real entity, though I would include Indo-European, Uralic-Yukaghir, Altaic including Japanese and Korean, Chukotian, and Eskimo-Aleut, leaving out Nivki for the time being and certainly leaving out Ainu. Nivki does seem to be a Eurasiatic language but it’s not a separate node. Instead it may be a part of the Chukotian family. Or even better yet, it seems to be part of a family connected to the New World via the Almosan family in the Americas.

I feel that Eurasiatic is a much more solid entity than Nostratic. Not that I am against Nostratic, but it’s more that Eurasiatic is a simple hypothesis to prove and with Nostratic, I’m much less sure of that. On the other hand, to the extent that Nostratic overlaps with Eurasiatic, it is surely correct.

Campbell and Mixco:

Indo-Anatolian: The hypothesis, associated with Edgar Sturtevant, that Hittite (or better said, the Anatolian languages, of which Hittite is the best known member) was the earliest Indo-European language to split off from the others. That is, this hypothesis would have Anatolian and Indo-European as sisters, two branches of a Proto-Indo-Hittite.

The more accepted view is that Anatolian is just one subgroup of Indo-European, albeit perhaps the first to have branched off, hence not ‘Indo-Hittite’ but just ‘Indo-European’ with Anatolian as one of its branches. In fact the two views differ very little in substance, since, in either case, Anatolian ends up being a subfamily distinct from the other branches and in the view of many the first to branch off the family.

The view that Anatolian is just another subgroup of IE is not the more accepted view. In fact, it has been rejected by specialists. Indo-Europeanists have told me that Indo-Anatolian is now the consensus among Indo-Europeanists, so Campbell and Mixco’s statement that Indo-Anatolian is a minority view is false.

Campbell and Mixco:

Nostratic (< Latin nostra ‘our’): A proposed distant genetic relationship that, as formulated in the 1960s by Illich-Svitych, would group Indo-European, Uralic, Altaic, Kartvelian, Dravidian and Hamito-Semitic (later Afroasiatic), though other versions of the hypothesis would include various other languages. Nostratic has a number of supporters, mostly associated with the Moscow school of Nostratic, though a majority of historical linguists do not accept the claims.

There are many problems with the evidence presented on behalf of the Nostratic hypothesis. In several instances the proposed reconstructions do not comply with typological expectations; numerous proposed cognates are lax in semantic associations, involve onomatopoeia, are forms too short to deny chance, include nursery forms and do not follow the sound correspondences formulated by supporters of Nostratic.

A large number of the putative cognate sets are considered problematic or doubtful even by its adherents. More than one-third of the sets are represented in only two of the putative Nostratic branches, though by its founder’s criteria, acceptable cases need to appear in at least three of the Nostratic language families. Numerous sets appear to involve borrowing. (See Campbell 1998, 1999.) It is for reasons of this sort that most historical linguists reject Nostratic.

It is probably correct that consensus among specialists is to reject Nostratic, but serious papers taking apart of the proposal seem to be lacking. Nevertheless, most dismiss it and it is beginning to enter into the emotionally charged terrain of Altaic and Amerind, particularly the former, and belief in it is becoming a thing of ridicule as it is for Altaic. Nevertheless, there have been a few excellent linguists doing work on this very long-range family for decades now.

Campbell and Mixco:

Indo-Uralic: The hypothesis that the Indo-European and Uralic language families are genetically related to one another. While there is some suggestive evidence for the hypothesis, it has not yet been possible to confirm the proposed relationship.

This summary seems too negative. Indo-Uralic is probably one of the most promising long-range proposals out there. I regard the relationship between the two as obvious, but to me it is only a smaller part of the larger Eurasiatic family. Frederick Kortland has done a lot of good work on this idea. Even some hardline splitters are open to this hypothesis.

Campbell and Mixco:

Altaic: While ‘Altaic’ is repeated in encyclopedias and handbooks most specialists in these languages no longer believe that the three traditional supposed Altaic groups, Turkic, Mongolian and Tungusic, are related. In spite of this, Altaic does have a few dedicated followers.

The most serious problems for the Altaic proposal are the extensive lexical borrowing across inner Asia and among the ‘Altaic’ languages, lack of significant numbers of convincing cognates, extensive areal diffusion and typologically commonplace traits presented as evidence of relationship.

The shared ‘Altaic’ traits typically cited include vowel harmony, relatively simple phoneme inventories, agglutination, their exclusively suffixing nature, (S)OV ([Subject]-Object-Verb) word order and the fact that their non-main clauses are mostly non-finite (participial) constructions.

These shared features are not only commonplace typological traits that occur with frequency in unrelated languages of the world and therefore could easily have developed independently, but they are also areal traits shared by a number of languages in surrounding regions the structural properties of which were not well-known when the hypothesis was first framed.

This one is still up in the air, but Campbell and Mixco are lying when they say that idea has been abandoned. Most US linguists regard it as a laughingstock, and if you say you believe in it you will experience intense bullying and taunting from them. Oddly enough, outside the US, in Europe in particular, Altaic is regarded as obviously true. However, notorious anti-Altaicist Alexander Vovin has camped out in Paris and is now spreading his nihilistic doctrine to Europeans there.

The problem is that almost all of the US linguists who will laugh in your face and call you an idiot if you believe in Altaic are not specialists in the language. However, I did a study of Altaic specialists, and 73% of them believe in some form of Altaic.

So the anti-Altaicists are pushing a massive lie – that critical consensus has completely abandoned Altaic and regards as a laughingstock, but their project is more Politics and Propaganda than Science. In particular, it’s a fad. So Altaic is in the preposterous position where almost all of the people who know nothing about it will laugh in your face and call you an idiot if you believe in it and the overwhelming majority of specialists will say it’s real.

Altaic must be the only nonexistent family that has an incredibly elaborate 1,000 page etymological dictionary, full reconstructions of the proto-languages, etymologies of over 2,000 Altaic terms, and elaborate sound correspondences running through it. The anti-Altaicists use the silly “we can’t reconstruct the numerals so it’s not real” line here.

Altaic is obviously true based on 1-2 person pronoun paradigms at an absolute minimum. The anti-Altaic argument of course, is preposterous. As noted, they dismiss a vast 1,000 page Etymological Dictionary with 2,300 reconstructed etymologies as a hallucinated work.

There are vast parallels in all three families at all levels, in particular in the Mongolic-Tungusic family, which gets a 100% with computer programs. The go-to argument here has always been that these changes are all due to borrowings, but for this to have occurred, borrowing would have had to occur between large far removed language families on such a vast scale the likes of which has never been seen anywhere on Earth.

The argument that entire 1-2 pronoun paradigms have been borrowed is particularly preposterous because 1-2 pronouns are almost never borrowed anyway, and there has never been a single case of on Earth of the borrowing of a 1-2 person pronoun paradigm, much less the borrowing of one at the proto-language level. So the anti-Altaicists are arguing that something that has never happened anywhere on Earth not only happened, but happened more than once among different proto-languages. So the anti-Altaic argument is that something that could not possibly have happened actually occurred.

This is the conclusion of every paper the splitters write. Something that has never occurred on Earth and probably could not possibly happen not only occurred, but occurred many times around the globe for thousands of years.

Many regard including Japonic and Koreanic in Altaic as dubious, although having looked over the data, I am certain that they are part of Altaic. But they seem to be further away from the traditional tripartite system than the traditional three families are to each other. If we follow the theory that Japanese and Korean have been split from Proto-Altaic for 8,000 years, this starts to make a lot more sense.

The ridiculous massive borrowings argument specifically fails for geographical reasons. Proto-Turkic was never next door to Proto-Mongolic and Proto-Tungusic. The Proto-Altaic homeland is in the Khingan Mountains in Western Manchuria and Eastern Mongolia. Tungusic split off from Altaic 5,300 years ago, leaving Proto-Turkic-Mongolic in Khingans. 3,400 years ago, Proto-Turkic broke from Proto-Turkic-Mongolic and headed west to Northern Kazakhstan and the southern part of the Western Siberian Plain, leaving Mongolic alone in the Khingans.

Proto-Transeurasian – Khingans 9,000 YBP

Proto-Korean – Liaojiang on the north shore of the Bohai Sea 8,000 YBP.

Proto-Japanese – Northern coast of the Shandong Peninsula on the southern shore of the Bohai Sea 8,000 YBP

Proto-Tungusic – Amur Peninsula 5,300 BP. Breaks apart 2,000 YBP.

Proto-Turkic – Northern Kazakhstan 3,400 BP.

Proto-Mongolic – Khingans 3,400 BP.

Can someone explain to me how Mongolic and Tungusic borrow from Turkic 3,000 miles away in a different place at a different time in this scenario? Can someone explain to me how any of these proto-languages borrowed from each other at all, especially as they were in different places at different times?

Not only that but supposedly both Proto-Mongolic and Proto-Tungusic each borrowed from Proto-Turkic separately. These borrowings included massive amounts of core vocabulary in addition to an entire 1st and 2nd person pronoun paradigm.

Keep in mind that the borrowing of this paradigm, something that has never happened anywhere, supposedly occurred not just once but twice, between Proto-Tungusic 5,300 YBP on the Amur from Proto-Turkic in North Kazakhstan 3,000 miles away 2,000 later, and at the same time, between  Proto-Mongolic in the Khingans and Proto-Turkic in Northern Kazakhstan 3,000 miles away. How exactly did this occur?

And can someone explain to me how Proto-Korean and Proto-Japanese borrow from either of the others under this scenario?

Campbell and Mixco:

Turkic: A family of about thirty languages, spoken across central Asia from China to Lithuania. The family has two branches: Chuvash (of the Volga region) and the non-Chuvash Turkic branch of relatively closely related languages. Some of the Turkic languages are Azeri, Kyrgyz, Tatar, Crimean Tatar, Uighur, Uzbek, Yakut, Tuvan, and Tofa. Turkic is often assigned to the ‘Altaic’ hypothesis, though specialists have largely abandoned Altaic.

As noted above, it is simply incorrect that specialists have largely abandoned Altaic. This is simply carefully crafted propaganda on the part of Campbell and Mixco. In fact, my own study showed that 73% of experts in these families felt that Altaic existed at least in some form, if only in a relationship with two out of the three-five languages.

Campbell and Mixco:

Some scholars classify Korean in a single family with Japanese; however, this is a controversial hypothesis. Korean is often said to belong with the Altaic hypothesis, often also with Japanese, though this is not widely supported.

Japonic-Koreanic has considerable support among specialists in these languages, although it is not universally accepted. Campbell and Mixco are excessively negative about the level of support for an expanded Altaic. In fact, an expanded Altaic which includes Japanese and Korean in some part of it has significant though probably not majority support. Perhaps 30-40% of specialists support it.

Shandong Peninsula with Tianjin and Liaojiang across the Bohai Sea, location of the Proto-Japonic and Proto-Korean homelands.

Proto-Japanic and Proto-Koreanic were both spoken in Northeastern China 8,000 YBP. Proto-Japonic was spoke on the north of the Shandong Peninsula and Proto-Koreanic was spoken across the Bohai Sea in Tianjin and especially across the Bohai Straights on the Liaodong Peninsula. They may have stayed here next to each other for 3,000 years until the Proto-Koreanics moved to the Korean Peninsula 5,000 YBP, displacing the Ainuid types there. Proto-Japonics probably stayed in Shandong until 2,3000 YBP when they left to populate Japan and the Ryukus, displacing the Ainu who were already there.

Campbell and Mixco:

Yeniseian, Yenisseian: Small language family of southern Siberia of which Ket (Khet) is the only surviving member. Yeniseian has no known broader relatives, though some have been hypothesized (see the Dené-Caucasian hypothesis).

Campbell and Mixco state and serious untruth here, including some weasel words. By discussing Dene-Caucasian in the same breath as relatives of Yenisien, they are able to deflect away from the more widely accepted proposal of a link between Yenisien in the Old World and Na-Dene in the New World. This is Edward Vajda’s Dene-Yenisien proposal.

The problem is that this long-range proposal has the support of many people, including splitter Johanna Nichols. Of the 17 experts who weighed in on Dene-Yenisien, 15 of them had a positive view of the hypothesis. Campbell and Mixco are the only two who are negative, but neither are experts on either family. All specialists in either or both families support the proposal. When 15 out of 17 is not enough, one wonders at what point the field reaches a consensus. Must we hold out for Campbell and Mixco’s approval for everything?

Campbell and Mixco:

Nivkh (also called Gilyak): A language isolate spoken in the northern part of Sakhalin Island and along the Amur River of Manchuria, in China. There have been various unsuccessful attempts to link Nivkh genetically with various other language groupings, including Eurasiatic and Nostratic.

Granted, there is no consensus on the affiliation of Nivkhi. However, a recent paper by Sergei Nikolaev proved to me that Nivkhi is related to Algonquian-Wakashan, a family of languages in the Americas. One of these languages is Wakashan, and there has been talk of links between Wakashan and the Old World for some time.

Michael Fortescue places Nivkhi in Chukotko-Kamchatkan. Greenberg places it is Eurasiatic as a separate node. But as Chukotko-Kamchatkan is part of Eurasiatic, they are both saying the same thing in a way. My theory is that Nivkhi is Eurasiatic, possibly related to Chukoto-Kamchatkan, and like Yeniseian, is also connected to languages in North America as some of the Nivkhi probably migrated to North America and became the American Indians. In this way, we can reconcile both hypotheses.

There are three specialist views on Nivkhi. One says it is Eurasiatic, the other that it is Chukotian, and the third that it is part of the Algonquian-Wakashan or Almosan family in the New World. Consensus is that Nivkhi is related to one of two other entities – other languages in Northeastern Asia or a New World Amerindian family. So expert consensus seems to have moved away from the view of Nivkhi as an isolate.

Campbell and Mixco:

Paleosiberian languages (also sometimes called Paleoasiatic, Hyperborean languages): A geographical (not genetic) designation for several otherwise unaffiliated languages (isolates) and small language families of Siberia.

Perhaps the main thing that unites these languages is that they are not Turkic, Russian or Tungusic, the better known languages of Siberia. Languages often listed as Paleosiberian are: Chukchi, Koryak, Kamchadal (Itelmen), Yukaghir, Yeniseian (Ket) and Nivkh (Gilyak). These have no known genetic relationship to one other.

Taken as a broad statement, of course this is true. However, Chukchi, Koryak, and Kamchadal or Itelmen are part of a family called Chukutko-Kamchatkan. This family has even been reconstructed. Campbell and Mixco’s statement that these languages have no known genetic relationship with each other is false.

Campbell and Mixco:

Austroasiatic: A proposed genetic relationship between Mon-Khmer and Munda, accepted as valid by many scholars but not by all.

The fact is that Austroasiatic is not a “proposed genetic relationship.” Instead it is now accepted by consensus. That there may be a few outliers who don’t believe in it is not important. I’m not aware of any linguists who doubt Austroasiatic other than Campbell and Mixco, and neither is a specialist. Austroasiatic-Hmong-Mien is the best long-range proposal for Austroasiatic, but it has probably not yet been proven. Austroasiatic is also part of the expanded version of the Austric hypothesis.

Campbell and Mixco:

Miao-Yao (also called Hmong-Mien): A language family spoken by the Miao and Yao peoples of southern China and Southeast Asia. Some proposals would classify Miao-Yao with Sino-Tibetan, others with Tai or Austronesian; none of these has much support.

This seems to be more weasel wording on the part of the authors. By listing Tai or Austronesian and Sino-Tibetan as possible relatives of Miao-Yao and then correctly dismissing it, they leave out a much better proposal linking Hmong-Mien to Austroasiatic.

This shows some promise, but the relationship is hard to see amidst all of the Chinese borrowing. As noted, the relationship between Hmong-Mien and Sino-Tibetan is one of borrowing. The relationship with Tai or Austronesian is part of Paul Benedict’s original Austric proposal. He later turned against this proposal and supported a more watered down Austric with Austronesian and Tai-Kadai, which seems to be nearing consensus support now.

Campbell and Mixco:

Austric: A mostly discounted hypothesis of distant genetic relationship proposed by Paul Benedict that would group together the Austronesian, Tai-Kadai and Miao-Yao.

More weasel wording. It is correct that Benedict’s original Austric (which also included Austroasiatic) was abandoned even by Benedict himself, a more watered down Austric that he later supported consisting of Austronesian and Tai-Kadai called Austro-Tai has much more support. They get around discussing the watered down Austro-Tai with good support by limiting Austric to Benedict’s own theory which even he rejected later in life. In this sense, they misrepresent the debate, probably deliberately.

In fact, evidence is building towards acceptance of Austro-Tai after papers by Weera Ostapirat and Laurence Sagart seem to have proved the case using the comparative method. Roger Blench also supports the concept. In addition, to Benedict, it is also supported by  Lawrence Reid, Hui Li, and Lawrence Reid. It is opposed by Graham Thurgood, who is a specialist (he was my main academic advisor on my Master’s Degree in Linguistics). It is also opposed by Campbell and Mixco, but they are not specialists. Looking at expert opinion, we have seven arguing for the theory and one arguing against it. Specialist consensus then is that Austro-Tai is a real language family.

Even the larger version of Austric, including all of Benedict’s families plus Ainu and the South Indian isolate Nihali, has some supporters and some suggestive evidence that it may be correct.

Campbell and Mixco:

Tai-Kadai: A large language family, generally but not
universally accepted, of languages located in Southeast Asia and southern China. The family includes Tai, Kam-Sui, Kadai and various other languages. The genetic relatedness of several proposed Tai-Kadai languages is not yet settled.

Tai-Kadai is not “mostly but not universally accepted.” It is accepted by consensus as an existent language family. Perhaps whether some languages belong there is in doubt but the proposal itself is not controversial. Campbell and Mixco’s statement that Tai-Kadai remains controversial is a serious distortion of fact.

Campbell and Mixco:

Na-Dene: A disputed proposal of distant genetic relationship, put forward by Sapir, that would group Haida, Tlingit and Eyak-Athabaskan. There is considerable disagreement about whether Haida is related to the others. The relationship between Tlingit and Eyak-Athabaskan seems more likely, and some scholars misleadingly use the name ‘Na-Dené’ to mean a grouping of these two without Haida.

Levine and Michael Krauss, two top Na-Dene experts, are on record as opposing the addition of Haida to Na-Dene for 40 years. A recent conference about Edward Vajda’s Dene-Yenisien concluded that there was no evidence to include Haida in Na-Dene. However, a recent paper by Alexander Manaster-Ramer made the case that Haida is part of Na-Dene. This paper was enough to convince me. Further, the scholar with the most expertise on Haida has said that Haida is part of Na-Dene. So Campbell and Mixco are correct here that the subject is up in the air with both supporters and opponents.

The statement that a relationship between Tlingit and Eyak-Athabaskan seems “more than likely” is an understatement. I believe it is now linguistic consensus that Tlingit is part of Na-Dene, so Campbell and Mixco’s statement is not quite true.

Campbell and Mixco:

Tonkawa: An extinct language isolate of Texas. Proposals to link Tonkawa with the languages of the Coahuiltecan or Hokan-Coahuiltecan hypotheses have not generally been accepted.

I’m sure it is the case that Coahuiltecan and Hokan-Coahuiltecan affiliations of Tonkawa have been rejected. A Coahuiltecan connection was even denied by Manaster-Ramer, who recently proved that the family existed. That said, there are interesting  parallels between Tonkawa and Coahuiltecan that I cannot explain. However, a recent paper by Manaster-Ramer made the much better case that Tonkawa was in fact Na-Dene.

Campbell and Mixco:

Amerind: The Amerind hypothesis is rejected by nearly all practicing American Indianists and by most historical linguists. Specialists maintain that valid methods do not at present permit classification of Native American languages into fewer than about 180 independent language families and isolates. Amerind has been highly criticized on various grounds.There is an excessive number of errors in Greenberg’s data.

Where Greenberg stops – after assembling superficial similarities and declaring them due to common ancestry – is where other linguists begin. Since such similarities can be due to chance similarity, borrowing, onomatopoeia, sound symbolism, nursery words (the mama, papa, nana, dada, caca sort), misanalysis, and much more, for a plausible proposal of remote linguistic relationship one must attempt to eliminate all other possible explanations, leaving a shared common ancestor as the most likely.

Greenberg made no attempt to eliminate these other explanations, and the similarities he amassed appear to be due mostly to accident and a combination of these other factors.

In various instances, Greenberg compared arbitrary segments of words, equated words with very different meanings (for example, ‘excrement/night/grass’), misidentified many languages, failed to analyze the morphology of some words and falsely analyzed that of others, neglected regular sound correspondences, failed to eliminate loanwords and misinterpreted well-established findings.

The Amerind ‘etymologies’ proposed are often limited to a very few languages of the many involved. Finnish, Japanese, Basque and other randomly chosen languages fit Greenberg’s Amerind data as well as or better than do any of the American Indian languages in his ‘etymologies’; Greenberg’s method has proven incapable of distinguishing implausible relationships from Amerind generally. In short, it is with good reason Amerind has been rejected.

The movement into the Americas came in three waves.

The first wave brought the Amerinds. It is here where the 160 language families reside. According to the reigning theory in Linguistics, this group of Amerindians came in one wave that spoke not only 160 different languages but spoke languages that came from 160 different language families, none of which were related to each other. These being language families which, by the way, we can find scarcely a trace of in the Old World.

The second wave was the Na-Dene people who came along the west coast and then went inland.

The last wave were the Inuits.

Greenberg simply lumped all of the 600 languages of the  Americas into a single family. The argument was good, though I’m not sure he proved that every single one of those languages were all part of Amerind. But a lot of them were. The n- m- 1st and 2nd person pronouns are found in 450 of those languages. The ablauted t’ana, t’una, t’ina word, meaning respectively human child  of either sex, all females including family terms, and all males including family terms are extremely common in Amerind.

So t’ana just means child. T’una means girl, woman, and includes various names for all sorts of female relatives – grandmother, cousin, aunt, niece, etc. T’ina means boy, man, and includes the family terms grandfather, brother-in-law, uncle, cousin, and  nephew. This ablauted paradigm is found across a vast number of these Amerind languages, and it is nonexistent in the rest of the world.

Quite probably most to all of those languages having that term are part of a single family. What are the other arguments? That 300 languages independently innovated these terms, in this precise ablauted paradigm, on their own? What is the likelihood of that?

That these items occurring across such vast swathes of languages is due to chance? But this paradigm does not exist anywhere else, so how could it be due to chance? That these core vocabulary items were borrowed massively all across the Americas, when family terms like that are rarely borrowed? That’s not possible. None of the alternate theories make the slightest bit of sense.

Hence, the Amerind languages that have the n- m- pronoun paradigm and the t’ana, t’una, t’ina ablauted names for the sexes and the terms of family relations by sex are quite probably part of a huge language family. I’m well aware that a few of the languages having those terms could be due to chance. I’m pretty sure that about zero of those pronouns and few, if any, of those family terms were borrowed.

However, not all Amerind languages have either the pronoun paradigm or the ablauted sex term. In those cases, I’m unsure if those languages are all part of the same language. But if you can put those languages in families and reconstruct to the proto-languages and end up with the pronoun paradigm or the ablauted family term reconstructed in the proto-language of that family, I’m sure that family would be part of Amerind. That’s about all you have to do to prove relationship in Amerind.

Campbell and Mixco:

Penutian: A very large proposed distant genetic relationship in western North America, suggested originally by Dixon and Kroeber for the Californian language families Wintuan, Maiduan, Yokutsan, and Miwok-Costanoan. The name is based on words for ‘two’, something like pen in Wintuan, Maiduan, and Yokutsan, and uti in Miwok-Costanoan, joined to form Penutian.

Sapir, impressed with the hypothesis, attempted to add an Oregon Penutian (Takelma, Coos, Siuslaw, and ‘Yakonan’), Chinook, Tsimshian, a Plateau Penutian (Sahaptian, ‘Molala-Cayuse,’ and Klamath-Modoc) and a Mexican Penutian (Mixe-Zoquean and Huave).

The Penutian grouping has been influential, and later proposals have attempted to unite various languages from Alaska to Bolivia with it. Nevertheless, it had a shaky foundation based on extremely limited evidence, and, in spite of extensive later research, it did not prove possible to demonstrate any version of the Penutian hypothesis and several prominent Penutian specialists abandoned it. Today it remains controversial and unconfirmed, with some supporters but with many who doubt it.

The statement that today it “remains controversial and unconfirmed, with some supporters but with many who doubt it,”  has no basis in fact. It is surely controversial and it is probably unconfirmed by linguistic consensus. Yes, it has a number of supporters, and there are quite a few who doubt it. However, among those who doubt it, none of them are specialists in these languages. Hence, we are dealing with an Altaic situation here, where the specialists believe in it but the non-specialists insist it’s nonsense.

In fact, the consensus among the specialists on these languages is that Penutian exists. A Penutian family comprising Maiduan, Utian (Miwok-Costanoan), Wintuan, Yokutsan, Coosan, Siuslaw, Takelma, and Kalapuyan and Alsean (Yakonan), Chinookan, Tsimshianic, Klamath-Modoc (Lutuami), Cayuse and Molala (Waiilatpuan), Sahaptian has been proven to my satisfaction. I am uncertain of the Penutian status of Mixe-Zoque and Huave (Mexican Penutian), although I believe that Huave and Mixe-Zoque are related to each other, albeit at a very deep time depth of 9,000 years.

Anti-Penutianists have not published a paper in a long time. The last one I remembered was published by William Shipley, and he’s been gone for a while. I am not aware of one expert on these languages who says Penutian does not exist.

Campbell and Mixco:

Cayuse-Molala: A genetic classification no longer believed that linked Cayuse (of Oregon and Washington) and Molala (of Oregon) in a single assumed family. The evidence for this was later shown to be wrong and the hypothesis was abandoned.

According to Campbell and Mixco, Cayuse is an isolate. I assume they see Molala as an isolate too. There probably is no Cayuse-Molala family, but Molala is part of Plateau Penutian, and Cayuse may be part of the same group. Plateau Penutian is part of the Penutian hypothesis, which appears to be true. By not mentioning these facts, Campbell and Mixco’s statement is quite misleading.

Campbell and Mixco:

Mosan: A now abandoned proposal of distant genetic relationship that would group Salishan, Wakashan and Chimakuan together.

Another part of this proposal was that Mosan was part of a larger family with Algonquian called Almosan. An excellent series of papers was published recently by Sergei Nikolaev that validated Almosan and proved to me that it was related to Nivkhi in the Old World.

Michael Fortescue argued a few years before that Mosan was a valid entity and that was related to the Old World language Nivkhi. Recently, Murray Gell-Mann, Ilia Peiros, and Georgiy Starostin also supported Almosan and grouped it with Chukotko-Kamchatkan and Nivkhi. David Beck recently argued that Mosan is a language area or Sprachbund instead of a genetic family.

So far we have four specialists arguing that Mosan exists, and one saying it does not. The consensus among specialists seems to be that Mosan is a valid language family. At any rate, Campbell and Mixco’s statement that this proposal is “now abandoned” is false.

For Almosan, we have four specialists saying it exists and two apparently saying it does not. Expert consensus on Almosan is optimistic.

Hokan: A controversial hypothesis of distant genetic relationship proposed by Dixon and Kroeber among certain languages of California; the original list included Shastan, Chimariko, Pomoan, Karok, and Yana, to which they soon added Esselen, Yuman, and later Chumashan, Salinan, Seri, and Tequistlatecan. Later scholars, especially Edward Sapir, proposed various additions to Hokan. Many ‘Hokan’ specialists doubt the validity of the hypothesis.

It is not true that many Hokan specialists “doubt the validity of the hypothesis.” I can’t remember the last time I saw an anti-Hokan paper. Yes, Campbell, Mixco, and Mithun say Hokan does not exist, but they are not specialists. The consensus among specialists such as Mikhail Zhikov, Terence Kaufman, and Marcelo Jokelsy is that Hokan exists. I have only found one specialist who disagrees with the Hokan hypothesis, and she merely doubts the existence of Ch’imáriko.

I believe that a Hokan family consisting of Karuk, Shasta-Palaihnihan, Ch’imáriko, Yana, Salinan, Pomoan, Yuman, Seri, and Tequistlatecan exists, although I would leave out Chumashan, Washo, and Jicaquean or Tolan. Chumashan is an isolate, and while Washo and Tolan may be Hokan at a very deep time depth, the few possible cognates are not enough to provide evidence of this. I am agnostic on Esselen, which is only known from a 350 word list collected by friars at a California mission.

I have not seen any evidence that Coahuiltecan is Hokan. There is some evidence, though not probative enough for me, that Lencan and Misumalpan may be Hokan. Nevertheless, Lencan and Misumalpan form a language family that has even been accepted by Campbell himself. This is the only long-range family proposal he has supported since the publication of LIA.

Although Campbell’s opinion on many hypotheses may be waved away as he is not an expert on that family or language, Lencan and Misumalpan are right up his alley as he is an expert in languages in Central America. He has focused mostly on Mayan, but he also knows the other languages of the region well.

Campbell and Mixco:

Cochimí–Yuman: A family of languages from Arizona, California and Baja California, with two branches, extinct Cochimí (of Baja California) and the Yuman subfamily (members of which are Kiliwa, Diegueño, Cocopa, Mojave, Maricopa, Paipai, and Walapai–Havasupai–Yavapai, among others). Cochimí–Yuman is often associated with the controversial Hokan hypothesis, though evidence is insufficient to embrace the proposed relationship.

The consensus among experts in the Cochimí–Yuman family, including Mikhail Zhikov and Terence Kaufman, is that it is part of the Hokan family. Campbell disbelieves in the association but he is not an expert. However, Mixco opposes the Hokan affinity of Cochimi-Yuman, and granted, he is actually a specialist on these languages. So among specialists, we have two who support the Hokan association and one who opposes it. The specialist consensus then would be that they are this association is a promising hypothesis, but it is not yet proven. This is different from Campbell and Mixco’s wording, which is more negative.

Campbell and Mixco:

Coahuiltecan: A hypothesis of distant genetic relationship that proposed to group some languages of south Texas and northern Mexico: Coahuilteco, Comecrudo and Cotoname, and sometimes also Tonkawa, Karankawa, Atakapa and Maratino (with Aranama and Solano assumed to be varieties of Coahuilteco).

Sapir proposed a broader classification of Hokan–Coahuiltecan, joining the Coahuiltecan proposal with the broader Hokan hypothesis, and placed this in his even larger Hokan–Siouan super-stock. None of these proposals has proven sufficiently robust to be accepted generally.

I am not aware of any specialists who have recently argued against the existence of Coahuiltecan. Yes, Campbell and Mixco do not accept it, but they are not specialists. A recent paper by Alexander Manaster-Ramer proved the existence of Coahuiltecan to my satisfaction. I believe that a Coahuiltecan family consisting of Comecrudo, Cotoname, Aranama, Solano, Mamulique, Garza, and Coahuilteco absolutely exists. Karankawa is probably a part of this family. I am not aware that any specialist is arguing against the existence of this family at the moment.

I do not think there is good evidence for other postulated languages such as Atakapa and Tonkowa. First of all, Tonkawa is probably Na-Dene as per another paper by Manaster-Ramer. Atakapa is part of the Gulf family. However, I am not yet convinced that Coahuiltecan is as member of the Hokan language family.

Campbell and Mixco:

Gulf: Hypothesis of a distant genetic relationship proposed by Mary R. Haas that would group Muskogean, Natchez, Tunica, Atakapa and Chitimacha, no longer supported by most linguists.

The notion that Gulf is no longer supported by most linguists is simply incorrect. There have only been four linguists who studied this family.

The first was Mary Haas, who also proposed a relationship with Yuki as Yuki-Gulf. Haas was always dubious about Chitimacha’s addition to Gulf.

Greenberg resurrected Yuki-Gulf in LIA.

Pam Munro is an expert on these languages. A while back she published a paper on Yuki-Gulf. I read that paper. The resemblances are so stunning between Muskogean, Natchez, Tunica, Atakapa and Chitimacha that I was shocked that anyone doubted the relationship. Furthermore, the relationship with Yuki and Wappo, a full 2,500 miles away in Northern California, was shocking.

The fourth was Geoffrey Kimball, who concluded that Gulf was probably a family but that this could not be proven.

There evidence for Gulf in Munro’s paper was good, and there even appeared to be sound correspondences running through the relationship. What was shocking about it was that Yuki and Wappo could not possibly have borrowed from Gulf because Gulf is in Louisiana 2,500 miles away. So how did all these resemblances come in? Chance is ruled out. Borrowing could not have happened. Therefore a relationship at least between Yuki and the Gulf languages is obvious.

Munro’s paper took the position that Greenberg’s Yuki-Gulf hypothesis was correct. However, there are some problems. First, Atakapa as part of Gulf has been controversial, in part because it has also been tied in with Coahuiltecan. Indeed there are resemblances between the two, and they were not spoken next to each other so borrowing can be ruled out.

Perhaps a way of solving the matter is to posit not only Yuki-Gulf but a larger family that includes Coahuiltecan as Greenberg does in LIA. I have no idea how justified this is, but there are certainly surprising resemblances between Atakapa and the Coahuiltecan languages.

Furthermore, whether or not Chitimacha is part of Gulf has been up in the air from the beginning when Haas published her paper. Recent papers have made the case that Chitimacha is related to Mesoamerican language families of Mexico such as Mixe-Zoque and Totonacan. These papers used the comparative method. Campbell has rejected this hypothesis.

That Tunica at the very least shows a close relationship with Muskogean is not even controversial. The idea has a long pedigree and is presently supported by all experts in this family.

Geoffrey Kimball examined the data recently and concluded that from the evidence, it appears that Gulf exists, but we will never be able to prove it, as he puts it. However, he stated that Tunica is almost certainly related to Muskogean. At this point, I would think that Tunica-Muskogean at the very least should be considered consensus among specialists.

Kimball’s paper had a number of problems, mostly that he was operating with a negative stance towards the existence of the family. Further, there were issues with his notions of sound symbolism and borrowing in the paper where his explanations made no sense at all.

Let’s evaluate Campbell and Mixco’s statement that Gulf is no longer supported by most linguists.

We have four specialists on record about whether or not a Gulf family exists.

Mary Haas: Positive, minus Chitimacha

Joseph Greenberg: Positive

Pamela Munro: Positive

Geoffrey Kimball: Probably exists but it’s not possible to prove it.

Brown et al: Chitimacha is a part of the Totonozoquean family, not the Gulf family. The other members of Gulf are not members of this family.

Three out of the four specialists on the Gulf family say that the Gulf family is a reality. The other feels it exists but cannot be proven. And there is uncertainty about whether Chitimacha is probably not part of Gulf. The consensus among experts is that Gulf is a real language family.

Campbell and Mixco’s statement that Gulf is no longer supported by most linguists is simply false.

Furthermore, I would like to point out that a good case can be made for the existence of a Totonozoquean family consisting of the Mixe-Zoque and Totonacan languages. Whether this is consensus among experts is somewhat up in the air.

Campbell and Mixco:

Macro-Gê: A proposed distant genetic relationship composed of several language families and isolates, many now extinct, along the Atlantic coast (primarily of Brazil). These include Chiquitano, Bororoan, Botocudoan, Rikbaktsa, the Gê family proper, Jeikó, Kamakanan, Maxakalían, Purian, Fulnío, Ofayé and Guató. Many are sympathetic to the hypothesis and several of these languages will very probably be demonstrated to be related to one another eventually, though others will probably need to be separated out.

This is much too pessimistic. Macro-Gê is not a proposed long range family -it is a large language family in South America accepted by consensus. It is not true that many are sympathetic to it; instead, the consensus is that it is correct. Nor is it correct to say that it will probably be demonstrated eventually. In fact, it is already an accepted reality.

Campbell and Mixco:

Quechumaran: Proposed distant genetic relationship that would join Quechuan and Aymaran. While considerable evidence has been gathered in support of the hypothesis, it is extremely difficult in this case to distinguish what may be inherited (and therefore evidence of a genetic relationship) from what may be diffused (and therefore not reliable evidence of a genetic connection).

It is true that there is no consensus on the existence of Quechumaran. The consensus seems to be as above that it is not yet proven. Those opposed to the idea throw out the usual borrowing scenario, but they have had to push the large number of borrowings in core vocabulary all the way back to Proto-Aymara and Proto-Quechua. In my opinion, “massive borrowing of core vocabulary at the proto-language level” is simply another word for genetics.

Gerald Clauson, the famous Turkologist opponent of Altaic, had to keep pushing his massive borrowings of core vocabulary further and further back until he eventually had the scenario taking place at the Proto-Turkic, Proto-Tungusic, and Proto-Mongolic levels. See above for my analysis on why these three proto-languages could not possibly have borrowed from each other as they were in different places in different times.

A similar problem exists with opponents of the Uralo-Yukaghir theory, in which they are also forced to deal with a large amount of core vocabulary dating back a long time. Hakkinen tried to solve this problem by pushing the borrowing all the way back to not just Proto-Uralic but Pre-Proto-Uralic. Pre-Proto-Uralic at 8,000 years to me means nothing less than Uralo-Yukaghir. What else could it mean? He has heavy borrowing of core vocabulary between Pre-Proto-Uralic and Proto-Yukaghir. That’s another way of saying genetics.

Campbell and Mixco:

Macro-Guaicuruan (also spelled Macro-Waykuruan, Macro-Waikuruan): A proposed distant genetic relationship that would join the Guaicuruan and Matacoan families of the Gran Chaco in South America in a larger-scale genetic classification. Grammatical similarities, for example in the pronominal systems, have suggested the relationship to some scholars, but the extremely limited lexical evidence raises doubts for others. Some would also add Charruan and Mascoyan to these in an even larger ‘Macro-Waikuruan cluster.’

It is not true that this is a proposed long-range family suggested by some by doubted by others. In fact, Macro-Guaicuruan is accepted by consensus and is as uncontroversial as Macro-Gê, Pama-Nyungan, and other such families. There is however debate about which families are members outside of the Guaicuruan and Mataguayo language families that make up the essence of the family. There have been suggestions to add Lule-Vilela and the Zamucoan, Charruan, and Mascoyan families to this family. I do not feel that these additions are yet warranted.

Campbell and Mixco:

Pama-Nyungan: A very large, widely spread language family of Australia, some 175 languages. The name comes from Kenneth Hale, based on the words pama ‘man’ in the far northeast and nyunga ‘man’ in the southwest. Languages assigned to Pama-Nyungan extend over four-fifths of Australia, most of the continent except northern areas.

Pama-Nyungan is accepted by most Australianists as a legitimate language family, but not uncritically and not universally. It is rejected by Dixon; it is held by others to be plausible but inconclusive based on current evidence. Some Pama-Nyungan languages are Lardil, Kayardilt, Yukulta, Yidiny, Dyirbal, Pitta-Pitta, Arrente, Warlpiri, Western Desert language(s), and there are many more.

Actually, consensus now is that this family of Australian languages does indeed exist. True, Dixon challenged the existence of Pama-Nyungan recently, but his opposition was so outrageous and it prompted a quick surge of papers from Australianists defending the existence of Pama-Nyungan. The notion that other Australianists feel that Pama-Nyungan is possible but presently inconclusive is not correct. I am not aware of a single Australianist other than Dixon who feels this way. Instead, Pama-Nyungan is about as uncontroversial as Macro-Gê, Afroasiatic, or Austroasiatic.

Campbell and Mixco:

‘Papuan’ languages: A term of convenience used to refer to the languages of the western Pacific, most in New Guinea (Papua New Guinea and the Indonesian provinces of Papua and West Irian Jaya), that are neither Austronesian nor Australian. Papuan definitely does not refer to a genetic relationship among these languages for no such relationship can at present be shown.

That is, the term is defined negatively and does not imply a linguistic relationship. While most are spoken on the island of New Guinea, some are found in the Bismark Archipelago, Bougainville Island and the Solomon Islands to the east, and in Halmahera, Timor and the Alor Archipelago to the west.

There are some 800 Papuan languages divided in the a large number of mostly small language families and isolates not demonstrably related to one another.

For what it’s worth, this statement by Campbell and Mixco is correct.

Campbell and Mixco:

One large genetic grouping that has been posited for a number of Papuan languages is the Trans-New Guinea phylum, which is promising but not yet confirmed.

Trans-New Guinea is not “promising but not yet confirmed.” Instead it is an uncontroversial language family accepted by the consensus of all specialists.

References

Beck, David (1997). Mosan III: A Problem of Remote Common Proximity. International Conference on Salish (and Neighbo(u)ring) Languages.
Benedict, Paul K. (1942). “Thai, Kadai, and Indonesian: A New Alignment in Southeastern Asia.” American Anthropologist 44, 4: 576–601.
Benedict, Paul K. (1975). Austro-Thai Language and Culture, with a Glossary of Roots. New Haven: HRAF Press.
Blench, Roger (2008). The Prehistory of the Daic (Tai-Kadai) Speaking Peoples. Presented at the 12th EURASEAA Meeting in Leiden, the Netherlands, 1-5 September 2008.
Blench, Roger (2018). Tai-Kadai and Austronesian Are Related at Multiple Levels and Their Archaeological Interpretation (draft).
Blust, Robert (2014). “The Higher Phylogeny of Austronesian and the Position of Tai-Kadai: Another Look,” in The 14th International Symposium on Chinese Languages and Linguistics (IsCLL-14).
Campbell, Lyle and Marianne Mithun (Eds.) (1979). The Languages of Native America: An Historical and Comparative Assessment.
Campbell, Lyle and Mauricio J. Mixco (2007). A Glossary of Historical Linguistics. Edinburgh University Press.
Campbell, Lyle and William J. Poser (2008). Language Classification: History and Method. Cambridge: Cambridge University Press
Fortescue, M. (1998). Language Relations across Bering Strait: Reappraising the Archaeological and Linguistic Evidence. (Nivkhi is Mosan.)
Fortescue, Michael (2011). “The Relationship of Nivkh to Chukotko-Kamchatkan Revisited.” Lingua 121, 8: 1359-1376. (Nivkhi is Chukoto-Kamchatkan.)
Gell-Mann, Murray; Ilia Peiros, and George Starostin (2009). “Distant Language Relationship: The Current Perspective.” Journal of Language Relationship.
Greenberg, Joseph H. (2000). Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Volume 1, Grammar. Stanford: Stanford University Press.
Greenberg, Joseph H. (2002). Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Volume 2, Lexicon. Stanford: Stanford University Press.
Heine, Bernd (1992). African Languages. International Encyclopedia of Linguistics, ed. by William Bright, Vol. 1, pp. 31-36. Oxford: Oxford University Press. (No such thing as Nilo-Saharan.)
Krauss, Michael E. (1979). Na-Dene and Eskimo-Aleut. The Languages of Native America: Historical and comparative assessment, ed. by Lyle Campbell and Marianne Mithun, pp. 803-901. Austin: University of Texas Press. (Haida not part of Na-Dene.)
Levine, Robert D. (1979). Haida and Na-Dene: A New Look at the evidence. IJAL 45: 157-70. (Haida not part of Na-Dene.)
Li, Hui (李辉) (2005). Genetic Structure of Austro-Tai Populations (Doctoral Dissertation). Fudan University.
Mixco, Mauricio J. (1976). “Kiliwa Texts.” International Journal of American Linguistics Native American Text Series 1: 92-101
Mixco, Mauricio J. (1977). “The Linguistic Affiliation of the Ñakipa and Yakakwal of Lower California”. International Journal of American Linguistics 43: 189-200.
Nicola¨i, Robert (1990). Parent´es Linguistiques (`A Propos du Songhay). Paris: CNRS. (Dimmendaal says Songhay is Nilo-Saharan.)
Nikolaev, S. (2015). Toward the Reconstruction of Proto-Algonquian-Wakashan. Part 1: Proof of the Algonquian-Wakashan Relationship.
Nikolaev, S. (2016). Toward the Reconstruction of Proto-Algonquian-Wakashan. Part 2: Algonquian-Wakashan Sound Correspondences.
Ostapirat, Weera (2005). “Kra-Dai and Austronesian: Notes on Phonological Correspondences and Vocabulary Distribution,”  in Laurent Sagart, Roger Blench and Alicia Sanchez-Mazas, eds. The Peopling of East Asia: Putting Together Archaeology, Linguistics, and Genetics, pp. 107-131. London: Routledge Curzon.
Ostapirat, Weera (2013). Austro-Tai Revisited. Paper Presented at the 23rd Annual Meeting of the Southeast Asian Linguistics Society, 29-31 May 2013, Chulalongkorn University.
Reid, Lawrence A. (2006). “Austro-Tai Hypotheses.” In Keith Brown (Ed.), The Encyclopedia of Language and Linguistics, 2nd Edition, pp. 609–610.
Sagart, Laurent (2005b). “Tai-Kadai as a Subgroup of Austronesian,” in L. Sagart, R. Blench, and A. Sanchez-Mazas (Eds.), The Peopling of East Asia: Putting Together Archaeology, Linguistics, and Genetics, pp. 177-181.
Sagart, Laurent (2019). “A Model of the Origin of Kra-Dai Tones.” Cahiers de Linguistique Asie Orientale. 48, 1: 1–29.
Thurgood, Graham (1994). “Tai-Kadai and Austronesian: The Nature of the Relationship.” Oceanic Linguistics 33: 345-368.

Is Afroasiatic Related to Indo-European?

Claudius: Very interesting. Too me Afro-Asiatic seems very close to IE. But I don’t know anything about the other Eurasiatic or Nostratic families besides Uralic and Altaic (Japanese).

But IE is like AA with corrupted and limited ablaut. PIE verbs did have ablaut just not to the extreme of AA languages. Even PIE/IE some nouns exhibit ablaut.

Part of the problem is that AA is so old. Nostratic itself is 15-18,000 years old, and AA is 13-15,000 years old itself. The numerals are still a mess. They’re probably not even reconstructible. Numerals get replaced more than people think. This silly numerals argument is also used to invalidate Altaic. But in Altaic most of the original numerals were replaced. However, some of the originals held on in lesser semantic roles. So they were still there, just harder to see as the main numeral forms got replaced by innovations.

AA is the most ancient language family that is universally accepted. Some say that Omotic is not proven to be part of it, but those are wild splitters like Lyle Campbell who reflexively object to everything in a reactionary manner. This reaction has absurdly taken over the whole field now. We can’t even agree that Altaic is real. For God’s sake, there’s a 1,300 page etymological dictionary of Altaic out there, and people still insist it’s not real!

It’s not particularly close to IE.

Core Nostratic is Uralic, IE and Altaic.

Altaic (Turkic, Mongolic, and Tungusic, including Japanese and Korean), Uralic (including Yukaghir), Eskimo-Aleut and Chukchi-Kamchatkan are possibly core Nostratic. Some include Etruscan.

Whether Afroasiatic is core Nostratic is controversial. Aharon Dolgopolosky thought it was. Allan Bomhard followed Dolgopolosky.

Later Nostratic concepts have placed Afroasiatic and Elamite parallel to Nostratic (Sergei Starostin). Others put AA, Kartvelian, Elamo-Dravidian as sub-branches within Nostratic (Bomhard). Starostin’s followers, including his son George, have placed AA back in core Nostratic.

Joseph Greenberg posited a subgroup of Nostratic called Euroasiatic. He did not include Dravidian and AA. Greenberg felt that AA and Dravidian were sisters to Nostratic as a whole. Bomhard put Euroasiatic as a sub-family of Nostratic alongside AA and Dravidian. as two other sub-branches.

But there are definitely parallels with AA and IE all right. That’s clear.

Proto-Nostratic root *γor-:

(vb.) *γor- ‘to leave, to go away, to depart; to separate; to abandon’;
(n.) *γor-a ‘leaving, departure; separation; abandonment’
Extended form:
(vb.) *γor-V-b- ‘to leave, to go away, to depart; to separate; to abandon’;
(n.) *γor-b-a ‘leaving, departure; separation; abandonment’

Afrasian: Proto-Semitic *γar-ab- ‘to leave, to go away, to depart’ > Arabic ġaraba ‘to go away, to depart, to absent (oneself), to withdraw (from), to leave (someone, something); to go to a foreign country; to expel from the homeland, to banish, to exile’, ġarba-t ‘removal, departure’, ġurba-t ‘absence from one’s homeland; separation from one’s native country, banishment, exile; life, or place, away from home’; Mehri əġtərōb ‘to be abroad, away from home’, ġərbēt ‘strange place, unknown place’; Śḥeri/Jibbāli aġtéréb ‘to be abroad, away from home’, ġarbέt ‘strange, unknown place; abroad’. Perhaps also Punic «rbt ‘desolation’ (?) in ḳl «rbt ‘the voice of desolation’ (interpretation highly uncertain) (cf. Hoftijzer-Jongeling 1995:887).

Proto-Indo-European *H₃orbʰ- ‘to be or become separated, abandoned, bereft’, *H₃orbʰ-o-s ‘(n.) orphan, servant; (adj.) bereft, abandoned, deprived (of)’:

Sanskrit árbha-ḥ ‘little, small; child’; Armenian orb ‘orphan’; Greek ὀρφανός ‘orphan, without parents, fatherless; (metaph.) bereft, abandoned’; Latin orbus ‘bereft, deprived by death of a relative or other dear one; bereaved (of); childless; an orphan’; Old Irish orb ‘heir’, orb(b)e, orpe ‘inheritance’; Gothic arbi ‘inheritance,’ arbja ‘heir’ (f. arbjō ‘heiress’); Old Icelandic arfi ‘heir, heiress’, arfr ‘inheritance, patrimony’, erfa ‘to inherit’, erfð ‘inheritance’; Old Swedish arve, arver ‘heir’; Danish arv ‘heir’; Norwegian arv ‘heir’; Old English ierfa, irfa ‘heir’, ierfe ‘inheritance, bequest, property’, erfe, irfe, yrfe ‘inheritance, (inherited) property’, irfan, yrfan ‘to inherit’; Old Frisian erva ‘heir’, erve ‘inheritance, inherited land, landed property’; Old Saxon erƀi ‘inheritance’; Middle Dutch erve ‘heir’; Old High German arbi, erbi ‘inheritance’, arbeo, erbo ‘heir’ (New High German Erbe ‘inheritance; heir’); Old Church Slavic rabъ ‘servant, slave’; Russian rab [раб] ‘slave, serf, bondsman’ (f. rabá [раба] ‘slave, serf, bondmaid’); Hittite (3rd sg. pres. act.) ḫar-ap-zi ‘to separate oneself and(re)associate oneself elsewhere’. Pokorny 1959:781-782 *orbho- ‘weak, abandoned; slave, orphan’; Walde 1927-1932:183-184 *orbho-; Mallory-Adams 1997:411 *h₂/h₃orbhos ‘orphan, heir’; Mann 1984-1987:884 *orbhəkos ‘young, tender; deprived, blind’, 884 *orbhənikos ‘young, minor, underage’, 884-885 *orbhət-, *orbhit- ‘deprived, bereft; deprivation, bereavement’, 885 *orbhi̯os adjectival form of *orbhos, 885 *orbhm̥ mos (*orbhmos) ‘bereft, deprived’, 885—886 *orbhos, -i̯os, -i̯ə ‘deprived, bereft; child, orphan’; Watkins 1985:46 *orbh- ‘to put asunder, to separate’ (suffixed form *orbh-o- ‘bereft of father’) and 2000:60 *orbh- ‘to change allegiance, to pass from one status to another’ (oldest form *ə̯₃erbh-, colored to *ə̯₃orbh-) (suffixed form *orbh-o- ‘bereft of father’ also ‘deprived of free status’); Gamkrelidze-Ivanov 1995I:399, I:651 *orbʰo- ‘deprived of one’s share, deprived of possessions; orphan; servant, slave’, I:781 *orbʰo-; Mayrhofer 1956—1980.I:52 and 1986—2001.I:119—120; Boisacq 1950:719 *orbho-s; Beekes 2010:1113—1114 *h₃orbʰ-o-; Frisk 1970-1973:431 *orbho-s; Chantraine 1968-1980:829 *orbho-; Hofmann 1966:240 *orbhos; Hübschmann 1897:482, no. 335, *orbhos; Matirosyan 2008:535-536 *Horbʰ-o-; Walde-Hofmann 1965-1972:219-220 *orbhos, *orbhi̯o-; Ernout-Meillet 1979:466—467; De Vaan 2008:433 *h₃orbʰ-o-; Derksen 2008:373 *h₃erbʰ-; Kroonen 2013:33 Proto-Germanic *arbja- ‘inheritance’ (<*h₃orbʰ-i̯o-), 33 Proto-Germanic *arbjan – ‘heir’ (< *h₃orbʰ-i̯on-); Orël 2003:22 Proto-Germanic *arƀaz, 22 Proto-Germanic *arƀjaz; Lehmann 1986:41-42 *orbho-;  Feist 1939:56 *orbhi̯o-; Falk-Torp 1910-1911.I:34; De Vries 1977:12 and 13; Boutkan-Siebinga 2005:93 *h₃erbʰ-; Walshe 1951:48; Kluge-Mitzka 1967:170 *orbho-; Kluge-Seebold 1989:183-184 *orbhijo-, *orbho-; Kloekhorst 2008b:311-312 *h₃erbʰ-to; Puhvel 1984:176—183.

Proto-Nostratic (n.) *t’orʸ-a ‘tree, the parts of a tree’ (> ‘leaf, branch, bark, etc.’):

Proto-Afrasian *t’[o]r- ‘tree’, preserved in various tree names or names of parts of trees (‘leaves, branches, etc.’): Semitic: Akkadian ṭarpa”u (ṭarpi”u) ‘a variety of tamarisk’; Arabic ṭarfā” ‘tamarisk tree’. Hebrew ṭārāφ [ טָרָף ] ‘leaf’ (a hapax legomenon in the Bible); Aramaic ṭarpā, ṭǝraφ ‘leaf’; Syriac ṭerpā ‘leaf, branch’; Samaritan Aramaic ṭrp ‘leaf, part of a tree, branch’. Klein 1987:252 Egyptian d&b ‘fig tree’ (< *drb); West Chadic: Hausa ɗoorawaa ‘locust-bean tree’; East Chadic: Bidiya tirip ‘a kind of tree’ (assimilation of vowels). Orël—Stolbova 1995:516, no. 2464, *ṭarip- ‘tree’.

Proto-Indo-European *t’er-w/u-/*t’or-w/u-, *t’r-ew-/*t’r-ow-/*t’r-u- ‘tree, wood’: Greek δόρυ ‘tree, beam’, δρῦς ‘oak’; Hittite ta-ru ‘wood’; Albanian dru ‘tree, bark, wood’; Sanskrit dā́ru ‘a piece of wood, wood, timber’, drú-ḥ ‘wood or any wooden implement’; Avestan drvaēna- ‘wooden’, dāuru- ‘wood (en object), log’; Welsh derwen ‘oak’; Gothic triu ‘tree, wood’; Old Icelandic tré ‘tree’, tjara ‘tar’; Old English trēow ‘tree, wood’, tierwe, teoru ‘tar, resin’; Old Frisian trē ‘tree’; Old Saxon triu, treo ‘tree, beam’; New High German Teer ‘tar’; Lithuanian dervà ‘resinous wood’, dãrva ‘tar’; Old Church Slavic drěvo‘tree’; Russian dérevo [дерево] ‘tree, wood’; Serbo-Croatian drȉjevo ‘tree, wood’; Czech dřevo ‘tree, wood’. Pokorny 1959:214—217 *deru-, *dō̆ru-, *dr(e)u-, *dreu̯ǝ-, *drū- ‘tree’; Walde 1927-1932:804-806 *dereu̯(o)-; Mann 1984-1987:142 *deru̯os, -ā, -i̯ǝ (*dreu̯-) ‘tree, wood, timber, pitchpine; pitch, tar, resin; hard, firm, solid, wooden’, 156 *dō̆ru ‘timber, pole, spike, spear’, 157 *doru̯os, -ā, -i̯ǝ ‘wood (timber); resin’, 161 *dru- (radical) ‘timber, wood’, 161 *drūi̯ō (*druu̯ō, *-i̯ō; *drūn-) ‘to harden, to strengthen’, 161 *drukos ‘hard, firm, wooden’, 162 *drus-, *drusos ‘firm, solid’, 162 *druu̯os, -om, -is ‘wooden, hard; wood’, 162 *drū̆tos ‘wooden, of oak, of hardwood; solid, firm, strong’, 165 *dr̥u̯is, -i̯ǝ ‘wood, trees, hardwood’, 165—166 *dr̥u̯os, -om; *drus-, *dru- ‘wood, timber, tree’; Gamkrelidze-Ivanov 1995:192 and 193 *t’er-w-, *t’or-w-, *t’r-eu-, *t’r-u- ‘oak (wood), tree’; Mallory-Adams 1997:598 *dóru ‘wood, tree’; Watkins 1985:12 *deru (also *dreu-) and 2000:16-17 *deru (also *dreu-) ‘to be firm, solid, steadfast’ (suffixed variant form *drew-o-; variant form *drou-; suffixed zero-grade form *dru-mo-; variant form *derw-; suffixed variant form *drū-ro-; lengthened zero-grade form *drū-; o-grade form *doru-; reduplicated form *der- drew-); Mayrhofer 1956-1980.II:36; Chantraine 1968-1980:294 *dor-w-, *dr-ew-; Frisk 1970-1973:411-412; Hofmann 1966:63 *dō̆ru; Beekes 2010.I:349 *doru; Boisacq 1950:197-198 *doru; Orël 1998:76 and 2003:405 Proto-Germanic *terwōn ~ *terwan, 409-410 *trewan; Kroonen 2013:514 Proto-Germanic *terwa/ōn- ‘tar’ and 522-523 Proto-Germanic *trewa- ‘tree’; Lehmann 1986:347-348 *deru-, *drewo-, *dr(e)w-(H-); Feist 1939:480-481 *der-eu̯-o-; De Vries 1977:591 *dreu-; Klein 1971:745 *derew(o)-, *drew(o)- and 779 *derow(o)-, *drew(o)-; Onions 1966:904 and 939 *deru-,*doru-; Kluge-Mitzka 1967:775 *deru-; Kluge-Seebold 1989:725 *deru-; Huld 1984:56 *dru-n-; Fraenkel 1962-1965:90-91; Derksen 2008:99 *deru-o- and 2015:123-124 *deru-o-; Smoczyński 2007:103; Osthoff 1901:98-180; Benveniste 1969:104-111 and 1973:85-91; P. Friedrich 1970:140-149 *dorw- ‘tree’ or ‘oak’.

Repost: Genes and Language Match Well

Genes and Language Match Well

This post will look into whether or not genes and language line up well. The question may seem academic, but it is important for linguists in the battle for whether or not there is anything to the large macro-families that the “lumpers” are creating.

It’s yet another skirmish in the lumpers versus splitters battle in Historical Linguistics. Historical is the branch that deals with language families, language relationships, and reconstruction of old languages that are no longer spoken.

The debate has heated up in recent years due to the prominence of lumper theories publicized by the late Joseph Greenberg and his disciples, notably Merritt Ruhlen at Stanford University. Ruhlen and Greenberg use a technique called mass comparison which has come under a lot of wild and irrational abuse but seems to be a valid scientific method in the hands of an expert.

Greenberg used it to come up with the four major language families of Africa a long time ago, and his classification there has remained pretty solid ever since.

He since published a book called Language in the Americas, which broke down all Amerindian languages into three large families – Amerind, Na-Dene and Eskimo-Aleut. I have read that book many times, and I concur with its analysis. Unfortunately, a detailed examination of the evidence goes beyond the scope of this post.

Na-Dene and Eskimo-Aleut are not very controversial, though the position of Haida within Na-Dene is regarded as unproven. However, looking at evidence mustered by Alexander Manaster-Ramer, I believe that Haida is definitely Na-Dene, though possibly a sister to the entire group as it is so distant.

In the same way, the ancient Indo-European Anatolian language is now regarded as a separate branch of Indo-European – Indo-Hittite or Indo-Anatolian. My Indo-Europeanist sources told me that Indo-Hittite or Indo-Anatolian is now regarded as consensus in the field.

Bengston promotes a family called Dene-Caucasian that involves the North Caucasian languages of the Caucasus, Basque, Na-Dene, Sino-Tibetan, Burushaski in northern Pakistan and the Ket Family in Siberia. I can’t speak for the whole family, but the evidence is definitely interesting. I think that Bengston has proven a case for Ket, Basque, and the Caucasian languages being related, as I read a book on that subject.

Recently, Edward Vajda conclusively proved that the Ket language is related to the Na-Dene languages.

A Ket man in Siberia. His phenotype looks a bit Japanese. He doesn’t look like an Amerindian. The situation of the Ket is deplorable, as most live in serious poverty and do not see any hope for improving themselves. The Ket language is also in bad shape, as hardly anyone under 35 can speak it well, and 30% of the population regard speaking Ket as useless.
The USSR did a better job with minority tongues than Putin.
There is good evidence of a link between the Ket and the  Amerindians (broken link). The Selkup are a Samoyedic people who live near the Ket. There is also good evidence linking the peoples of the Altai with Amerindians. This doesn’t make a lot of sense, as the Selkup and Ket now live a long ways from the Altai region, but the Ket and Selkup are thought to have lived in the Altai long ago and came north later on.
 
Relating to the Ket, along with the Selkup nearby, the theory linking these groups to the Amerindians supports a single migration to the Americas 16,000 years ago, but it’s not at all definitive. According to this paper (broken link) linking the Ket with Amerindians, Proto-Caucasians are thought to have evolved in Central Asia. I would place it more near the Caucasus.
 

Click to enlarge. I believe that the latest evidence is showing that all of the various Altai peoples – Northern Turkics would be the various Altai groupings – the Altai, the Tofalar, the Khakass and the Shor – are related to the Amerindians. These are often referred to as Northern Turkics. They aren’t really Turks per se as in people from Turkey, but even the Turks from Turkey are thought to be partly related to these Northern Turkic tribes.

Northern Turkics are right on the border between Asians and Caucasians on gene charts, and some Amerinds are not so far genetically from that border either. If you look at the Cavalli-Sforza gene chart below, you can see that next to the Eskimo-Aleuts, the Chukchi, and the Northern Turkics are the people most closely related to the Amerindians.

It also looks like the Ket and Selkup came from what is now the Northern Turkic Altai region. Anthropologically, these various groups are either Uralics, South Siberian, Central Asian or North Asian Asiatics. The Altai region is where Russia, China and Mongolia all come together.

This is the first connection of a New World language family with an Old World language family.

Here is a Nenets woman from Siberia. She definitely looks Northern Chinese or Korean. They have a population of 44,000, and there are 31,000 speakers of the language. It’s really two languages – Forest Nenets and Tundra Nenets – but both are said to be endangered. I think at least Tundra Nenets will be around for a while though, as most kids are still learning it. The Nenets are Samoyedics like the Selkup, discussed above. The Selkup are related to the Amerindians.

It’s interesting that the Ket have also been linked genetically with the New World.

Here is a rare photo of Ed Vajda with two Ket women in Siberia described as “experts in the Ket language.” I’m not good at judging ages, but these women look to be about 40-60. If so, that is good, as I thought all of the speakers were elderly, and hardly anyone spoke the language well anymore. Ket has anywhere from 537-1,000 speakers. A related language, Yugh, is thought to have recently gone extinct. The rest of the Yeniseien languages went extinct about 150-250 years ago.

Greenberg and Ruhlen are the most vilified of the lumpers, but there are others who are following more orthodox methods of reconstruction to prove the existence of ancient language families, such as the late Sergey Starostin, his son George Starostin, John Bengston, the late Vladislav Markovich Illich-Svitych (a prodigy, dead at the young age of only 32), Aharon Dolgopolsky and Vitaly Victorovich Shevoroshkin.

The Starostins, Illich-Svitych, Dolgopolsky, and Shevoroshkin all worked on Nostratic, a vast family consisting variously of Indo-European, Uralic, Altaic, Kartvelian, Nivkh, Chukotko-Kamchatkan, Afro-Asiatic, Dravidian, and Eskimo-Aleut. I now think that Afroasiatic and Dravidian are sisters to Nostratic instead of part of the family per se because they are so far removed from the rest of the family.

I would accept IE, Uralic, Altaic, Chukotko-Kamchatkan and Eskimo-Aleut in Nostratic. The Altaic family is itself controversial, but I regard it as fact, having studied it. Altaic also includes Japanese and Korean. I would toss Yukaghir in with Uralic.

Nostratic has a lot more going for it than some of the other long-range proposals, and since these scholars are using classic reconstruction, it gets respect from splitters. Starostin’s webpage is a great resource for looking into long-range theories, especially Nostratic and Altaic.

Bengston, Shevoroshkin, and the Starostins all worked on Dene-Caucasian. This hypothesis seems a lot more controversial.

Click to enlarge. Here is a tree of Luigi Cavalli-Sforza’s human genetic families on the left and various human language families on the right, including some big families. The only one that is seriously out of place is Tibetan. This is because the Tibetans are a genetically North Chinese people who have moved down into Southern China in recent years. They cluster with South Chinese linguistically but NE Asians genetically.
All the rest lines up pretty well, including super-families like Nostratic and Eurasiatic (a Nostratic-like family created by Greenberg).
The hypothesized Austric family is interesting. I’m not sure if I buy this super-family or not, but I have not really looked into it.
With recent genetic evidence linking Indonesians and Vietnamese to Daic peoples of South China and SE Asia, it seems worth looking into. At the very least Austro-Thai, a language family consisting of the Austronesian and Tai-Kadai families. seems to have been proven in the last 10 years with the publication of a couple of important articles. Laurence Sagart is doing good work in this area.

References

Campbell, Lyle & Mithun, Marianne (Eds.) 1979. The Languages of Native America: An Historical and Comparative Assessment. Austin: University of Texas Press.
Campbell, Lyle. 1988. “Review of Language in the Americas, by Joseph Greenberg.” Language 64: 591-615.

Campbell, Lyle. 1997. American Indian Languages: The Historical Linguistics of Native America. New York: Oxford University Press.

Greenberg, Joseph. 1987. Language in the Americas. Stanford: Stanford University Press.

Greenberg, Joseph. 1989. “Classification of American Indian languages: a reply to Campbell.” Language 65:1, 107-114.

An Interesting Mostly Southern Chinese Phenotype

A good friend of mine who resides in Singapore. He is very interested in his background and gave me his photo to analyze.

Looking at it, I believe he is definitely Southern Chinese fore the most part. His father is Hainanese and has a rather distinctive genotype that looks something like his son’s. His mother is a certain type of Malay that dates back to the 1400’s and is significantly mixed with European blood, mostly British and Dutch, as Europeans have a presence in the area dating back centuries. I believe that they are called Pernakans. He also has some female relatives that look very Malay. I do not know who the older man to the right is, but he looks quite Malay to me.

I think my friend ended up looking more Chinese than Malay. The Hainanese are definitely a Chinese type people. Whether they also have a Vietic type SE Asian component is not known as I do not know the history of Hainan.

Although my friend definitely has a strong Southern Chinese look, he also has another component that makes him look, well, different. I’m not going to attempt to describe this element, but it does make him look somewhat “odd,” “interesting,” or “unusual, ” from a Southern Chinese POV. A typical Southern Chinese would say that he looks like a Southern Chinese, but he’s not like us. A Southern Chinese has more of a Modern Mongoloid look. My friend is mostly modern Mongoloid, with some elements of transitional Mongoloid or archaic Mongoloid – this is what the Malays are after all – added in.

The evolution from Negritos to moderns occurred much later in Malaysia, much taking place in only the last 5,000 years. The Senoi are an example of an archaic group that is definitely Australoid yet nevertheless more progressive than the Negritos. These are the “dream people” of psychological and anthropological literature, though modern research has shown that they do not incorporate dreams as much into their waking lives as we previously thought and that the extent to which they do this was much exaggerated.

There are also Negritos (or original Asians) in Malaysia. In fact, there is a group in Malaysia that genes that date back to 72,000 YBP. This is actually before the main Out of Africa event, yet is has now been shown that other small groups went out of Africa before then.

Most of these groups were devastated by the vast Toba volcanic explosion in India 72,000 YBP that exterminated almost all humans in South and Southeast Asia. It is thought that only 1,500 of this group survived the explosion. This means that humans went through a severe genetic bottleneck no doubt accompanied by massive selection pressure and huge genetic effects. Whether this explosion’s effects extended to Central Asia (probably), the Middle East (maybe), or East Africa (unknown) is not known. At any rate, this original group departed from East Africa near Somalia and Djibouti.

The main OOA group left out of here too. No one quite knows what these people looked like but they have appeared somewhat Khoisan. The Khoisan are the most ancient group in Africa with genes dating back 52,000 YBP. Further, their click language to me seems like a good candidate for the original human language. It does seem to be quite primitive. Before that, we clearly used sign language. Neandertals could not speak due to their hyoid bones. The great apes also have this problem. So when Neantertals vocalized, they may have sounded like great apes.

The Sasquatch, which I believe is an archaic hominid related to Heidebergensis which somehow survived, has a very odd speech pattern (it speaks on the inhale, bizarrely enough – try it sometime) and a friend of mine who shot and killed two of them told me that the juveniles were using extensive sign language. They ran half the time on all four and half the time on two legs, which is very odd. Sasquatches can run up to 30 mph on all fours. That must be quite frightening to watch but it can be seen in the Port Edward Island Sasquatch footage. Anyway, enough about Bigfoot for today!

It’s not known how far modern human language dates back. Sergei Starostin feels it cannot date back more than 50,000 because so many cognates remain that we can actually construct a bit of Proto-World. One Proto-World term is “tik” meaning one, to point, index finger, etc. From this comes our word to teach. Imagine a teacher pointing at a blackboard with his index finger. I worked on an Indian language a while back and they had a very archaic word found only in the earliest vocabularies – tik, meaning “the point of a spearhead. I cannot prove it but I believe deep down inside that this is from the same root. I

It’s more of a gut feeling or intuitive thing, and intuitions are often wrong because they overgeneralize, throw out logic altogether, and rely exclusively on notoriously unreliable and subjective (the very word subjective implies emotional response) feelings, especially deep or gut feelings that can be described as “Gestalt.” I’m a birdwatcher and we use something called Gestalt to identify fleeing glimpses of a bird.

All we can see is what philosophers like Heidegger might call “the essence” or essential nature of the bird rather than it’s surface characteristics which are too fleeting to identify. Heidegger discusses surface versus essence interpretations of objects a lot. It seems hard to figure out but it’s easier than you think.

Logic relies on surface or appearance, including the human definition we have given to the object.

Intuition on the other hand pretty much throws out the surface stuff and looks for the “essence of the thing” or the “deep meaning” or “true meaning” of the object. We are getting into Plato here with the concept of “pure objects” that actually do not exist in reality.

An example of Platonic pure objects would be what I call the Masculine and Feminine spirit (see the brilliant and wrongly derided Otto Weininger’s “Sex and Character” for more. And Weininger comes from Nietzsche in my opinion and leads to Heidigger, also in my opinion. He seems to be a sort of a bridge between the two. Note that all were Germans, Weininger an Austrian, but oh well.

The Masculine Spirit and the Feminine Spirit is one way of dividing the universe or world in a binary manner. Not that there are not other binary methods of chopping the world into opposite halves, but this is just one of them.

I would argue that the world is half Masculine principle and half Feminine principle and that neither is better than the other and the marriage of the two opposites creates a whole that is bigger than the sum of its parts, hence the human pair bond where each pair of the male-female couple fills in the missing blanks or parts of the other one, each creating a whole person in the other where only a “half person” had existed before.

We are also getting into Taoism here, but the ancient Chinese were awful damn smart, so you ignore them at your peril in my opinion. Furthermore, the Taoist maxim of how to live your life – “moderation in all things” is an excellent aphorism, not that many of us ever do it. It’s clearly the route to a long lifespan.

To do the opposite is to burn candles at both ends, life fast, die young, and leave a pretty corpse, which sounds very romantic and appealing when young (it did to me) but which sounds increasing idiotic and even suicidal for no good reason with each advancing year past 30. I now find it laughable, pathetic, and openly suicidal and delight in mocking the concept. But I survived another 30 years past the expire date on that concept, so perhaps my new attitude is simply the inevitable product of living out that maxim twice and hence nullifying it.

There are a number of Southern Chinese groups with more of an indigenous look, sometimes prognathous. These date back to the original indigenous elements in Southern China and SE Asia, who all date back to the Negritos. The Montagnards of Vietnam are definitely one of these indigenous types. The indigenous went from

Indigenous (Negrito) -> Proto SE Asian (with Melanesian component) -> modern SE Asian (Modern Mongoloid with archaic components. This effect is quite pronounced in the Vietnamese, who were completely overrun by a Chinese invasion 2,300 years ago after which there was much interbreeding and a huge infusion of Cantonese words, which now make up 70% of Vietnamese vocabulary.

However, the core vocabulary of of Vietnamese remains Austroasiatic (a language family nevertheless with Southern Chinese roots derived from the archaic Mongoloid peoples of the region 5-7,000 YBP, who later moved into SE Asia. This core vocabulary is shared by the Munda branch of Astroasiatic, completely isolated India, particularly Eastern (Mongoloid) India. The fact that Vietic shares a common core vocabulary with the geographically separated Munda proves the existence of Austrasiatic.

In fact, it is the final convincing argument. Anyone who says that Austroasiatic does not exist is a fool.

Further, the evidence for Austroasiatic, a proven family, is no greater than the existence for Altaic, and in fact Altaic may be better proven. The “numerals” argument against Altaic is belied by the 13,000 year old Afroasiatic language, the numerals of which are a complete disaster.

Numerals are more often innovated and replaced than people think. Often the old cognates survive in archaic words or words used for related concepts, but it’s not unusual at all for the main term to be an out and out innovation. Most Altaic numerals are innovated, but there are a few cognates. Further most of the numerals have cognates in related or archaic words.

This is the most archaic layer of Austroasiatic. Some of these peoples are archaic Mongoloids with a strong Australoid component. A branch of these Australoids called Carpenterians went from India to Australia 11,000 YBP and become part of the Aborigines. Another group of archaic Australoids were called Murrayans. They came from Thailand 17,000 YBP and went to Australia. It is not known what Australians looked like before that but no doubt they were quite primitive. It’s long been thought that they have more Erectus component than the rest of us, but I’m not sure that is proven. Certainly their appearance resembles that.

The Murrayans are the core element of the Ainu, who went to the Philippines 16,000 YBP in an unusual, Caucasian appearing type, and then moved to the Southern Japanese islands north into Japan 13,000 YBP, quite possibly replacing an ancient Negrito type already there. This Negrito type definitely existed in Southern China and may well have existed in Korea. Some Australoids or especially Australoid-Mongoloid mixes can have a superficial “Caucasian” appearance, but that’s just parallel development, coincidence or more probably the fact that the possible human phenotypes is only a small subset of the possible ones.

It is this coincidentally “Caucasoid” appearance that led many observers to believe that the Ainu were somehow ancient Caucasians (Norwegians, joked one anthropologist was) that got stranded from the rest of Europoid flock way over on the other side of Asia. In fact, the Ainu are Australoid by skull and Mongoloid by genes. Their language, like the Japanese language, has an ancient Austronesian layer that has led many to falsely conclude that the Altaic Japanese language is actually an Austronesian one. The argument is even better with Ainu, the deeper group of which has not been shown to my satisfaction.

English as a Genocidal Language Attacking Other Tongues Spoken in the Anglosphere – USA

English has had a genocidal affect on the other languages spoken here, but many non-English languages still survive and some are quite thriving.

Pennsylvania Dutch is still quite alive with 300,000 native speakers. I think is is just a dialect of Rhenish German. It’s actually two separate languages and they can’t understand each other.

There are many other languages in the US that have been taken out by English. Most of the Indian languages spoken here have been driven extinct or moribund by English. A few like Cherokee, Sioux, Navajo, Mohawk, Pueblo, some Alaskan languages, a couple of Indian languages of the US South, are still doing well.

Most of the others are in bad to very bad shape, often moribund with only 10 or fewer speakers, often elderly. Many others are extinct. However, quite a few of these languages have had a small number of middle aged to elderly speakers for the last 25 years, so the situation is somewhat stable at least at the moment.

Almost all Indian languages are not being  learned by children. But there are still children being raised speaking Cherokee, Navajo, Pueblo, Mohawk, and some Alaskan and Southern US Indian languages. Navajo is so difficult that when Navajo children show up at school, they still have  problems with Navajo. They often don’t get the  language in full until they are twelve.

However, there are revitalization efforts going on with many to most Indian languages, with varying amounts of success. Some are developing quite competent native speakers, often young people who learn the language starting at 18-20. I know that Wikchamni Yokuts has a new native speaker, a 23 year old man who learned from an old who is a native speaker. In California, there is a master apprentice program going on along these lines.

There are a number of preschool programs where elders try to teach the  languages to young children. I am not sure how well they are working. There are problems with funding, orthographies and mostly apathy that are getting in the way of a lot of these programs.

There are many semi-speakers. For instance in the tribe I worked with, many of the Indians knew at least a few words, and some of the leadership knew quite a few words. But they could hardly make a sentence.

Eskimo-Aleut languages are still widely spoken in Alaska. I know that Inuktitut is still spoken, and  there are children being raised in the language. Aleut is in poor shape.

Hawaiian was almost driven extinct but it was revived with a revitalization program. I understand that the language still has problems. I believe that there are Hawaiian medium schools that you can send your child to. There may be only ~10,000 fluent speakers but there are many more second language speakers with varying fluency.

There are actually some European based languages and creoles spoken in the US.  A noncontroversial one is Gullah, spoken on the islands of South Carolina. There may be less than 5,000 speakers, but the situation has been stable for 30-35 years. Speakers are all Black. It is an English creole and it is not intelligible with English at all.

There is at least one form of French creole spoken in Louisiana.  There is also an archaic form of French Proper called Continental French that resembles French from 1800. It has 2,000 speakers. Louisiana French Creole still has ~50,000 speakers. People worry about it but it has been stable for a long time. Many of the speakers are Black.

Texas German is really just a dialect of German spoken in Texas. There are only a few elderly speakers left.

There are a few Croatian languages spoken in the US that have diverged dramatically from the languages back home that they are now different languages. The status of these languages vary. Some are in good shape and others are almost dead. One of these is called Strawberry Hill Gorski Kotar Kaikavian spoken in Missouri. It is absolutely a full separate language and is no longer intelligible with the Gorski Kotar Kaikavian spoken back home.

There are other European languages spoken in the US, but they are not separate from those back home. Most are going out.

There are many Mandarin and especially Cantonese speakers in the US.

There are many Korean speakers in the US, especially in California.

There are a fair number of Japanese speakers in the US, mostly in California.

There are many speakers of Khmer, Lao, Hmong, and Vietnamese in the US. Most are in California but there are Hmong speakers in Minnesota also.

There are quite a few speakers of Arabic languages in the US. Yemeni, Syrian, and Palestinian Arabic are widely spoken. There are many in New York City, Michigan and California.

There are also some Assyrian speakers in  the US and there are still children being raised in Assyrian. Most are in California.

There are quite a few Punjabi and Gujarati speakers in the US now. We have many Punjabi speakers in my city.

There are quite a few Urdu speakers here. Most of all of these speakers are in California.

Obviously there are many Spanish speakers in the US. English is definitely not taking out Spanish. They are mostly in the Southwest, Florida, and New York City, but they are spreading out all across the country now.

There are a few Portuguese speakers in the US. All also speak English. They are mostly in California but some are back east around Massachusetts.

The Sicilian Italian spoken in the US by Italian immigrants is still spoken fairly widely to this day. It has diverged so much from the Sicilian back home that when they go back to Sicily, they are not understood. This is mostly spoken in large cities back east.

There are quite a few Armenian speakers in the US and children are still being raised in Armenian. Most are in California.

There are some Persian speakers in the US, but not a lot. Most of these are in California too.

All of these languages are the same languages as spoken back home.

External Relations of Japanese and Apache

Jason Voorhees: YEE – There is some similarity between the language of an Apache and that of the Japanese for example.
Yee: That seems far fetched. My ancestors moved from Central China, but I can’t understand any of their dialect now. Language is easy to lose

Actually this is not correct. Apache does have external relations in the new Yenisien-Na Dene family (already under fierce attack by splitters), and in a larger sense to Chinese but not Japanese. But there is no similarity whatsoever between Japanese and Apache, other than that probably all human languages are related at some distant level. There is no clear or obvious relationship between Japanese (really Japonic) and any other language. Japanese is not one language. It is a group of languages called Japonic. Most of the Japonic languages are spoken the Ryukyu Islands (Okinawa), where there are 5-6 separate languages spoken. These languages still have many speakers, but they are in very bad shape as the Japanese have been waging war on them for some time now. Most of the speakers are middle aged or older and transmission to the young is at a low level.
However, it is clear to me that Japanese does have external relations. The most obvious external relation would be with Korean. Even some of the hardest-core anti-Altaicists agree that there is a good chance that Korean and Japanese are related. Looking at the larger picture, Japanese and Korean are both related to Turkic, Tungusic and Mongolic in a superfamily called Altaic. Mainstream linguistics has refused to accept Altaic although the evidence for its existence is striking.
The evidence for the existence of Altaic is just as good as the evidence for Austroasiatic,l and that is a universally accepted family. Worse, people who believe in Altaic are attacked and ridiculed mercilessly to the point where if you believe in it,  you might actually have a hard time getting a professorship.
Of course, Altaicists are accused of being anti-scientific because “science” has not yet shown that there is any relationship. Adults who think like this are children. Science doesn’t know everything and science is flat out wrong about countless things. That is because many theories are simply true that are presently rejected by science due to so-called lack of evidence.
Having to go ask Mommy Science whether everything you encounter in the world is true or not is like what a child does. A child is always running up to Mommy asking is it is true that so and so etc etc. Mommy says yes or no and the kid is satisfied. The are adults who are still tied to their mothers apron strings who never learned to differentiate themselves as mature individuals. Hence they have to run the Mommy Science and ask whether something is true or not instead of sitting down and looking at the evidence and deciding for yourself.
Not all things that are true have been accepted by science. If you are going to learn anything in life, it should be that right there. Time to cut the apron strings, babies.

The Roots of the Alphabet(s)

Probably most of you do not know that we are all using a variant of the ancient Phoenician alphabet. Actually I am not sure if that is precisely true, as I think the Phoenician alphabet was preceded by an Assyrian one. But at any rate, our classic Western alphabets all came out of the Levant and Mesopotamia in some way or other. Indeed, it is even theorized that many of the syllabaries in use in Central, South and Southeast Asia are also rooted in this original alphabet from the Levant.

Of course, Chinese and consequently Korean and Japanese alphabets have another origin.

One might wish to throw the odd SE Asian orthographies such as Thai, Lao, Burmese, Vietnamese, Javanese, Sundanese and Khmer there, but my understanding is that all of those SE Asian orthographies were actually derived from syllabaries originally designed in India.

A few writing systems such as Georgian, Armenian and Cree may have been created de novo, but I might have to look that up. The only non-Middle Eastern derived orthography that immediately comes to my mind is the Chinese ideographs.

The origins of the Assyrian/Phoenician alphabet appear to have been ultimately in Egyptian hieroglyphics. So the ancient Egyptians really started it all when it comes to writing down words, at least for the West.

Chinese ideographs may date from even earlier. Chinese bone writing goes way back.

Very early European writing such as runic systems and similar systems in Asia such as the Turkic Orkhon inscriptions may not be related to the Phoenician system at all. The Yukaghir in Siberia and the Yi in South China may also have designed de novo systems.

A Look at the Korean Language

From here.
A look at the Korean from the perspective of an English speaker trying to learn the language. The truth is that Korean is one of the hardest languages on Earth for an English speaker to learn.
Most agree that Korean is a hard language to learn.
The alphabet, Hangul at least is reasonable; in fact, it is elegant. But there are four different Romanizations – Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul. Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether because you really need to know the hanja or Chinese character that is in back of the Hangul symbols.
Bizarrely, there are two different numeral sets used, but one is derived from Chinese so it should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems.
Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings. Japanese has a similar problem with homonyms, but at least with Japanese you have the benefit of kanji to help you tell the homonyms apart. With Korean Hangul, you get no such advantage.
Similarly, there seem to be many ways to say the same thing in Korean. The learner will feel when people are using all of these different ways of saying the same thing that they are actually saying something different each time, but that is not the case.
One problem is that the bp, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible.
The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. On the other hand, Japanese or Chinese will help you a lot with Korean. Chinese and Japanese speakers can usually learn Korean quickly.
Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand.
Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway.
Maybe 60% of the words are based on Chinese words, but unfortunately, much of this Chinese-based vocabulary intersects with Japanese versions of Chinese words in a confusing way.
Speakers of Korean can learn Japanese fairly easily. Korean seems to be a more difficult language to learn than Japanese. There are maybe twice as many particles as in Japanese, the grammar is dramatically more difficult and the verbs are quite a bit harder. The phonemic inventory in Korean is also larger and includes such oddities as double consonants.
Korean is rated by language professors as being one of the hardest languages to learn.
Korean is rated 5, hardest of all.

A Look at the Japanese Language

From here.
A look at Japanese, with a view to how hard it is to learn for a speaker of English
Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English.
The Japanese orthography is one of the most difficult to use of any orthography.
There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood.
The Japanese writing system is probably crazier than the Chinese writing system. Japanese borrowed Chinese characters. But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millenia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse.
Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character.
There are some writers, typically of literature, who deliberately choose to use kanji that even Japanese people cannot read. For instance, Ryuu  Murakami  uses the odd symbols 擽る、, 轢く、and 憑ける.
The Japanese system is made up of three different systems: the katakana and hiragana (the kana) and the kanji, similar to the hanzi used in Chinese. Chinese has at least 85,000 hanzi. The number of kanji is much less than that, but kanji often have more than one meaning in contrast to hanzi.
Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English.
A common problem is that a perfectly grammatically correct sentence uttered by a Japanese language learner, while perfectly correct, is still not acceptable by Japanese speakers because “we just don’t say it that way.” The Japanese speaker often cannot tell why the unacceptable sentence you uttered is not ok. On the other hand, this problem may be common to more languages than Japanese.
There is also a class of Japanese called “honorifics” or “keigo” that is quite hard to master. Honorifics are meant to show respect and to indicate one’s place or status in the social hierarchy. These typically effect verbs but can also affect particles and prefixes. They are usually formed by archaic or highly irregular verbs. However, there are both regular and irregular honorific forms. Furthermore, there are five different levels of honorifics. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play.
Although it is true the Japanese young people are said to not understand the intricacies of keigo, it is still expected that they know how to speak this well. Consequently, many young Japanese will opt out of certain conversations because they feel that their keigo is not very good. Books explaining how to use keigo properly have been big sellers among young people in Japan in recent years as young people try to appear classy, refined or cultured.
In addition, Japanese born overseas (especially in the US), while often learning Japanese pretty well, typically have a very poor understanding of keigo. Instead of embarrassing themselves by not using keigo or using it wrong, these Japanese speakers often prefer to speak in English to Japanese people rather than bother with keigo-less Japanese. Overcorrection in keigo is also a problem when hypercorrection leads to someone making errors in keigo due to “trying to hard.” This looks like phony or insincere politeness and is often worse than not using keigo at all.
One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things which involve the use of a complex numerical noun classifier system.
Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that all noun modifiers, even phrases, must precede the nouns they are modifying.
It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative.
In this sentence:
The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM.
Everything underlined must precede the noun plane:
Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM.

One of the main problems with Japanese grammar is that it is going to seem to so different from the sort of grammar and English speaker is likely to be used to.
Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning.
However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs.
Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension.
The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing.
Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words.
Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. A study by the US Navy concluded that the hardest language the corpsmen had to learn in the course of service was Japanese. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily.
Japanese is rated 5, hardest of all.
Classical Japanese is much harder to read than Modern Japanese. Though you can get by with much less kanji when reading the modern language, you will need a minimum knowledge of 3,000 kanji for reading Classical Japanese, and that’s using a dictionary. There are only about 500-1,000 frequently used characters, but there are countless other words that will come up in your reading especially say special words used in the Imperial Court. Many words have more than one meaning, and unless you know this, you will be lost. 東宮(とうぐう) for instance means Eastern Palace. However, it also means Crown Prince because his residence was to the east of the Emperor’s.
The movie The Seven Samurai (set in the late 1500’s) seems to use some sort of Classical Japanese, or at least Classical vocabulary and syntax with modern pronunciation. Japanese language learners say they can’t understand a word of the archaic Japanese used in this movie.
Classical Japanese gets 5, hardest of all.

A Look at the Chinese Language

From here.
This post will look at how hard it is to learn Chinese for an English speaker.
It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you hit a wall, often because the isolating syntactic structure is so strangely different from English.
Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English with no tense or articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense.
Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There are serial verbs, a complex classifier system, syntax marked by something called topic-prominence, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff. Verb complements can be baffling, especially potential and directional complements. The 了 character can have seemingly countless meanings. You also need to learn quite a bit of vocabulary just to speak simple sentences.
Chinese phonology is not as easy as some say. There are too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same. There is a distinction between aspirated and nonaspirated consonants which does not exist in English.
Chinese orthography is probably the hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more (although this is controversial), but you only need to know about 4-6,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than 5% of Chinese know that many.
The Communists tried to simplify the system (simplified Mandarin), but they simply decreased the number of strokes needed for each symbol. The Communists’ spelling reform left much to be desired.
To make matters worse, there are different ways to write each symbol – different styles of Chinese calligraphy. For instance, Classical Chinese may be written in so called “grass-style” calligraphy or in another style altogether.
It’s a real problem when you encounter a symbol you don’t know because there is often no good way to sound out the word as the system simply is not very phonetic. The Chinese alphabet is probably only 25% phonetic, and many frequently-used characters give tell you nothing about how to pronounce them. Further, you need to learn at least 300 characters before you can start to use the meager phonetics of the writing system at all.
Furthermore, word boundaries are not obvious, as one character does not necessarily equal one word. Therefore it is hard to tell where one word starts and stops and another one begins.
Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary.
Furthermore, merely learning how to look up words in the dictionary in the first place takes new Chinese learners several months and learning how to use a dictionary well is typically not possible until a year of study. Even people who have studied for several years sometimes encounter characters that they simply cannot find in the dictionary. In China, dictionary look-up contests are often held, showing that the process is not transparent at all.
A good student of Chinese often has more than one dictionary, and some have up to 20 different dictionaries. There are separate dictionaries for simplified and traditional characters and dictionaries that have both. There are entire dictionaries just for Classical Chinese particles and others for four character idioms (chéngyǔ), a type of allegorical sayings with two parts (xiēhòuyǔ), and another for proverbs (yànyǔ). There are separate dictionaries for terms that entered Chinese during the Chinese era and others for specifically Buddhist terms. There is an easier way to use a Chinese dictionary called four-part look-up, but it takes a long time to learn it and most learners never master it for whatever reason.
To solve all of these problems with the ideographic writing system, numerous romanization schemes have been invented. At last count, there were a dozen or so of them, but a number of those are rarely used. Certainly, there are 2-3 heavily used ones and that is not counting the bomofu phonetic alphabet used in Taiwan. One of the main problems with these romanization systems is that none of them are very good and they all have serious limitations. Furthermore, the romanization system you studied as a Chinese learner tends to affect your accent in Chinese.
Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense. The writing system is often so opaque that even native speakers forget how to write the characters of eve commonly used words.
Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese (wenyanwen) prose. It’s actually written in a different language, so to learn to read Chinese properly like an educated Chinese person does, you will have to learn not one language but two.
One rejoinder is that Classical Chinese to Chinese people is similar to Greek and Latin to an English speaker, but this is a bad analogy, as Classical Chinese is widely studied in Chinese secondary schools and some of the finest Chinese prose is written in this language (see the Confucius and Mencius examples below). Further, after studying French for a few years, you should be able to read French authors who wrote 300 years ago, but after a similar period of studying Chinese, you will not be able to read Confucius or Mencius.
Hence most educated Chinese would be expected to know something about Classical Chinese, and if you wanted to learn Chinese like an educated Chinese speaker, you would have to learn this other language also.
In addition, you need to learn Classical Chinese even if you do not aspire to be an educated Chinese speaker because  one encounters Classical Chinese often in modern Chinese society, often in paintings or character scrolls.
The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another.
One problem with the tone system is that when you want to change the meaning of a sentence in a subtle manner via changing intonation of a word, you are bound to change the tone of the word in Chinese. Merely by placing semantic emphasis on a single word, you may deliver a gibberish sentence. Chinese speakers have their own way of using tone as a way of generating subtle semantic meaning, but they do so in an entirely different way than speakers of non-tonal languages do.
However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy.
A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.
Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones even with the tones, and in that case, meaning is often discerned by context, stress, rhythm and intonation.
Chinese, like French and English, is heavily idiomatic.
It’s little known, but Chinese also uses different forms to count different things, like Japanese.
There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms and have no cognates to fall back on.
In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling.
mei meiyounger sister
jie jie
older sister
ge ge
older brother
di di
younger brother
Many agree that Chinese is the hardest to learn of all of the major languages. In a recent international survey of language professors worldwide, these teachers rated Chinese as the hardest language to learn among languages that are commonly studied.
Mandarin gets a 5 rating for extremely hard.
However, Cantonese is even harder to learn than Mandarin. Cantonese has nine tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.
In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Modal particles are difficult in Cantonese. Clusters of up to the 3 sentence final particles are very common. 我食咗飯 and 我食咗飯架啦喎 are both grammatical for I have had a meal, but the particles add the meaning of I have already had a meal or answering a question or even to imply I have had a meal, so I don’t need to eat anymore.
Cantonese gets a 5.5 rating, close to hardest of all.
Min Nan is also said to be harder to learn than Mandarin, as it has a more complex tone system, with five tones on three different levels. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor and many fewer children are being raised speaking it than before.
Min Nan gets a 5.5 rating, close to hardest of all.
A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of Shanghai (Fengxian Wu) was the most complex language of all, with 20 separate vowels. The nearest competitor was Norwegian with 16 vowels.
Fengxian Wu gets a 5.5 rating, close to hardest of all.
Classical Chinese is still read by many Chinese people and Chinese language learners. Unless you have a very good grasp on modern Chinese, classical Chinese will be completely wasted on you. Classical Chinese is much harder to read than reading modern Chinese.
Classical Chinese covers an era extending over 3,000 years, and to attain a reading fluency in this language, you need to be familiar with all of the characters used during this period along with all of the literature of the period so you can understand all the allusions. Even with a knowledge of Classical Chinese, you need to read it in context. If you are good at Classical Chinese and someone throws you a random section of it, it will take you a good amount of time to figure it out unless you know context.
The language is much more to the point than Modern Chinese, but this is not as good as it sounds. This simplicity leaves a room for ambiguity and context plays an important role. A joke about some obscure historical or literary anecdote will be lost you unless you know what it refers to. For reading modern Chinese, you will need at least 5,000 characters, but even then, you will still need a dictionary. With Classical Chinese, there are no lower limits on the number of characters you need to know. The sky is the limit.
Classical Chinese gets a 5.5 rating, close to hardest of all.