Still Working on My Paper

This is taking up all my time these days. Oh well, at least I have a job that I’m working at full-time.

If you are interested, you can check the progress of it here. The problem with this sort of thing is that once you start down one of the rabbit holes, you can stay down there a long time and still not feel that you’ve finished. At some point, you have to say enough is enough and move on. 75 pages on the page and over 200 references. I’m going nuts on references. If you can’t dazzle them with brilliance, bury them with bullshit. Works pretty good.

There are two other drafts up. One is on consensus in the Altaic language controversy. Another is on a misreading of a comparison of Japonic with Turkic, Mongolic, Tungusic, and Koreanic which shows that Japonic is absolutely related to the other languages.

Not sure if any of you are interested in this rather esoteric, specialized, and abstruse stuff, but some of you might be.

Splitters Versus Lumpers in Historical Linguistics

Warning:  Long, runs to 57 pages. This article is intended at the moment more for the general audience than for specialists,  but specialists may also find it of interest. At the moment, it is not properly formatted or edited to be of use for publication in an academic journal, but perhaps it could be published in such a format some day.

For background into what Historical Linguistics is, see this Wikipedia article. Basically it involves determining which languages are related to each other via various means and once that is determined, reconstructing a proto-language that the related languages descended from, along with, hopefully, regular sound correspondences which supposedly proves the relationship once and for all. The argument in Historical Linguistics now is between conservatives or splitters or progressives or lumpers.

Splitters say that the comparative method – described above as reconstructing a proto-language with regular sound correspondences – is necessary in order to prove that two or more languages are related. However, they also say, probably correctly, that this method is not useful beyond ~6,000 years. Any relationships beyond that time frame would not be provable by the comparative method and hence could never be proven. This effectively shuts down all research into long-range older language families.

Some lumpers say that this method is not necessary and instead relationships can be determined by simply looking at the two or more languages, a process called comparison or mass comparison. I point out below that comparison need not be cursory but could mean deep study of languages over 10, 15, or 20 years.

They tend to focus on core vocabulary, numerals, family terms, pronouns, and deictics, in addition to small morphological particles – all things that are rarely borrowed. Once they find a number of these items that resemble one another greater than chance, they say that the two languages are related because chance and borrowing are ruled out.

They say that this is the way to prove language relatedness, not the comparative method. The comparative method instead is used to learn interesting things about language families that have already been discovered via comparison, such as reconstructing proto-languages and finding regular sound correspondences.

Splitters say that comparison or mass comparison is not a valid way of proving that languages are related and that only the comparative method can be used to prove this. However, as noted, they set a 6,000- year time limit on the method needed to prove this, and this walls off a lot of potential knowledge and about ancient and long-range language relationships as unprovable and hence undiscoverable. In a way, they are shutting the door to new scientific discovery beyond a certain time frame by claiming that the method needed to make these discoveries doesn’t work beyond X thousand years.

Other lumpers disagree that the comparative method has a time limit on it and are attempting to use the comparative method to reconstruct ancient long-range language families and find regular sound correspondences between them. Unfortunately, most of their efforts are in vain as splitters are using increasingly strict criteria for proof of language relationship and hence are shooting down most if not all of these efforts being done “in the proper way.”

So they are saying that proof must be done in a certain way, but when people try to play by the rules and use that way to find proof, they keep moving the goalposts and using increasingly strict, petty, and quibbling methods to in general say that the relationship is not proven.

So the say, “You must use this tool for your proof!” And then people play fair and use the tool, and almost always say, “Sorry, you didn’t prove it!” It all feels like a game that is rigged to fail is most if not all cases.

Hence, the current trend of extreme conservatism in Historical Linguistics has set up rules seem to be designed to prevent the discovery of most if not all new language families, in particular long-range families older than 6-8,000 years.

I am quite certain that long-range language families such as Altaic (with either three families or five), Indo-Uralic, Uralic-Yukaghir, Hokan, Penutian, Mosan, Almosan, Japanese-Korean, Gulf, Yuki-Gulf, Elamite-Dravidian, Quechumaran, Austroasiatic-Hmong Mien, Coahuiltecan, North Caucasian, or Na-Dene will never be proven in my lifetime, and that’s not to mention the more extreme proposals such as Eurasiatic, Nostratic, Dene-Caucasian, Austric, and Amerind, although the evidence for the first and last of these is quite powerful.

There are simply too many emotions tied up in any of these proposals. Further, many linguists have spent a good part of their careers arguing against these proposals. It is doubtful that any amount of evidence will cause them to change their minds. Scientists, like any other humans, don’t like to be shown that they’re wrong.

Lyle Campbell, Maryanne Mithun, Mauricio Mixco, Sarah Grey Thomason, Joanna Nichols, William Poser, Peter Daniels, Dell Hymes, Larry Trask, Gerrit Dimmendaal, Donald Ringe, Juha Janhunen, William Bright, and Paul Sidwell are among the leaders of this new conservatism.

At first I was very angry at what these people were doing, especially the most egregious cases such as Campbell. Then I realized that people lie and misrepresent things all day long every single day in my life and that this behavior is fairly normal behavior in humans, especially in a mushy area like this one where hard truths are hard to come by and most stated facts are more properly matters of opinion or could be construed that way.

I realized that they are simply defending a scientific paradigm and that unfortunately, this is the rather underhanded and emotion-ridden environment that defending paradigms tends to produce.

Though to be completely honest, I should not be singling these people out because the current conservatism is simply consensus and acts as the current paradigm on the language relatedness question in Historical Linguistics. The people listed above are at the top of the profession and are often considered the best historical linguists. They write books on historical linguistics. A number are considered to be ultimate authorities on questions of language relatedness. They are simply the leading edge of the current conservative consensus and paradigm in the field.

Although granted, of all of them, Campbell seems to be the most extreme conservative. He is also one of the top historical linguists in the world. Mixco, Mithun, and Poser are about on the same level as Campbell.

Campbell, Mithun, Thomason, and Mixco are Americanists whose conservatism was set off by the publication of Joseph Greenberg’s Language in the Americas (LIA) in 1987.

All of the linguists above are noted for the excellent scholarship.

The conservatives who are denying most if not all new families are are called splitters.They tend to be very angry if not out and out abusive, engaging in bullying, mockery, ridicule, ostracization, and all of the usual techniques used in science against the proposers of a new paradigm.

The people who propose long-range families are called lumpers. Lumpers are heavily disparaged in the field nowadays such that almost no one wants to be known as a lumper or associated with such. However, many other historical linguists seem to be taking a more moderate fence-sitter stance where they are open to questions of new language families, including long-range families.

Among the long-range families that the moderates are open to considering nowadays are Indo-Uralic, Dene-Yenisien, and Austro-Tai. Some of the smaller long-range families in the Americas even have supporters among the most hardline of splitters. I’m even dubious about well-argued proposals such as Dene-Yenisien.

Thomason takes extreme umbrage to the notion that splitters have a bias that will not allow few if any new families to be discovered after Greenberg compared them with Malcolm Guthrie’s objections to Greenberg’s new classification of Bantu. However, after thinking this over for some time now, I now believe that Greenberg is correct. The splitters have their minds made up. They are going to allow few if any new families to be discovered. A few of them have caved a bit.

I also work in mental health, and it’s pretty obvious to me when something is not right about a scientific debate. I’ve been getting that vibe about the splitters versus lumpers debate from the very start. When a debate in science has degenerated into bias, ideology and ideologues, propaganda, politics, and in particular extreme emotion, it gives off a certain intuitive feel about it. This debate has felt this way from Day One. To put it simply, the debate simply doesn’t smell right. I have a feeling that science left the room along time ago here.

One thing I noticed was that people who have worked on one particular language or family for much of their careers are especially angry and aggressive about the notion that their family could possibly be related to anything else. Indeed famous linguists were remarking on this tendency as early as 1901. Among the reasons given was that they had their hands full already without new work to take on and a disinclination to see their language family related to anything else as this would deny its specialness.

Trask is forceful that Basque could not possibly have any outside relatives.

I saw a debate on the Net some years ago with Trask and a Spanish assistant holding court over a debate over the external relations of Basque. Those who argued for external relations were pushing a relationship with the Caucasian languages, which is possible though not proven in my opinion. Trask and his assistant were very angry and aggressive in holding down the fort. Apparently everything was a Spanish borrowing. The debate didn’t smell right at all.

With a background in psychology, I wonder what is going on here. One possibility is as Greenberg suggests and as was suggested back in 1901 – simple narcissism. When one specializes in a language family for a long time, it probably become blurred with the self such that the self and the family become married to each other, and it’s hard to tell where one ends and the other begins. Yourself and the family you’ve spent your career working on become one and same thing. If your family is not related to anything else, it’s special.

We all think we are special. This is the essence of human narcissism. To say that their favorite language has relatives is to deny its specialness almost as if to say that our egos were not real but were instead extensions of other people’s egos. Actually if you read Sartre or study modern particle physics, that’s not a bad theory, but most people bristle at the notion.

I met Korean and Japanese people when I was doing my Masters. Both beamed when they told me that their language had no known relatives. Of course that made it special in their eyes and played right into their ethnocentrism.

Another problem may be the trajectory of one’s career. If one has been arguing forcefully for 30 years that there are no known relations to your family, your reputation is going to take a huge hit if you have to agree that you were wrong all those years.

There is also a politics question.

Another reason is Politics. We are dealing here with a Paradigm. For a good description of a Scientific Paradigm, see Thomas Kuhn’s The Structure of Scientific Revolutions. Kuhn holds that science is by its nature very conservative, some sciences being more conservative than others. A Paradigm is set up when the field reaches a satisfactory consensus that a particular theory is correct. After a while, serious barriers go up to any challenges to overthrow the proven theory.

The challenges are first ignored, then ridiculed (often severely), then attacked (often ferociously) and then, if the challenge is successful, it is accepted (often slowly and grudgingly). Kuhn pointed out that defenders of the old theory are usually so reluctant to see the paradigm overthrown that we often must wait literally until their deaths to finally overthrow the paradigm. They defend it to their deathbeds. I suggest we are dealing with something more than pure empiricism here.

It is quite risky to challenge a paradigm in science. People’s careers have suffered from it. A supporter of Keynesian economics, then challenging the current paradigm in economics, could not get hired at any university in the US during the 1930’s.

In the splitters versus lumpers debate, we have been in the Anger phase for some time now. We seem to be settling out of it, as many are taking a fence-sitting position and arguing for attempts to resolve the debate to make it less heated.

The Paradigm here involves extreme skepticism about any new language families to the point that any new families are simply going to be rejected on all sorts of grounds. Paradigms involve politics at the academic level. When a Paradigm is set up in science, almost all scientists write and do research within the paradigm. Anything outside of the paradigm is derided as pseudoscience or worse.

The problem is that when a Paradigm in in effect, all scholars are supposed to publish within the Paradigm. Publishing outside the paradigm is regarded as evidence that one is a kook, a crank, is practicing pseudoscience, or that one is crazy or a fool. It is instructive in this debate to note that most of the prominent lumpers are independent scholars operating outside of the politics of academia.

I have had them tell me that the only reason they can take the lumper position that they do is because they are independent and don’t have a university job, so there are no repercussions if they are wrong. They told me that if they had a professorship, they would not be able to do this work. They have also told me that they know for a fact that certain splitters might jeopardize their jobs, careers, and especially their funding if they took a lumper position. This was given as one of the reasons for their dogmatic splitterism.

In addition, science works according to fads, or more properly, standard beliefs. The trends for these beliefs are set by the biggest names in the field. The biggest names in Linguistics are all splitters now. They are the trendsetters, especially in whatever specialty of Historical Linguistics you are working in. Everyone else in the field is dutifully following in their footsteps. As an up and coming young scholar, you are supposed to follow the proper trends and hypotheses of your field to uphold the consensus of scholars in your area of specialty. As you can see there is a lot more than simple empiricism going on here.

With my background, I look for psychological motivations anywhere I can find them. And science is no stranger to bias and emotional psychological motivations driving, or usually distorting it. We are human and humans have emotions. Emotion is the enemy of logic. Logic is the basis of empiricism. Hence, emotions are the enemy of science.

Scientists are supposed to remain objective, but alas, they are humans themselves and subject to all of the emotional psychological motivations that the rest of them are. Scientists are supposed to police themselves for bias, but that’s probably hard to do, especially if the bias is rooted in psychological processes or in particular if it is unconscious, as many such processes are.

Campbell’s case is an extreme one, but I believe it is simply motivated by internal psychological process inside of the man himself.

Campbell is driven by psychological complexes. His entire turn towards extreme conservatism in this debate was set off by the huge feud he had with Greenberg, and everything since has flowed from that. He took a very angry position that LIA was completely false and did his best to trash its reputation far and wide. This disparagement is still the order of the day, and Greenberg’s name is as good as mud in the field.

Then Campbell generalized his extreme splitterist reaction to LIA out to all of the language families in the world because if he allowed any new families elsewhere in the world, he might have to allow them in the Americas, and he could not countenance that. Note also that Campbell has gone out of his way to specifically attack Greenberg’s four-family split in his proposal for language families in Africa.

This proposal, done with Greenberg’s derided method of mass comparison, has had a successful result in Africa and has been proven with the test of time. Campbell cannot allow this because if he admits that Greenberg was right in Africa, he might have to accept that he might be right in the Americas too, and that’s beyond the pale. So in his recent works he has specifically set out to state that Afroasiatic, Nilo-Saharan, Niger-Kordofanian, and Khoisan – the four families of Greenberg’s classification – have not been proven to exist yet. The truth is exactly the opposite, but the psychological process here is bald and naked for all to see.

Here he specifically trashes these language families because they were discovered by Joseph Greenberg, Campbell’s bete noir. Campbell’s agenda is to show the Greenberg is a preposterous kook and crank, although he was one of the greatest linguists of the 20th century. Greenberg’s African work is regarded as true, and this poses a problem if Campbell is to characterize Greenberg as a charlatan.

If Greenberg was right about one thing, could he not be right about another? In order to lay the foundation for the theory that Greenberg’s method doesn’t work and that it cannot discover any language relationships, Campbell will have to deny the method ever had any successes. So he sets about to deny that Greenberg’s four African families are proven.

Splitters have come up with a repertoire of reasons to shoot down proposed language relations and most are pretty poor.

They rely on overuse of the borrowing, chance, sound symbolism, nursery word, and onomatopoeia explanations for non-relatedness. There is also an overuse of the comparative method with excessively strict standards being set up for etymologies and sound correspondences. In a number of cases, linguists are going back to the etymologies of their proto-languages and reducing them by up to half.

In the last 20 years, Uralicists have gone back over the original Proto-Uralic etymologies and gotten rid of fully half of them (from 2,000 down to 1,000) on a variety of very poor reasons, mostly irregular sound correspondences. It appears to me that while there were some obvious bad etymologies in there, most of the ones that were thrown out were perfectly good.

Irregular sound correspondences is a bad reason to throw out an etymology. Keep in mind that 5

This is not just conservatism. It is out and out Reaction. Worse, it is nearly a Conservative Revolution, which I won’t define further. It is akin to a city council declaring that all of the old, beautiful buildings in the city are going to be torn down because they were not constructed properly. Will they be rebuilt? Well, of course not. Most of the top Uralicists are involved in this silly and destructive project.

In a recent paper, George Starostin warned that the splitters were not just conservatives determined to stop all progress. He pointed out that there was actually a trend towards rejection and going backwards in time to dismantle families that have already set up on the grounds that they were not done perfectly enough. As we can see, his warning was prescient.

There are statements being made by moderates that both sides, the splitters and the lumpers, are being equally unreasonable. As one linguist said, the debate is between lazy lumpers (Just believe us, don’t demand that we prove it!) and angry splitters (Not only is this new family false, but all new families proposed from now on will also be shot down!). He suggested that they are both wrong and that the solution lies in a point in the middle. I don’t have a problem with this moderate centrist belief

The splitter notion itself rests on an obvious falsehood, that there are hundreds of language families in the world that have no possible relationship with each other.

According to Campbell, there are 160 language families and isolates in the Americas. The question is where did all of these entities come from. Keep in mind, in Linguistics, the standard view is that these 160 entities are not related to each other in any way, shape, or form. Thinking back, this means that language would have had to have developed in humans 160 times among the Amerindians alone.

The truth is that there was no polygenesis of language.

Sit back and think for a moment. How could language possibly have been independently developed more than one time? Obviously it arose in one group. How could it have arose in other groups too? It couldn’t and it didn’t. Did some of the original speakers go deaf, become mutes, forget all their language, and  then have children, raising them without language, in which case the children devised language for themselves?

Children need comprehensible input to develop language. No language to hear in the environment, no language for the children to acquire on their own. With coclear implants, formerly deaf people are now able to hear for the first time. A woman got hers at age 32. Since she missed the Critical Period for language development, the window of which closes at age 8, she  has not, even at this late  date, been able to acquire language satisfactorily. She missed the boat. No input, no language.

Obviously language arose only once among humans. It had to. And hence, all human languages are related to each other de facto whether we can “prove” it by out fancy methods or not. In other words, all human languages are related. Those 160 language families and  isolates in the Americas? All related. Now we may not be able to prove which languages they are related to specifically and most closely, but we know they are all related to each other.

In the physical sciences, including Evolutionary Psychology, many things are simply assumed because the alternate theories could not have happened. But we have no evidence of much of anything in Evolutionary Psychology or Evolutionary Anthropology. We know our ancestors lived in X place at Y times, but we have no idea what they were doing there. We can’t go back in time to prove that this or that happened.

Using the logic of linguists, since we cannot make time machines to go back in time and make theories about Evolutionary Anthropology and Evolutionary Psychology of these peoples, we can make no statements about this matter, as the only way to prove it would be to see it. In physics, there are particles that we have never seen. We have simply posited their existence because according to our theories, they have to exist. According to linguists, we could not posit the discovery of these particles unless we see it.

Contrary to popular rumor, everything in science does not have to be “proven” by this or that rigorous method. Many things are simply posited, as no real evidence for their existence exists, either because we were not there or because we can’t see them, or in the case of pure physics, we can’t even test out our theories. They exist simply because they have to according to our existing theories, and all competing theories fall down flat.

Well, the Americanists beg to disagree. Greenberg’s theory was so extreme and radical that the entire field erupted in outrage. None of their alternate theories, not even one of them, make the slightest bit of sense.

Despite the fact that these languages are obviously related to each other, in order to “officially prove it” we have to use a method called the comparative method whereby proto-languages and families are reconstructed and regular sound correspondences are shown between the languages being studied.

This is the only way that we can prove one language is related to another. That’s simply absurd for a few reasons.

First of all, I concur with Joanna Nichols that the comparative method does not really work on language families older than 6-8,000 years. Beyond that time, so many sound changes have taken place, semantics have been distorted, and terms fallen out of use that there’s not much of anything left to reconstruct. Furthermore, time has washed away any evidence of sound correspondences.

Although Nichols is a splitter, I have to commend her. First, she’s right above.

Second, realizing this, she says that the comparative method will always fail beyond this time frame. I believe she thinks then that we need to use new methods if we are to prove that long-range families exist. The method she suggests is “individual-identifying evidence,” which seems to be another way of saying odd morpheme paradigms that were probably not borrowed and are hardly existent outside of that family.

This harkens back to Edward Sapir’s “submerged features,” where he says we can prove the existence of language families by these small morphemic resemblances alone.

The rest of the field remain sticks in the mud. They say that we must use the comparative method to discover that languages are related because no other method exists. The problem is that as noted, as splitters themselves note, if the comparative method fails beyond 6,000 years back, all attempts to prove language families that old or older are bound to fail.

The splitters seem positively gleeful that according to their paradigm, few if any new language families will be discovered. This delight in nihilism seems odd and disturbing. What sort of science is gleeful that no new knowledge will be found? Even in the even that this is true, it’s depressing. Why get excited about something so negative?

Many language families in the world were discovered by Greenberg’s “mass comparison” or simply comparing one language to another, which should be called “comparison.” And in fact, many of the smaller language families in the world are still being posited by the means of comparison or mass comparison. Comparison need not be the broad, sweeping, forest for the trees, holistic method Greenberg employs. I argue that it means lining up languages and looking for common features. We could be lining up one language against another and that would also be “comparison.”

It need not be a shallow examination. One could examine a possible language for five, ten, fifteen, or twenty years.

After studying a pair or group of languages for some time, if one finds a group of core vocabulary items that resemble one another and are above the rate found by chance (

I fail to understand why examining a language or group of languages for a long period of time to find resemblances and try to rule out chance or borrowings is a ridiculous method. What’s so ridiculous about that? Sure, it’s nice to reconstruct and get nice sound correspondences going, but it’s not always necessary, especially in long-range comparisons when such methods are doomed to failure.

One more thing: if splitters say that the comparative method fails beyond 6,000 years, why do they keep putting long-range families to the test using the comparative method? After all, the result will always come up negative, right? What’s the point of doing a study you know will come up negative? Just to get your punches in?

There are a number of folks who have bought into the splitters’ arguments and are trying to discover long-range families by the comparative method of reconstructing the proto-language and finding regular sound correspondences between them. A number of them claim to have been successful. There have been attempts to reconstruct proto-languages and find regular sound correspondences with Altaic, Nostratic, Dene-Caucasian, Dene-Yenisien, Austro-Tai, Totonozoquean, and Uralo-Yukaghir.

Altaic, Nostratic, and Dene-Caucasian all have proto-languages reconstructed with good sound correspondences running through them. Altaic and Nostratic have etymological dictionaries containing many words, 2,300 proto-forms in the case of Altaic in a 1,000 page volume. Further, a considerable Nostratic proto-language was reconstructed by Dogopolsky and Illich-Svitych.

All of these efforts claim that they have proven their hypotheses. However, the splitters such as Campbell have rejected all of them. So you see, even when people follow the mandated method and play it by the book the way they are supposed to, the splitters will nearly always say that the efforts come up short. It’s a rigged game.

How about another question? If the comparative method fails is doomed beyond 6,000 years, why don’t we use another method to discover these relationships? The splitter rejoinder is that there is no other method. It’s the comparative method or nothing. But how do they know this? Can they prove that other methods can never be used to successfully discover a language relationship?

The following quotes are from a textbook or general text on Historical Linguistics by Lyle Campbell and Mario Mixco, A Glossary of Historical Linguistics. The purpose of this paper will be misrepresented as critics who will say that I am a lumper who is saying criticizing splitters for their opposition to known language families.

There is some of that here, but more than lumper propaganda, what I am trying to do here more than anything else is to show how Campbell and Mixco have been untruthful about linguistic specialist consensus regarding these families. In most cases, they are openly misrepresenting the state of consensus in the field.

As will be shown, Campbell and Mixco repeatedly seriously distort the state of consensus regarding many language families, particularly long-range ones. They usually favor a more negative and conservative view, saying that a family has little support when it has significant support and saying it is controversial when the consensus in the field is that the family is real. Campbell and Mixco engage in serious distortions of fact all through this text:

Campbell and Mixco:

Afroasiatic: Enjoys wide support among linguists, but it is not uncontroversial, especially with regard to which of the groups assumed to be genetically related to one another are to be considered true members of the phylum.

There is disagreement concerning Cushitic, and Omotic (formerly called Sidama or West Cushitic) is disputed; the great linguistic diversity within Omotic makes it a questionable entity for some. Chadic is held to be uncertain by others. Typological and areal problems contribute to these doubts. For example, some treat Cushitic and Omotic together as a linguistic area (Sprachbund) of seven families within Afroasiatic.

Campbell and Mixco are wrong. Afroasiatic is not controversial at all. There is widespread consensus that the family exists and that all of the subfamilies are correct.

The “we can’t reconstruct the numerals” argument is much in evidence here too. See the Altaic debate below for more on this. One argument against Altaic is “We can’t reconstruct the numerals.” However, Afroasiatic is a recognized family and not only has reconstruction itself proved difficult, but the numerals in particular are a gigantic mess. It seems that one does not need to have a fully reconstructed numeral set after all to have a proven language family.

There is consensus that Cushitic is a valid entity. Granted, there has been some question about Omotic, but in the last 10-15 years, consensus has settled on an agreement that Omotic is part of Afroasiatic.

The great diversity of Omotic is no surprise. Omotic is probably 13,000 years old! It’s amazing that there’s anything left at all after all that time.

Where do we get the idea that a language family cannot possibly be highly diverse? Chadic is also uncontroversial by consensus. I am not aware of any serious proposals to see Cushitic and Omotic as an Altaic-like Sprachbund of mass borrowings. Campbell and Mixco’s comments above are simply not correct. The only people questioning the validity of Afroasiatic or any of its components are Campbell and Mixco, and they are not an experts on the family.

Campbell and Mixco:

Berber is usually believed to be one of the branches of Afroasiatic.

This is far too pessimistic. Berber is recognized by consensus as being one of the branches of Afroasiatic.

Campbell and Mixco:

Niger-Kordofanian (now often just called Niger-Congo): A hypothesis of distant genetic relationship proposed by Joseph H. Greenberg in his classification of African languages. Estimated counts of Niger-Kordofanian languages vary from around 900 to 1,500 languages. Greenberg grouped ‘West Sudanic’ and Bantu into a single large family, which he called Niger-Congo, after the two major rivers, the Niger and the Congo ‘in whose basins these languages predominate’ (Greenberg 1963: 7).

This included the subfamilies already recognized earlier: (1) West Atlantic (to which Greenberg joined Fulani, in a Serer-Wolof-Fulani [Fulfulde] group), (2) Mande (Mandingo) (thirty-five to forty languages), (3) Gur (or Voltaic), (4) Kwa (with Togo Remnant) and (5) Benue-Congo (Benue-Cross), with the addition of (6) Adamawa-Eastern, which had not previously been classified with these languages and whose classification remains controversial.

For Greenberg, Bantu was but a subgroup of Benue-Congo, not a separate subfamily on its own. In 1963 he joined Niger-Congo and the ‘Kordofanian’ languages into a larger postulated phylum, which he called Niger-Kordofanian.

Niger-Kordofanian has numerous supporters but is not well established; the classification of several of the language groups Greenberg assigned to Niger-Kordofanian is rejected or revised, though most scholars accept some form of Niger-Congo as a valid grouping.

As Nurse (1997: 368) points out, it is on the basis of general similarities and the noun-class system that most scholars have accepted Niger-Congo, but ‘the fact remains that no one has yet attempted a rigorous demonstration of the genetic unity of Niger-Congo by means of the Comparative Method.’

There is consensus among scholars that Niger-Kordofanian is a real thing.

Campbell and Mixco:

Nilo-Saharan: One of Greenberg’s four large phyla in his classification of African languages. In dismantling the inaccurate and racially biased ‘Hamitic,’ of which Nilo-Hamitic was held to be part, Greenberg demonstrated the inadequacy of those former classifications and argued for the connection between Nilotic and Eastern Sudanic.

He noted that ‘the Nilotic languages seem to be predominantly isolating, tend to monosyllabism, and employ tonal distinctions’ (Greenberg 1963: 92). To the extent that this classification is based on commonplace shared typology and perhaps areally diffused traits, it does not have a firm foundation. Nilo-Saharan is disputed, and many are not convinced of the proposed genetic relationships. It is generally seen as Greenberg’s wastebasket phylum, into which he placed all the otherwise unaffiliated languages of Africa.

First of all, Nilo-Saharan is not classified based on its language typology which were perhaps areally diffused. There is also a great deal of the more typical evidence in favor of this language family. Second,  it is not true that it lacks a firm foundation and that many are not convinced of its reality. The consensus among experts is that this family exists and the overwhelming majority of the subfamilies and isolates Greenberg put it in are correct.

Saying that it is a wastebasket phylum does not make sense because the Nilo-Saharan languages are only found in  a certain part of Africa. If it was truly such a phylum, there would be languages from all over Africa placed in this family.

According to Roger Bench, a moderate, there is now consensus in the last 10-15 years that Nilo-Saharan is a real thing.

Consensus has formed that 7

Yes, Campbell and Mixco say that Nilo-Saharan is not real, but they are not specialists.

Campbell and Mixco:

Khoisan: A proposed distant genetic relationship associated with Greenberg’s (1963) classification of African languages, which holds some thirty non-Bantu click languages of southern and eastern Africa to be genetically related to one another. Greenberg originally called his Khoisan grouping ‘the Click Languages’ but later changed this to a name based on a created compound of the Hottentots’ name for themselves, Khoi, and their name for the Bushmen, San.

Khoisan is the least accepted of Greenberg’s four African phyla. Several scholars agree in using the term ‘Khoisan’ not to reflect a genetic relationship among the languages but, rather, as a cover term for all the non-Bantu and non-Cushitic click languages.

Although it is probably true that Khoisan is the least accepted of Greenberg’s families, that’s not saying much, as it only means that 8

According to George Starostin, in the last 5-10 years, there is now consensus that Khoisan exists. There are five major Khoisan scholars, and four of them agree that Khoisan is real, with all of them including Sandawe and most including Hadza. There is one, Traill, who says it’s not real, but he is also a notorious Africanist splitter.

Campbell and Mixco:

Eurasiatic: Greenberg’s hypothesis of a distant genetic relationship that would group Indo-European, Uralic–Yukaghir, Altaic, Korean–Japanese–Ainu, Nivkh, Chukotian and Eskimo–Aleut as members of a very large ‘linguistic stock’. While there is considerable overlap in the putative members of Eurasiatic and Nostratic there are also significant differences. Eurasiatic has been sharply criticized and is largely rejected by specialists.

I have no doubt that Eurasiatic has been sharply criticized, but apart from a negative review in Language by Peter Daniels, the controversy seems quite muted compared to the furor over Amerind. I am also not sure that it is largely rejected by specialists. It probably is, but most of them have not even bothered to comment on it. I believe that this family is one of the best long-range proposals out there.

Based on the data from the pronouns alone, it’s obviously a real entity, though I would include Indo-European, Uralic-Yukaghir, Altaic including Japanese and Korean, Chukotian, and Eskimo-Aleut, leaving out Nivki for the time being and certainly leaving out Ainu. Nivki does seem to be a Eurasiatic language but it’s not a separate node. Instead it may be a part of the Chukotian family. Or even better yet, it seems to be part of a family connected to the New World via the Almosan family in the Americas.

I feel that Eurasiatic is a much more solid entity than Nostratic. Not that I am against Nostratic, but it’s more that Eurasiatic is a simple hypothesis to prove and with Nostratic, I’m much less sure of that. On the other hand, to the extent that Nostratic overlaps with Eurasiatic, it is surely correct.

Campbell and Mixco:

Indo-Anatolian: The hypothesis, associated with Edgar Sturtevant, that Hittite (or better said, the Anatolian languages, of which Hittite is the best known member) was the earliest Indo-European language to split off from the others. That is, this hypothesis would have Anatolian and Indo-European as sisters, two branches of a Proto-Indo-Hittite.

The more accepted view is that Anatolian is just one subgroup of Indo-European, albeit perhaps the first to have branched off, hence not ‘Indo-Hittite’ but just ‘Indo-European’ with Anatolian as one of its branches. In fact the two views differ very little in substance, since, in either case, Anatolian ends up being a subfamily distinct from the other branches and in the view of many the first to branch off the family.

The view that Anatolian is just another subgroup of IE is not the more accepted view. In fact, it has been rejected by specialists. Indo-Europeanists have told me that Indo-Anatolian is now the consensus among Indo-Europeanists, so Campbell and Mixco’s statement that Indo-Anatolian is a minority view is false.

Campbell and Mixco:

Nostratic (< Latin nostra ‘our’): A proposed distant genetic relationship that, as formulated in the 1960s by Illich-Svitych, would group Indo-European, Uralic, Altaic, Kartvelian, Dravidian and Hamito-Semitic (later Afroasiatic), though other versions of the hypothesis would include various other languages. Nostratic has a number of supporters, mostly associated with the Moscow school of Nostratic, though a majority of historical linguists do not accept the claims.

There are many problems with the evidence presented on behalf of the Nostratic hypothesis. In several instances the proposed reconstructions do not comply with typological expectations; numerous proposed cognates are lax in semantic associations, involve onomatopoeia, are forms too short to deny chance, include nursery forms and do not follow the sound correspondences formulated by supporters of Nostratic.

A large number of the putative cognate sets are considered problematic or doubtful even by its adherents. More than one-third of the sets are represented in only two of the putative Nostratic branches, though by its founder’s criteria, acceptable cases need to appear in at least three of the Nostratic language families. Numerous sets appear to involve borrowing. (See Campbell 1998, 1999.) It is for reasons of this sort that most historical linguists reject Nostratic.

It is probably correct that consensus among specialists is to reject Nostratic, but serious papers taking apart of the proposal seem to be lacking. Nevertheless, most dismiss it and it is beginning to enter into the emotionally charged terrain of Altaic and Amerind, particularly the former, and belief in it is becoming a thing of ridicule as it is for Altaic. Nevertheless, there have been a few excellent linguists doing work on this very long-range family for decades now.

Campbell and Mixco:

Indo-Uralic: The hypothesis that the Indo-European and Uralic language families are genetically related to one another. While there is some suggestive evidence for the hypothesis, it has not yet been possible to confirm the proposed relationship.

This summary seems too negative. Indo-Uralic is probably one of the most promising long-range proposals out there. I regard the relationship between the two as obvious, but to me it is only a smaller part of the larger Eurasiatic family. Frederick Kortland has done a lot of good work on this idea. Even some hardline splitters are open to this hypothesis.

Campbell and Mixco:

Altaic: While ‘Altaic’ is repeated in encyclopedias and handbooks most specialists in these languages no longer believe that the three traditional supposed Altaic groups, Turkic, Mongolian and Tungusic, are related. In spite of this, Altaic does have a few dedicated followers.

The most serious problems for the Altaic proposal are the extensive lexical borrowing across inner Asia and among the ‘Altaic’ languages, lack of significant numbers of convincing cognates, extensive areal diffusion and typologically commonplace traits presented as evidence of relationship.

The shared ‘Altaic’ traits typically cited include vowel harmony, relatively simple phoneme inventories, agglutination, their exclusively suffixing nature, (S)OV ([Subject]-Object-Verb) word order and the fact that their non-main clauses are mostly non-finite (participial) constructions.

These shared features are not only commonplace typological traits that occur with frequency in unrelated languages of the world and therefore could easily have developed independently, but they are also areal traits shared by a number of languages in surrounding regions the structural properties of which were not well-known when the hypothesis was first framed.

This one is still up in the air, but Campbell and Mixco are lying when they say that idea has been abandoned. Most US linguists regard it as a laughingstock, and if you say you believe in it you will experience intense bullying and taunting from them. Oddly enough, outside the US, in Europe in particular, Altaic is regarded as obviously true. However, notorious anti-Altaicist Alexander Vovin has camped out in Paris and is now spreading his nihilistic doctrine to Europeans there.

The problem is that almost all of the US linguists who will laugh in your face and call you an idiot if you believe in Altaic are not specialists in the language. However, I did a study of Altaic specialists, and 7

So the anti-Altaicists are pushing a massive lie – that critical consensus has completely abandoned Altaic and regards as a laughingstock, but their project is more Politics and Propaganda than Science. In particular, it’s a fad. So Altaic is in the preposterous position where almost all of the people who know nothing about it will laugh in your face and call you an idiot if you believe in it and the overwhelming majority of specialists will say it’s real.

Altaic must be the only nonexistent family that has an incredibly elaborate 1,000 page etymological dictionary, full reconstructions of the proto-languages, etymologies of over 2,000 Altaic terms, and elaborate sound correspondences running through it. The anti-Altaicists use the silly “we can’t reconstruct the numerals so it’s not real” line here.

Altaic is obviously true based on 1-2 person pronoun paradigms at an absolute minimum. The anti-Altaic argument of course, is preposterous. As noted, they dismiss a vast 1,000 page Etymological Dictionary with 2,300 reconstructed etymologies as a hallucinated work.

There are vast parallels in all three families at all levels, in particular in the Mongolic-Tungusic family, which gets a 10

The argument that entire 1-2 pronoun paradigms have been borrowed is particularly preposterous because 1-2 pronouns are almost never borrowed anyway, and there has never been a single case of on Earth of the borrowing of a 1-2 person pronoun paradigm, much less the borrowing of one at the proto-language level. So the anti-Altaicists are arguing that something that has never happened anywhere on Earth not only happened, but happened more than once among different proto-languages. So the anti-Altaic argument is that something that could not possibly have happened actually occurred.

This is the conclusion of every paper the splitters write. Something that has never occurred on Earth and probably could not possibly happen not only occurred, but occurred many times around the globe for thousands of years.

Many regard including Japonic and Koreanic in Altaic as dubious, although having looked over the data, I am certain that they are part of Altaic. But they seem to be further away from the traditional tripartite system than the traditional three families are to each other. If we follow the theory that Japanese and Korean have been split from Proto-Altaic for 8,000 years, this starts to make a lot more sense.

The ridiculous massive borrowings argument specifically fails for geographical reasons. Proto-Turkic was never next door to Proto-Mongolic and Proto-Tungusic. The Proto-Altaic homeland is in the Khingan Mountains in Western Manchuria and Eastern Mongolia. Tungusic split off from Altaic 5,300 years ago, leaving Proto-Turkic-Mongolic in Khingans. 3,400 years ago, Proto-Turkic broke from Proto-Turkic-Mongolic and headed west to Northern Kazakhstan and the southern part of the Western Siberian Plain, leaving Mongolic alone in the Khingans.

Proto-Transeurasian – Khingans 9,000 YBP

Proto-Korean – Liaojiang on the north shore of the Bohai Sea 8,000 YBP.

Proto-Japanese – Northern coast of the Shandong Peninsula on the southern shore of the Bohai Sea 8,000 YBP

Proto-Tungusic – Amur Peninsula 5,300 BP. Breaks apart 2,000 YBP.

Proto-Turkic – Northern Kazakhstan 3,400 BP.

Proto-Mongolic – Khingans 3,400 BP.

Can someone explain to me how Mongolic and Tungusic borrow from Turkic 3,000 miles away in a different place at a different time in this scenario? Can someone explain to me how any of these proto-languages borrowed from each other at all, especially as they were in different places at different times?

Not only that but supposedly both Proto-Mongolic and Proto-Tungusic each borrowed from Proto-Turkic separately. These borrowings included massive amounts of core vocabulary in addition to an entire 1st and 2nd person pronoun paradigm.

Keep in mind that the borrowing of this paradigm, something that has never happened anywhere, supposedly occurred not just once but twice, between Proto-Tungusic 5,300 YBP on the Amur from Proto-Turkic in North Kazakhstan 3,000 miles away 2,000 later, and at the same time, between  Proto-Mongolic in the Khingans and Proto-Turkic in Northern Kazakhstan 3,000 miles away. How exactly did this occur?

And can someone explain to me how Proto-Korean and Proto-Japanese borrow from either of the others under this scenario?

Campbell and Mixco:

Turkic: A family of about thirty languages, spoken across central Asia from China to Lithuania. The family has two branches: Chuvash (of the Volga region) and the non-Chuvash Turkic branch of relatively closely related languages. Some of the Turkic languages are Azeri, Kyrgyz, Tatar, Crimean Tatar, Uighur, Uzbek, Yakut, Tuvan, and Tofa. Turkic is often assigned to the ‘Altaic’ hypothesis, though specialists have largely abandoned Altaic.

As noted above, it is simply incorrect that specialists have largely abandoned Altaic. This is simply carefully crafted propaganda on the part of Campbell and Mixco. In fact, my own study showed that 7

Campbell and Mixco:

Some scholars classify Korean in a single family with Japanese; however, this is a controversial hypothesis. Korean is often said to belong with the Altaic hypothesis, often also with Japanese, though this is not widely supported.

Japonic-Koreanic has considerable support among specialists in these languages, although it is not universally accepted. Campbell and Mixco are excessively negative about the level of support for an expanded Altaic. In fact, an expanded Altaic which includes Japanese and Korean in some part of it has significant though probably not majority support. Perhaps 30-4

Shandong Peninsula with Tianjin and Liaojiang across the Bohai Sea, location of the Proto-Japonic and Proto-Korean homelands.

Proto-Japanic and Proto-Koreanic were both spoken in Northeastern China 8,000 YBP. Proto-Japonic was spoke on the north of the Shandong Peninsula and Proto-Koreanic was spoken across the Bohai Sea in Tianjin and especially across the Bohai Straights on the Liaodong Peninsula. They may have stayed here next to each other for 3,000 years until the Proto-Koreanics moved to the Korean Peninsula 5,000 YBP, displacing the Ainuid types there. Proto-Japonics probably stayed in Shandong until 2,3000 YBP when they left to populate Japan and the Ryukus, displacing the Ainu who were already there.

Campbell and Mixco:

Yeniseian, Yenisseian: Small language family of southern Siberia of which Ket (Khet) is the only surviving member. Yeniseian has no known broader relatives, though some have been hypothesized (see the Dené-Caucasian hypothesis).

Campbell and Mixco state and serious untruth here, including some weasel words. By discussing Dene-Caucasian in the same breath as relatives of Yenisien, they are able to deflect away from the more widely accepted proposal of a link between Yenisien in the Old World and Na-Dene in the New World. This is Edward Vajda’s Dene-Yenisien proposal.

The problem is that this long-range proposal has the support of many people, including splitter Johanna Nichols. Of the 17 experts who weighed in on Dene-Yenisien, 15 of them had a positive view of the hypothesis. Campbell and Mixco are the only two who are negative, but neither are experts on either family. All specialists in either or both families support the proposal. When 15 out of 17 is not enough, one wonders at what point the field reaches a consensus. Must we hold out for Campbell and Mixco’s approval for everything?

Campbell and Mixco:

Nivkh (also called Gilyak): A language isolate spoken in the northern part of Sakhalin Island and along the Amur River of Manchuria, in China. There have been various unsuccessful attempts to link Nivkh genetically with various other language groupings, including Eurasiatic and Nostratic.

Granted, there is no consensus on the affiliation of Nivkhi. However, a recent paper by Sergei Nikolaev proved to me that Nivkhi is related to Algonquian-Wakashan, a family of languages in the Americas. One of these languages is Wakashan, and there has been talk of links between Wakashan and the Old World for some time.

Michael Fortescue places Nivkhi in Chukotko-Kamchatkan. Greenberg places it is Eurasiatic as a separate node. But as Chukotko-Kamchatkan is part of Eurasiatic, they are both saying the same thing in a way. My theory is that Nivkhi is Eurasiatic, possibly related to Chukoto-Kamchatkan, and like Yeniseian, is also connected to languages in North America as some of the Nivkhi probably migrated to North America and became the American Indians. In this way, we can reconcile both hypotheses.

There are three specialist views on Nivkhi. One says it is Eurasiatic, the other that it is Chukotian, and the third that it is part of the Algonquian-Wakashan or Almosan family in the New World. Consensus is that Nivkhi is related to one of two other entities – other languages in Northeastern Asia or a New World Amerindian family. So expert consensus seems to have moved away from the view of Nivkhi as an isolate.

Campbell and Mixco:

Paleosiberian languages (also sometimes called Paleoasiatic, Hyperborean languages): A geographical (not genetic) designation for several otherwise unaffiliated languages (isolates) and small language families of Siberia.

Perhaps the main thing that unites these languages is that they are not Turkic, Russian or Tungusic, the better known languages of Siberia. Languages often listed as Paleosiberian are: Chukchi, Koryak, Kamchadal (Itelmen), Yukaghir, Yeniseian (Ket) and Nivkh (Gilyak). These have no known genetic relationship to one other.

Taken as a broad statement, of course this is true. However, Chukchi, Koryak, and Kamchadal or Itelmen are part of a family called Chukutko-Kamchatkan. This family has even been reconstructed. Campbell and Mixco’s statement that these languages have no known genetic relationship with each other is false.

Campbell and Mixco:

Austroasiatic: A proposed genetic relationship between Mon-Khmer and Munda, accepted as valid by many scholars but not by all.

The fact is that Austroasiatic is not a “proposed genetic relationship.” Instead it is now accepted by consensus. That there may be a few outliers who don’t believe in it is not important. I’m not aware of any linguists who doubt Austroasiatic other than Campbell and Mixco, and neither is a specialist. Austroasiatic-Hmong-Mien is the best long-range proposal for Austroasiatic, but it has probably not yet been proven. Austroasiatic is also part of the expanded version of the Austric hypothesis.

Campbell and Mixco:

Miao-Yao (also called Hmong-Mien): A language family spoken by the Miao and Yao peoples of southern China and Southeast Asia. Some proposals would classify Miao-Yao with Sino-Tibetan, others with Tai or Austronesian; none of these has much support.

This seems to be more weasel wording on the part of the authors. By listing Tai or Austronesian and Sino-Tibetan as possible relatives of Miao-Yao and then correctly dismissing it, they leave out a much better proposal linking Hmong-Mien to Austroasiatic.

This shows some promise, but the relationship is hard to see amidst all of the Chinese borrowing. As noted, the relationship between Hmong-Mien and Sino-Tibetan is one of borrowing. The relationship with Tai or Austronesian is part of Paul Benedict’s original Austric proposal. He later turned against this proposal and supported a more watered down Austric with Austronesian and Tai-Kadai, which seems to be nearing consensus support now.

Campbell and Mixco:

Austric: A mostly discounted hypothesis of distant genetic relationship proposed by Paul Benedict that would group together the Austronesian, Tai-Kadai and Miao-Yao.

More weasel wording. It is correct that Benedict’s original Austric (which also included Austroasiatic) was abandoned even by Benedict himself, a more watered down Austric that he later supported consisting of Austronesian and Tai-Kadai called Austro-Tai has much more support. They get around discussing the watered down Austro-Tai with good support by limiting Austric to Benedict’s own theory which even he rejected later in life. In this sense, they misrepresent the debate, probably deliberately.

In fact, evidence is building towards acceptance of Austro-Tai after papers by Weera Ostapirat and Laurence Sagart seem to have proved the case using the comparative method. Roger Blench also supports the concept. In addition, to Benedict, it is also supported by  Lawrence Reid, Hui Li, and Lawrence Reid. It is opposed by Graham Thurgood, who is a specialist (he was my main academic advisor on my Master’s Degree in Linguistics). It is also opposed by Campbell and Mixco, but they are not specialists. Looking at expert opinion, we have seven arguing for the theory and one arguing against it. Specialist consensus then is that Austro-Tai is a real language family.

Even the larger version of Austric, including all of Benedict’s families plus Ainu and the South Indian isolate Nihali, has some supporters and some suggestive evidence that it may be correct.

Campbell and Mixco:

Tai-Kadai: A large language family, generally but not universally accepted, of languages located in Southeast Asia and southern China. The family includes Tai, Kam-Sui, Kadai and various other languages. The genetic relatedness of several proposed Tai-Kadai languages is not yet settled.

Tai-Kadai is not “mostly but not universally accepted.” It is accepted by consensus as an existent language family. Perhaps whether some languages belong there is in doubt but the proposal itself is not controversial. Campbell and Mixco’s statement that Tai-Kadai remains controversial is a serious distortion of fact.

Campbell and Mixco:

Na-Dene: A disputed proposal of distant genetic relationship, put forward by Sapir, that would group Haida, Tlingit and Eyak-Athabaskan. There is considerable disagreement about whether Haida is related to the others. The relationship between Tlingit and Eyak-Athabaskan seems more likely, and some scholars misleadingly use the name ‘Na-Dené’ to mean a grouping of these two without Haida.

Levine and Michael Krauss, two top Na-Dene experts, are on record as opposing the addition of Haida to Na-Dene for 40 years. A recent conference about Edward Vajda’s Dene-Yenisien concluded that there was no evidence to include Haida in Na-Dene. However, a recent paper by Alexander Manaster-Ramer made the case that Haida is part of Na-Dene. This paper was enough to convince me. Further, the scholar with the most expertise on Haida has said that Haida is part of Na-Dene. So Campbell and Mixco are correct here that the subject is up in the air with both supporters and opponents.

The statement that a relationship between Tlingit and Eyak-Athabaskan seems “more than likely” is an understatement. I believe it is now linguistic consensus that Tlingit is part of Na-Dene, so Campbell and Mixco’s statement is not quite true.

Campbell and Mixco:

Tonkawa: An extinct language isolate of Texas. Proposals to link Tonkawa with the languages of the Coahuiltecan or Hokan-Coahuiltecan hypotheses have not generally been accepted.

I’m sure it is the case that Coahuiltecan and Hokan-Coahuiltecan affiliations of Tonkawa have been rejected. A Coahuiltecan connection was even denied by Manaster-Ramer, who recently proved that the family existed. That said, there are interesting  parallels between Tonkawa and Coahuiltecan that I cannot explain. However, a recent paper by Manaster-Ramer made the much better case that Tonkawa was in fact Na-Dene.

Campbell and Mixco:

Amerind: The Amerind hypothesis is rejected by nearly all practicing American Indianists and by most historical linguists. Specialists maintain that valid methods do not at present permit classification of Native American languages into fewer than about 180 independent language families and isolates. Amerind has been highly criticized on various grounds.There is an excessive number of errors in Greenberg’s data.

Where Greenberg stops – after assembling superficial similarities and declaring them due to common ancestry – is where other linguists begin. Since such similarities can be due to chance similarity, borrowing, onomatopoeia, sound symbolism, nursery words (the mama, papa, nana, dada, caca sort), misanalysis, and much more, for a plausible proposal of remote linguistic relationship one must attempt to eliminate all other possible explanations, leaving a shared common ancestor as the most likely.

Greenberg made no attempt to eliminate these other explanations, and the similarities he amassed appear to be due mostly to accident and a combination of these other factors.

In various instances, Greenberg compared arbitrary segments of words, equated words with very different meanings (for example, ‘excrement/night/grass’), misidentified many languages, failed to analyze the morphology of some words and falsely analyzed that of others, neglected regular sound correspondences, failed to eliminate loanwords and misinterpreted well-established findings.

The Amerind ‘etymologies’ proposed are often limited to a very few languages of the many involved. Finnish, Japanese, Basque and other randomly chosen languages fit Greenberg’s Amerind data as well as or better than do any of the American Indian languages in his ‘etymologies’; Greenberg’s method has proven incapable of distinguishing implausible relationships from Amerind generally. In short, it is with good reason Amerind has been rejected.

The movement into the Americas came in three waves.

The first wave brought the Amerinds. It is here where the 160 language families reside. According to the reigning theory in Linguistics, this group of Amerindians came in one wave that spoke not only 160 different languages but spoke languages that came from 160 different language families, none of which were related to each other. These being language families which, by the way, we can find scarcely a trace of in the Old World.

The second wave was the Na-Dene people who came along the west coast and then went inland.

The last wave were the Inuits.

Greenberg simply lumped all of the 600 languages of the  Americas into a single family. The argument was good, though I’m not sure he proved that every single one of those languages were all part of Amerind. But a lot of them were. The n- m- 1st and 2nd person pronouns are found in 450 of those languages. The ablauted t’ana, t’una, t’ina word, meaning respectively human child  of either sex, all females including family terms, and all males including family terms are extremely common in Amerind.

So t’ana just means child. T’una means girl, woman, and includes various names for all sorts of female relatives – grandmother, cousin, aunt, niece, etc. T’ina means boy, man, and includes the family terms grandfather, brother-in-law, uncle, cousin, and  nephew. This ablauted paradigm is found across a vast number of these Amerind languages, and it is nonexistent in the rest of the world.

Quite probably most to all of those languages having that term are part of a single family. What are the other arguments? That 300 languages independently innovated these terms, in this precise ablauted paradigm, on their own? What is the likelihood of that?

That these items occurring across such vast swathes of languages is due to chance? But this paradigm does not exist anywhere else, so how could it be due to chance? That these core vocabulary items were borrowed massively all across the Americas, when family terms like that are rarely borrowed? That’s not possible. None of the alternate theories make the slightest bit of sense.

Hence, the Amerind languages that have the n- m- pronoun paradigm and the t’ana, t’una, t’ina ablauted names for the sexes and the terms of family relations by sex are quite probably part of a huge language family. I’m well aware that a few of the languages having those terms could be due to chance. I’m pretty sure that about zero of those pronouns and few, if any, of those family terms were borrowed.

However, not all Amerind languages have either the pronoun paradigm or the ablauted sex term. In those cases, I’m unsure if those languages are all part of the same language. But if you can put those languages in families and reconstruct to the proto-languages and end up with the pronoun paradigm or the ablauted family term reconstructed in the proto-language of that family, I’m sure that family would be part of Amerind. That’s about all you have to do to prove relationship in Amerind.

Campbell and Mixco:

Penutian: A very large proposed distant genetic relationship in western North America, suggested originally by Dixon and Kroeber for the Californian language families Wintuan, Maiduan, Yokutsan, and Miwok-Costanoan. The name is based on words for ‘two’, something like pen in Wintuan, Maiduan, and Yokutsan, and uti in Miwok-Costanoan, joined to form Penutian.

Sapir, impressed with the hypothesis, attempted to add an Oregon Penutian (Takelma, Coos, Siuslaw, and ‘Yakonan’), Chinook, Tsimshian, a Plateau Penutian (Sahaptian, ‘Molala-Cayuse,’ and Klamath-Modoc) and a Mexican Penutian (Mixe-Zoquean and Huave).

The Penutian grouping has been influential, and later proposals have attempted to unite various languages from Alaska to Bolivia with it. Nevertheless, it had a shaky foundation based on extremely limited evidence, and, in spite of extensive later research, it did not prove possible to demonstrate any version of the Penutian hypothesis and several prominent Penutian specialists abandoned it. Today it remains controversial and unconfirmed, with some supporters but with many who doubt it.

The statement that today it “remains controversial and unconfirmed, with some supporters but with many who doubt it,”  has no basis in fact. It is surely controversial and it is probably unconfirmed by linguistic consensus. Yes, it has a number of supporters, and there are quite a few who doubt it. However, among those who doubt it, none of them are specialists in these languages. Hence, we are dealing with an Altaic situation here, where the specialists believe in it but the non-specialists insist it’s nonsense.

In fact, the consensus among the specialists on these languages is that Penutian exists. A Penutian family comprising Maiduan, Utian (Miwok-Costanoan), Wintuan, Yokutsan, Coosan, Siuslaw, Takelma, and Kalapuyan and Alsean (Yakonan), Chinookan, Tsimshianic, Klamath-Modoc (Lutuami), Cayuse and Molala (Waiilatpuan), Sahaptian has been proven to my satisfaction. I am uncertain of the Penutian status of Mixe-Zoque and Huave (Mexican Penutian), although I believe that Huave and Mixe-Zoque are related to each other, albeit at a very deep time depth of 9,000 years.

Anti-Penutianists have not published a paper in a long time. The last one I remembered was published by William Shipley, and he’s been gone for a while. I am not aware of one expert on these languages who says Penutian does not exist.

Campbell and Mixco:

Cayuse-Molala: A genetic classification no longer believed that linked Cayuse (of Oregon and Washington) and Molala (of Oregon) in a single assumed family. The evidence for this was later shown to be wrong and the hypothesis was abandoned.

According to Campbell and Mixco, Cayuse is an isolate. I assume they see Molala as an isolate too. There probably is no Cayuse-Molala family, but Molala is part of Plateau Penutian, and Cayuse may be part of the same group. Plateau Penutian is part of the Penutian hypothesis, which appears to be true. By not mentioning these facts, Campbell and Mixco’s statement is quite misleading.

Campbell and Mixco:

Mosan: A now abandoned proposal of distant genetic relationship that would group Salishan, Wakashan and Chimakuan together.

Another part of this proposal was that Mosan was part of a larger family with Algonquian called Almosan. An excellent series of papers was published recently by Sergei Nikolaev that validated Almosan and proved to me that it was related to Nivkhi in the Old World.

Michael Fortescue argued a few years before that Mosan was a valid entity and that was related to the Old World language Nivkhi. Recently, Murray Gell-Mann, Ilia Peiros, and Georgiy Starostin also supported Almosan and grouped it with Chukotko-Kamchatkan and Nivkhi. David Beck recently argued that Mosan is a language area or Sprachbund instead of a genetic family.

So far we have four specialists arguing that Mosan exists, and one saying it does not. The consensus among specialists seems to be that Mosan is a valid language family. At any rate, Campbell and Mixco’s statement that this proposal is “now abandoned” is false.

For Almosan, we have four specialists saying it exists and two apparently saying it does not. Expert consensus on Almosan is optimistic.

Hokan: A controversial hypothesis of distant genetic relationship proposed by Dixon and Kroeber among certain languages of California; the original list included Shastan, Chimariko, Pomoan, Karok, and Yana, to which they soon added Esselen, Yuman, and later Chumashan, Salinan, Seri, and Tequistlatecan. Later scholars, especially Edward Sapir, proposed various additions to Hokan. Many ‘Hokan’ specialists doubt the validity of the hypothesis.

It is not true that many Hokan specialists “doubt the validity of the hypothesis.” I can’t remember the last time I saw an anti-Hokan paper. Yes, Campbell, Mixco, and Mithun say Hokan does not exist, but they are not specialists. The consensus among specialists such as Mikhail Zhikov, Terence Kaufman, and Marcelo Jokelsy is that Hokan exists. I have only found one specialist who disagrees with the Hokan hypothesis, and she merely doubts the existence of Ch’imáriko.

I believe that a Hokan family consisting of Karuk, Shasta-Palaihnihan, Ch’imáriko, Yana, Salinan, Pomoan, Yuman, Seri, and Tequistlatecan exists, although I would leave out Chumashan, Washo, and Jicaquean or Tolan. Chumashan is an isolate, and while Washo and Tolan may be Hokan at a very deep time depth, the few possible cognates are not enough to provide evidence of this. I am agnostic on Esselen, which is only known from a 350 word list collected by friars at a California mission.

I have not seen any evidence that Coahuiltecan is Hokan. There is some evidence, though not probative enough for me, that Lencan and Misumalpan may be Hokan. Nevertheless, Lencan and Misumalpan form a language family that has even been accepted by Campbell himself. This is the only long-range family proposal he has supported since the publication of LIA.

Although Campbell’s opinion on many hypotheses may be waved away as he is not an expert on that family or language, Lencan and Misumalpan are right up his alley as he is an expert in languages in Central America. He has focused mostly on Mayan, but he also knows the other languages of the region well.

Campbell and Mixco:

Cochimí–Yuman: A family of languages from Arizona, California and Baja California, with two branches, extinct Cochimí (of Baja California) and the Yuman subfamily (members of which are Kiliwa, Diegueño, Cocopa, Mojave, Maricopa, Paipai, and Walapai–Havasupai–Yavapai, among others). Cochimí–Yuman is often associated with the controversial Hokan hypothesis, though evidence is insufficient to embrace the proposed relationship.

The consensus among experts in the Cochimí–Yuman family, including Mikhail Zhikov and Terence Kaufman, is that it is part of the Hokan family. Campbell disbelieves in the association but he is not an expert. However, Mixco opposes the Hokan affinity of Cochimi-Yuman, and granted, he is actually a specialist on these languages. So among specialists, we have two who support the Hokan association and one who opposes it. The specialist consensus then would be that they are this association is a promising hypothesis, but it is not yet proven. This is different from Campbell and Mixco’s wording, which is more negative.

Campbell and Mixco:

Coahuiltecan: A hypothesis of distant genetic relationship that proposed to group some languages of south Texas and northern Mexico: Coahuilteco, Comecrudo and Cotoname, and sometimes also Tonkawa, Karankawa, Atakapa and Maratino (with Aranama and Solano assumed to be varieties of Coahuilteco).

Sapir proposed a broader classification of Hokan–Coahuiltecan, joining the Coahuiltecan proposal with the broader Hokan hypothesis, and placed this in his even larger Hokan–Siouan super-stock. None of these proposals has proven sufficiently robust to be accepted generally.

I am not aware of any specialists who have recently argued against the existence of Coahuiltecan. Yes, Campbell and Mixco do not accept it, but they are not specialists. A recent paper by Alexander Manaster-Ramer proved the existence of Coahuiltecan to my satisfaction. I believe that a Coahuiltecan family consisting of Comecrudo, Cotoname, Aranama, Solano, Mamulique, Garza, and Coahuilteco absolutely exists. Karankawa is probably a part of this family. I am not aware that any specialist is arguing against the existence of this family at the moment.

I do not think there is good evidence for other postulated languages such as Atakapa and Tonkowa. First of all, Tonkawa is probably Na-Dene as per another paper by Manaster-Ramer. Atakapa is part of the Gulf family. However, I am not yet convinced that Coahuiltecan is as member of the Hokan language family.

Campbell and Mixco:

Gulf: Hypothesis of a distant genetic relationship proposed by Mary R. Haas that would group Muskogean, Natchez, Tunica, Atakapa and Chitimacha, no longer supported by most linguists.

The notion that Gulf is no longer supported by most linguists is simply incorrect. There have only been four linguists who studied this family.

The first was Mary Haas, who also proposed a relationship with Yuki as Yuki-Gulf. Haas was always dubious about Chitimacha’s addition to Gulf.

Greenberg resurrected Yuki-Gulf in LIA.

Pam Munro is an expert on these languages. A while back she published a paper on Yuki-Gulf. I read that paper. The resemblances are so stunning between Muskogean, Natchez, Tunica, Atakapa and Chitimacha that I was shocked that anyone doubted the relationship. Furthermore, the relationship with Yuki and Wappo, a full 2,500 miles away in Northern California, was shocking.

The fourth was Geoffrey Kimball, who concluded that Gulf was probably a family but that this could not be proven.

There evidence for Gulf in Munro’s paper was good, and there even appeared to be sound correspondences running through the relationship. What was shocking about it was that Yuki and Wappo could not possibly have borrowed from Gulf because Gulf is in Louisiana 2,500 miles away. So how did all these resemblances come in? Chance is ruled out. Borrowing could not have happened. Therefore a relationship at least between Yuki and the Gulf languages is obvious.

Munro’s paper took the position that Greenberg’s Yuki-Gulf hypothesis was correct. However, there are some problems. First, Atakapa as part of Gulf has been controversial, in part because it has also been tied in with Coahuiltecan. Indeed there are resemblances between the two, and they were not spoken next to each other so borrowing can be ruled out.

Perhaps a way of solving the matter is to posit not only Yuki-Gulf but a larger family that includes Coahuiltecan as Greenberg does in LIA. I have no idea how justified this is, but there are certainly surprising resemblances between Atakapa and the Coahuiltecan languages.

Furthermore, whether or not Chitimacha is part of Gulf has been up in the air from the beginning when Haas published her paper. Recent papers have made the case that Chitimacha is related to Mesoamerican language families of Mexico such as Mixe-Zoque and Totonacan. These papers used the comparative method. Campbell has rejected this hypothesis.

That Tunica at the very least shows a close relationship with Muskogean is not even controversial. The idea has a long pedigree and is presently supported by all experts in this family.

Geoffrey Kimball examined the data recently and concluded that from the evidence, it appears that Gulf exists, but we will never be able to prove it, as he puts it. However, he stated that Tunica is almost certainly related to Muskogean. At this point, I would think that Tunica-Muskogean at the very least should be considered consensus among specialists.

Kimball’s paper had a number of problems, mostly that he was operating with a negative stance towards the existence of the family. Further, there were issues with his notions of sound symbolism and borrowing in the paper where his explanations made no sense at all.

Let’s evaluate Campbell and Mixco’s statement that Gulf is no longer supported by most linguists.

We have four specialists on record about whether or not a Gulf family exists.

Mary Haas: Positive, minus Chitimacha

Joseph Greenberg: Positive

Pamela Munro: Positive

Geoffrey Kimball: Probably exists but it’s not possible to prove it.

Brown et al: Chitimacha is a part of the Totonozoquean family, not the Gulf family. The other members of Gulf are not members of this family.

Three out of the four specialists on the Gulf family say that the Gulf family is a reality. The other feels it exists but cannot be proven. And there is uncertainty about whether Chitimacha is probably not part of Gulf. The consensus among experts is that Gulf is a real language family.

Campbell and Mixco’s statement that Gulf is no longer supported by most linguists is simply false.

Furthermore, I would like to point out that a good case can be made for the existence of a Totonozoquean family consisting of the Mixe-Zoque and Totonacan languages. Whether this is consensus among experts is somewhat up in the air.

Campbell and Mixco:

Macro-Gê: A proposed distant genetic relationship composed of several language families and isolates, many now extinct, along the Atlantic coast (primarily of Brazil). These include Chiquitano, Bororoan, Botocudoan, Rikbaktsa, the Gê family proper, Jeikó, Kamakanan, Maxakalían, Purian, Fulnío, Ofayé and Guató. Many are sympathetic to the hypothesis and several of these languages will very probably be demonstrated to be related to one another eventually, though others will probably need to be separated out.

This is much too pessimistic. Macro-Gê is not a proposed long range family -it is a large language family in South America accepted by consensus. It is not true that many are sympathetic to it; instead, the consensus is that it is correct. Nor is it correct to say that it will probably be demonstrated eventually. In fact, it is already an accepted reality.

Campbell and Mixco:

Quechumaran: Proposed distant genetic relationship that would join Quechuan and Aymaran. While considerable evidence has been gathered in support of the hypothesis, it is extremely difficult in this case to distinguish what may be inherited (and therefore evidence of a genetic relationship) from what may be diffused (and therefore not reliable evidence of a genetic connection).

It is true that there is no consensus on the existence of Quechumaran. The consensus seems to be as above that it is not yet proven. Those opposed to the idea throw out the usual borrowing scenario, but they have had to push the large number of borrowings in core vocabulary all the way back to Proto-Aymara and Proto-Quechua. In my opinion, “massive borrowing of core vocabulary at the proto-language level” is simply another word for genetics.

Gerald Clauson, the famous Turkologist opponent of Altaic, had to keep pushing his massive borrowings of core vocabulary further and further back until he eventually had the scenario taking place at the Proto-Turkic, Proto-Tungusic, and Proto-Mongolic levels. See above for my analysis on why these three proto-languages could not possibly have borrowed from each other as they were in different places in different times.

A similar problem exists with opponents of the Uralo-Yukaghir theory, in which they are also forced to deal with a large amount of core vocabulary dating back a long time. Hakkinen tried to solve this problem by pushing the borrowing all the way back to not just Proto-Uralic but Pre-Proto-Uralic. Pre-Proto-Uralic at 8,000 years to me means nothing less than Uralo-Yukaghir. What else could it mean? He has heavy borrowing of core vocabulary between Pre-Proto-Uralic and Proto-Yukaghir. That’s another way of saying genetics.

Campbell and Mixco:

Macro-Guaicuruan (also spelled Macro-Waykuruan, Macro-Waikuruan): A proposed distant genetic relationship that would join the Guaicuruan and Matacoan families of the Gran Chaco in South America in a larger-scale genetic classification. Grammatical similarities, for example in the pronominal systems, have suggested the relationship to some scholars, but the extremely limited lexical evidence raises doubts for others. Some would also add Charruan and Mascoyan to these in an even larger ‘Macro-Waikuruan cluster.’

It is not true that this is a proposed long-range family suggested by some by doubted by others. In fact, Macro-Guaicuruan is accepted by consensus and is as uncontroversial as Macro-Gê, Pama-Nyungan, and other such families. There is however debate about which families are members outside of the Guaicuruan and Mataguayo language families that make up the essence of the family. There have been suggestions to add Lule-Vilela and the Zamucoan, Charruan, and Mascoyan families to this family. I do not feel that these additions are yet warranted.

Campbell and Mixco:

Pama-Nyungan: A very large, widely spread language family of Australia, some 175 languages. The name comes from Kenneth Hale, based on the words pama ‘man’ in the far northeast and nyunga ‘man’ in the southwest. Languages assigned to Pama-Nyungan extend over four-fifths of Australia, most of the continent except northern areas.

Pama-Nyungan is accepted by most Australianists as a legitimate language family, but not uncritically and not universally. It is rejected by Dixon; it is held by others to be plausible but inconclusive based on current evidence. Some Pama-Nyungan languages are Lardil, Kayardilt, Yukulta, Yidiny, Dyirbal, Pitta-Pitta, Arrente, Warlpiri, Western Desert language(s), and there are many more.

Actually, consensus now is that this family of Australian languages does indeed exist. True, Dixon challenged the existence of Pama-Nyungan recently, but his opposition was so outrageous and it prompted a quick surge of papers from Australianists defending the existence of Pama-Nyungan. The notion that other Australianists feel that Pama-Nyungan is possible but presently inconclusive is not correct. I am not aware of a single Australianist other than Dixon who feels this way. Instead, Pama-Nyungan is about as uncontroversial as Macro-Gê, Afroasiatic, or Austroasiatic.

Campbell and Mixco:

‘Papuan’ languages: A term of convenience used to refer to the languages of the western Pacific, most in New Guinea (Papua New Guinea and the Indonesian provinces of Papua and West Irian Jaya), that are neither Austronesian nor Australian. Papuan definitely does not refer to a genetic relationship among these languages for no such relationship can at present be shown.

That is, the term is defined negatively and does not imply a linguistic relationship. While most are spoken on the island of New Guinea, some are found in the Bismark Archipelago, Bougainville Island and the Solomon Islands to the east, and in Halmahera, Timor and the Alor Archipelago to the west.

There are some 800 Papuan languages divided in the a large number of mostly small language families and isolates not demonstrably related to one another.

For what it’s worth, this statement by Campbell and Mixco is correct.

Campbell and Mixco:

One large genetic grouping that has been posited for a number of Papuan languages is the Trans-New Guinea phylum, which is promising but not yet confirmed.

Trans-New Guinea is not “promising but not yet confirmed.” Instead it is an uncontroversial language family accepted by the consensus of all specialists.

References

Beck, David (1997). Mosan III: A Problem of Remote Common Proximity. International Conference on Salish (and Neighbo(u)ring) Languages.
Benedict, Paul K. (1942). “Thai, Kadai, and Indonesian: A New Alignment in Southeastern Asia.” American Anthropologist 44, 4: 576–601.
Benedict, Paul K. (1975). Austro-Thai Language and Culture, with a Glossary of Roots. New Haven: HRAF Press.
Blench, Roger (2008). The Prehistory of the Daic (Tai-Kadai) Speaking Peoples. Presented at the 12th EURASEAA Meeting in Leiden, the Netherlands, 1-5 September 2008.
Blench, Roger (2018). Tai-Kadai and Austronesian Are Related at Multiple Levels and Their Archaeological Interpretation (draft).
Blust, Robert (2014). “The Higher Phylogeny of Austronesian and the Position of Tai-Kadai: Another Look,” in The 14th International Symposium on Chinese Languages and Linguistics (IsCLL-14).
Campbell, Lyle and Marianne Mithun (Eds.) (1979). The Languages of Native America: An Historical and Comparative Assessment.
Campbell, Lyle and Mauricio J. Mixco (2007). A Glossary of Historical Linguistics. Edinburgh University Press.
Campbell, Lyle and William J. Poser (2008). Language Classification: History and Method. Cambridge: Cambridge University Press
Fortescue, M. (1998). Language Relations across Bering Strait: Reappraising the Archaeological and Linguistic Evidence. (Nivkhi is Mosan.)
Fortescue, Michael (2011). “The Relationship of Nivkh to Chukotko-Kamchatkan Revisited.” Lingua 121, 8: 1359-1376. (Nivkhi is Chukoto-Kamchatkan.)
Gell-Mann, Murray; Ilia Peiros, and George Starostin (2009). “Distant Language Relationship: The Current Perspective.” Journal of Language Relationship.
Greenberg, Joseph H. (2000). Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Volume 1, Grammar. Stanford: Stanford University Press.
Greenberg, Joseph H. (2002). Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Volume 2, Lexicon. Stanford: Stanford University Press.
Heine, Bernd (1992). African Languages. International Encyclopedia of Linguistics, ed. by William Bright, Vol. 1, pp. 31-36. Oxford: Oxford University Press. (No such thing as Nilo-Saharan.)
Krauss, Michael E. (1979). Na-Dene and Eskimo-Aleut. The Languages of Native America: Historical and comparative assessment, ed. by Lyle Campbell and Marianne Mithun, pp. 803-901. Austin: University of Texas Press. (Haida not part of Na-Dene.)
Levine, Robert D. (1979). Haida and Na-Dene: A New Look at the evidence. IJAL 45: 157-70. (Haida not part of Na-Dene.)
Li, Hui (李辉) (2005). Genetic Structure of Austro-Tai Populations (Doctoral Dissertation). Fudan University.
Mixco, Mauricio J. (1976). “Kiliwa Texts.” International Journal of American Linguistics Native American Text Series 1: 92-101
Mixco, Mauricio J. (1977). “The Linguistic Affiliation of the Ñakipa and Yakakwal of Lower California”. International Journal of American Linguistics 43: 189-200.
Nicola¨i, Robert (1990). Parent´es Linguistiques (`A Propos du Songhay). Paris: CNRS. (Dimmendaal says Songhay is Nilo-Saharan.)
Nikolaev, S. (2015). Toward the Reconstruction of Proto-Algonquian-Wakashan. Part 1: Proof of the Algonquian-Wakashan Relationship.
Nikolaev, S. (2016). Toward the Reconstruction of Proto-Algonquian-Wakashan. Part 2: Algonquian-Wakashan Sound Correspondences.
Ostapirat, Weera (2005). “Kra-Dai and Austronesian: Notes on Phonological Correspondences and Vocabulary Distribution,”  in Laurent Sagart, Roger Blench and Alicia Sanchez-Mazas, eds. The Peopling of East Asia: Putting Together Archaeology, Linguistics, and Genetics, pp. 107-131. London: Routledge Curzon.
Ostapirat, Weera (2013). Austro-Tai Revisited. Paper Presented at the 23rd Annual Meeting of the Southeast Asian Linguistics Society, 29-31 May 2013, Chulalongkorn University.
Reid, Lawrence A. (2006). “Austro-Tai Hypotheses.” In Keith Brown (Ed.), The Encyclopedia of Language and Linguistics, 2nd Edition, pp. 609–610.
Sagart, Laurent (2005b). “Tai-Kadai as a Subgroup of Austronesian,” in L. Sagart, R. Blench, and A. Sanchez-Mazas (Eds.), The Peopling of East Asia: Putting Together Archaeology, Linguistics, and Genetics, pp. 177-181.
Sagart, Laurent (2019). “A Model of the Origin of Kra-Dai Tones.” Cahiers de Linguistique Asie Orientale. 48, 1: 1–29.
Thurgood, Graham (1994). “Tai-Kadai and Austronesian: The Nature of the Relationship.” Oceanic Linguistics 33: 345-368.

Three Academic Linguistics Sessions I Took Part In

Sessions on Linguistics papers that a friend of mine put up. On Academia, a lot of people put their papers up for informal peer review, which ends up being a session. They range from pleasant to heated and often the criticisms are quite barbed. This is the way that social science is supposed to be though – peer review is not supposed to be a walk in the park – if it is, you’re defeating the purpose and you’re not really doing science.
So if you wan to know the stuff I read and comment on for kicks, go ahead and dig in. Don’t expect to understand anything unless you have a background in this stuff though. I have a Masters in this subject and 30 years of independent study under my belt, and still most of the people in these sessions completely kick my ass. Historical Linguistics is one of my specialties. I study in it a lot but I don’t think I could write a paper in it. It’s just so beyond my capabilities. I don’t understand how anyone does this stuff unless they have eidictic memories, which most of them apparently do.
I took part in these discussions, so I get an author credit, which is nice as far as it goes.
But if you have a background in Linguistics like Claudius and James Schipper and a few of the others, you may find these discussions interesting.
***********
A copy of the whole discussion session on the draft paper version of “Some Gününa Yajüch loanword etymologies for Mapudungun,” totaling a full 17 pages (with some tangential discussion) with 20 participants. Special thanks go to those who shared their thoughts on the tangential discussion of the Altaic language hypothesis. As usual, the input will be used to improve the manuscript to hopefully publishable standards.
***********

Is Afroasiatic Related to Indo-European?

Claudius: Very interesting. Too me Afro-Asiatic seems very close to IE. But I don’t know anything about the other Eurasiatic or Nostratic families besides Uralic and Altaic (Japanese).

But IE is like AA with corrupted and limited ablaut. PIE verbs did have ablaut just not to the extreme of AA languages. Even PIE/IE some nouns exhibit ablaut.

Part of the problem is that AA is so old. Nostratic itself is 15-18,000 years old, and AA is 13-15,000 years old itself. The numerals are still a mess. They’re probably not even reconstructible. Numerals get replaced more than people think. This silly numerals argument is also used to invalidate Altaic. But in Altaic most of the original numerals were replaced. However, some of the originals held on in lesser semantic roles. So they were still there, just harder to see as the main numeral forms got replaced by innovations.

AA is the most ancient language family that is universally accepted. Some say that Omotic is not proven to be part of it, but those are wild splitters like Lyle Campbell who reflexively object to everything in a reactionary manner. This reaction has absurdly taken over the whole field now. We can’t even agree that Altaic is real. For God’s sake, there’s a 1,300 page etymological dictionary of Altaic out there, and people still insist it’s not real!

It’s not particularly close to IE.

Core Nostratic is Uralic, IE and Altaic.

Altaic (Turkic, Mongolic, and Tungusic, including Japanese and Korean), Uralic (including Yukaghir), Eskimo-Aleut and Chukchi-Kamchatkan are possibly core Nostratic. Some include Etruscan.

Whether Afroasiatic is core Nostratic is controversial. Aharon Dolgopolosky thought it was. Allan Bomhard followed Dolgopolosky.

Later Nostratic concepts have placed Afroasiatic and Elamite parallel to Nostratic (Sergei Starostin). Others put AA, Kartvelian, Elamo-Dravidian as sub-branches within Nostratic (Bomhard). Starostin’s followers, including his son George, have placed AA back in core Nostratic.

Joseph Greenberg posited a subgroup of Nostratic called Euroasiatic. He did not include Dravidian and AA. Greenberg felt that AA and Dravidian were sisters to Nostratic as a whole. Bomhard put Euroasiatic as a sub-family of Nostratic alongside AA and Dravidian. as two other sub-branches.

But there are definitely parallels with AA and IE all right. That’s clear.

Proto-Nostratic root *γor-:

(vb.) *γor- ‘to leave, to go away, to depart; to separate; to abandon’; (n.) *γor-a ‘leaving, departure; separation; abandonment’ Extended form: (vb.) *γor-V-b- ‘to leave, to go away, to depart; to separate; to abandon’; (n.) *γor-b-a ‘leaving, departure; separation; abandonment’

Afrasian: Proto-Semitic *γar-ab- ‘to leave, to go away, to depart’ > Arabic ġaraba ‘to go away, to depart, to absent (oneself), to withdraw (from), to leave (someone, something); to go to a foreign country; to expel from the homeland, to banish, to exile’, ġarba-t ‘removal, departure’, ġurba-t ‘absence from one’s homeland; separation from one’s native country, banishment, exile; life, or place, away from home’; Mehri əġtərōb ‘to be abroad, away from home’, ġərbēt ‘strange place, unknown place’; Śḥeri/Jibbāli aġtéréb ‘to be abroad, away from home’, ġarbέt ‘strange, unknown place; abroad’. Perhaps also Punic «rbt ‘desolation’ (?) in ḳl «rbt ‘the voice of desolation’ (interpretation highly uncertain) (cf. Hoftijzer-Jongeling 1995:887).

Proto-Indo-European *H₃orbʰ- ‘to be or become separated, abandoned, bereft’, *H₃orbʰ-o-s ‘(n.) orphan, servant; (adj.) bereft, abandoned, deprived (of)’:

Sanskrit árbha-ḥ ‘little, small; child’; Armenian orb ‘orphan’; Greek ὀρφανός ‘orphan, without parents, fatherless; (metaph.) bereft, abandoned’; Latin orbus ‘bereft, deprived by death of a relative or other dear one; bereaved (of); childless; an orphan’; Old Irish orb ‘heir’, orb(b)e, orpe ‘inheritance’; Gothic arbi ‘inheritance,’ arbja ‘heir’ (f. arbjō ‘heiress’); Old Icelandic arfi ‘heir, heiress’, arfr ‘inheritance, patrimony’, erfa ‘to inherit’, erfð ‘inheritance’; Old Swedish arve, arver ‘heir’; Danish arv ‘heir’; Norwegian arv ‘heir’; Old English ierfa, irfa ‘heir’, ierfe ‘inheritance, bequest, property’, erfe, irfe, yrfe ‘inheritance, (inherited) property’, irfan, yrfan ‘to inherit’; Old Frisian erva ‘heir’, erve ‘inheritance, inherited land, landed property’; Old Saxon erƀi ‘inheritance’; Middle Dutch erve ‘heir’; Old High German arbi, erbi ‘inheritance’, arbeo, erbo ‘heir’ (New High German Erbe ‘inheritance; heir’); Old Church Slavic rabъ ‘servant, slave’; Russian rab [раб] ‘slave, serf, bondsman’ (f. rabá [раба] ‘slave, serf, bondmaid’); Hittite (3rd sg. pres. act.) ḫar-ap-zi ‘to separate oneself and(re)associate oneself elsewhere’. Pokorny 1959:781-782 *orbho- ‘weak, abandoned; slave, orphan’; Walde 1927-1932:183-184 *orbho-; Mallory-Adams 1997:411 *h₂/h₃orbhos ‘orphan, heir’; Mann 1984-1987:884 *orbhəkos ‘young, tender; deprived, blind’, 884 *orbhənikos ‘young, minor, underage’, 884-885 *orbhət-, *orbhit- ‘deprived, bereft; deprivation, bereavement’, 885 *orbhi̯os adjectival form of *orbhos, 885 *orbhm̥ mos (*orbhmos) ‘bereft, deprived’, 885—886 *orbhos, -i̯os, -i̯ə ‘deprived, bereft; child, orphan’; Watkins 1985:46 *orbh- ‘to put asunder, to separate’ (suffixed form *orbh-o- ‘bereft of father’) and 2000:60 *orbh- ‘to change allegiance, to pass from one status to another’ (oldest form *ə̯₃erbh-, colored to *ə̯₃orbh-) (suffixed form *orbh-o- ‘bereft of father’ also ‘deprived of free status’); Gamkrelidze-Ivanov 1995I:399, I:651 *orbʰo- ‘deprived of one’s share, deprived of possessions; orphan; servant, slave’, I:781 *orbʰo-; Mayrhofer 1956—1980.I:52 and 1986—2001.I:119—120; Boisacq 1950:719 *orbho-s; Beekes 2010:1113—1114 *h₃orbʰ-o-; Frisk 1970-1973:431 *orbho-s; Chantraine 1968-1980:829 *orbho-; Hofmann 1966:240 *orbhos; Hübschmann 1897:482, no. 335, *orbhos; Matirosyan 2008:535-536 *Horbʰ-o-; Walde-Hofmann 1965-1972:219-220 *orbhos, *orbhi̯o-; Ernout-Meillet 1979:466—467; De Vaan 2008:433 *h₃orbʰ-o-; Derksen 2008:373 *h₃erbʰ-; Kroonen 2013:33 Proto-Germanic *arbja- ‘inheritance’ (<*h₃orbʰ-i̯o-), 33 Proto-Germanic *arbjan – ‘heir’ (< *h₃orbʰ-i̯on-); Orël 2003:22 Proto-Germanic *arƀaz, 22 Proto-Germanic *arƀjaz; Lehmann 1986:41-42 *orbho-;  Feist 1939:56 *orbhi̯o-; Falk-Torp 1910-1911.I:34; De Vries 1977:12 and 13; Boutkan-Siebinga 2005:93 *h₃erbʰ-; Walshe 1951:48; Kluge-Mitzka 1967:170 *orbho-; Kluge-Seebold 1989:183-184 *orbhijo-, *orbho-; Kloekhorst 2008b:311-312 *h₃erbʰ-to; Puhvel 1984:176—183.

Proto-Nostratic (n.) *t’orʸ-a ‘tree, the parts of a tree’ (> ‘leaf, branch, bark, etc.’):

Proto-Afrasian *t’[o]r- ‘tree’, preserved in various tree names or names of parts of trees (‘leaves, branches, etc.’): Semitic: Akkadian ṭarpa”u (ṭarpi”u) ‘a variety of tamarisk’; Arabic ṭarfā” ‘tamarisk tree’. Hebrew ṭārāφ [ טָרָף ] ‘leaf’ (a hapax legomenon in the Bible); Aramaic ṭarpā, ṭǝraφ ‘leaf’; Syriac ṭerpā ‘leaf, branch’; Samaritan Aramaic ṭrp ‘leaf, part of a tree, branch’. Klein 1987:252 Egyptian d&b ‘fig tree’ (< *drb); West Chadic: Hausa ɗoorawaa ‘locust-bean tree’; East Chadic: Bidiya tirip ‘a kind of tree’ (assimilation of vowels). Orël—Stolbova 1995:516, no. 2464, *ṭarip- ‘tree’.

Proto-Indo-European *t’er-w/u-/*t’or-w/u-, *t’r-ew-/*t’r-ow-/*t’r-u- ‘tree, wood’: Greek δόρυ ‘tree, beam’, δρῦς ‘oak’; Hittite ta-ru ‘wood’; Albanian dru ‘tree, bark, wood’; Sanskrit dā́ru ‘a piece of wood, wood, timber’, drú-ḥ ‘wood or any wooden implement’; Avestan drvaēna- ‘wooden’, dāuru- ‘wood (en object), log’; Welsh derwen ‘oak’; Gothic triu ‘tree, wood’; Old Icelandic tré ‘tree’, tjara ‘tar’; Old English trēow ‘tree, wood’, tierwe, teoru ‘tar, resin’; Old Frisian trē ‘tree’; Old Saxon triu, treo ‘tree, beam’; New High German Teer ‘tar’; Lithuanian dervà ‘resinous wood’, dãrva ‘tar’; Old Church Slavic drěvo‘tree’; Russian dérevo [дерево] ‘tree, wood’; Serbo-Croatian drȉjevo ‘tree, wood’; Czech dřevo ‘tree, wood’. Pokorny 1959:214—217 *deru-, *dō̆ru-, *dr(e)u-, *dreu̯ǝ-, *drū- ‘tree’; Walde 1927-1932:804-806 *dereu̯(o)-; Mann 1984-1987:142 *deru̯os, -ā, -i̯ǝ (*dreu̯-) ‘tree, wood, timber, pitchpine; pitch, tar, resin; hard, firm, solid, wooden’, 156 *dō̆ru ‘timber, pole, spike, spear’, 157 *doru̯os, -ā, -i̯ǝ ‘wood (timber); resin’, 161 *dru- (radical) ‘timber, wood’, 161 *drūi̯ō (*druu̯ō, *-i̯ō; *drūn-) ‘to harden, to strengthen’, 161 *drukos ‘hard, firm, wooden’, 162 *drus-, *drusos ‘firm, solid’, 162 *druu̯os, -om, -is ‘wooden, hard; wood’, 162 *drū̆tos ‘wooden, of oak, of hardwood; solid, firm, strong’, 165 *dr̥u̯is, -i̯ǝ ‘wood, trees, hardwood’, 165—166 *dr̥u̯os, -om; *drus-, *dru- ‘wood, timber, tree’; Gamkrelidze-Ivanov 1995:192 and 193 *t’er-w-, *t’or-w-, *t’r-eu-, *t’r-u- ‘oak (wood), tree’; Mallory-Adams 1997:598 *dóru ‘wood, tree’; Watkins 1985:12 *deru (also *dreu-) and 2000:16-17 *deru (also *dreu-) ‘to be firm, solid, steadfast’ (suffixed variant form *drew-o-; variant form *drou-; suffixed zero-grade form *dru-mo-; variant form *derw-; suffixed variant form *drū-ro-; lengthened zero-grade form *drū-; o-grade form *doru-; reduplicated form *der- drew-); Mayrhofer 1956-1980.II:36; Chantraine 1968-1980:294 *dor-w-, *dr-ew-; Frisk 1970-1973:411-412; Hofmann 1966:63 *dō̆ru; Beekes 2010.I:349 *doru; Boisacq 1950:197-198 *doru; Orël 1998:76 and 2003:405 Proto-Germanic *terwōn ~ *terwan, 409-410 *trewan; Kroonen 2013:514 Proto-Germanic *terwa/ōn- ‘tar’ and 522-523 Proto-Germanic *trewa- ‘tree’; Lehmann 1986:347-348 *deru-, *drewo-, *dr(e)w-(H-); Feist 1939:480-481 *der-eu̯-o-; De Vries 1977:591 *dreu-; Klein 1971:745 *derew(o)-, *drew(o)- and 779 *derow(o)-, *drew(o)-; Onions 1966:904 and 939 *deru-,*doru-; Kluge-Mitzka 1967:775 *deru-; Kluge-Seebold 1989:725 *deru-; Huld 1984:56 *dru-n-; Fraenkel 1962-1965:90-91; Derksen 2008:99 *deru-o- and 2015:123-124 *deru-o-; Smoczyński 2007:103; Osthoff 1901:98-180; Benveniste 1969:104-111 and 1973:85-91; P. Friedrich 1970:140-149 *dorw- ‘tree’ or ‘oak’.

Repost: Update to Races of Man Post

Update to Races of Man Post

 

My earlier piece, The Major and Minor Races of Mankind, has been given a major update. The previous incarnation was:

3 Macro Races

Caucasian (Caucasoid) Asian (Mongoloid) African (Negroid)

I left the three macro races intact. I have debated whether or not to include new macro races but I haven’t been able to come up with anything. The main problem is that all of the potential splits – Kalash, Pacific Islander, Papuan, Amerindian and Aborigine are all part of the macro races. The Kalash are part of the Caucasian race and the rest are all indisputably Asians (yes, even Aborigines).

Previous version:

6 Major Races

Northeast Asian Southeast Asian Papuan Aborigine Caucasian African

Revised version:

9 Major Races

Northeast Asian Southeast Asian Papuan Aborigine Caucasian African Kalash Pacific Islander Amerindian

The result looks something like this:

African Macro Race

General African Major Race

15 minor African races

Caucasian Macro Race

General Caucasian Major Race Kalash Major Race

19 minor Caucasian races

Asian Macro Race

Northeast Asian Major Race Southeast Asian Major Race Amerindian Major Race Papuan Major Race Aborigine Major Race Oceanian Major Race

53 minor Asian races

The last three above, Kalash, Oceanian and Amerindian, were added, giving me a 9-race theory in addition to the standard 3-race theory. Genetically, the Kalash are extremely bizarre. On one chart, they form a As you can see, very European looking phenotypes are not rare at all in the Kalash. This 2 year old girl could well be German, except for the strange “elf-ears”, which supposedly are very common among these people. The elf ears are probably a consequence of genetic drift. Drift occurs when a population is isolated for a long time without many outside inputs.The Kalash, unlike all other peoples in the region, have little or no South Indian or Asian genes.

More than anything else, this indicates a West Eurasian origin for the Kalash. West Eurasia is a term that is hard to define, and some say that the region does not even exist. There are some hazy definitions of West Eurasia out there, but in the way it is most used by population geneticists, it appears to mean the Near East and the Caucasus.

As West Eurasia is in the area of the purported homeland of the Caucasian race (Caucasus), we once again deal with the question of the Kalash being an ancient Caucasian tribe, perhaps one of the most ancient Caucasian stocks on Earth.

I saw one genetic map that had all proto-Caucasians (and all proto-NE Asians for that matter) coming out of the Borogil Pass on the border of northern Pakistan and the Wakhan Corridor of Afghanistan 35,000 years ago. Originally the group was something like Pre-Caucasian–NE Asian. The group went north and one line went to proto-Caucasians and the other went northeast to Proto-NE Asians.

We don’t have the foggiest idea of what these people may have looked like, but skulls from India 24,000 years ago look more like Aborigines than anything else.

The Borogil Pass in the area of Pakistan, Afghanistan and China. As you can see, it is pretty tough going. This is the lowest pass leading out of South Asia and up into the steppes, so it is logical that early men may have migrated in this way.

Actually I think the genesis of NE Asians is more complex than that, but the article was interesting. The genesis of Caucasians is one of the least understood of all the major races. The homeland of the proto-Caucasians is either in the Caucasus or in Central Asia and the Middle East and North Africa seems to be a major staging ground. At this time, the most ancient Caucasians seem to be South Indians and Berbers.

South Indians go back about 15-20,000 years and have been evolving right there with few outside inputs for all that time. Before 20,000 years ago, the Proto-South Indians are thought to have come from the Middle East. They probably bred in with or displaced an Australoid people resembling Aborigines who were the original people of India.

The Berbers may go back even further than that and there are suggestions that they may have had an origin in northeastern Africa near Ethiopia, Sudan and Eritrea. That area was the jumping off point for the human race to leave Africa 60-70,000 years ago, pointing once again to very ancient Berber origins. European-like skulls only go back 10,000 years or so and white skin only goes back 9,000 years.

All humans originally were dark-skinned. The people with the darkest skin evolved in the areas where the UV rays were the brightest. It was thought at first that dark skin was an adaptation to prevent sunburn and melanoma, but a there are problems with this analysis.

Sunburn does not usually kill you, and melanoma tends to hit older in life, after one has already produced offspring. A better explanation may be that intense UV rays cause destruction of folic acid stores in the body. Then pregnant women, with their folic acid destroyed, have a high potential of giving birth to deformed babies.

White skin was actually a depigmentation process to enable people to get more Vitamin D, which is scarcer at northern altitudes in Northern Europe due to weak UV rays. Lighter skin is necessary to grab all the Vitamin D that one can. An argument against this is that Vitamin D deficiency does not occur in areas of low UV radiation.

But this is not true. Even today, darker skinned people, such as South Indians, who immigrate to the UK are coming down with various Vitamin D deficiency syndromes, including rickets. It is probably necessary for darker-skinned people who live at high latitudes to take Vitamin D supplementation.

The proto-Caucasians may have split off as early as 35,000 years ago. Some NE Asians are quite close to Caucasians and vice versa. The groups straddling the Caucasian-Asian border form a sort of a line from Turkey to Korea and then up to the Chukchi Peninsula. Along the way we have Turks, Iranians, Jews, West Asians, Central Asians, Northern Turkics, Mongolians, Northern Chinese, Koreans and Chukchi.

West Asians include Punjabis and Pashtuns and live in Pakistan, NW India and Afghanistan. Central Asians include Kazakhs, Turkmen and Uzbeks. Northern Turkics include the Altai, the Yakut and other groups. Many of them live around where China, Mongolia and Russia all come together. Interestingly, this seems to be exactly where most Amerindians came from – the Altai Mountains.

The Chukchi are an Eskimo-like people who live on the Chukchi Peninsula on the far eastern end of Siberia where the Bering Straight separates Russia from Alaska.

What’s curious about the Chukchi is that Luigi Luca Cavalli-Sforza’s Principal Coordinates chart in his 1994 book The History and Geography of Human Genes (chart here) puts the Chukchi in with Caucasians. Yet by appearance and apparently also genetics, the Chukchi cluster with Asians.

So there are some groups that are really on the border. I had a hard time knowing what to do with Turkics, Northern Turkics and Central and West Asians, as the genetics was so hazy. I usually just dropped them in either NE Asians or Caucasians based on appearance.

The Kalash are a group of about 3,000 people living in Chitral Province in Pakistan on the border of Afghanistan.

The valleys of the Kalash. The villages are at about 6,000 feet and as the soil is very rich, they grow many crops. They also do a lot of herding, mostly of goats it seems. They do observe a menstruation taboo, where the women have to go off to special hut during that time, but this is a very old taboo in many human tribal groups. The Kalash bury their dead above ground in caskets. Burial of the dead above ground is a very ancient human tradition.

The Negritos of both Papua and the Andaman Islands, one of the most ancient human groups, bury their dead above ground in little tree houses. The Zoroastrians, one of the most ancient human religions, bury the dead on rooftops and let the vultures eat them. This is getting to be a problem in parts of India where they live as the neighbors are starting to complain!

They still retain an ancient pagan religion. The are remarkably egalitarian for that part of the world, and women work in the fields side by side with men. They have somehow managed to resist Islamacization for centuries, possibly due to the remote and multiethnic nature of the Chitral region.

Four Kalash students. The fellow on the right is a dead ringer for a European. He could be a German or an Englishman. The fellow on the left could easily be an Italian, a Greek, an Armenian, an Iranian or a Turk. The other two are awfully hard to classify. They almost look a little Amerindian.

There are some similar phenotypes across the border in Afghanistan in Nuristan amongst people called Nuristanis. They were converted to Islam at the point of a sword by a genocidal Pashtun maniac named Amir Abdur-Rahman during Afghanistan’s nation-building process in the 1890’s. His genocide of the Hazara was similar proportionally to the Jewish Holocaust.

A Kalash woman with some children, apparently her own. She and her kids do not look quite so Caucasian; they look more Asian. Actually the woman is hard to classify as belonging to any known race that we are familiar with. In California, you might think she was an Amerindian from Latin America.

The legend is that the Kalash and the Nuristanis were the remnants of Alexander the Great’s army that invaded and conquered the region 2000 years ago. This was the reason for all the European phenotypes in the area. Recently, this was thought to be a legend with no basis in fact, but recent controversial genetic testing suggests that the Kalash may have up to 2

Macedonian and Kalash female costumes compared – note the similarity in costumes. Also the Kalash continue to worship a creator God cognate with the Greek Zeus. I cannot help but think that some of those Macedonian phenotypes are also present in Kalash females. And the terrain looks rather similar too.

Maybe some of Alexander’s men did stay here, thinking they were home away from home. This story is definitely widespread in that part of the world. I had an Afghan doctor from Nangarhar Province in Afghanistan who insisted it was true.

This has been challenged since although there is one Greek marker in the Kalash, the other major marker that ought to be there, since it is apparently present in all Greeks, is not there. One counter-suggestion is that the Kalash got the Greek marker by chance through genetic drift. This seems dubious. The question remains highly confused .

A Kalash man, possibly with his wife by his side. He could easily be an Italian, an Albanian, a Spaniard or a Portuguese. She’s harder to classify, but could be an Italian.

The Kalash worship a God called Dezau, which is from the Indo-European sky God *Dyaos (reconstructed form), from which the Greeks derived Zeus and the Romans Jupiter. So the Kalash are the last practitioners of ancient Indo-European mythology.

A Kalash woman with Caucasian features and somewhat Asian eyes. It’s hard to place her into a known ethnic group, but there are Kurds who look something like this. The Kalash probably originated in an area near Kurdistan, but no one really knows. The child looks more Asian. Love the costumes.

They have some odd customs.

One I particularly love is called the Festival of the Budalak. A strong teenage boy is sent up in the mountains for the summer with the goats. He practically lives on goat milk, which supposedly makes him even stronger.

When he comes back there is a festival, and at the festival he gets to have sex with any woman he wants, even his own mother, a young virgin or another man’s wife, but he only gets to rampage like this for 24 hours. Any child born of these encounters is considered to be blessed. They supposedly quit practicing this custom recently due to bad publicity, but many think that they still practice it in secret.

Definitely one of the world’s greatest customs!

A beautiful Kalash woman who eloped with a man recently to get married. Although many times the couple who do this are single, in quite a few cases a married woman can elope with another man. The new husband just has to pay double the bride price. The cuckold just takes it all in stride, or at least he doesn’t get homicidal. It’s amazing the kind of rights women have in this group. Too bad so many of them convert out to Pakistani Islam where women are pretty much chattel.

This woman obviously resembles some European phenotype, but I don’t know my European racial types a la Coon, etc, very well. I almost want to say Norwegian?

The Kalash are coming under pressure from radical Islamists recently and several villages have been converted by force (I thought Muslims never do this!) Also radical mullahs incite local Muslims to go into Kalash villages and smash their religious idols.

A Kalash shamaness or female shaman. It is amazing that in this misogynistic part of the world that women are granted such a high religious position. Druze women in Lebanon and Syria are also allowed to become high religious leaders. The costume is amazing. Shamans are one of the oldest aspects of human religions, characteristic of animist type religions.

As the world is full of spirits (or Gods in a polytheistic world) the shaman works via human psychology to manipulate the spirit world to the benefit of the patient. It is hard to say how much there is to it, but areas of the world where humans have been practicing this sort of thing for a long time can do some pretty amazing things.

There are reports out of the South Seas that whole villages would get together to cast evil spells on leaders of neighboring islands. In a number of cases, the leader died soon afterward. The cause of death was typically massive and multiple organ failure. It was as if he simply exploded inside. There are persistent reports that saying a prayer over water or a meal makes it taste better.

There are many reports of dying people communicating over long distances with loved ones just before they die.

And there are also many reports of people sensing nearby tragedies as they are occurring. All of this needs to be investigated by science but there are good reasons to think that this sort of thing is compatible with modern science, especially particle physics where we are all part of each other.

I am also convinced that clairvoyance and sharing of hallucinations are possible, having experienced both of these things. Of course, we were tripping on LSD-like woodrose seeds at the time, but still.

Pacific Islanders and Amerindians were also added, as there is good evidence that these two groups form valid major groupings. Cavalli-Sforza’s eight-race theory listed Amerindians and a group he called Pacific Islanders that apparently also included Papuans.

Rosenberg et al’s six-race grouping also included Amerindians and a group he called Melanesians, consisting of Papuans and Melanesians. Since other evidence indicates significant distance between Papuans and Melanesians and Papuans and Pacific Islanders in general, I decided to leave Papuans as a separate major group.

Yet a good case can be made to split off Polynesians, Micronesians and Melanesians in a compact grouping. The creation of the Polynesians is a result of the spread of the Lapita culture, one of the world’s greatest sea journeys undertaken by Austronesian mariners, Taiwanese aborigines (Chinese people) who left Taiwan 1000’s of years ago to settle Island SE Asia. First they went to the Philippines, then to Indonesia.

From Central Indonesia, they left and settled coastal New Guinea, bringing an advanced culture to New Guinea. They also may have settled as far east as the Solomons.

The Trobriand and Solomon Islands are said to be one of the centers for Proto-Papuan culture in the region, and may have been settled as long ago as 35,000 years ago.

Later, a new wave of Austronesians came out of Central Indonesia (near the Wallace Line) and moved through Melanesia, picking up only a few Melanesian genes along the way. These mariners then went off to populate the entirety of Polynesia in the past 2000 years.

So, according to this theory, Polynesians are mostly Chinese (Taiwanese aborigines) with some Melanesian in them.

One interesting question is why the Polynesians got so huge. First of all, they are not all huge. I have taught a lot of these people in the LA schools and there are a variety of phenotypes, including one that is short and thin.

One theory is that the journey to populate Polynesia was so harsh that only the strongest survived and the weakest died. It may have been necessary to eat the dead for the survivors to go on. Perhaps they fought to the death for scarce resources. Anyway, on many Polynesian islands an extremely brutal culture of continuous, potentially genocidal warfare was the norm and this was probably the world center for cannibalism.

Finally, the last wave to move out was the Micronesians. This group consisted of Polynesians who moved out of Polynesia to populate Micronesia. According to the theory above, they are mostly Chinese (Taiwanese) with only a small amount of Melanesian in them.

The suggestion above was that both the Polynesians and the Melanesians are mostly-Chinese (Taiwanese) people. That conclusion is based on a recent paper that has not yet been widely distributed.

However, another paper suggests that the major Haplogroups in Polynesians – C and F – are indigenous to the region, meaning they are related to the original Melanesian and Papuan settlers.

That paper, and many others, suggests that Micronesians and Polynesians are about 5

Interestingly, the vast majority of the Chinese genes in Melanesians and Polynesians seem to have come from one group of Taiwanese aborigines – the Ami.

A group called the Alor in far eastern Indonesia clusters with Melanesians and a group called the Toba Batak of northern Sumatra in Indonesia clusters with Micronesians.

Alor of far Eastern Indonesia after a major disaster. They are Melanesians who speak Papuan languages. The languages are endangered and very poorly documented. There is a major undertaking underway right now to at least document these languages.

Some very interesting looking Alor women. Although they are Melanesians, they look a bit different from many other Melanesians. The woman on the left has some pretty Asian looking eyes. This may be because they speak an Austronesian language. Melanesians who speak an Austronesian language have some Chinese (Taiwanese) genes, but never more than 2

Both White Nationalist and Afrocentrist varieties of ethnic nationalist idiots keep trying to insist that these folks are either Black or closely related to Blacks.

These people are some of the furthest away from Africans on the planet. You can’t go by phenotype or appearance or even behavior. None of that means much. You have to go by genes. As these people were some of the first to split off from Africans, they have been evolving away from them for the longest. Whites are much closer to Blacks than these Melanesians.

An Alor man who is working with a linguistic team that is documenting Alor languages. Alor is a major diving site for commercial recreational diving crews. The water is still nice and clear here and the coral reefs are still intact. The fish population is good too as there are not a lot of people living in this part of Indonesia. The famous Komodo Dragon lives near here on Komodo Island in far eastern Indonesia.

The reason these people, who are much less related to Black people than I am, are always called Black, is due to the color of their skin! But that has nothing to do with anything. A bobcat and coyote are similarly colored too. Truth is that if you evolved in the areas of the Earth with the highest UV radiation, you often ended up with very dark skin, which does resemble that of Africans.

But this is just convergent evolution and has nothing to do with relatedness. This guy is a lot more closely related to Chinese than to Black people. The Alor do seem to have about 2

The Toba Batak of Northern Sumatra. The guys in this photo actually do look Micronesian – I have seen photos of Micronesians. How these Micronesians ended up on the north coast of Sumatra is news to me. The Toba Batak live west of Medan in the area around Lake Toba, especially on Samosir Island. Their elaborately carved wooden houses are a popular tourist attraction.

A photo of a Toba Batak family. I had a hard time finding quality pics of the Toba Batak. You can see that they are extremely dark – much darker than most people living in this area. Also I think that some Micronesians may have wavy hair like that. The Toba Batak are Micronesians who somehow ended up in northern Sumatra.

This shows that Indonesians are not any particular race, although most are more general SE Asian types fairly close to Filipinos.

Classification of races is a tricky business. In my post, I went by genetic distance alone and not phenotype, culture, behavior, etc. I also treated very gingerly all contributions by ethnic nationalists, who are known to be profoundly dishonest about this stuff. Despite PC nonsense, there clearly are races of mankind. In fact, my classification scheme posits 87 minor races, and it is still undergoing revision.

References

Capelli, C.; Wilson, J. F.; Richards, M.; Stumpf, M. P. H.; Gratrix, F.; Oppenheimer, S.; Underhill, P.; Pascali, V.L.; Ko, T. M.; and Goldstein, D. B. (2001). “A Predominantly Indigenous Paternal Heritage for the Austronesian-Speaking Peoples of Insular Southeast Asia and Oceania”. American Journal of Human Genetics 68:432-443.

Cavalli-Sforza, L. L., P. Menozzi, A. Piazza. 1994. The History and Geography of Human Genes. Princeton: Princeton University Press.

Jablonski, N. and Chaplin, G. (2000) “The Evolution of Human Skin Coloration”. Journal of Human Evolution.

Repost: The Major and Minor Races of Mankind

The Major and Minor Races of Mankind

Repost from the old site that was shut down. This post is very long and complicated – it runs to 83 pages – but I have tried to make it as easy to understand as possible. Please feel free to dip into it at your leisure. Updated January 28, 2013. Regularly updated.

As you can see by the title, this is an awfully ambitious post. Those who believe that race does not exist, or that Caucasoid, Negroid, Mongoloid and Australoid are outdated terms of no use, might as well bail out right now and save yourself the exasperation.

Recent prior attempts include the usual Mongoloid – Caucasoid – Negroid Three Race Theory, which is discussed below. The main problems with this theory are twofold: that it fails to classify a group called Australoids and that it fails to note the huge split between SE Asians and NE Asians.

From Cavalli-Sforza’s recent work comes an eight-race theory: European Caucasoids, South Asian and North African Caucasoids, Northeast Asian Mongoloids, Southeast Asians extending from Thailand to Indonesia and the Philippines, Pacific Islanders, Australian Aborigines, Negroids and American Indians.

This is not bad, but I would argue that there is no reason to put both Arabs/Berbers and South Indians in one race (see Cavalli-Sforza’s own map below). Genetically, they are quite distant.

From my World Book Encyclopedia 1990 comes a nine-race theory: Negroids, Caucasians, Asians, Polynesians, Micronesians, Melanesians, Aborigines, South Indians and Amerindians. To this I recently added three more very distinct groups, Khoisan (Bushmen), Pygmies and Negritos, to come up with 12 races.

But we can go further than this. If Polynesians and Melanesians are widely regarded as separate races, we should be able to distinguish races based on any other major grouping at least as genetically distant as Polynesians and Melanesians. When I finally found two hapmaps showing the distance between Polynesians and Melanesians, I got the idea for a new race theory based on genetic distance alone.

This theory in most cases is based only on genetic distance, and not physical appearance of physical anthropology. In a few cases, races were grouped into a major group based on appearance – for instance, genetically, Chukchis are in the Caucasian square below, yet they look anything but Caucasian.

Though many distinguish Melanesians and Papuans, Capelli’s (see below) genetic analysis puts them in one race. But see Figures 1-4 below which clearly put them in separate groups. Also, Melanesian and Papuan teeth are very different from each other.

Some people are likely to be upset by this theory.

Surely the Japanese will not be happy to learn that they are virtually identical to the despised Koreans. White Nationalists will not be happy to learn that Turks, Jews, Kurds and Iranians are included in the European race and that they cannot include South Indians with Australoids.

NE Asians and ignorant amateur anthropologists will be unhappy to learn that there is no reason to lump SE Asians with Australoids and that the hated Filipinos (which some refer to as the “niggers of Asia”) are very close to the high-IQ, high-achieving Southern Chinese and the Filipinos haven’t a trace of Negrito in them.

It is standard of NE Asian racialists and amateur anthropologists on the Net to say that the Filipinos are heavily-Negrito.

There are traces of Australoid (Papuan) genes in the Malay, some Indonesians, the Southern Thai and the Coastal Vietnamese, but these admixtures are not large, and the Filipinos haven’t any observable Australoid traces.

Filipinos are closer to Southern Chinese than any other race below, although they are also close to the Aeta Negritos. This is because the Aeta and Ati Negritos are not Australoids genetically but instead are related to SE Asians. Anthropomorphically, they are Australoids.

There is also a more substantial Melanesian component in many Indonesians (except those in Western Indonesia), but there is In fact, as Figures 1-3 below indicate, they are Asians and are most closely related to other Pacific Islanders. In fact, the distance between SE Asians and Australoids is greater than the distance between NE Asians and Caucasians.

Afrocentrists will be unhappy to learn that various dark folks like South Asians, Melanesians, Papuans and Negritos cannot be considered to be “Black” by any sane definition of the word.

This theory creates nine major races and 113 minor races. It is a work in progress.

Most of this document comes from Cavalli-Sforza’s haplogroup gene map of the human race below.

Figure 1: Cavalli-Sforza’s Principal Coordinate (PC) autosomal DNA haplogroup gene mappings of major human ethnic and racial groups. There are differences between a PC mapping and the tree mappings below.Much of the racial grouping below is based on this map – on genetic distance between groups, not on superficial resemblances between groups. The upper left square can be called NE Asian. The lower left square can be called SE Asian. The upper right square can be called Caucasian. The lower right square can be called African.Figure 2: Another Cavalli-Sforza map showing general genetic distance, with tremendous overlap with the map above. This map clearly separates out Papuans and Melanesians and also Filipinos and Thais. There is some confusion here regarding the placement of Northern Turkics with Amerindians and whether NW Amerindians should be cleaved off into a separate race.

This map is actually interesting because it implies that there are six major races of humans – not three – NE Asians, SE Asians, Oceanians (Australoids), Pacific Islanders, Caucasians and Africans. As you can see, the distance between NE Asians and SE Asians and between SE Asians and Pacific Islanders is greater than that between NE Asians and Caucasians. SE Asia is clearly an area of profound genetic diversity.

Figure 3: Yet another map, in this case a genetic tree. Once again, Papuans must be cleaved from Melanesians and Thai, and Chinese are clearly separated. This is the first tree that shows the Northern Chinese, and it seems clear it wants to put them with the Koreans and Japanese. This map shows five major races – Caucasians, NE Asians, SE Asians, Africans, Papuans and Aborigines.

Figure 4: More from Cavalli-Sforza showing genetic distance. This was apparently used to map one or both of the maps above. Based on this, I split the Thai off from the Filipinos. This map also shows that Aborigines are most closely related first to Mongolians and Siberians and second to Japanese and Koreans.

I usually wanted about 150 points difference to split off into a separate race, but in some cases I split off closer groups if they were distinguished somewhere else, like in any combination of Figs. 1, 2 or 3. You need to click on it to read it properly.

The initial impulse for this post was this paper in the American Journal of Human Genetics, A Predominantly Indigenous Paternal Heritage for the Austronesian-Speaking Peoples of Insular Southeast Asia and Oceania (Capelli et al 2001). If you look at Table 4 in Capelli, you can see that they carefully delineate out Polynesian and Melanesian groups based on Haplogroup mapping.

Since many scholars of race include both Melanesians and Polynesians as separate races, this table serves to delineate what the proper genetic distance between genetic groups needs to be in order for them to be separate races.

Based on Polynesians and Melanesians as separate races in Table 4 in Capelli, I was able to sort out four more groups in that table, if only to get some idea of the distances between racial groups.

First, an Indonesian Race was separated out, including all but the easternmost island groups such as the Alor that go into Melanesian. Javanese and Sarawak were later included based on Figure 5. Later, based again on Figure 5, the Toraja and Mentawi were separated out, each into their own groups. The Toraja are an ancient farming group in South Sulawesi. The Mentawi are the indigenous peoples of the Mentawi Islands west of Sumatra. They still live a hunter-gatherer lifestyle.

A Lesser Sunda Race was also split out (see Figure 5), but the Alor were not covered, as they lumped more with Melanesians. The Lesser Sunda Race included the Lembata, the Lamaholot, the Manggarai and the Kambera. These people have mixed Indonesian and Melanesian ancestry. The Lembata and Lamaholot live on Lomblen Island east of Flores Island. The Kembara live on Sumba Island and the Manggarai live in the West of Flores Island.

Second, a Filipino-Ami Race, composed of Filipinos and the Ami, a Taiwanese aborigine group (the Filipinos are almost genetically identical to the Ami and are quite close to the Southern Chinese – see Figure 1 in Capelli) was split off.

Third, a South Chinese Race consisting of unknown groups that was later expanded below was split off.

Based on the distances between these clearly differentiated races in Capelli, I was able to plot plot racial distances in Figure 1 above to infer major and minor races based on distance.

All of the groups created via Capelli were then further chopped up based on Cavalli-Sforza here (p. 234-235). An Indonesian Race consisting of Sulawesi, Borneo and Lesser Sunda survived the cut, while the Alor of Lesser Sunda went into Melanesians. Malays themselves are distinct enough to create a Malay race.

The proto-Malay or Temuan, who have some of the most ancient genes on Earth of all of the Out of African peoples, are an ancient aboriginal group in Malaysia. They have an extremely diverse genetic signature (See Figure 5), enough to split off a category all of their own.

The Bidayuh or Land Dayaks are the indigenous peoples of Sarawak. Their genetics are wildly divergent (Figure 5), as we might expect from such an ancient people, hence, they form their own stock.

Some comments are in order.

Although separate NE Asian and SE Asian Major Races were created in order to account for both the vast differences between NE and SE Asians (the distance between NE and SE Asians is greater than the distance between Caucasians and NE Asians) it should still be noted that at a deep level, this is clearly one race.

The Gilyak and Ainu are leftovers from the original Proto-Northeast Asians. The Proto-Northeast Asian homeland was around Lake Baikal maybe 35,000 years ago. The Ainu themselves may go back 18,000 years to the Jomons, who arrived from Thailand. These people resembled Australoids.

In Figure 1 above, Northern Turkic forms a clear race with various Amerindians, yet in Figure 4, they seem to be quite distant. The Buryat have also been linked to Amerindians, even though anthropologically, they are linked to Mongolians and genetically they are close to Koreans.

The North Turkics are closest to the Northern Chinese and the Nepalese, both of which were split off into separate groups. The Manchu and Qiang were added to the Northern Han based on genetics for the Manchu and the fact that the Qiang have an origin in the north. The Yunnan Han, a southern group, oddly cluster with Northern Chinese, as do the Hui.

The Oroqen, a Siberian Tungusic tribe in northeast China that is genetically very divergent, was split off into its own group.

The Nepalese, consisting of Nepalis and Newaris, are genetically Asians, though they resemble Caucasians. They pretty much straddle the line between Caucasians and Asians. A lot of groups close to them – Turkics, Mongols, Northern Chinese, and Altaics, straddle the line between Caucasian and Asian.

Nepalis are closely related to South Indians. They are also close to Central Asians. The Central Asian Race includes the Kirghiz, Karalkalpaks, Uzbeks, Turkmen and possibly others. Although they are mixed Caucasian-Mongoloid people, genetic analysis shows that they can be included with Asians. However, other analysis (Table 2) shows that they are best placed in with Caucasians, though only barely.

Others, such as Kazakhs, are closer to Tuvans and also Mongolians (Table 2). The Kazakhs were placed into a Mongolian Race, somewhat arbitrarily.

The Sherpas were then further split off and placed in with the Yakut (p. 231). All of these splits were based on this data (p. 229). The Tuva were given a separate race based on data showing them splitting away from the Yakut-Sherpas (p. 229)

Northeastern Indians were put into the Mon-Khmer Race somewhat arbitrarily, since this is who they cluster with. There was some confusion. In one paper, the Naga, Apatani, Nishi and Nemang cluster with the Mon-Khmer, and the Adi go in with Tibetans.

The situation is somewhat contradicted by this Y-DNA graph (Reddy 2007), which puts the Apatani, Nishi and Adi, along with the Tripuri, Jamatia, Mog and Chakma, in a single Indian Tibeto-Burman Race. Because of this cluster, and because this group tends to separate somewhat from General Tibetan, I created an Indian Tibeto-Burman Race.

Note that the Tibeto-Burman Tujia, Yizu and Shan cluster away from Indian Tibeto-Burman to some extent. The Mizo and Yizu, Indian Tibeto-Burman groups, cluster more with General Tibetan. However, the Mizo are far enough away from the rest of General Tibetan to warrant their own stock (chart). The Garo also cluster with General Tibetan on Y-DNA, but on Mt-DNA, they are very different (chart) (Reddy 2007).

A group of the Mundas was split off as a Meghalaya Race on the basis of their differentiation on MtDNA (chart) (Reddy 2007). Some Indian Tibeto-Burman groups such as the Bai and the Pnar were included. This race includes the War Jantia, Bhoi, Maram, War Khasi, Kynriam, Nishi, Pnar and Bai. All of these groups are found in Meghalaya or over the border into China.

A group consisting of the Santhal, Naga, Munda, Kurmi and Sudra were split off from this group due to their dramatic difference on MtDNA (chart). This group also lives in NE India.

There is a group of Indo-European speakers in NE India that can be differentiated from the rest of the groups on Mt-DNA. This NE India Indo-European Race consists of the Mahishya, Bagdi, Gaud, Tanti and  Lodha.

The Mon-Khmer are close enough to Thai and Southern Chinese in Fig. 4 to be included with the Tai, but they were split off due to the obvious distance in Fig. 1. The Mon-Khmer, Southern Chinese and Thai groups are clearly all closely related.

The Zhuang were split off from Mon-Khmer into a Munda Race on the basis of this autosomal DNA table (p. 235) (Cavalli-Sforza 1994). The The Austroasiatic Race consists of the Mon, Zhuang, She, Santhal, Ho and Lyngngam. Most of these groups are found in NE India, but the Mon are in Burma. Most speak Austroasiatic languages, but a some speak Tibeto-Burman or even Indo-European languages. The Nongtrai group with this race in Y-DNA (chart) but not on MtDNA (chart), where they may well form their own group.

The Zhuang are a group in Southern China. They left Central China for Southern China 5000 yrs ago. This group was originally thought to be part of the proto-Tai group in Southern China that later moved down into SE Asia and gave rise not only to the Thai, but also helped form many other SE Asian groups.

At the time of the split from proto-Tai to Tai, the Zhuang went to Guangxi Province and the Tai went to Yunnan. In 1200, the Tai moved down into Indochina and mixed with local groups, becoming the Thai, Lao and Shan.

The Senoi are an ancient group in Malaysia dating back about 4,000-8,000 years. From the close genetic relationship, it seems that the Senoi may have split off from the proto-Zhuang or an earlier group soon after the group left Northern China for Southern China. The Santhal, Ho and Shompen may also have been early split-offs.

The Shompen at least are thought to be a very old group. Originally it was thought that they were remnants of the early people (Negritos) who settled the area, but further research indicated that they are an Austroasiatic group, albeit an ancient one.

Although there is much controversy about the origins of the Senoi (Are they Negritos?) a variety of points of inquiry converge on the notion that they are related to SE Asians.

The Senoi are Veddoids, an ancient group with possible links to the Negritos and the original settlers of Asia 70,000 years ago. There is fascinating evidence for this as Senoi skulls cluster with skulls from the Andaman Islands, Coastal New Guinea and Tamils. Andaman Islanders are Negritos, the New Guinea population is Melanesian and the Tamils are thought to be Veddoid.

The Senoi speak an Austroasiatic language and are also thought to be related to the Vietnamese and the Khmer. Senoi teeth resemble SE Asian and Polynesian teeth. It is thought that the Senoi came down from Southern China and bred in heavily with the Negrito Semang in Malaysia. The Senoi have wavy hair like most Veddoids, though some have straight hair and a few have woolly hair like Negritos.

I recently split the Greater Andamanese and the Onge into two separate major races each based on new data showing that they are profoundly different from all other humans. Whether or not they get separate major races of their own each is open to debate and is determined by the depth of their differences.

However, the data does show that they are each completely separate branches on the human tree. As the Andaman Islanders were the first people to split off after we left Africa and they have been evolving for ~70,000 years in isolation, it figures that they would be extremely different.

I also decided to split Australoids into a macro race alongside Caucasians, Africans and Asians due to charts showing that they are extremely different from all other humans. This group would include for now Papuans, Aborigines and Andaman Islanders.

The Tungus, a group of mostly reindeer-herding tribes, including the Even and the Evenki, were given a separate group based on this map (p. 227). The Evenki are also close to various Tibetan groups, because these Tibetan groups came from NE Asia also.

Amazingly, the Yenisien (of which Ket is the last surviving member) Language Family has now (in 2004) been conclusively tied to the Amerindian Na-Dene Language Family, the first conclusive linking of a New and Old World language family. Even though the Ket presently reside quite a bit to the north of the Altai region where most Amerindians came from, the Ket used to live down near the Altai thousands of years ago.

Northern Turkics include such groups as the Altai, Hazara, Shor, Tofalar, Uighurs, Chelkan, Soyot, Kumandin, Tuva and Teleut. They are located around the Altai Mountains where China, Mongolia and Russia all come together. This is where most of the Amerindians came from.

Evidence for including the Hazara, who speak a language related to Persian, in the Northern Turkic group is a chart that shows the Hazara clustering with the Uighur.

Malay Negritos (the Semang) were given a separate race based on a recent study finding them highly differentiated from other Asian populations. The Jehai and Kensui are related Negrito groups in Malaysia (Figure 5).

Though Cavalli-Sforza includes Berbers barely into the African square, I include them with Caucasians due to their greater resemblance to Caucasians than African, and also due to genetic analyzes that show that they have little Black in them. However, some Berbers are clearly African. Analyses of the more-Caucasian Berbers find that, across the board, they are on average Tuaregs were given separate races because they are clearly separate from Berbers and all of the African groups in Fig. 1.

However, Tuaregs do cluster (p. 169) with Algerians and Bejas. Since Algerians are Caucasian and most Tuaregs are Africans (though they vary considerably), I had to separate them into major races based on appearance. This is one of those cases where genes flies in the face of physical anthropology.

Bejas are a mixed-race people living in northeastern Africa and speaking a Cushitic language. They look like Ethiopians. Ethiopians are about 5

Similarly, Nubians are grouped (p. 169) in with the Caucasian Berbers, although most people consider them to be Black people. With examples like this, you can see why Fig. 1 has Berbers on the border of African and Caucasian.

Figure 1 also puts the Chukchi in the Caucasian square, though they clearly resemble Asians. I lump them in with Asians due to their obvious resemblance to Asians. I included Aleuts with Chukchis due to a recent paper showing a linkage.

Siberian Eskimos were included for the same reason. The entire group was called the Beringian Race. The Koryaks were split into a separate group due to Cavalli-Sforza’s data. The Itelmen were later added to the Koryaks due to evidence showing that they are related. Both were combined into a Paleosiberian Race. The Reindeer Chukchi, apparently a more Siberian group, was split off due to its great (p. 228) genetic distance from other groups.

The Uralic Race was split into a Siberian Uralic Race including the Samoyed, Ket and Nentsy subgroups (p. 227). The Nganasan are an outlier (p. 229) in this group, and there was barely enough evidence to split them into a separate group.

Northern Na-Dene speakers were split from the North American Eskimos whom they resemble (p. 323), on the basis of this tree (p. 227). Similarly, Ge and Tucanoan (linguistic groups) Amerindians were split off from the rest due to great distance (p. 322) between them and the others.

A Fuegian Amerindian Race was created based on evidence that they exhibit extreme genetic differences with all other Amerindians. They are probably the ancestors of the original peopling of the Americas.

The Nootka, or Nuuchahnulth, were also split off due to the finding of a fifth major haplogroup lineage (p. 1166) in them in addition to the main four lineages – A-D – usually found in Amerindians. This line links back to ancient Amerindian remains and goes back to Mongolia.

I started out with a General Amerindian Race, but I decided to split it into four races – Northwest American, Northern, Central and Southern, based on Figure 2. It is true that I could not make these splits on the basis of Figure 1 or the genetic distance charts, but as most serious splits on Figure 2 went into separate races, I decided to split the Amerinds in the same manner.

Further, the Amerinds have some of the greatest internal genetic distances of any geographical group, far more, for instance, than the Europeans and Iranians, so the splitting seemed valid.

South Indians are included with Caucasians based on a general consensus that these are an ancient group of Caucasians. The reason being their resemblance in facial and body structure to Caucasians. In addition, Figure 1 clearly puts them in the Caucasian square, and the other three figures clearly show that they are most closely related to Caucasians.

Although genetic studies say that South Indians are all one race and there is good reason to believe this, Figure 1 delineates South Indians and North Indians into separate groups, though there is a clear transition from one to the other. Figures 2 and 3 reiterate the distinction between South and North Indians.

There is data linking Vietnamese genetically with Cantonese. Vietnamese genetics are very complex and it is all being worked out. They are clearly an Austronesian-Tai mix with heavy S. Chinese admixture and some undetermined amount of Khmer and Cham mixed in. Vietnamese does not include the Montagnards, who are the indigenous people and seem to be related to Negritos.

There is good evidence also linking the Vietnamese and related groups to the Tai, however, there seems to be better evidence linking to them to a small group of mostly Mon-Khmer speakers. The Deang or Paluang,  the Jinuo and the Blang lump together with the Vietnamese (Lĭ 2006). The Mon-Khmer speaking Deang live in Yunnan, Burma and Thailand,  the Tibeto-Burman speaking Jinuo live in Yunnan and the Blang also live in Yunnan. So the closest living relatives to the Vietnamese people are in Yunnan, and next in Burma and Thailand.

Since there is quite a bit more distance between Filipinos and Thais than between Filipinos and Southern Chinese, I split off Thais into a separate race. I also kept the Filipino-Ami Race above, but added the Guangdong Han (Guangdonren in Chinese) to the group based on evidence that they are linked to the Ami.

Based on Fig. 5, I further refined the Filipino portion of this group into Tagalog, Visaya and Ilocano speakers, while splitting off the Manobo into a separate group, as they are divergent (Fig. 5). Tagalogs are an ethnic group who live mostly in Luzon and Oriental Mindoro, while Visayan languages are spoken in the Visayas region in the central Philippines, encompassing the islands of Panay, Negros, Cebu, Bohol, Leyte, Samar and Palawan. Ilocano speakers are located in the far north of Luzon.

A race called the Southeast China Race was created based on a tight clustering of the Minnan Nan, Hakka, and overseas Chinese of Singapore and Thailand. Based on Figure 5, the Cantonese Han (outside of Hong Kong) were added to this race.

A separate Taiwanese Aborigine Race was split off, based on Cavalli-Sforza’s work. This group, best seen as the principal Taiwanese Aborigine Race, consists of the Atayal, Bunun and Yami. Another Taiwanese Aborigine group, the Paiwan, was split into an Island SE Asian Race based on Cavalli-Sforza. Interestingly, the Paiwan, Atayal and Yami are also somewhat close to the Tai Race (see below).

The Taiwanese Aborigines have an interesting background, and their prehistory is in need of further research.

In addition to the Thais proper, I also include other Tai groups such as the Tai Lue, Tai Kern, Tai Yong and Tai Yuan on the basis of Figure 5. All are found in Thailand. Many groups are related to the Thais. They are the Lao, Shan, Dai, Lahu, Aini and Naxi. The Lahu, Dai and Aini were included on the basis of this report. All of them are found in Yunnan. This group is found in Southern China (especially Yunnan), Laos, Vietnam, Thailand and Burma. The Buyei are also related to the Thai.

Two aboriginal groups of Thailand are so different as to warrant a separate stock each.

The Htin, or Mal, are ancient aborigines of Thailand speaking a Khmuic language. In Figure 5, they are different enough to constitute their own stock.

The Mlabri are a very strange group of hunter-gatherers in Thailand who are very poorly understood. They live very primitive lives. Their genetics is wildly diverse and suggests that they were founded from a small stock only 800 years ago or so. That is, they went through a genetic bottleneck. Some think that they are former farmers who went back to land for some reason. They are one of the most genetically wildly diverse people in Asia (see Figure 5).

Although Fig. 4 suggests that Southern Chinese and the Thai should be grouped together, Figs. 1-3 suggest otherwise. Clearly, the two groups are very close, but I decided to break Southern Chinese off due to the other figures above, especially Figure 1, that suggest they are a separate grouping.

I lumped a number of groups into a Southern Chinese Race, including the Dong, Yi and the Han living in Henan Province, China, based on evidence that they form a group with the Southern Chinese. These groups are found in the Southern Chinese provinces, including Henan, Guangxi, Sichuan, Guizhou, Hainan and Fujian.

I created a Hmong-Mien Race for the Hmong and the Mien, since, while they are close to the Southern Chinese Race, they are different enough to merit their own category (see Figure 5).

Figure 5: Click to enlarge. A good chart of many of the Asian races, showing how well genes and language line up.

The Li is a genetically divergent Chinese ethnic group that forms it’s own outlier between the Southern and Northern Chinese. However, it trends more towards Southern Chinese. They also link up very closely to the Khmer. The suggestion here is that the ancestors of the Khmer were the Li.

What we are learning about Negritos is that instead of forming a distant group, they are often closest to the people they are living around. So the Philippine Negritos (Aeta) are closest to other Filipinos, and the Veddas are closest to other South Asians.

The Mamanwa, a Negrito group on Mindanao Island in the Philippines, are highly divergent from the rest of the Philippine Negritos. The Mamanwa are thought to be remnants of the original Negrito population in the Philippines.

The Palau, a Micronesian group, curiously cluster with Aeta and Agta Negritos, indicating that they may be the remains of the original settlers of SE Asia. The Agta and Aeta cluster together also (Fig. 5). The Aeta and Agta Negritos both live in mountainous areas of Luzon.

The Iraya Mangyans of the Philippines are also quite different, but they are close to the Ati Negritos, also of the Philippines (Fig. 5). The Ati live on Panay Island, in the Visayas Group. The Iraya are a Mangyan group living on Mindoro Island. The Mangyans are not Negritos, but they are still an indigenous group in the Philippines and are different from most Filipinos.

The Toba Batak, a tribe in northern Sumatra, curiously clusters with the Kanaka and Yap Micronesians. On Figure 5, the Karo Batak line up with the Toba Batak. They may be leftovers of the original Melanesian-Polynesian mix that populated Micronesia. The Kanaka is an old name for a The Veddas are clearly related to the Negritos as one of the sole remaining leftovers of the group that left Africa 70,000 years ago and populated all of Asia. There are interesting links between them and the Toala of Southern Sulawesi and the Senoi of Malaysia. Nevertheless, almost all Veddas except the Kerala Kadar cluster with the South Indian Race.

North Indians include the Punjabis, Central Indic, Punjabi Brahmins, Rajputs, Vania Soni, Mumbai Brahmins, Jats, Kerala Brahmins, Pakistanis and Koli.

South Indians include the Munda, Bhil, Maratha, Rajbanshi, Oraon, Parji, Kolami-Naiki, Chenchu-Reddi, Konda, Kolya, West Bengal Brahmins, Parsi and Gonds. Although many of these groups are thought to be related to Veddas or Negritos and part of the original people of India, they now resemble other South Indians.

Kerala Kadar are a highly diverse Vedda group who are probably the ancestors of the original people of India. They live in the forests of Kerala and resemble Australoids.

The Gurkha and Tharu are two highly diverse groups in Nepal. In Figure 5, the Ladakhi are close to them, so a Himalayan Race was created to encompass them.

The Kanet live in Himachal Pradesh and Gujarat and probably have some Tibetan mixture. The inclusion of the Uttar Pradesh Brahmin with these people in unexplained.

The Nicobarese and the Senoi cluster with the Munda Race on Y-DNA, but on Mt-DNA, they are extremely different (chart here) (Reddy 2007), which is suggested by their ancient origins. Each got a separate race due to their extreme divergence.

The Khoisan were divided into three groups, the San, Khoi and Hadza. The Khoi are probably a creation of intermarriage between SW Bantus and San. The Hadza are an ancient group in Kenya and Ethiopia. The San form a separate race with the Somalis.

The Sandawe are another Khoisan group that was also divergent, but not enough to form a separate group, on the table here (p. 176), but was split off due to its divergence on the tree here (p. 169) .

The Sara are a a very divergent Nilotic group from Chad, who form a race with Biaka Pygmies from Central African Republic. All of the African splits are from here (p. 169).

The Funji, a Nilo-Saharan group, was both split off due to their diversity (p. 169). The Bedik, a small group of 5,000 in Senegal, are also divergent. Though they are not divergent enough to be a race on the distance chart, they are on the PC and tree charts. The Funji, or Gule, live in Sudan on the Blue Nile near the Ethiopian border (p. 170). The Bedik are a small group in Senegal.

Three groups in Senegal, the Peul, Serer (650,000) and Wolof (2 million), were split off into a separate group although they they do not have enough distance in the distance chart to warrant that, similar to the Southern Chinese, Thai and Khmer. However, like these three groups, the Senegalese groups are quite different on the PC Chart and on the tree chart, so they were split off (p. 181-182).

The Peul (700,000) speak Fulani (Peul is just French for Fulani), but are settled African farmers, unlike the more pastoralist Caucasian – Berber group that roams across the Sahel.

Figure 1 appears to divide humanity into four racial squares – Northeast Asian, Southeast Asian, Caucasian and African. Although the difference between SE and NE Asians is deeper than that between Asians and Caucasians, it is clear that this is all one race – the Mongoloids. Inside of that group, all of the Chinese are related.

The homeland of the proto-Asians dates back over 60,000 years and is in northern Vietnam and southern China. We know this because the Vietnamese have the greatest genetic diversity in all of Asia. The split between the NE Asians and the SE Asians is at least 53,000 years deep. There is a Hmong-specific line alone that may date as far back as 26,000 years.

The traditional tripartite system favored today by racial minimalists – Caucasian, Mongoloid and Negroid – is appealing, but I could not reproduce it. As there is as much difference between Asians and Caucasians as between SE Asians and NE Asians, why should I create a Mongoloid Race?

Instead, I split it into nine separate major races. This enabled me to account for the fact that while Australoids are Asians (genetic analysis of various Australoids has proven this), they are definitely an extremely divergent group.

This analysis also recognizes the deep diversity of Australoids – the Aborigines are more distant to Africans than any other race (once again despite physical appearance), due to genetic drift in Australia for millenia.

At first I put Papuans into an Australoid Race with Aborigines, but later I split them off. The distance between Aborigines and Papuans is as great as between Caucasians and Asians, so why lump the two Oceanians together? At the same time, we should recognize that there is a Mongoloid super-group that does encompass Aborigines, Papuans and both NE and SE Asians.

Figure 1 puts Aborigines barely into the NE Asian square, Papuans on the line between SE and NE Asians and Melanesians further down in the SE Asian square. Figure 4 shows that Aborigines they are mostly closely related first to Mongolians and Siberians and next to Japanese and Koreans. This is due to the Ainu substructure in these groups.

I also reluctantly split off the Kalash into a separate major race, inside of Caucasians, based on a stunning paper that differentiated the Kalash among groups such as Africans, East Asians, Oceanians, etc.

Based on Cavalli-Sforza’s six-race theory above in part, I split off Amerindians into a separate race inside of Asians. I also split off Pacific Islanders into a group called Oceanians, but contra Cavalli-Sforza, I did not include Papuans with the rest of the Pacific Islanders.

My Pacific Islander group includes Melanesians, Micronesians and Polynesians. Note that one group of Indonesians is included in each of the Melanesian and Micronesian subgroups. Therefore, there is no Indonesian race per se, as Indonesians encompass a variety of groups, although most can be put into a few SE Asian minor races.

That is based on genes. If you go by anthropometrics, you can get a group called Australoids that includes Negritos, Melanesians, the Ainu, Papuans, Aborigines, the Senoi, Tamils and Fuegian Amerindians.

The Andaman Islands Negritos are also profoundly different from other groups, and are said to have the “purest” genetic profile of any group, once again due to genetic drift and lack of outside inputs. Papuans, Melanesians and Negritos are also extremely distant from Africans, once again despite physical appearances.

The Khoisan (San and Bushmen) in Africa are the oldest race on Earth based on genetic signatures dating back 53,000 years, and this is what the original humans who came out of Africa 70,000 years ago may have looked like.

The various Negrito groups, the Aborigines and possibly the Papuans are also very ancient.

Mongoloids as we now know them are only 9,000 years old – previous groups in Asia looked more like Australoids – of which the Ainu and Gilyak are the last remaining descendants.

Australoid types and their ancestors are the original peoples of India , Burma, ThailandThe Bantu (or the Africans that we are familiar with) may go back much further – it has been up to 40,000 years since they split off from the Pygmies. There is a suggestion that they were distinguishable from Khoisan (Bushmen) even 100,000 years ago (p. 160). The ancestors of all Africans seem to have come from West Africa at least 35,000 years ago (p. 160).

Amerindians at the tip of South America are very different in head shape than the rest of the Amerindians – looking more like Australoids – and their genetics is also profoundly different.

The proto-Caucasian homeland may have been in the Caucasus about 45,000 years ago. Another theory says it was in Central Asia.

The most ancient Europeans are the Saami and an ancient, isolated group of Sardinians. Among Caucasians, the Berber and South Indian Races appear to be very ancient, and both are extremely divergent within the Caucasian group. They may be surviving remnants of the most ancient Caucasians.

The South Indians are actually midway between Caucasians and Asians genetically and are only lumped with Caucasians because this is who they most resemble.

Europeans proper only go back 10,000 years or so, but the Saami (best seen as proto-Europeans) seem to go further back than that.

South Indians have been evolving in considerable isolation for about 15-20,000 years in the subcontinent. Prior to that, they appear to have come from the Middle East. The Berbers of today appear to be continuous with Berbers of up to The rest of the groupings mostly follow from Figure 1. More tables like Table 4 in Capelli would be very helpful in order to tease out more minor races.

A single asterisk indicates considerable genetic difference from related groups, two asterisks indicates a highly divergent group, and three asterisks is a profoundly divergent group. Major races are in red.

Some groups are not represented. I was not able to classify many groups with Negrito or Veddoid affiliations, such as the Tamils of South Asia and the Montagnards of Vietnam.

Mien and Qiang are Northern Chinese tribes, but the Mien have moved to the South lately. I could not find any good genetic data on the Qiang. The Nu were arbitrarily included in the Tibetan Race because they came from Tibet, but I don’t have good genetic data to prove that this is really a single unit. The chart here does not clarify things much.

The Bhutanese, though most closely related to Tibetans, were given their own race based on data showing that they are nevertheless considerably distant from Tibetans.

The Barya are a mixed-race group in Western Eritrea.

The Gilyak or Nivkhi are an ancient tribe living on the border between Korea, Russia and Japan that has ties to the Ainu. Ryukyuan is another name for Okinawan. They were given a separate race based on studies showing them intermediate between the Ainu and modern Japanese.

The Va (or Wa) are an ethnic group in Yunnan and Burma that seems to be distinct from the Northern, Southern and Tibetan Chinese groups. The Va seem to be about equally related to the Northern and Southern Chinese, indicating some sort of a dual origin. The Jingpo, or Karen, another Yunnan group that also occurs in Burma, were included with them based on this paper. The Lawa of Thailand were added to this group based on Figure 5. Interestingly, the languages of the Lawa and Va are also closely related.

A Southern Japanese Race was split off from the Japanese, Ryukuyans and Ainu. This group is made up of Kyushu Island, the southernmost island, and the Kinki region of Honshu, near the city of Kyoto. The Japanese in this area are highly divergent (p. 232).

The European-Iranian Race includes almost all Europeans except the Saami, Basques and Sardinians. The Saami and the Sardinians are very distant and the Basques much less so from the rest of the Europeans.

Although Cavalli-Sforza classes the Basques, Yugoslavs and Greeks as genetic outliers, there was not enough distance between the Yugoslavs and Greeks and other Europeans to split them into a separate group on the basis of genetic distance. Furthermore, the Greeks are clearly in the European group in Fig. 1 – they are quite close to English and Danes in the PC analysis.

However, I did split the Basques off based on their lying outside the European-Iranian cluster on the PC chart in Fig. 1. Most groups that were distinguished as independent units outside of clusters on Fig. 1 were given separate races.

The Greeks are interesting in that, while they are obviously a part of the Europeans on all charts, they are also the only Europeans that are are also close enough to most Middle Easterners to be included in their group. So the Greeks are a link between the European and Middle Eastern groupings inside the Caucasian Race.

The Iranian branch includes Jordanians, Iraqis, Assyrians, Druse, Lebanese, Kurds, Georgians, Caspians, Turks, Jews, and related groups in the area. It was difficult to decide whether to put the Turks in the Iranian subgroup or in the Central Asian subgroup, as they are close to both.

It was also very difficult to decide whether to put the people of the Caucasus, the Kurds, Turks, Caspians and Jews in the Iranian group or the Central Asian group as they cluster with both. I decided on sheer geographic grounds to put them in the Iranian group. The Russian Saami are closer to the Tungus and were included in that group.

Although some Arabs, West Asians and all South Indians were split off, this was somewhat arbitrary. Although they form separate groups on the Fig. 1, the Arabs are closely enough related to various Europeans, including Greeks, to be included with Europeans (Fig. 4). However, the Arabs were not as close as the Iranians.

Likewise, South Indians are close to Iranians, who are in turn close to Greeks and Italians – note that Iranians are also somewhat close to Danes and English (Fig. 4). As the Greeks link Europeans genetically with Middle Easterners, the Iranians link Europeans genetically with India. Arabs and South Indians were only split off due to the distance observable in Fig. 1.

West Asians were also split off due to their divergence. Based on this chart, they seem to be a compact grouping. This group includes the Pashtuns, Brahuis, Balochis, Makranis and Sindhis.

Further research shows that the Tajiks and Hunza, who at first appear to group with the West Asian group above, actually compose two groups divergent enough to be split into 2 different races. The first group is made of the Hunza of the Karokorams, the Bartangi of the Pamir Range and the Roma or Gypsies of Europe. So the Gypsies have a Himalayan origin.

The second group is made up of Tajiks, the Shugnan of the Pamirs, Bukhara Arabs and three groups in India – the Kallar of Kerala, the Sourashtran of Tamil Nadu and Yadhava of various parts of the region.

The Kalash, a strange, ancient, tiny tribe with Caucasian roots in northwest Pakistan in Chitral Province, are so diverse that they could very well form their Since making a macro race out of a tiny ethnic group in Pakistan is absurd, I decided to throw them as a major race subsumed under Caucasians, albeit on the grounds that they are an extremely divergent race. They were classed with Caucasians because there is a general consensus that this is what they are (last two links are racist).

Due to their divergence, Kuwaitis and Arabians – consisting of Saudis, Yemenis and Bedouins – were split off into separate groups.

The are numerous groups that are more or less recent combinations of various groups and do not yet deserve their own racial category.

Hispanics are in general a mixture between Caucasians (typically Iberians) and Amerindians. They have been evolving for a short time and have not had time to differentiate into anything suggesting a race yet (despite nonsense from La Raza demagogues).

There are other Hispanics who are heavily mixed with Blacks, Caucasians and Amerindians. This is especially seen in South America in Brazil, Venezuela, and Colombia, and even in Central America and Mexico.

There are large Black-White mixed populations in the West Indies. In Singapore and Hawaii, there are rapidly mixing populations that defy categorization.

This paper is basically just a shot in the dark and is more properly termed a pilot or exploratory study. I welcome evidence-based inputs from any knowledgeable persons who wish to add to this preliminary grouping of the human races, major and minor. All suggestions coming from nationalists of various types, ethnic or otherwise, typically lacking evidence, will probably be rejected outright.

There are 4 macro races of man, 11 major races of man and 115 minor human races of man.

* = significant genetic distance from most other groups

** = major genetic distance from most other groups

*** = extreme genetic distance from most other groups

Asian Macro Race

Northeast Asian Major Race*

Japanese-Korean Race (Japanese – Korean)

Southern Japanese Race (Honshu Kinki – Kyushu)

Ryukyuan Race (Okinawans)

Ainu Race*** (Ainu)

Gilyak Race** (Gilyak)

Northern Chinese Race (Northern Han – Qiang – Manchu – Hui – Yunnan Han)

Oroqen Race (Oroqen)

Sherpa-Yakut Race (Sherpa – Yakut)

Nepalese Race (Nepali – Newari)

Mongolian Race (Mongolian – Inner Mongolian – Buryat – Kazakh)

Northern Turkic Race*** (Dolgan – Altai – Shor – Tofalar – Uighur – Chelkan – Soyot – Kumandin Teleut – Hazara)

Central Asian Race (Kirghiz – Karalkalpak – Uzbek – Turkmen)

Tuva Race (Tuva)

Tungus Race (Even – Evenki – Russian Saami)

Siberian Race

Beringian Race** (Chukchi – Aleut – Siberian Eskimo)

Paleosiberian Race (Koryak – Itelmen)

Reindeer Chukchi Race (Reindeer Chukchi)

General Tibetan Race (Tibetan – Lisu – Nu – Tujia – Akha – Burmese –  Yizu)

Mizo Race (Mizo)

Bhutanese Race (Bhutanese Buddhist)

Siberian Uralic Race (Nentsy – Samoyed – Ket – Mansi – Khanty)

Nganasan Race (Nganasan)

Uralic Race (Komi – Mari)

North American Eskimo Race (Inuit)

Amerindian Major Race*

Northern Na-Dene Race

Northwestern American Amerindian Race

Northern Amerind Race

Central Amerind Race

Southern Amerind Race

Ge Amerindian Race (Ge Language Group)

Tucanoan Amerindian Race (Tucanoan Language Group)

Nootka Amerindian Race (Nuuchahnulth – Makah)

Fuegian Amerindian Race (Ona – Yaghan – Kaweskar – Aonikenk – Alacaluf)

Southeast Asian Major Race*

Southern Chinese Race (Dong – Henan Han – Yi – She – Punu – Naxi)

Hmong-Mien Race (Chinese Hmong – Thai Hmong – Mien)

Li-Khmer Race (Li – Khmer)

Southeast China Race (Hakka – Min Nan – Singapore Chinese – Thai Chinese – Cantonese Han)

South China Sea Race (Tagalog – Ilocano – Visayan – Ami Taiwanese Aborigine – Guangdong Han)

Manobo Race (Manobo)

Philippines Negrito Race (Aeta – Agta – Palau Micronesian)

Mangyan-Ati Race (Iraya – Ati)

Mamanwa Philippines Negrito Race (Mamanwa)

Tai Race (Thai – Tai Lue – Tai Kern – Tai Yong – Tai Yuan – Lao – Lahu – Aini – Shan – Dai – Muong – Buyei)

Vietnamese Race (Vietnamese – Deang – Jinuo – Blang)

Mlabri Race** (Mlabri)

Htin Race (Htin)

Kachin Race (Kachin – Karen – Va – Nung – Lu – Lawa)

General Taiwanese Aborigine Race (Ayatal – Bunun – Yami)

Island SE Asian Race (Paiwan Taiwanese Aborigine – Sea Dayak – Sumatran – Balinese)

Bidayuh Race** (Jagoi)

Indonesian Race (Sulawesi – Borneo – Lesser Sunda – Sarawak – Javanese)

Mentawi Race (Mentawi)

Toraja Race (Toraja)

Lesser Sunda Race (Kambera – Lembata – Lamaholot – Manggarai)

Malay Race (Malaysia Malay – Singapore Malay)

Proto-Malay Race** (Temuan)

Austroasiatic Race (Mon – Zhuang – She – Ho – Lyngngam)

Nongtrai Race (Nongtrai)

Santhal-Naga Race (Santhal – Naga – Munda – Kurmi – Sudra)

Meghalaya Race (War Jantia – Bhoi – Maram – War Khasi – Kynriam – Nishi – Pnar – Bai)

Senoi Race (Senoi)

Shompen Race (Shompen)

Garo Race (Garo)

NE Indian Indo-European Race (Mahishya – Bagdi – Gaud – Tanti – Lodha)

Indian Tibeto-Burman Race (Apatani – Nishi – Adi – Tripuri – Jamatia – Mog – Chakma)

Semang Malay Negrito Race*** (Semang – Jehai – Kensui)

Oceanian Major Race*

Micronesian Race (Yap – Kanaka – Toba Batak Indonesian – Kora Batak Indonesian)

Polynesian Race* (Tonga – Western Samoa – French Polynesia – Cook Islands)

Melanesian Race (Fiji – Vanuatu – New Ireland – Papuan Melanesian – Nasioi – Alor Indonesian)

Australoid Macro Race

Australian Major Race***

General Australian Aborigine Major Race***

Queensland Aborigine Race***

Western Territory Pama-Nguyan Aborigine Race***

Papuan Major Race***

General Papuan Race***

Motu Papuan Race***

Sepik-Ramu Papuan Race***

Greater Andaman Islands Major Race***

Greater Andaman Islands Negrito Race***

Onge Andaman Islands Major Race***

Onge Andaman Islands Negrito Race***

Caucasian Macro Race

General Caucasian Major Race***

European-Iranian Race (Most European – Caucasus – Armenian – Jewish – Turk – Kurd – Iranian – Jordanian – Iraqi – Assyrian – Druze – Lebanese – Georgian – Caspian – Palestinian)

Basque Race (Basque)

Norwegian-Swedish Saami Race*** (Norwegian Saami – Swedish Saami)

Finnish Saami Race** (Finnish Saami)

Sardinian Race** (Sardinian)

Kuwaiti Race* (Kuwaiti)

Arabian Race (Saudi – Yemeni – Bedouin)*

West Asian Race (Pashtun – Brahui – Balochi – Makrani – Sindhi )

Tajik Race (Tajik – Bukhara Arab – Shugnan – Kallar –  Sourashtran – Yadhava)

West Himalayan Race (Hunza – Bartangi – Roma)

Berber Race*** (Berber)

Egyptian Race (Egyptian)

North African Race (Moroccan – Libyan – Tunisian – Canarian)

Algerian Race (Algerian)

North Indian Race** (Punjabi – Central Indic – Punjabi Brahmin – Rajput – Vania Soni – Mumbai Brahmin – Jat – Kerala Brahmin – Koli)

Himalayan Race*** (Gurkha – Tharu – Ladakhi)

Karnet-Uttar Pradesh Brahmin Race*** (Karnet – Uttar Pradesh Brahmin)

South Indian Race** (Munda – Bhil – Maratha – Rajbanshi – Oraon – Parji – Kolami Naiki – Chenchu Reddi – Konda – Kolya – West Bengal Brahmin – Parsi – Gond)

Kerala Kadar Race*** (Kerala Kadar)

South Dravidian Race*** (Sinhalese – Lambada – Irula – Izhava – Kurumba – Nayar – Toda – Kota – Malayaraya – Tamil)

Kalash Major Race***

Kalash Race*** (Kalash)

African Macro Race

African Major Race***

Tigrean Race*** (Tigrean)

Amharic Race*** (Amharic)

Sudanese-Barya Race*** (Sudanese – Barya)

General Nilotic Race (Shilluk – Masai – Nuer – Dinka – Luo – Turkana – Karanojo – Mabaan)

Funji Nilotic Race (Funji)

Tuareg-Beja Cushitic Race*** (Tuareg – Beja)

Nubian Race*** (Nubian)

Wolof-Peul-Serer Race (Wolof – Peul – Serer)

General Bantu Race (Most Bantus)

Bedik Bantu Race (Bedik)

West African Race (Most West Africans)

Mbuti Pygmy Race

Sara Nilotic-Biaka Pygmy Race (Sara – Biaka)

San Khoisan-Somali Race*** (San – Somali)

Khoi Khoisan Race*** (Nama – !Ora)

Hadza Khoisan Race*** (Hadza)

Sandawe Khoisan Race (Sandawe)

References

Capelli C., Wilson J. F., Richards M., Stumpf M. P. H., Gratrix F., Oppenheimer S., Underhill P., Pascali V. L., Ko T. M., and Goldstein D. B. 2001. A Predominantly Indigenous Paternal Heritage for the Austronesian-Speaking Peoples of Insular Southeast Asia and Oceania. American Journal of Human Genetics 68:432-443.

Cavalli-Sforza L. L., Menozzi P,. Piazza A.. 1994. The History and Geography of Human Genes. Princeton, NJ: Princeton University Press.

Chu J. Y., Huang W., Kuang S. Q., Wang J. M., Xu J. J., Chu Z. T., Yang Z. Q., Lin K. Q., Li P., Wu M., Geng Z. C., Tan C. C., Du R. F., and Jin L.. 1998. Genetic Relationship of Populations in China. Proceedings of the National Academy of the Sciences of the United States of America (PNAS). 95:11763-11768.

Harihara S., Saitou N., Hirai M., Gojobori T., Park K. S., Misawa S., Ellepola S. B., Ishida T. and Omoto K. 1988. Mitochondrial DNA Polymorphism Among Five Asian Populations. Am. J. Hum. Genet. 43:134-143

Jablonski, N. and Chaplin, G. 2000. The Evolution of Human Skin Coloration. Journal of Human Evolution. Available on this blog here.

Lĭ H., Pan S., Donnelly M., Tran D., Qin Z., Zhang Y., Cheng X., Yin R., Lin W. and Hoang V. 2006. Dermatoglyph Groups Kinh Vietnamese to Mon-Khmer. International Journal Of Anthropology 21:3-4, pages 295-306.

Lin M, Chu CC, Chang SL, Lee HL, Loo JH, Akaza T, Juji T, Ohashi J, Tokunaga K. March 2001. The Origin of Minnan & Hakka, the So-called “Taiwanese”, Inferred by HLA Study. Tissue Antigens:57(3):192-9.

Omoto, K. (1984). The Negritos: Genetic Origins and Microevolution. Acta Anthropogenetics 8(1-2):137-47.

Omoto K., Ueda S., Goriki K., Takahashi N., Misawa S., and Pagaran I. G. (1981). Population Genetic Studies of the Philippine Negritos. III. Identification of the Carbonic Anhydrase-1 Variant With CA1 Guam. Am J Hum Genet. 33(1): 105-111.

Reddy BM, Langstieh BT, Kumar V, Nagaraja T, Reddy ANS, et al. 2007. Useem, John. 1948.  Robert LindsayPosted on Categories Aborigines, Africa, Ainu, Americas, Amerindians, Andaman Islanders, Anthropology, Arabs, Asia, Asians, Assyrians, Australia, Basques, Bedouins, Berbers, Black-White (Mulattos), Blacks, Cambodia, Caribbean, Central Africa, Central African Republic, Central Asians, China, Chinese (Ethnic), Chuckchi, Dene-Yenisien, East Africa, East Indians, Eritrea, Ethiopia, Europeans, Filipinos, Genetics, Georgians, Greeks, Hispanics, Hmong, India, Indonesia, Indonesians, Inuit, Iranians, Iraqis, Italians, Japanese, Jews, Jordanians, Kazakhs, Kenya, Khmer, Khoisan, Koreans, Kurds, Kuwaitis, Language Families, Lao, Laos, Latin America, Lebanese, Linguistics, Malays, Malaysia, Melanesians, Mestizos, Micronesians, Mixed Race, Mongolians, Na-Dene, Near Easterners, Negritos, North Africa, North Africans, Northeast Asians, Oceanians, Oroquen, Pacific, Pakistanis, Papuans, Philippines, Physical, Polynesians, Pygmies, Race/Ethnicity, Regional, Reposts From The Old Site, Roma, SE Asia, SE Asians, Siberians, South Asia, South Asians, Spaniards, Sudan, Taiwanese Aborigines, Tajiks, Thai, Thailand, Tibetans, Turkmen, Turks, Uighurs, Uzbeks, Vietnam, Vietnamese, Yemenis2 Comments on Repost: The Major and Minor Races of Mankind

Uzbek, Kazakh, and Kirghiz

A recent What Language Is This post featured Uzbek. A commenter just got the correct answer today. Hence it is time for a post on Uzbek and two neighboring Turkic languages, Kazakh and Kirghiz.

Uzbek

Uzbek. Uzbek is a well-developed language. I see scholarly papers written in Uzbek. In addition, Uzbek is two separate languages. The well known language to the north, Northern Uzbek, that is the official language of a nation and another language to south, Southern Uzbek, spoken by 2.5 million people in Afghanistan, which has no official status. The two  Uzbeks are not mutually intelligible.

Kazakh

Kazakh has a lot of problems. A lot of people don’t speak it, and there are many ethnic Russians who speak Russian there. They live in the north. There was a lot of talk of them leaving with the independence of Kazakhstan, but most of them stayed. Insane Russian ultranationalists claim the northern part of Kazakhstan for the Russians and wish to incorporate it into Russia (just to show you how insane those lunatics are). A lot of people would rather speak Russian than Kazakh, even a lot of ethnic Kazakhs.

They have had a hard time developing Kazakh into a full modern language, and there are a lot of issues with Kazakh in the schools. Furthermore, there has been profound Russian influence on the language in terms of vocabulary, phonology, and possibly morphology.

There are many Kazakh speakers in China, over 2 million of them. They have gotten caught up in the Uyghur political mess.

New information shows that they speak a separate language from Kazakh in Kazakhstan. So many Russian words have gone into Kazakh since World War 2 that Kazakh speakers in China can no longer understand the Kazakh TV and radio broadcasts which can be heard in China. Therefore, Chinese Kazakh is a separate language, and within Macro-Kazakh, there are two languages, Kazakh and Chinese Kazakh. I didn’t get this new information in time to incorporate it into my book chapter published in a recent book (Lindsay 2016) when I redid the Turkic language family, creating a number of new languages.

Kirghiz

First of all, it is not entirely certain that Kazakh and Kirghiz are separate languages.  In my book chapter (Lindsay 2016), I decided that this was one language called Kirghiz-Kazakh, with two separate languages, Kirghiz and Kazakh. This is because they are largely mutually intelligible. However the communication is more one way than two ways. I believe that the Kazakhs can understand the Kirghiz well, but the Kirghiz have some problems understanding Kazakh. However the difficulties were not great enough for me to split it into a separate language.

However via communication with a Kazakh speaker, he insisted that Kazakh and Kirghiz were definitely separate languages. I forget the reasoning but he had an intuitive sense of whether a pair of languages consisted of one being a dialect of the other or whether they were two separate languages.

That is, native speakers have excellent intuition on the language/dialect question, which shows how preposterous the Linguistics profession is that the language/dialect split is not a scientific question. If it’s not a scientific question, how is it that native speakers the world over have an intuition over whether a lect is a separate language or a dialect of another language? Apparently humans have excellent intuitions about things that simply do not exist in a scientific sense. Who knew?

Kirghiz has a lot of problems. I think just about everyone speaks it, but the problem is the language is not well developed into a modern language yet, so all sorts of technical and modern terms have had to be invented. Even whole new dictionaries have been created. Bottom line is Kirghiz is not really ready to serve the functions of a national language even for the state. I believe they might use Russian, instead but I am not certain.

References

Lindsay, Robert. 2016. “Mutual Intelligibility among the Turkic Languages,” in Süer Eker and Ülkü Şavk. Çelik. Endangered Turkic Languages, Volume I: Theoretical and General Approaches: Before the Last Voices Are Gone (Tehlİkedekİ Türk Dİllerİ Cİlt I: Kuramsal Ve Genel Yaklaşimlar Son Sesler Duyulmadan), Ankara, Turkey/Astana, Kazakhstan: International Turkish-Kazakh University and International Turkic Academy.

Anatolian Homeland for Indo-European: The Argument Is Over

CLAVDIVS AMERICANVS: I don’t have a dog in this fight and I not an Indo-Europeanist. But check this anti-Kurgan Hypothesis video. The talk about ‘wheel’ cognates across three continents is fascinating.

I know some Indo-Europeanists pretty well. We communicate back and forth. And they have told me that it is now unanimous among Indo-Europeanists that the proper name for the family is Indo-Anatolian, similar to Joseph Greenberg’s Indo-Hittite. In other words, Anatolian itself is so divergent from the rest of IE that it is a sister to all of the non-Anatolian languages.

The argument is over. Indo-European is divided into Anatolian and everything else, so Anatolian is a sister family to all of the rest of IE. That right there shows that Anatolian split far before all the rest. According to the Kurgan Hypothesis, that can’t be so.

And if Anatolian split is that far from the rest of IE, obviously it was the initial homeland and Colin Renfrew’s Anatolian homeland theory gained backing when a phylogenetic or Bayesian analysis by Atkinson and Grey showed that IE goes back 9,000 YBP.

However, the Kurgan Hypothesis is also correct. Obviously, the Kurgan area was a secondary homeland for the IE people. It looks like IE sat  in Anatolia for ~3,000 years, not doing a whole lot, and then went to the Kurgan area 6,000 YBP. I would argue for a secondary split of Tocharian after Anatolian and then all of the rest of IE splitting off from that.

Indo-European being divided into Anatolian first and then all non-Anatolian languages after that, similar to how

  • Turkic is actually Bulgaro-Turkic, as Turkic is divided into Chuvash, etc. and all of the non-Bulgaric languages.
  • Tungusic is now divided into Manchu-Tungus, ie, Tungusic is divided into Manchu and all of the non-Manchu languages.
  • Tai is split into the Kadai languages and then all of the non-Kadai languages.
  • Inuit is divided into Aleut and then all of the non-Aleut languages.
  • Austronesian is obviously divided into the languages of Taiwan and then all of the non-Taiwan languages, but they are not formally split that way.

We don’t have a lot of these splits in IE itself that I’m aware of.

Ethnic Nationalists and Language Classification Mix Like Oil and Water

Mithridates: What’s for damn sure is that ethnic nationalists (Oh, the myriad varieties of them!!) are the #1 threat to any sane and sensible discussion on topics like… and language classifications…

I am not sure if you have read any of my linguistic work, but some of it has already been published. I had to deal with ethnic nationalists a lot (Turkish ethnic nationalists – some of the worst of them all), and it was definitely not pleasant. For instance, they insist that the (IMHO – 53) Turkic languages are all just dialects of Turkish! And good luck trying to disabuse them of that notion. They’re very aggressive and they’re even violent (check out recent videos), and that makes them even more scary.

Right now I am dealing with a Macedonian ethnic nationalist (all Balkan varieties are very unpleasant to say the least) and he was extremely unpleasant. He is trying to get me fired from my professor job LOL. I’m flattered that he thinks I’m obviously a university professor, but nope, I’m not. So I wish him luck getting me fired from a job I don’t have.

Beyond that, ethnic nationalists are the bane of language classification. There are so many “dialects” that are so obviously separate languages but we can’t split them because ethnic nationalists run the discourse in those countries. Idiotically, my field utterly unscientifically states that there is no way to tell a language from a dialect.

Oh yeah? We can put a man on the moon but we can’t develop a successful definitions of language and dialect? How absurd is that?

So we stupidly throw up our hands and say this is not a linguistic question (though obviously it is) and say the distinction between the two is a political matter (!), so we throw it over to the most dishonest  reprobates people on Earth next to out and out criminals, namely, politicians! Of course politicians  never lie or anything like that!

So really we should take all of our scientific questions over to politics and let politics answer these questions! Hell, politics won’t even give you a straight answer if you ask it what time or day it is. If a politician’s mouth is moving, he’s lying. It’s practically a requirement to score high on the psychopathy scale to be a politician. So let’s let these pathological lying sociopaths called politicians answer our scientific questions in Linguistics!

Ethnic nationalists have infiltrated language classification by petitioning to get languages removed from their countries, as they wish to believe that the only language in say Ruritania is Ruritanian, and all of the other languages, no matter how different, are dialects of Ruritanian!

So Basque is just a dialect of Spanish, right? And Suomi or Lappish is a dialect of Swedish. And Sorbian is a dialect of German. And Breton and Basque are dialects of French. As you can see, we could go on and on here.

There are probably 2,000 languages within the scope of “Chinese,” yet the Chinese government lies and says there is only one Chinese language. We linguists have to go along with this insanity because…why?

Ethnic nationalists dishonestly removed several Occitan languages and several North Germanic languages in Sweden, among other places. I can’t believe that SIL (the publishers of Ethnologue who are now in charge of handing out ISO codes for new languages) fell for this.

Minority Languages in Russia

I’ve been working on this article for at least a year now, but actually I think it has been in my files for longer, up to five years. You can see that much of the information is a bit out of date as a result. A lot of this information was translated from Russian sources. The translations to English were poor, so the whole mess needed a huge rewrite from mangled Russian to English translation to a more proper English.

I’ve done this a number of times before and it was never easy. For some reason this is always a lot harder than it seems. For one thing, I had to eliminate entire sentences because I couldn’t properly understand what they were saying or they were saying something that didn’t seem correct to me.

For that matter it is quite hard to rewrite something written in seriously mangled English by someone who can’t write or even worse by someone who has English as a second language and doesn’t write it well. You would think it would be easy to turn mangled English into proper English, but it’s just not.

This post is pretty long. It runs to 33 pages on the web. If it were in a book, it would run to 16 pages.

According to the Constitution of Russia, Russian is the official language on the whole territory of the Russian Federation, but regions are given the right to establish republics and set their own their national languages. The Constitution also guarantees the right of all the peoples of Russia to preserve their native language and to create conditions for its study and development.

According to the Basic Law of Languages, citizens have the right to use their native language as the language of communication, education, learning, and creativity.

We will now look at the study of native languages in the schools of the Russian Federation in the areas within the jurisdiction of the regional authorities. In Russian schools, 89 different languages are studied, of which 39 are used as the language of instruction.

Adygea

In 2007 Parliament passed a law mandating the compulsory study of the Adygean language for Adygean children in schools where Russian is the mode of instruction. However, this law was repealed in 2013. Recently, March 14 was designed the Day of the Speaking and Writing the Adyghe Language. Parents of preschoolers may also choose to put their children in Aegean-language public kindergartens.

The Ministry of Education and Science reported the results of Adygean language teaching in the schools: in 43 preschools, 4,759 Adygean children study the language. In 127 preschools, children are taught the basics of Adyghe culture, customs, and traditions.

All students in Russian-medium schools must study the history and geography of Adygea, and Russian-speaking pupils have a choice of studying Adyghe Language or Adyghe Literature. 22,000 students are currently studying Adygean Language, and 27,600 are studying Adygean Literature.

Altai

There are regular proposals from the Altai people and educators to mandate the compulsory study of the Altai languages Northern Altai and Southern Altai for Altai children. Both Northern and Southern Altai are divided into three divergent dialects each, so there are actually six separate Altai languages. The three languages of the northern and southern groups each were combined into a Northern Altai and Southern Altai official language respectively.

Recently, an attempt was made to pass such legislation, but government legal scholars felt the law would violate children’s rights.

In Gorno-Altaisk on March 15, 2014 at the 9th Session of the Altay Culture Meeting, representatives of the Altai people went further, adopting a resolution to mandate Altai languages study for all students, no matter their ethnicity. However, attendees warned about a Russian backlash.

They felt that such a law would inevitably lead to rising dissent among Russians and other non-Altaians in the republic. This unrest could conceivably lead to the elimination of republic status for the Altai Republic itself.

Bashkortostan

A law is in place in Bashkortostan mandating the compulsory study of the Bashkir language by all students. Each educational institution gets to decide how many hours per week they wish to devote to Bashkir study. Parents of Russian children regularly protest this law and propose to make the study of Bashkir voluntary instead. Chuvash parents have also protested the law. Ethnic tensions have heightened in the area recently.

Buryatia

The question of the possible introduction of compulsory study of the Buryat language in republic schools has been discussed recently and has wide public support. Recently, a video titled, Buryad Heleeree Duugarayal! – “Let’s Speak Buryat!,” was released, urging Buryats to not forget their native language.

However, regional authorities decided to keep the study of Buryat optional in the republic. A few deputies appealed the ruling, and various amendments were adopted at their request, but the amendments did not substantially change the authorities’ decision to keep Buryat study optional. Opponents of the idea of compulsory study of Buryat in the schools fear that it will lead to the emergence of ethnic tensions.

Chechnya

In Chechnya, the national language is taught in all schools of the republic as a separate subject. Since 9

Despite the fact that the national language is widely used in everyday life, nevertheless, the scope of its use continues to steadily narrow. At the last roundtable of the Ministry of Culture of the Chechen Republic, officials noted what they felt was the alarming process of mixing Chechen and Russian in speech as well as a gradual tendency towards replacement of Chechen in the official sphere.

According to the director of the Institute of Education of the Chechen Republic, Abdullah Arsanukaev, the introduction of Chechen language instruction in the schools could ameliorate this situation. The government for its part is working to equalize Russian and Chechen ​​on the official level. It is expected to create a state commission for the conservation, development, and dissemination of the Chechen language.

Chukotka Autonomous Okrug

The main languages ​​in Chukotka are Chukchi, Eskimo, and Even. The government is now working on a program for the development of the these languages. So far, the Association of Indigenous Peoples of Chukotka has organized courses in Chukchi and Even.

Chukchi is the language of everyday communication for most Chukchi in the family and when engaging in traditional economic activities. In schools in Chukchi villages, Chukchi classes are compulsory in primary school and optional in high school.

Chuvashia

The Chuvash language is taught as a compulsory subject in schools and in a number of universities for one or two semesters.

“In the beginning, a lot of parents opposed their children studying Chuvash. But today I can say with confidence that these parents no longer feel this way. In contrast, some even want their the child to know the native language of Chuvashia, and probably rightly so,” says Olga Alekseeva, a teacher of Chuvash language and literature in School № 50 in Cheboksary.

The acuteness of the language issue in the country can be judged by recent events – in 2013, a court found Chuvash journalist Ille Ivanova guilty of inciting ethnic hatred for a publication about how the Chuvash language was disadvantaged in the Chuvash Republic.

Discussions around the native language exacerbated the recent language reform. According to opponents of reform, the new rules impoverished the language and could catalyze its Russification.

Crimea

The newly adopted constitution of the new Russian region declared three official languages ​​- Russian, Ukrainian, and Crimean Tatar. Education in schools will be carried out in these three languages​​.

Russian-speaking parents of children from Buryatia, Bashkortostan, and the Tatar Republic residing in Crimea have already appealed to the President of Russia and the leadership of Crimea requesting making the study of Ukrainian and Crimean Tatar voluntary in Crimea.

Activists fear that unless the law is rewritten, in the future, all children regardless of nationality will be obliged to study all three official languages. Signatories cite the example of their national republics, where Russian-speaking students have to learn a foreign language, the titular language of the republic.

Dagestan

The people of Dagestan speak 32 languages​​, although only 14 native languages are officially recognized. Elementary schools allow instruction in 14 different languages, depending on the region. The rest of the instruction is in Russian.

According to Murtazali Dugrichilova of the North Caucasus radio station Freedom, the native language of the ethnic group is spoken in the most parts of the country as the language of the home. “In rural areas, all of the local languages are spoken. In large cities such as or in Makhachkala or Derbent, teaching in national languages is optional,” he said.

In the future, at the suggestion of Ramadan Abdulatipova, Dagestan will form a commission on the use of Russian and local languages ​​of the republic. It is also expected that after the adoption of the law “On Languages ​​of the Republic of Dagestan,” all 32 languages ​​in the country will receive the status of the official language.

Director of the Institute of Language, Literature, and Art at the Dagestan Scientific Center Magomed Magomedov believes that after enactment of the new law, all of the native languages of the region will be present in the school system.

Dagestan took into consideration the negative experiences of other national republics in this area, and according to Magomedov, the law will prohibit demonstrations and pickets about language issues.

Ingushetia

According to the law “On the State Languages ​​of the Republic of Ingushetia,” Ingush and Russian are both used as official state languages in all educational institutions in the country.

Experts believe that the preservation and development of Ingush is necessary to ensure it is on an equal footing with Russian in all aspects in the republic. In addition, there has been a lot of discussion about the need to develop new words in Ingush for modern things such as industrial terminology.

Kabardino-Balkaria

In Kabardino-Balkaria, the debate over language issues flared up in connection with the adoption of amendments to the law “On Education.” The law mandates that both languages,​ Kabardian and Balkar, be used in education for children who have one of these languages as a mother tongue.

Kalmykia

According to the law “On Languages ​​of the Republic of Kalmykia,” in schools where instruction is in Russian, the Kalmyk language will be introduced starting in first grade as a compulsory school subject. Representatives of non-Kalmyks in the republic are unhappy with this law, but they have not said much about it.

Language activists point out that Kalmyk has a low status in Kalmykia. As an example, they cite the fact that cultural events and even national holiday celebrations are exclusively in Russian.

Karachay-Cherkessia

In the republic, Abaza, Karachay, Nogay, Circassian, and Russian are all official languages​​. The Constitution of the republic mandates compulsory education in the native language for students who have one of the above as a native language.

In addition, according to the law “On Education,” in those Russian-language schools, students who have a native language other than Russian must be taught their native language as a compulsory subject. National activists think that the best outcome is achieved when native languages are used as a mode of instruction and not taught as a special subject. At the moment, the republic is in the process of updating textbooks in Abaza, Karachay, Nogay, and Circassian.

Karelia

Karelia is the only national republic of the Russian Federation in which Russian is the only state language. One of the problems with raising the status of the Karelian language here has been the fact that Karelians are a minority in their own republic, and as a consequence, the republic has only a relatively small number of Karelian speakers.

Recently, President Anatoly Grigoryev of the Karelian Congress fielded a proposal to declare three official languages in Karelia ​​- Russian, Karelian, and Finnish. They modeled this notion on Crimea, where authorities promised to introduce trilingualism as the official policy.

National languages are optionally taught in preschool, elementary school, and high school. According to the Ministry of Education in 2013, 6,500 students studied Karelian, Finnish, and Veps.

Khakassia

As in many republics, the Khakass language is preserved mainly in rural areas that are densely populated by indigenous peoples. Compulsory Khakass language study is mandatory in all national schools in the republic.

Meanwhile, Political Science professor Gunzhitova Handa said that in Khakassia on September 1, 2014, Khakass classes became mandatory from grades 1-11, with an exam in Russian, Russian-Khakass, and Khakass schools.

Khanty-Mansiysk

According to NGO’s, there is only one native language course for the 4,000 speakers of Khanty and Mansi in the republic. Language loss in both languages has been accelerating in recent years. Representatives of youth organizations of indigenous peoples of the North have offered drastic solutions, including depriving national benefits to Khanty and Mansi peoples who do not know their native language.

According to the Hope Moldanova, president of the Ob-Ugric Peoples youth organization, “Young people have a different attitude towards their native language nowadays. Some of them are fluent in two languages but only understand but do not speak their native language, and others think it is sufficient to only know Russian, which is spoken by the majority.”

She too is concerned that the new generation is less interested in the national languages​​. Due to the low demand for the specialty, Ugra State University even closed its Finno-Ugric language Department.

Khanty still has 10,000 speakers in three divergent dialects.  The dialects are so divergent that they are actually separate languages. 4

In the east, there are still some child speakers but there has been a general shift to Russian. Intergenerational transmission of Khanty has stopped in the south. Schools in Khanty-speaking areas generally use Russian as  the mode of instruction.

Mansi has 1,000 speakers, 5

The northern dialect has most of the remaining speakers. There are only a few remaining elderly speakers of the eastern dialect. The southern dialect went extinct before 1950, and the western dialect is probably also extinct.

Komi

The Ministry of Education introduced the compulsory study of Komi language from the first grade in 2011. Later that year, in September 2011, the Constitutional Court ruled that the study of the Komi language in schools of the republic was mandatory. Now schools may choose two different Komi language study programs – “like a native” (up to 5 hours per week) or “as a state language” (2 hours per week in the primary grades).

According to Natalia Mironova, an employee of the Komi Scientific Center’s Ural Branch, this has led to latent discontent among the youth. She said high school students do not understand why they should waste time studying the Komi language when it takes away precious time they could be using to study for their math exams.

Mari El

In the Republic of Mari El, where the official languages ​​are Russian and Mari (Meadow Mari and Hill Mari), mandatory study of Russian and one of the Mari languages was introduced in 2013. Analysts say that among the Russian population, there is growing dissatisfaction with the fact that they are forced to learn what they consider to be an unnecessary language, but there have been few protests about the matter.

Mordovia

The republic introduced the compulsory study of either the Erzya or Moksha languages ​​in all schools of the republic in 2006. Originally, mandatory study of these languages only took place in national schools in districts and villages where there were many Erzya and Moksha people residing. Prior, since 2004, teaching of these languages had been optional in Russian-language schools.

When the compulsory study of these languages was introduced, there ​​were signs of dissatisfaction on the part of the Russian-speaking parents. Now, the number of dissatisfied parents has significantly decreased, and their voice is almost imperceptible.

Nenets Autonomous Okrug

In NAO there are 43,000 people, of which about 7,500 are the members of the titular population, the Nenets. The main problem in the study of the Nenets languages, Forest Nenets and Tundra Nenets, is the lack of books and teachers.

Tundra Nenets still has a good number of speakers, but Forest Nenets is only spoken by a small population. Tundra Nenets has speakers of all ages and is still spoken by children. However, in the west of the republic, a shift to Komi and Russian is underway.

According the Lyudmila Taleevoy of the Methodist SBD Nenets Regional Center for Education Development, the pedagogy programs at the university level no longer prepare specialists in teaching Nenets. Instead, children are taught Nenets by Russian-speaking teachers who studied Nenets when they were students. An old outdated Nenets grammar is used in instruction.

North Ossetia

According to the regional law on languages​​, children have the right to choose schooling in one of two languages – Russian or Ossetian. Ossetian consists of two dialects, Iran and Digorian. The two dialects are so divergent that they are basically separate languages.

According Ossetian journalist Zaur Karaev, all students who have another language as a native tongue, such as Armenians, Ukrainians, Azerbaijanis, and others, must study their native languages in language classes in the primary grades. The language teaching program is more complicated in high school.

Tatarstan

In Tatarstan, where only half of the population is a member of the titular ethnic group, the Tatars, the study of the Tatar language is compulsory for all. Non-Tatar speaking parents regularly protest this law. They even appealed to the Prosecutor’s Office claiming that the law discriminated against Russian-speaking students, but an inquiry by the prosecutor’s office found no violations.

Meanwhile, Tatar nationalists for their part remain alarmed about the state of the Tatar language. According to them, Tatar has a low status in the republic – for instance, in the streets, most writing on storefronts is in Russian, not Tatar. There are also problems with Tatar in TV media, and there is no university that conducts all of its teaching in Tatar.

Nevertheless, the republic regularly implements Tatar language projects and programs, a recent one being the introduction of the Tatar study in kindergartens.

Tuva

In contrast to most of the other republics, in Tuva, it is the Russian language that is in bad shape, not the titular language, Tuva, which is in much better shape. In 2008, a report noted that Russian was in terrible shape in Tuva.

According to Valerie Kahn, a researcher in the Sociology and Political Science Departments at the Tuvan Institute of Humanitarian Research, the authorities were forced to pay attention to this problem. 2014 was declared the Year of the Russian Language in Tuva. As a consequence, systematic measures have been taken to ensure that children in rural areas can learn Russian.

According to Khan, the Tuvan language is in excellent shape. Travelers also note that residents of the republic mostly communicate in Tuvan, although most signs on the streets are in Russian.

Meanwhile Tuvan journalist Oyumaa Dongak believes that the national language is oppressed. On her blog she notes that it is difficult to find Tuvans who speak pure Tuvan without Russian admixture, and even in the government, most employees do not know Tuvan. At the same time, she points out that the state allocated $210 million for the development of the Russian language and nothing for Tuvan.

Udmurtia

The State Council of Udmurtia recently rejected an initiative on compulsory study of the Udmurt language  in the schools of the republic.

Earlier, a similar initiative was made by the association “Udmurt Kenesh.” According to them, the compulsory study of the Udmurt will fight the loss of the Udmurt language in families where the parents do not speak Udmurt with their children as well as develop a culture of multilingualism among citizens. Russian activists have sharply opposed the proposals.

According to the interim head of Udmurtia, Alexander Solovyov, the budget annually allocates money for teaching and training in the titular language.

Yakutia

According to the law of the Sakha Republic “On Languages”, the languages ​​of instruction in secondary schools are Sakha or Yakut, Evenki, Even, Yukaghir, Dolgan, and Chukchi, and Russian in Russian-language schools.

In the non-Russian medium schools, Russian is taught as a subject. Local official languages of various parts of the republic ​​are also taught as a subject in Russian schools in areas in the north where there are large numbers of Evenki, Even, Yukaghir, Dolgan, and Chukchi speakers. In spite of the measures to preserve native languages other than Yakut, all except Yakut have been losing speakers in recent years.

In fact, Evenki, Even, Yukaghir, Dolgan, and Chukchi are only used as the principal means of communication in seven villages and towns. In all other places, most residents no longer speak those languages, and the languages are used mostly by the middle aged and elderly, and even then only in the home or in families that preserve traditional lifestyles like reindeer herding.

In Even areas, Even is taught as a subject from preschool through primary school. Even is an  endangered language. Even has 5,500 speakers.

In areas where the Evenki live, Evenki is taught from preschool through primary school, with an optional course in the eighth grade. Evenki is considered an endangered language. It has 25,000 speakers.

Dolgan, a language very closely related to Sakha, only has 1,000 speakers, and the number continues to decline. Mixed marriages are a problem as when a Dolgan speaker marries a speaker of another language, the children are raised in Russian and hence inter-generational transmission is broken. However, Dolgan is still spoken by all ages and is still being learned by children.

Chukchi still has 5,000 speakers and is considered to be in good shape. It is used in mother tongue education in regions where Chukchis predominate.

Yamal-Nenets Autonomous Okrug

Yamal-Nenets Autonomous Okrug faces problems common to republics where languages with only small numbers of speakers remain. The main indigenous languages ​​spoken here are Nenets, Khanty, and Selkup.

YaNAO has problems with  shortages of teachers for all three languages for both native language study classes and mother tongue education, which is offered in the nomadic schools. Other problems these languages face are language teachers who lack language teaching skills for beginning language learners and a shortage of instructional materials in the languages.

The Selkup language has 1,000 speakers, but it is in fairly good shape. It is only taught in the north of the speaker region and even there only until the fourth grade. In a couple of areas of the north, the language is still spoken by Selkups of all ages and also spoken by non-Selkups who reside there. In the north, 9

Problems

Virtually all minority languages in Russia suffer because parents and students themselves prefer to learn and speak Russian. This is not surprising, as Russian is not only spoken by the majority of the population, but it also remains the main language of interethnic communication in multinational Russia.

Students must pass the compulsory USE exam, a Russian proficiency test, in order to graduate from high school, hence students tend to study Russian more than other languages, including their own native language, in order to pass the test.

Nevertheless the fact remains that the native language remains the basis for the culture and preservation of the ethnic group. If the languages dies, the culture and in a sense the group itself die with it. Hence, promotion of native languages remains an important goal in Russia. Each region is trying to solve the native language problem in its own particular way.

Compulsory study of the official language of the particular region for all students has not had good results. For example, in Tatarstan, all students are required to study Tatar whether even if their native language is not Tatar.

This led to opposition by Russian-speaking parents who saw no use in their children studying Tatar. Further, it has led to the feeling that people who do not speak the language of the titular republic are being oppressed on the basis of their nationality.

Voluntary native language classes in schools do not lead to increased interest in native languages among youth. Realizing this, many regional governments have begun moving the national native language more into day to day life; for instance, by translating books and street signs into the national language.

Communication in the family itself from p parents to children remains the best way to preserve native languages. Peoples who pursue traditional occupations also tend to preserve their languages longer. Also, not everything can be translated into Russian. For instance, in the north, people still use their native language for items and concepts that have no good translation in Russian.

With the Internet has come increased interest among native peoples in preserving their culture and consequently the Net now offers more opportunities to learn native languages. On the other hand, the presence of Russian on the Net had a bad effect on native languages.

For instance, with the advent of the Internet, many more Russian borrowings and neologisms went into native languages. In addition, people on the Net using native languages often do not write their languages properly. This leads to impaired learning of the correct rules and spelling of the language.

As the head of the Center for National Education Problems FIRO MES Artyomenko Olga, a number of republics are reducing the hours of Russian instruction in the schools.

According to her, changes in the laws are needed in order to remove tension between ethnic groups and improve the quality of language instruction.

In particular, she recommended the removal of terms such as “non-Russian native,” “nonnative Russian,” and “Russian as a foreign language” from the laws of Russia.

A bill to update the legal place of the native languages of Russia has been in the works for a long period of time by the State Duma Committee of Nationalities. The bill has been received positively by the regions. Nevertheless, it has not yet passed the Duma.

Praise for my Work

I hope I haven’t published this before, but if I did, hey, chalk it up to vanity, eh?

These two glowing  recommendations are from  this fellow. I really like him a lot!

Peter S Piispanen Stockholm University, Graduate Student

On my work below, presently a 242 page, well, let’s face it, at this point, it’s basically a book, right? I have not yet found a publisher for it, though I have received some rave reviews from such far-flung places as Japan and Russia.

Mutual Intelligibility of Languages in the Slavic Family

Intelligibility studies are both interesting and of importance for the study of phonology, grammar, historical linguistics, the effect of language contact situations, as well as the sociocultural factors influencing languages perceived as high or low status, and so on.

Lindsay here presents the intelligibility between many of the Slavic languages in great detail – and this clears up many common and unspoken questions about these languages…the paper comes well recommended!

This paper was actually published, believe it or not, and it had to go through two peer reviews to get there.  The second peer review included the world’s top Turkologists.

Here’s the cite in case any of you are interested:

Lindsay, Robert. 2016. “Mutual Intelligibility among the Turkic Languages,” in Süer Eker and Ülkü Şavk. Çelik. Endangered Turkic Languages Volume I: Theoretical and General Approaches: Before the Last Voices Are Gone (Tehlİkedekİ Türk Dİllerİ Cİlt I: Kuramsal Ve Genel Yaklaşimlar Son Sesler Duyulmadan), Ankara, Turkey/Astana, Kazakhstan: International Turkish-Kazakh University and International Turkic Academy.

I also came up with the subtitle of the series – “Before the Last Voices Are Gone.” We went round and round about a few choices until we settled on that one. It has a nice literary beauty to it, I think.

I never did get a hard copy of that book I am published in. It was extremely hard to get a copy in part because it cost $75 and also because it would have had to have been shipped from Turkey to the US, and I understand that shipping costs for such things are just awful.

I have an e-copy of course, but it’s just not the same thing as a book, right? A book – you know, that hard thing with pages in it that you actually hold in your hand? Remember those things from a long time ago, maybe before some of you were born? If you don’t remember what a book is, perhaps ask your parents. They should definitely know what a book is.

It seems that a lot of publications are going pretty much e-publication only with no hardcover. Color me disappointed. No folks, it’s not the same thing. It’s just not. Sorry.

Mutual Intelligibility Among the Turkic Languages

A massive paper by Robert Lindsay on the study of mutual intelligibility of the Turkic languages, dispelling many myths and including language examples, historical considerations, and more – heartily recommended for any Turkologist or student of any Turkic language!

The Laz People of Turkey

The last Spot the Language piece was solved by a Turkish commenter who is one-half the ethnicity of the language: the Laz people. Here is his comment about the Laz and the region where they reside. Very nice comment and I would like to thank the commenter very much.

Ertuğrul Bilal: I am a Turco-Laz half-breed. There are at least half to close to one million people like me. I identify as a son of the homeland and as any particular ethnicity. This is also the primal identity adopted by almost all Lazes, who see themselves ethnically Laz only secondarily. Let’s put it his way: Black Sea people’s loyalty is more territorial than ethnic, just like cats.

FYI: Laz is not related to Turkish or any other Turkic language. It is part of the Kartvelian linguistic family, consisting of Georgian, Svan, and the Mingrelian-Laz twin peoples. The single substantial difference between the last two being that Mingrelians remained Orthodox, while Laz converted to Islam in late 15th and 16th century; otherwise the discrepancy is solely dialectal.

Laz people live on Northeastern Black Sea coast, actually at the eastern end towards the Turkish-Georgian frontier. This region has always been multi-cultural just as Anatolia used to be, only somewhat more so; even if superficially it is less obvious nowadays.

The local populace was originally mainly Tzans, a rather obscure culture, apparently resulting from an amalgamation of indigenous populace with immigrating/invading Cimmerians, westward-advancing Kartvelians and perhaps some other not well-known tribes ancestral to both Mingrelians and Laz in Antiquity when Greek colonizers founded practically all cities and most of the towns.

Today, you may find Turks (Alevi Turcomans forcibly relocated there by the Ottoman empire in 16th century who converted to Sunnism, except for a few thousand who remained Alevi) and other people of Turkic origin like my late father who told me his paternal lineage emigrated from Northern Dagestan and was either Nogay or Kumyk.

In addition, there are now Lazes, Georgians, Armenians (Hemshinids Islamicized long ago and some others forcibly assimilated to Turks in 1915), and Islamized Greeks, to mention only the most numerous.

Let’s put it this way – we are accustomed to quite a wide diversity of ethnicities in our country and especially in my parents’ native region, even if the official doctrine still tends to disregard the fact, and while it is not outright denial as in the past, a more subtle denial yet exists.

Praise for Two of My Papers

Mutual Intelligibility of Languages in the Slavic Family

Intelligibility studies are both interesting and of importance for the study of phonology, grammar, historical linguistics, and the effect of language contact situations, as well as the sociocultural factors influencing languages perceived as high or low status, and so on.

Lindsay here presents the intelligibility between many of the Slavic languages in great detail – and this clears up many common and unspoken questions about these languages…the paper comes well recommended!

This work yet remains unpublished. At the moment, it is 260 pages. At this point it is pretty much a book. It is simply a massive overview of all of the Slavic languages and it does add some new languages to Slavic. It also changes the classification around a bit. This might be harder to publish, as Slavicists are a pretty stuck-up lot and if you are not a Slavicist yourself, they pretty much don’t want to talk to you or publish your work.

Turkologists are a bit like that too, but there are not nearly as many of them, and they accepted my work as an amateur.

I did have one Slavicist tell me that all of this data has never been published before in a single work. Sure it has mostly been published, but most of it is in obscure journal papers that no one reads.  And no doubt a lot of good Slavicist work is being done in Slavic languages themselves and not in English. I know it is true in the Balkans as much of the work is still published in Serbo-Croatian.

I sent the paper out to a few Slavicists, including some of the top ones in the field, but I never heard back from them. All of these folks are extremely busy even at an advanced age and this is pretty much a book after all. They are always behind in their reading, preparing new papers, teaching, etc.

Mutual Intelligibility Among the Turkic Languages

A massive paper by Robert Lindsay on the study of mutual intelligibility of the Turkic languages, dispelling many myths and including language examples, historical considerations, and more – heartily recommended for any Turkologist or student of any Turkic language!

This paper was published as an 81 page chapter in The Handbook of Endangered Turkic Languages, Volume 1. There is a url of it in published form if you would like to it.

I haven’t got much other feedback from it except from a major Turkologist who said she really liked it. I was amazed.

A Turkologist whose work was quoted incorrectly was very angry at me for misquoting what she said. I am not a Turkologist myself so this stuff is going to happen. I tried to assuage her but she wasn’t having anything of it.

No one yet has written a book review on this book, nor has anyone commented on my re-do of Turkic, cumulatively adding 14 new languages and changing the classification a bit. The Wikipedia article about Turkic classification does not include mine. There is a chart at the end with my classification in it.

A Look at the Altaic Question, a Current Controversy in Linguistics

               Turkic    Tungusic*        Written Mongolian
1P sing.:
 
nominative      ban      bi               bi
oblique stem    man-     min-             min-
2P sing.:
nominative      san      chi    (<*ti)    si
oblique stem    san-     chiin- (<*tin)   sin-
(e.g. Evenki and Manchu)

The Altaic argument is one of the biggest controversies in current linguistics. It is said that Linguistics has decided that Altaic does not exist. Actually, the field has not decided that at all. The consensus in the field is that Altaic is still an open question. In other words, they are fighting about it. The field is split up into Pro-Altaicists and Anti-Altaicists. It’s not true that the field has decided in favor of the Anti-Altaicists. The Antis say that there is no such thing as Altaic. The Pros said that Altaic exists, and here is the evidence. The consensus instead rejects both positions and says we don’t know if Altaic exists or not. There is a big difference between we don’t know if it exists (maybe it does and maybe it doesn’t) and it doesn’t exist. One statement is uncertainty and the other statement is negative. According to Anti-Ataicists, every time a human can’t make up their mind about something yes or no, they actually are saying no. No they’re not! They’re not saying yes or no. They are rejecting both positions and saying instead that they are undecided. What the Anti-Altaicists are doing is akin to saying everyone who answers undecided on a political candidate poll is actually saying that want to vote against the person! The entire basis of political polling would change. The Anti-Altaicists are typically quite vicious, while the other side is not. The safe position is Anti-Altaicism, so a lot of wimpy linguists too scared to stand up and fight have sought refuge in the negative position. Furthermore, Linguistics is like an 8th grade playground. Some positions are openly ridiculed. Pro-Altaicism is openly ridiculed, and taking that position is seen as prima facie evidence that a linguist is a crank, an idiot or a fool. I would imagine that if you told a hiring committee that you believed in Altaic, it would be harder to get hired than if you took the negative stand. And I could imagine that being pro-Altaic might keep you from getting tenure. Not only are the Antis vicious (all of them are vicious, bar none), but many of them are complete idiots and fools, as seen above in the preposterous conflation of uncertain opinions with negative opinions above. The fools on Bad Linguistics Reddit are evidence of this. They all hate Altaic because they are wimps who are too afraid of a fight, so they take a safe position. They bashed me for saying Altaic was real, saying it was evidence of what a kook and crank I am, when in fact, Altaic exists is a completely acceptable position to take. Many famous linguists have supported Altaic in the past, and a number of top linguists currently support it. Anti-Altaic papers are often vicious from an academic paper standpoint. In academic papers, you are supposed to be restrained and keep your strong opinions to yourself. Not so with anti-Altaicists. They are over the top insulting and ridiculing towards Altaicists. Altaicists have accumulated quite a bit of evidence in support of their position. The pronouns above prove Altaic for me. All I have to do is look at those pronoun sets (and there are other pronouns that also line up precisely like above) and I know it’s real. This is what Joseph Greenberg means when he says that proving whether language families exist and reconstructing proto-languages are two different things. You figure out a language family by simple inspection. Greenberg uses the mass comparison method, and it has worked very well for him for African languages. His Amerindian languages proposals have not been well accepted, but it’s clear that there is a large family called Amerind. There is 1st person m and second person n all through the family, occurring ~450 times. Personal pronouns are rarely borrowed, and entire personal pronoun sets are almost never borrowed (Piraha did borrow all of its pronouns, but Piraha is bizarre in many ways). Joanna Nichols, a current spokesperson for the conservative Linguistics Establishment as good as any other (and a fine linguist to boot) states that the current consensus is that there is no such thing as Amerind and that those 450 similar pronouns are all cases of borrowing. Wow! Personal pronoun sets (not just one pronoun but an entire paradigm) were borrowed 450 times in the Americas! That’s one of the most idiotic statements that one could make, but this is the current consensus of linguistic “science.” Dumb or what? A much better position would be to say that Amerind is uncertain (maybe it exists, maybe it doesn’t), as the negative position is preposterous and idiotic right on its face. Nichols has also stated that all of the Altaic pronouns were borrowed. That’s even more idiotic because unlike in the Americas, entire large pronoun paradigms exist in Altaic where they do not exist in Amerind. Paradigms, especially pronoun paradigms, are almost never borrowed, and paradigm evidence is considered excellent evidence of genetic relationship. English good, better, best is the same paradigm as German gut, besser, besten. That’s an odd way to set up comparatives, and the fact that that comparative set lines up perfectly is what is known as a paradigm. That one paradigm right there ought to be enough to prove the relatedness of English and German, even leaving out all other massive evidence for relatedness. Greenberg says that after you decide that languages form a family, then you set about using the comparative method of reconstructing proto-languages, finding sound correspondences and whatnot. The current conservative or reactionary position of the field is that first you reconstruct the proto-languages and then and only then can you prove a language family. That’s absurd. They’re in effect doing everything ass backwards. Incidentally, long ago Edward Sapir agreed with Greenberg that language families were proven first by inspection and only later did reconstruction take place. Sapir also came up with the Amerind hypothesis decades before Greenberg. Sapir is quoted as saying:

Getting down to brass tacks, how are you going to prove Amerind 1st person m and second person n other than genetic relatedness? – Edward Sapir, 1917?

Who was Edward Sapir? Only one of the greatest linguists in history. I can look right there at that pronoun paradigm set and tell you flat out that those three language families are related. It’s not possible that all of those languages borrowed all of those pronouns. It didn’t happen. It didn’t happen because it couldn’t happen. It’s beyond the realm of statistical probability. A statement that is outside the realm of statistical probability is considered to be for all intents and purposes nonfactual. Ask anyone Statistics major. Not only has Proto-Altaic been reconstructed at least in a tentative and initial form, but there are regular sound correspondences running through all of the comparative lexicon of the three proto-languages: Proto-Turkic, Proto-Tungusic and Proto-Mongolian. Regular sound correspondences are another thing we look for. It would mean that every time you have VlV in Language A, you have VnV in Language B (V = vowel). We then say that Language A l -> Language B n. Regular sound correspondences are considered to be excellent evidence of genetic relatedness. In fact, an entire etymological dictionary of Altaic has been produced, reconstructing a lot of Proto-Altaic lexicon along with the cognates in the daughter languages. This dictionary runs to over 1,000 pages, and it is a true work of art in the social sciences. The entire etymological dictionary has been rejected out of hand by the Anti-Altaicists. However, they have not directly attacked or tried to prove many of the etymologies wrong. They simply looked at it, said it’s junk, laughed at it and ridiculed it, and moved on. This conservative or even reactionary mood has been the norm in Historic Linguistics for decades now. The field has become very stick in the mud about this. However, in much of the rest of Linguistics, especially Sociolinguistics, Language Acquisition, and Applied Linguistics, the field has reached consensus on many a silly thing that makes little to no sense at all other than that it sounds very Politically Correct. Linguistics being a social science, PC and SJW Cultural Left culture has infected the field in an awful way. You must understand that Cultural Left views did not just appear in a few select social sciences. Instead this ideology swept through the entire social sciences, sparing not a one. In terms of a March Through the Institutions for this ideology, it was akin to a rapid hostile takeover. Cultural Left and SJW views are now mandatory in Linguistics. If you refuse to go along, you will not get hired or get tenured. If your reputation is too bad, you may not be able to publish in academic journals or books. Alas, my field has been poisoned with this Cultural Left toxin or venom like all the rest of them!

Possible Origin of the Black Plague

Here. The standard view is that twelve ships from Florence docked at Messina in 1347, bringing the Plague to Europe. It would later kill 1/3 of all Europeans and an incredible 2 A new view though is that the Plague, which had already been active in Asia for a while, came to Europe via a biological warfare attack by Genghis Khan’s raiders on the city of Caffa in the Crimea. The Caffans were probably Turkic speakers at this time, but it is hard to say what Turkic lect they may have spoken. Perhaps a dead language called Cuman. Khan’s raiders besieged the city and a number of people died of the Black Plague in the conflict. Khan’s men suspected a thing or two about biological warfare, so they loaded up the bodies that had died of the plague and catapulted them over the walls of the city into the population. Can you  imagine the horror of looking out your window and see a dead, bubonic plague ridden corpse fly by in the air at rapid speed to splatter nearby. Good Lord. In due time, this biological warfare killed a lot of the people in  the city. Khan knew nothing of the  germ theory of disease, but experience with the plague showed that those who came in contact with victims tended to sicken and die. No one knew what was causing it. One European physician posited that plague victims radiated some sort of death vapors or essence out of their very eyes. Without medical science, people had to fall back on spiritual theories. But people caught on quickly that being around plague victims could quickly make you a victim yourself. Physicians refused to treat plague patients and patients were often abandoned wherever they sickened. Family members even fled from their own sickened members, leaving them to die in the home while countless people fled to the countryside. But even there they were not safe. Even farm animals, cows, pigs, goats and sheep, caught the plague. So many sheep died that there was an acute wool shortage all over Europe for years afterwards. There was no solace or respite anywhere. The epidemic ended almost as fast as it began in 1354, but Europe was ruined. Entire cities had been abandoned as thousands of residents fled to the false safety of the countryside. Many people escaped from Khan”s raid on Caffa, and survivors fled all over the Mediterranean. This people soon sickened and died. It was possibly from some of this group, fled to Florence, that the ill-fated death ships docked in Messina on that warm October night. The disease was in Southern France the next year and Germany soon after that. Not long afterwards, it hit Paris. And despite the primitive conditions of the day, it was not long in  Paris before London was also hit. People did have ships in those days you know. Despite the enticing new theory, the medical journal concludes that the entrance of the Plague to Europe was multifactorial and the infection of the Caffa population did not play an important role in the European pandemic.

How I Determined Intelligibility For Turkic Lects

Steve: This is amazing. Well done. But how can you possibly know the degree of mutual intelligibility between two languages you don’t speak or know if something is a language or dialect when you don’t speak it? That seems strange. How is it worked out?

Linguists don’t speak all these languages we study. We just study languages, we don’t necessarily speak them. This is confused with the archaic use of the word linguist to mean polyglot. Honestly, many linguists do in fact speak more than one language, and quite a few of them have a pretty good knowledge of at least some of the languages that they study. But my mentor speaks only Turkish and English though he studies all Turkic languages. I don’t believe he has ever learned to speak any Turkic lect other than Turkish. In reference to my paper here. We are not looking for raw numbers. We just want to know if they can understand each other or not. A lot of it is from talking to native speakers and also there was a lot of reading papers by other linguists. I also talked to other linguists a lot. Linguists typically simply state if two lects are intelligible or not. Also there is a basic idea among linguists of what the boundary is between a language and a dialect, and I used this knowledge a lot. Can they understand each other? Yes or no. That’s pretty much about it. Also at some degree of structural difference, we can see the difference between a language and a dialect. It’s a judgement call, but linguists are pretty good at this. There is a subsection of very loud linguists, mostly on the Internet, who like to screech a lot about this question cannot be answered by answered because of this or that red herring or some odd conundrums that work their way in. The thing is if you ask around enough, you will be able to get around all of the conundrums and you should be able to eventually reconcile all of the divergent responses to get some sort of a holistic or “big picture.” You finally “figure it out.” The answer to the question comes to you in a sort of a “seeing the answer as part of a larger picture” sort of thing. The worst red herring is this notion that speakers from Group A will lie and say they do not understand speakers of Group B simply because they hate them so much. If this was such a concern, you would have think I would have run into it at some point. A much worse problem were ethnic nationalists who lie and say that they can understand neighboring tongues when they can’t. The toxin called Pan-Turkism or Turkish ultranationalism comes into play here. It is almost normal for Turks to believe that there is only one Turkic languages, and it is called Turkish. All of the rest of the languages simply do not exist and are dialects of Turkish. I had to deal with regular attacks by extremely aggressive Ataturkists who insisted that any Turk could easily understand any other Turkic language. Actually my adviser told me that my piece would not be popular with the Pan-Turkics at all. I don’t really care as I consider them to be pond scum. Granted, some of it was quite controversial and I got variable reports on intelligibility for some lects like Siberian Tatar vs. Tatar, the Altai languages, Kazakh vs. Kirghiz, Crimean Tatar vs. Turkish. Where native speakers differ on such questions, often vociferously, you simply ask enough of them, talk to some experts and try to get a feel for that what best answer to the question is. Some cases like Gagauz vs. Turkish probably need raw intelligibility testing. That’s the only one that is up in the air right now, but it is up in the air because the lects are so close. Intelligibility between Gagauz and Turkish is somewhere between  70-10 It is also starting to look like Nogay is a simply a dialect of Kazakh instead of a separate language, but that might be a hard sell. Some of these are seen as separate languages simply because they are spoken by different ethnies who do not want to be seen as part of the same group. Also they have different literary norms. Karapalkak is just a Kazakh dialect, but the speakers want to say they speak a separate language. Same with Bashkir, which is simply a dialect of Tatar. The case of Kazakh and Kirghiz is more controversial, but even here, we seem to be dealing with one language, yet the two dialects are spoken by different ethnies that have actually differentiated into two separate states, each with their own literary norm. Kazakhs wish to say they speak a language c called Kazakh and Kirghiz wish to say they speak a language called Kirghiz although they are probably really just one language. We see a similar thing with Czech and Slovak. My recent research has proven that Czech and Slovak are actually a single language. But the dialects are spoken by different ethnic groups who claim different cultures and histories and they have actually divided into two different states, and each has its own literary norm. It is here, where dialects become languages not via science by via politics, culture, history and sociology, that Weinrich’s famous dictum that “a language is a dialect with an army and a navy” comes into play. Scientifically, these are all simply dialects of a single tongue but we call them languages for sociological, cultural and political reasons.

A Few Words on Language Endangerment

Carlos Lam: Congrats! However, isn’t language death a rather standard occurrence among societies?

It is, but we linguists don’t really like it. It is quite a debate going on, but the bottom line seems to be that ethnic groups and speaker groups have the right to ownership of their languages. We worry that a lot of speaker groups are being pressured into blowing up their languages prematurely. We like to study these languages and we are not real happy about seeing them vanish into the horizon. On the other hand, is cultural death a natural thing too? Both cultural death and language death are occurring at rates far beyond the normal background rates. English and some of the other major languages are like weapons of mass destruction in taking out languages. You really want a world with one language and one culture? I don’t. The best position seems to be that speakers have the right to decide the fate of their languages. If speakers wish to continue speaking their languages, then governments and linguists should help them to preserve and continue to develop their languages. Quite a few groups do not seem to care that their languages are going are extinct or they are even driving or drove their languages extinct, and they have the full right to do so. In these cases, we will simply do salvage linguistics. There are many salvage linguistics projects going on in the world today. You won’t get very far with linguists arguing that language death is a good thing. Most people don’t think so. Occurring at the same time as language death is a lot of language revitalization. Even fully dead languages are being resurrected from the grave. Also in addition to language death, we are creating new languages all the time. In this piece, I created a total of net 13 new languages. And new languages are occurring on their own. To give you an example. A group of Crimean Tatars moved from Crimea to Turkey about 200 years ago in the course of the Crimean War. They have been speaking Crimean Tatar in Turkey ever since, for 200 years now. But in that time, Crimean Tatar in Turkey and Crimean Tatar in Ukraine has diverged so much that Turkish Crimean Tatar is now, in my opinion, a fully separate tongue from the Ukrainian language. This is because in Turkey, a lot of Turkish has gone into Turkish Crimean Tatar which is not well understand in the Ukraine. And in the Ukraine, a lot of Russian has gone in which is not well understood in Turkey. Hence, Crimean Tatar speakers in Turkey and Ukraine can no longer understand each other well. To give you another example, there are many Kazakh speakers in China. However, Kazakh speakers in China can no longer understand Standard Kazakh broadcasts from Kazakhstan because so many Russian loans have gone into Standard Kazakh that it is no longer intelligible with Chinese Kazakh speakers. I learned this too late for my paper, otherwise I would have split Chinese Kazakh off as a separate language. There are many cases like this. Further, many languages are being discovered. Sonqori, Western Khalaj, Todzhin, Duha, Dukha and Siberian Tatar are just a few of the new languages that I created. Khorosani Turkic was split into three different languages. Dayi was subsumed into one of the Khorosani Turkic languages. Altai was split from one into five separate languages, but the truth is that it is six languages, not five. Salar was split into Western Salara and Eastern Salar. Ili Turki was eliminated becuase it does not even exist. It is simply a form of Uighur. Kabardian and Balkar, Tatar and Bashkir, Kazakh and Kirghiz were some languages that were eliminated and subsumed into single tongues such as Tatar-Bashkir, Kazakh-Kirghiz, and Kabardian-Balkar. And on and on. Languages and of course dialects are dying all the time, but new languages are being created by humans and by linguists as we continue our splitting projects. Many lects referred to as dialects are more properly seen as separate languages. Chinese is at least 450 separate languages, only 14 of which are recognized. German may be up to 130 separate languages, only 20 of which are recognized. There are quite a few more languages to be created out there, but there is a lot of resistance to splitters like me from more conservative linguists and especially from linguistic nationalists. For while Chinese may well be over 1,000 languages, the Chinese government is anti-scientifically insistent that there is but one Chinese language and maybe 2,000 “dialects,” most of which are probably separate languages. The German government is quite resistant to the idea that there is more than one form of German, though I believe Bavarian and Swiss German have official status in Austria and Switzerland.

I Am Now a Published Author

Here. You can download my first published work above. I was published for the first time this spring in a book called:

Before the Last Voices Are Gone: Endangered Turkic Languages, Volume 1: Theoretical and General Approaches

This is the first volume of a four volume set called:

The Handbook of Endangered Turkic Languages

The first volume alone runs to 512 pages. Articles are in English, Russian and Turkish, variably. It was published out of the International Turkish-Kazakh University in Istanbul, Turkey and the International Turkic Academy in Astana, Kazakhstan. These are two campuses that are part of one joint Turkey-Kazakhstan shared university. I contributed one chapter that runs from pages 311-384 titled:

Mutual Intelligibility among the Turkic Languages

It’s 83 pages long and has ~100 references. It may have taken me 500 hours to write that chapter. Tell that to my enemies who claim I do not work, ok? When all is said and done, I figure I may make 75 cents an hour on this work. But this is how academic publishing works. There’s just no money in it. It’s all a labor of love. In addition, most work is done by professors who have to publish as part of their professorship (publish or perish), so in effect, their professor salary is covering their publishing. That document had to go through two rather grueling peer reviews. I had to make many changes in it to get it to publication. The second peer review had to get past the top Turkologists in the world today, and I am amazed that I made it through review to be honest. Most people publishing in academic books or journals are academics, professors working at universities. There are only a few of us independent scholars out there (I am an independent scholar because I am not at a university). Also most folks have PhD’s, and I only have a Masters, but there are some folks with Masters publishing academically. In general, this is a rather selective game where everyone is hyperspecializing as is the trend nowadays. Although my mentor at the project calls me a Renaissance Man, I wonder if the autodidact/polymath is an endangered species if not extinct. Everyone has to specialize nowadays. For instance, common knowledge in this particular field would be that the only folks who could publish in Turkology would be linguists with a PhD in Linguistics, preferably with a emphasis in Turkology. Beyond that, they may prefer say 5-10 years publishing in the field of Turkology in addition to a professorship in Turkic linguistics. You can see where this is headed. I am not knocking it. I am just pointing out that microspecialization is the game now. What follows is that since I lack the PhD or professorship or any background at all in Turkology, I should not be allowed to be published in this field, or if by some error I am somehow mispublished, all of my work should be promptly ignored as done by a nonspecialist who could not possibly know what he is talking about. Needless to say, I don’t agree with that, and I carry on tilting at windmills like a good deluded Renaissance Man who never got the memo and wouldn’t read it if he did. The odd thing is that I knew nothing about Turkology until I plunged into this mess. I had written a short piece of mutual intelligibility in Turkic, as MI is one of my pet subjects and put it up on Academia on my scholarly papers site, and a professor in Turkey happened to read it. He wrote to me telling me he agreed with me, he wanted me to expand it into a document, and they would publish it for me. So off I went, down the Turkic rabbit hole. If you study the very high IQ types (140+), they tend to go on “crazes” like this. They also lose interest after a bit, drop the craze and move on to some new craze. Dilettantism for the win. I also have an anxiety disorder called OCD which is well controlled. A good side of it though is that you tend to do dive down rabbit holes a lot, and the OCD makes you burrow maniacally into the rabbit hole with the notion that one is going to become the world’s leading expert on whatever rabbit hole you are digging in now. So for one or two years, I went absolutely berserk into Turkic, whereas before I scarcely knew a thing about it. The end result can be read above. The sad result is that either due to the savant stuff or the mental quirk, I also tend to lose interest in my rabbit holes after a bit. I follow them about halfway to China, make several revolutions around the molten core, and after a year or so, come up for air gasping with incipient Black Lung, and next thing you know, I am bored, and it’s onto a new craze. It’s a bit silly, but we all have our crosses to lug, and as eccentricities go, there are many worse things that dabbling, er hobbyism, er dilettantism, er polymathy, er autodidactism, er Renaissance Manism. Most of you will probably not find this very interesting, as it is pretty specialized stuff that is mostly of interest to people in the specialty, linguists and those interested in the subject. It’s not exactly for the general reader. But if you have any interest in these languages, you might enjoy it. I expanded Turkic from 41 to 53 languages, eliminated some languages, turned some into dialects, turned some dialects into full languages, combined languages into a single tongue, created some new languages out of scratch and did quite a bit of work on the history of the languages. I also reworked the classification a bit because I thought it could be done better. Even though this work does not pay much, the pay is in fame if it is at all. My work will either be accepted by the field or rejected outright or somewhere in between. I have already earned the praises of some of the world’s top Turkologists, much to my surprise. If I get fame, well, I get quoted in papers, maybe invited to conferences, and maybe even referenced in Wikipedia. There are groupies in all status fields, and what the heck, there may even be linguist groupies. If not, there are always starry eyed coeds dreaming of professor types to mentor them. I am already working that angle as it is. Writer Game, Scholar Game, there’s Game for everything. Or my work does not go over and maybe the field decides I do not know what I am talking about. Crap shoot, like most of life’s endeavors. Roll em, and wish upon a star…snake eyes! PS. The title of the series, Before the Last Voices Are Gone, was created by me. I think it has a nice little song.

External Relations of Japanese and Apache

Jason Voorhees: YEE – There is some similarity between the language of an Apache and that of the Japanese for example. Yee: That seems far fetched. My ancestors moved from Central China, but I can’t understand any of their dialect now. Language is easy to lose

Actually this is not correct. Apache does have external relations in the new Yenisien-Na Dene family (already under fierce attack by splitters), and in a larger sense to Chinese but not Japanese. But there is no similarity whatsoever between Japanese and Apache, other than that probably all human languages are related at some distant level. There is no clear or obvious relationship between Japanese (really Japonic) and any other language. Japanese is not one language. It is a group of languages called Japonic. Most of the Japonic languages are spoken the Ryukyu Islands (Okinawa), where there are 5-6 separate languages spoken. These languages still have many speakers, but they are in very bad shape as the Japanese have been waging war on them for some time now. Most of the speakers are middle aged or older and transmission to the young is at a low level. However, it is clear to me that Japanese does have external relations. The most obvious external relation would be with Korean. Even some of the hardest-core anti-Altaicists agree that there is a good chance that Korean and Japanese are related. Looking at the larger picture, Japanese and Korean are both related to Turkic, Tungusic and Mongolic in a superfamily called Altaic. Mainstream linguistics has refused to accept Altaic although the evidence for its existence is striking. The evidence for the existence of Altaic is just as good as the evidence for Austroasiatic,l and that is a universally accepted family. Worse, people who believe in Altaic are attacked and ridiculed mercilessly to the point where if you believe in it,  you might actually have a hard time getting a professorship. Of course, Altaicists are accused of being anti-scientific because “science” has not yet shown that there is any relationship. Adults who think like this are children. Science doesn’t know everything and science is flat out wrong about countless things. That is because many theories are simply true that are presently rejected by science due to so-called lack of evidence. Having to go ask Mommy Science whether everything you encounter in the world is true or not is like what a child does. A child is always running up to Mommy asking is it is true that so and so etc etc. Mommy says yes or no and the kid is satisfied. The are adults who are still tied to their mothers apron strings who never learned to differentiate themselves as mature individuals. Hence they have to run the Mommy Science and ask whether something is true or not instead of sitting down and looking at the evidence and deciding for yourself. Not all things that are true have been accepted by science. If you are going to learn anything in life, it should be that right there. Time to cut the apron strings, babies.

The Basque-Caucasian Hypothesis

I have gotten a lot of crap from my enemies for being on the Academia.edu site in the first place, but really anyone can join. The following was posted by one of the reviewers in an Academia session by one of the leading lights of the Basque-Caucasian theory. As you can see, the mythological and multiple lines of genetic evidence are starting to pile up pretty nicely too. This is neat stuff if you are interested in the Basque-Caucasian link in addition to work going on into the remains of the Neolithic Farmers who were subsumed in the Indo-European waves. It turns out there is quite a bit left in different parts of Europe, especially in terms of Neolithic Farmer mythology. From a discussion among academics and independent scholars on a paper on the Basque-Caucasian Theory in Historical Linguistics during a session in on Academia:

I am not a linguist but interested in the topic as it proposes a linguistic correlation between Caucasic languages and Basque, as it parallels my own current research on reconstructing European Paleolithic mythologies using ethnographic analogies constrained by on archaeogenetics and language macrofamily correlations. Tuite (2006, 2004, 1998, 1997) has pointed out the hunter-gatherer beliefs and myth motifs shared across a ‘macro-Caucasic’ area to the Hindu Kush and into Western Europe. Basque deities Mari, Sugaar, and Ama Lurra and their associated mythologems have striking similarities to the macro-Caucasic hunter mythologies (not found in Finno-Ugric or Middle Eastern ancient mythologies.) I am currently writing a paper identifying many examples of Southern/Western Gravettian art in Italy, Spain, southern France that appear to depict imagery only explicable by analogy to Macro-Caucasic religious myth and ritual. With respect to mtDNA fossil genetics, three skeleton samples are from Paglicci Cave, Italy, ~25 cal BP: one is macro-N-mtDNA (homeland Caucasus/Caspian/Iran; currently highest frequencies Caucasus, Arabia), and two skeletons, RO/HV-mtDNA (homeland northern Middle East; currently highest frequencies, Basque, Syria, Gilaki, Daghestan). During the later Magdalenian another diffusion occurs apparently by a similar route: HV4-mtDNA emerges in Belarus-Ukraine (~14±2 ka) and under Late Glacial Maximum HV4a (~13.5 ka) moves south and splits in the three refugia: southern Italy, southern Russia (HV4a1, ~10 ka), the Middle East (HV4a2, ~9 ka), and Basque area (HV4a1a, ~5 ka, suggesting full emergence of distinct Basque culture and language), (Gómez-Carballa, Olivieri et al 2012). These studies further support the existence of a Macro-Basque-Caucasic mythological stratum as well as shared language substrate.

The cutting-edge liberal theory is that Basque (and some other odd far-flung languages) is part of the Caucasian language family. In other words, at one time, the Basques and the peoples of the Caucasus like Chechens were all one people. What this probably represents is the ancient Neolithic farmers who covered Europe before the Indo-European invasion replaced almost all of the languages of Europe. All that is left is Basque and the peoples of the Caucasus. Everything in between got taken by IE except for some late movements by Uralic and Turkic speakers. Up in the north, the Lapp Uralic speakers are, like Basques, the last remains of the Neolithic farmers. The Sardinians also an ancient remaining group of these people, but their language has been surmounted recently by a Latinate tongue. As it turns out, the Basques and Caucasians also share a number of cultural similarities. There are also some similar placenames. And there is some good genetic evidence connecting the Basques with the Caucasian speakers. It’s all there, but the conservatives are balking, to put it mildly, about linking Basque with the Caucasian languages. I have long believed in this theory. I read a book over 20 years ago comparing Basque to the Caucasian languages and a few other distant tongues and thought the case was proved even via overkill by the book. And recent work is so super that one wonders why the conservatives are still winning. I feel that the link between Basque and the Caucasus languages is now proven to an obvious and detailed degree.

The Whites of East Asia

Ultra Cool writes:

There was a White tribe in China called Yuezhi, I think.

Turks. Almost Proto-Turkics. I think their descendants today would be best described as the Uighur people, who are ~1/2 White and 1/2 East Asian. However, a number of Uighur people, especially the women, look quite Caucasian. So I suppose these would be the farthest east of the Caucasians. I have an 80 page paper on Turkic languages that is in line to be published in a book whenever they get around to publishing it. I believe that I discuss the Yuehzi in there, and if I am not mistaken, they were precursors of the the Uighurs or even better yet the Tocharians. If you want a truly White tribe in East Asia, the Tocharians would be your best bet. They have Tocharian mummies that have blue and green eyes and blond hair. They were found in China! The Yuezhi were around ~2,000 YBP I believe. Most of the references we have to groups like that are from the Chinese. The Chinese were very helpful in that they developed a writing system early. As a comparison, the earliest written Turkic we can find is the Orkhon Inscriptions (also very near China) which are these hard-to-decipher runic-type characters inscribed on stone pillars. I believe they have deciphered these inscriptions. So our attested Turkic only goes back to ~400 AD. Mongolic is even worse with earliest transcriptions ~1400 with Middle Mongolian. Tungusic is catastrophic with nothing at all written down other than transcriptions of the languages from early Russian settlers. The Yukaghir have some odd Orkhon like inscriptions, but they are not Altaic. They are said speak an isolated language, but I think Yukaghir is related to Uralic. With the lack of early attestations, you can see why Altaic is so hard to reconstruct and prove.

A Look at the Georgian Language

This post will look at the Georgian language in terms of how hard it would be for an English speaker to learn it. Suffice to say that Georgian is probably one of the most complicated languages in the world, and that it would be quite difficult for an English speaker to learn this language.

Method and Conclusion. See here.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

Ratings: Languages are rated 1-6, easiest to hardest. 1 = easiest, 2 = moderately easy to average, 3 = average to moderately difficult, 4 = very difficult, 5 = extremely difficult, 6 = most difficult of all. Ratings are impressionistic.

Time needed. Time needed for an English language speaker to learn the language “reasonably well”: Level 1 languages = 3 months-1 year. Level 2 languages = 6 months-1 year. Level 3 languages = 1-2 years. Level 4 languages = 2 years. Level 5 languages = 3-4 years, but some may take longer. Level 6 languages = more than 4 years.

Kartvelian Karto-Zan

One problem with Georgian is the strange alphabet: ქართულია ერთ ერთი რთული ენა. It also has lots of glottal stops that are hard for many foreigners to speak; consonant clusters can be huge – up to eight consonants stuck together (CCCCCCCCVC)- and many consonant sounds are strange. In addition, there are uvulars and ejectives. Georgian is one of the hardest languages on Earth to pronounce. It regularly makes it onto craziest phonologies lists.

Its grammar is exceedingly complex. Georgian is both highly agglutinative and highly irregular, which is the worst of two worlds. Other agglutinative languages such as Turkish and Finnish at least have the benefit of being highly regular. The verbs in particular seem nearly random with no pattern to them at all. The system of argument and tense marking on the verb is exceedingly complex, with tense, aspect, mood on the verb, person and number marking for the subject, and direct and indirect objects.

Although it is an ergative language, the ergative (or active-stative case marking as it is called) oddly enough is only used in the aorist and perfect tenses where the agent in the sentence receives a different case, while the aorist also masquerades as imperative. In the present, there is standard nominative-accusative marking. A single verb can have up to 12 different parts, similar to Polish, and there are six cases and six tenses.

Georgian also features something called polypersonal agreement, a highly complex type of morphological feature that is often associated with polysynthetic languages and to a lesser extent with ergativity.

In a polypersonal language, the verb has agreement morphemes attached to it dealing with one or more of the verbs arguments (usually up to four arguments). In a non polypersonal language like English, the verb either shows no agreement or agrees with only one of its arguments, usually the subject. Whereas in a polypersonal language, the verb agrees with one or more of the subject, the direct object, the indirect object, the beneficiary of the verb, etc. The polypersonal marking may be obligatory or optional.

In Georgian, the polypersonal morphemes appear as either suffixes or prefixes, depending on the verb class and the person, number, aspect and tense of the verb. The affixes also modify each other phonologically when they are next to each other. In the Georgian system, the polypersonal affixes convey subject, direct object, indirect object, genitive, locative and causative meanings.

g-mal-av-en   = “they hide you” g-i-mal-av-en = “they hide it from you”

mal “to hide” is the verb, and the other four forms are polypersonal affixes.

In the case below,

xelebi ga-m-i-tsiv-d-a = “My hands got cold”.

xelebi means “hands”. The m marker indicates genitive or “my”. With intransitive verbs, Georgian often omits my before the subject and instead puts the genitive onto the verb to indicate possession.

Georgian verbs of motion focus on deixis, whether the goal of the motion is towards the speaker or the hearer. You use a particle to signify who the motion is heading towards. If it heading towards neither of you, you use no deixis marker. You specify the path taken to reach the goal through the use or prefixes called preverbs, similar to “verbal case.” These come after the deixis marker:

up                     a-
out                    ga-
in                      sha-
down into         cha-
across/through garda-
thither               mi-
away                 c’a-
or down            da-

Hence:

“up towards me” = amo-. The deixis marker is mo- and “up” is a-

On the plus side, Georgian has borrowed a great deal of Latinate foreign vocabulary, so that will help anyone coming from a Latinate or Latinate-heavy language background.

Georgian is rated 5, extremely difficult.

The Roots of the Alphabet(s)

Probably most of you do not know that we are all using a variant of the ancient Phoenician alphabet. Actually I am not sure if that is precisely true, as I think the Phoenician alphabet was preceded by an Assyrian one. But at any rate, our classic Western alphabets all came out of the Levant and Mesopotamia in some way or other. Indeed, it is even theorized that many of the syllabaries in use in Central, South and Southeast Asia are also rooted in this original alphabet from the Levant.

Of course, Chinese and consequently Korean and Japanese alphabets have another origin.

One might wish to throw the odd SE Asian orthographies such as Thai, Lao, Burmese, Vietnamese, Javanese, Sundanese and Khmer there, but my understanding is that all of those SE Asian orthographies were actually derived from syllabaries originally designed in India.

A few writing systems such as Georgian, Armenian and Cree may have been created de novo, but I might have to look that up. The only non-Middle Eastern derived orthography that immediately comes to my mind is the Chinese ideographs.

The origins of the Assyrian/Phoenician alphabet appear to have been ultimately in Egyptian hieroglyphics. So the ancient Egyptians really started it all when it comes to writing down words, at least for the West.

Chinese ideographs may date from even earlier. Chinese bone writing goes way back.

Very early European writing such as runic systems and similar systems in Asia such as the Turkic Orkhon inscriptions may not be related to the Phoenician system at all. The Yukaghir in Siberia and the Yi in South China may also have designed de novo systems.

Abstract of an Upcoming Publication of Mine

The following is an abstract of a long paper that will be published in one of three or four books of the series The Handbook of Endangered Turkic Languages which will be published in late September by the Turkish-Kazakh Joint University in Ankara, Turkey. The article is 88 pages along and is one of the most important articles in the series. I will also be the official English editor for all of the English articles in the series which total ~500 pages.

Mutual Intelligibility Among the Turkic Languages

By Robert Lindsay

Abstract: The Turkic family of languages with all important related dialects was analyzed on the basis of mutual intelligibility, with the following goals: (1) To determine the extent to which various Turkic lects can understand each other. (2) To ascertain whether various Turkic lects are better characterized as full languages in the own right in need of ISO codes from SIL or rather as dialects of another language. (3) The history of various Turkic lects was analyzed in an attempt to write a proper history of the important lects. (4) An attempt was made at classifying the Turkic languages in terms of subfamilies, sub-sub families, etc. The results were: (1) Rough intelligibility figures for various Turkic lects, related lects and Turkish itself were determined. Surprisingly, it was not difficult to arrive at these rough estimates. (2) The Turkic family was expanded from Ethnologue‘s 41 languages to 53 languages. (3) Full and detailed histories for many Turkic lects were written up in a coherent, easy to understand way, a task sorely needed in Turkic as histories of Turkic lects are often confused, inaccurate, controversial, and incomplete. (4) A new classification of Turkic is proposed that rejects and rewrites some of the better-known classifications.

A Look at the Turkish Language

From here. A look at the Turkish language from the point of view of an English speaker trying to learn the language. Turkish is not a difficult language to learn, but it is not exactly simple either, and the agglutinative structure is very different from Indo-European. Turkish is often considered to be hard to learn, and it’s rated one of the hardest in surveys of language teachers, however, it’s probably easier than its reputation made it out to be. It is agglutinative, so you can have one long word where in English you might have a sentence of shorter words. One word is Çekoslovakyalilastiramadiklarimizdanmisiniz? Were you one of those people whom we could not turn into a Czechoslovakian? Many words have more than one meaning. However, the agglutination is very regular in that each particle of meaning has its own morpheme and falls into an exact place in the word. See here:

göz            eye
göz-lük        glasses
göz-lük-çü     optician
göz-lük-çü-lük the business of an optician

Nevertheless, agglutination means that you can always create new words or add new parts to words, and for this reason even a lot of Turkish adults have problems with their language. Turkish is an imagery-heavy language, and if you try to translate straight from a dictionary, it often won’t make sense. However, the suffixation in Turkish, along with the vowel harmony, are both precise. Nevertheless, many words have irregular vowel harmony. The rules for making plurals are very regular, with no exceptions (the only exceptions are in foreign loans). In Turkish, incredible as it sounds, you can make a plural out of anything, even a word like what, who or blood. However, there is some irregularity in the strengthening of adjectives, and the forms are not predictable and must be memorized. Turkish is a language of precision in other ways. For instance, there are eight different forms of subjunctive mood that describe various degrees of uncertainty that one has about what one is talking about. This relates to the evidentiality discussed under Tuyuca above, and Turkish has an evidential form similar to Tamil and Bulgarian. On Turkish news, verbs are generally marked with miş, which means that the announcer believes it to be true though he has not seen it firsthand. The Roman alphabet and almost mathematically precise grammar really help out. Turkish lacks gender and there are almost no irregular verbs.  However, this is controversial, and it depends on how you define grammatical irregularity. There is strangeness in some of the verb paradigms, but it is argued that these oddities are rule-based. The aorist tense is said to have irregularity. Nevertheless, weighing against the verbal regularity would be the large number of verbal forms. There is some irregular morphophonology, but not much. The oblique relative clauses have complex morphosyntax. Turkish has two completely different ways of making relative clauses, one of which may have been borrowed from Persian. There are many gerunds for verbs, and these have many different uses. At the end of the day, Turkish grammar is not as regular or as simple as it is made out to be. Words are pronounced nearly the same as they are written. A suggestion that Turkish may be easier to learn that many think is the research that shows that Turkish children learn attain basic grammatical mastery of Turkish at age 2-3, as compared to 4-5 for German and 12 for Arabic. The research was conducted in Germany in 2005. In addition, Turkish has a phonetic orthography. However, Turkish is hard for an English speaker to learn for a variety of reasons. It is agglutinative like Japanese, and all agglutinative languages are difficult for English speakers to learn. As in Japanese, you start your Turkish sentence the way you would end your English sentence. Turkish vowels are unusual to speakers of English (ö and ü are not in English), and Turkish learners say the vowels are hard to make or even tell apart from one another. Turkish is rated 4, very hard to learn.

Evidence That Some Languages are Harder to Learn Than Others

From here and here. The standard view in Linguistics is that there are no easy or hard languages for either children L1 learners or older and adult L2 learners. It is also said that all languages are equally complex and no language is more simple or more complex than any other. On its face, this seems preposterous, especially for L2 learners. Linguists say that it all depends on what L1 you are coming from. There are anecdotal reports that Navajo children have a hard time learning Navajo as compared to children learning other languages, but Navajo kids definitely learn the language. Reportedly, Nambikwara children do not pick up the language fully until age 10 or so, one of the latest recorded ages for full competence. Nambikwara is sometimes said to be the hardest language on Earth to learn, but it has some competition. Adding weight to the commonly held belief that Arabic is hard to learn is research done in Germany in 2005 which showed that Turkish children learn their language at age 2-3, German children at age 4-5, but Arabic kids did not get Arabic until age 12. This implies that from easiest to hardest, it is Turkish -> German -> Arabic. Italian is still easier to learn than French, for evidence see the research that shows Italian children learning to write Italian properly by age 6, 6-7 years ahead of French children. So at least in terms of writing, it is much easier to learn to write Italian than it is to learn to write French. Careful studies have shown that English-speaking children take longer to read than children speaking other languages (Finnish, Greek and various Romance and other Germanic languages) due to the difficulty of the spelling system. Romance languages were easier to read than Germanic ones. So in terms of learning to read, from easiest to hardest, it would be Romance languages -> Finnish/Greek -> Germanic languages except English -> English. Suggesting that Danish may be harder to learn than Swedish or Norwegian, it’s said that Danish children speak later than Swedish or Norwegian children. One study comparing Danish children to Croatian tots found that the Croat children had learned over twice as many words by 15 months as the Danes. According to the study:

The University of Southern Denmark study shows that at 15 months, the average Danish toddler has mastered just 80 words, whereas a Croatian tot of the same age has a vocabulary of up to 200 terms. […] According to the study, the primary reason Danish children lag behind in language comprehension is because single words are difficult to extract from Danish’s slurring together of words in sentences. Danish is also one of the languages with the most vowel sounds, which leads to a ‘mushier’ pronunciation of words in everyday conversation.

Therefore, Danish is harder to learn to speak than Croatian, Norwegian or Swedish. From easiest to hardest to learn to speak, it is Norwegian/Swedish -> Danish and Croatian -> Danish. Russian is harder to learn than English. We know this because Russian children take longer to learn their language than English speaking children do. The reason given was that Russian words tended to be longer, but there may be other reasons. So from easier to harder to speak, it is Russian -> English. It is said English-speaking children reach full adult competency in the language (reading, writing, speaking, spelling) at age 12. Polish children do not reach this milestone until age 16. So from easier to harder, it would be Russian -> Polish -> English. If you think this website is valuable to you, please consider a contribution to support the continuation of the site. Donations are the only thing that keep the site operating.

Mutual Intelligibility Among the Turkic Languages

Turkic is a large family of about 40 languages stretching from Turkey all the way to China. Most of the languages are pretty close, and it’s often been said that they are all mutually intelligible, and that you can go from Turkey all the way to the Yakut region of Siberia and be understood the whole way.

This is certainly not the case, although there is something to it. That is because the languages, while generally not above 9

The truth is that mutual intelligibility in Turkic is much less than proclaimed.

Azeri is spoken in Azerbaijan. Turkish and Azeri are often said to be completely mutually intelligible, but this is not true, though the situation is interesting. The two are not mutually intelligible. The far eastern dialects of Turkish are closer to Azeri than to Turkish. Turkish has an average of 6

Intelligibility is increasing now now due to increased contact. Nowadays due to exposure to Turkish TV, most Azeri speakers can speak Turkish well, and due to exposure to Azeri TV, Turks understand a lot more Azeri than they used to.

Kazakh and Kirghiz are also close, enough to be one language, with intelligibility over 9

Tatar and Bashkir are even closer than Kazakh and Kirghiz and they are best seen as a single language, with intelligibility of over 9

Uzbek and Uyghur are fairly close, but they are still probably only 65-7

Uzbek and Kazakh are not mutually intelligible, but there is an intelligible dialect between them.

Tofa and Tuvan are not mutually intelligible, but there are intelligible dialects linking them. Both are spoken in Russia in the same region as Altai below.

The truth is that Altai and Uzbek are not even intelligible within themselves.

Altai is spoken in the Altai region of Russia where China, Russia and Mongolia all come together. Altai is split into North Altai and South Altai, separate languages.

Uzbek is split into North Uzbek and South Uzbek, separate languages.

Azeri is split into North Azeri and South Azeri, although the two are mutually intelligible, there are large differences in phonology, morphology, syntax and loan words. Nevertheless, they are very mutually intelligible, with intelligibility at 9

The Oghuz languages are said to be fully mutually intelligible, but that’s not really the case. The question of the intelligibility of Turkmen with Azeri and Turkish is controversial, as some sources say that they are mostly mutually intelligible. Intelligibility testing is warranted.

Turkish has uncertain intelligibility with Crimean Tatar. Crimean Tatar speakers say that Turks cannot understand their language (Dokuzlar 2010). However, Turkish speakers say that Turks and Crimean Tatar speakers can converse without too many problems. However, while mutual intelligibility is high, it is probably under 7

Turkish has high, but not full, intelligiblity of Karaim. Turkish intelligibility of Karaim may be 65-7

The intelligibility of Turkish with South Azeri may be quite high, on the order of 9

The intelligibility of Turkish and Khorasani Turkic is probably around 4

Practically speaking, Turkish has low intelligibility with Kazakh (Kipchak Branch), Uyghur and Uzbek (Uyghuric branch) and Khakas (Siberian branch). Turkish-Kazakh intelligibility is surely less than 4

Turkic has effectively

The intelligibility of Turkish with the Central Asian Turkic languages like Uzbek, Kazakh, Kyrghyz and Turkmen is much exaggerated.

Speakers of these languages who went to study in Turkey said they had problems with the Turkish language. It’s true that Turkish TV is not much watched in the Central Asian Turkic nations, but the main reason for that is that Central Asian Turkic speakers can’t understand it. They can’t even understand the simplified Turkish used in these broadcasts. After the fall of the USSR, people from these new nations visited Turkey, but they had to bring interpreters with them to communicate.

In truth, the whole notion of the mutual intelligibility of all Turkish is a pan-Turkic conceit. Pan-Turkism is a noxious form of ultranationalism headquartered in Turkey. It says that all speakers of Turkic languages are part of a Greater Turkey and often uses ominous irredentist language implying that Turkey is going to conquer all the Turkic lands and take them back.

The Pan-Turkics have a snide attitude towards other Turkic speakers, insisting that they all speak dialects of Turkish and not separate languages. This snideness is resented by speakers of other Turkic tongues.

A number of Turkic languages are nothing more than dialects and not full languages.

Ukrainian Urum is a dialect of Crimean Tatar, and Georgian Urum is a dialect of Turkish. Ukrainian Urum is spoken in SE Ukraine, and Crimean Tatar is spoken on the Crimean Peninsula.

Salchuq is an Azeri dialect. It is spoken in Iran.

However, Qashqai, also spoken in Iran, often thought to be an Azeri dialect, is in fact a separate but closely related language with 75-8

Gagauz has high intelligibility with Turkish. However, Bulgarians say that when Turks visit the Balkan Gaguaz communities in Bulgaria, the two groups have a hard time understanding each other. SIL says that not only Gagauz but also Balkan Gagauz Turkish are separate languages, but one wonders what criteria they are using to split them. The Gagauz are Christians living in Moldavia who strangely enough speak a Turkish language with many Christian Slavic loanwords. The Balkan Gagauz Turks live in Bulgaria, far west Turkey, Greece and Macedonia, but most of them live in Bulgaria.

Kumyk is said to be Karakalpak is so close to Kazakh, with 9

Chulym and Shor are often thought to be dialects of a single language. Not only is this not true, but Shor itself is two separate languages – Mrass Shor and Kondoma Shor – and Chulym is also two separate languages – Lower Chulym and Chulym. Chulym and Shor are spoken north of the Altai Mountains in the Ob River Basin near the city of Novokuznetsk.

Further research regarding the intelligibility of these languages is indicated.

References

Uygar Dokuzlar, Crimean Tatar speaker. April 2010. Personal communication.

If you think this website is valuable to you, please consider a contribution to support the continuation of the site.

More On The Hardest Languages To Learn – Non-Indo-European Languages

Caution: This post is very long. It runs to 200 pages on the Net. Updated January 17, 2016.

This is a continuation of the earlier post. I split it up into two parts because it had gotten too long.

The post refers to which languages are the hardest for English speakers to learn, though to some extent, the ratings are applicable across languages. Most Chinese speakers would recognize Spanish as being an easy language, despite its alien nature. And even most Chinese, Navajo, Poles or Czechs acknowledge that their languages are hard to learn. To a certain extent, difficulty is independent of linguistic starting point. Some languages are just harder than others, and that’s all there is to it.

Method, Results and Conclusion. See here.

In this case, 73 non-IE languages were examined.

Ratings: Languages are rated 1-6, easiest to hardest. 1 = easiest, 2 = moderately easy to average, 3 = average to moderately difficult, 4 = very  difficult, 5 = extremely difficult, 6 = most difficult of all.

Time needed: Time needed to learn the language “reasonably well”: Level 1 languages = 3 months-1 year. Level 2 languages = 6 months-1 year. Level 3 languages = 1-2 years. Level 4 languages = 2 years. Level 5 languages = 3-4 years, but some may take longer.

Here is a list of the ratings for the languages below as a handy reference.

 

Malagasy 1.0 Bahasa Indonesian 1.5 Aymara 2.0 Malay 2.0 Hawaiian 2.0 Swahili 2.0 Maori 3.0 Turkish 3.5 Quechua 4.0 Maltese 4.0 Tamil 4.0 Tagalog 4.0 Anyi 4.0 Egyptian Arabic 4.5 Moroccan Arabic 4.5 Amharic 4.5 Estonian 4.5 Khmer 4.5 Lao 4.5 Georgian 5.0 Gros Ventre 5.0 Karok 5.0 MSA Arabic 5.0 Hebrew 5.0 Somali 5.0 Malayalam 5.0 Korean 5.0 Japanese 5.0 Finnish 5.0 Skolt Sami 5.0 Hungarian 5.0 Quiang 5.0 Tibetan 5.0 Dzongka 5.0 Vietnamese 5.0 Sedang 5.0 Hmong 5.0 Tsou 5.0 Sakai 5.0 Kwaio 5.0 Thai 5.0 Kam 5.0 Buyang 5.0 Ga 5.0 Ndali 5.0 Xhosa 5.0 Ndebele 5.0 Zulu 5.0 Taa 5.0 Ju|’hoan 5.0 Cherokee 5.5 Lakota 5.5 Classical Japanese 5.5 Mandarin 5.5 Cantonese 5.5 Min Nan 5.5 Dondan Wu 5.5 Basque 5.5 Chechen 6.0 Circassian 6.0 Tsez 6.0 Archi 6.0 Tabasaran 6.0 Ingush 6.0 Ubykh 6.0 Abkhaz 6.0 Burushaski 6.0 Kootenai 6.0 Yuchi 6.0 Tlingit 6.0 Navajo 6.0 Slavey 6.0 Haida 6.0 Salish 6.0 Nuxalk 6.0 Montana Salish 6.0 Straits Salish 6.0 Halkomelem 6.0 Lushootseed 6.0 Cree 6.0 Ojibwa 6.0 Cheyenne 6.0 Arapaho 6.0 Wichita 6.0 Huamelutec 6.0 Hopi 6.0 Nahuatl 6.0 Comanche 6.0 Chinantec 6.0 Jalapa Mazatec 6.0 Tarina 6.0 Bora 6.0 Tuyuca 6.0 Cubeo 6.0 Hixkaryána 6.0 Nambikwara 6.0 Pirahã 6.0 Australian Languages – 6.0 Berik 6.0 Amele 6.0 Valpan 6.0 Tamazight 6.0 Tachelhit 6.0 Dahalo 6.0 Classical Chinese 6.0 Inuktitut 6.0 Kalaallisut 6.0 Chukchi 6.0

Northeast Caucasian, Northwest Caucasian and Kartvelian

Of course the Caucasian languages like Tsez, Tabasaran, Georgian, Chechen, Ingush, Abkhaz and Circassian are some of the hardest languages on Earth to learn.

Chechen and Circassian are rated 6, hardest of all.

Northeast Caucasian

NE Caucasian languages have the uvulars and ejectives of Georgian in addition to pharyngeals, lateral fricatives, and other strangeness. They have noun classes like the Bantu languages (but usually fewer). Nevertheless, they have noun class agreement markers on verbs on adjectives. One thing NE Caucasian has is lots of case. Some languages have 40+ cases. They are built from the ground up via two forms – one a spatial form such as in, on or around and the other a directional motion form such as to, from, through or at.

Tsezic

Tsez has 64-126 different cases, making it by far the most complex case system on Earth! It is one of the few languages on Earth that has two genitive cases – Genitive 1 (-s) and Genitive 2 (-z). Genitive 1 is used when the genitive’s head noun is in absolutive case and Genitive 2 is used when the genitive’s head noun is in any other case. It also has four noun classes. It is said that even native speakers have a hard time picking up the correct inflection to use sometimes.

In Tsez, you need to know a lot Tsez grammar to communicate at a basic level. The sentence:

English: I like your mother.

Tsez: Дāьр деби энийу йетих. (Dǟr debi eniyu yetix.)

In order to speak that sentence in Tsez, you need to know:

• the words themselves (word order is not as important) • that the verb -eti- requires the subject to be in the dative/lative case and the object to be in the absolutive • the noun class for eniyu (class II) • the dative/lative form of di (I), which is dǟr • the genitive 1 form of mi (you), which is debi • the congruence prefix y- that corresponds to the noun class of the absolutive argument of the phrase, in this case mother • the present tense ending for vowel-final verbs -x

Tsez is rated 6, hardest of all.

Lezgic Archi

Archi has an extremely complex phonology and one of the most complicated grammars on Earth. The extreme fusional aspects and the verbal morphology are what make the grammar so difficult. Every verb root has 1,502,839 possible forms! It is also an ergative language, but there is irregularity in its ergative system.

Some verbs take the typical ergative/absolutive case (absolutive for the subject of an intransitive very and ergative for the subject of a transitive verb – where the direct object would be in absolutive). In others the subject is in dative rather than the expected ergative/absolutive case. These are usually verbs of perception like love/want, hear, see, feel, and be bored. For instance, the verb:

-эти- = to love/want must have its subject in dative case instead of the expected absolutive or ergative case.

Among non-click languages, Archi has one of the largest consonant inventories, with only the extinct Ubykh having more. There are 26 vowels and between 76 and 82 consonants, depending on the analysis. Five of the six vowels can occur in five varieties: short, pharyngealized, high tone, long (with high tone), and pharyngealized with high tone.

It has many unusual phonemes, including contrasts between several voiceless velar lateral fricatives, voiceless and ejective velar lateral affricates and a voiced velar lateral fricative. The voiceless velar lateral fricative ʟ̝̊, the voiced velar lateral fricative ʟ̝, and the corresponding voiceless and ejective affricates k͡ʟ̝̊ and k͡ʟ̝̊ʼ are extremely unusual sounds, as velar fricatives are not typically laterals.

There are 15 cases, 10 regular cases, five spatial cases and five directional cases. The Spatial cases are Inessive (in), Intrative (between), superessive (above), Subessive (below) and Pertingent (against). The directional cases are Essive (as), Elative (out of), Lative (to/into), Allative (onto), Terminative (specifies a limit) and Translative (indicates change).

There are four noun classes:

I Male human II Female human III All insects, some animates, and some inanimates IV Abstracts, some animates, and some inanimates that can only be seen via verbal agreement

Archi is rated 6, hardest of all.

Samur Eastern Samur Lezgi–Aghul–Tabasaran

Tabasaran is rated the 3rd most complex grammar in the world, with 48 different noun cases.

Tabasaran is rated 6, hardest of all.

Nakh Vainakh

Ingush has a very difficult phonology, an extremely complex grammar, and furthermore, is extremely irregular. Ingush also has a proximate/obviate distinction and is the only language in the region that has this feature. Ingush along with Chechen both have a closed class of verbs, an unusual feature in the world’s languages. New verbs are formed by adding a noun to the verb do:

shootdo gun

Ingush is rated 6, hardest of all.

Kartvelian Karto-Zan

One problem with Georgian is the strange alphabet: ქართულია ერთ ერთი რთული ენა. It also has lots of glottal stops that are hard for many foreigners to speak; consonant clusters can be huge – up to eight consonants stuck together (CCCCCCCCVC)- and many consonant sounds are strange. In addition, there are uvulars and ejectives. Georgian is one of the hardest languages on Earth to pronounce. It regularly makes it onto craziest phonologies lists.

Its grammar is exceedingly complex. Georgian is both highly agglutinative and highly irregular, which is the worst of two worlds. Other agglutinative languages such as Turkish and Finnish at least have the benefit of being highly regular. The verbs in particular seem nearly random with no pattern to them at all. The system of argument and tense marking on the verb is exceedingly complex, with tense, aspect, mood on the verb, person and number marking for the subject, and direct and indirect objects.

Although it is an ergative language, the ergative (or active-stative case marking as it is called) oddly enough is only used in the aorist and perfect tenses where the agent in the sentence receives a different case, while the aorist also masquerades as imperative. In the present, there is standard nominative-accusative marking. A single verb can have up to 12 different parts, similar to Polish, and there are six cases and six tenses.

Georgian also features something called polypersonal agreement, a highly complex type of morphological feature that is often associated with polysynthetic languages and to a lesser extent with ergativity.

In a polypersonal language, the verb has agreement morphemes attached to it dealing with one or more of the verbs arguments (usually up to four arguments). In a non polypersonal language like English, the verb either shows no agreement or agrees with only one of its arguments, usually the subject. Whereas in a polypersonal language, the verb agrees with one or more of the subject, the direct object, the indirect object, the beneficiary of the verb, etc. The polypersonal marking may be obligatory or optional.

In Georgian, the polypersonal morphemes appear as either suffixes or prefixes, depending on the verb class and the person, number, aspect and tense of the verb. The affixes also modify each other phonologically when they are next to each other. In the Georgian system, the polypersonal affixes convey subject, direct object, indirect object, genitive, locative and causative meanings.

g-mal-av-en = they hide you g-i-mal-av-en = they hide it from you

mal (to hide) is the verb, and the other four forms are polypersonal affixes.

In the case below,

xelebi ga-m-i-tsiv-d-a = My hands got cold.

xelebi means hands. The m marker indicates genitive or my. With intransitive verbs, Georgian often omits my before the subject and instead puts the genitive onto the verb to indicate possession.

Georgian verbs of motion focus on deixis, whether the goal of the motion is towards the speaker or the hearer. You use a particle to signify who the motion is heading towards. If it heading towards neither of you, you use no deixis marker. You specify the path taken to reach the goal through the use or prefixes called preverbs, similar to “verbal case.” These come after the deixis marker:

up             a-
out            ga-
in             sha-
down into      cha-
across/through garda-
thither        mi-
away           c’a-
or down        da-

Hence:

up towards me = amo-. The deixis marker is mo- and up is a-

On the plus side, Georgian has borrowed a great deal of Latinate foreign vocabulary, so that will help anyone coming from a Latinate or Latinate-heavy language background.

Georgian is rated 5, extremely difficult.

Northwest Caucasian

All NW Caucasian languages are characterized by a very small number of vowels (usually only two or three) combined with a vast consonant inventory, the largest consonant inventories on Earth. Almost any consonant can be plain, labialized or palatalized. This is apparently the result of an historical process whereby many vowels were lost and their various features became assigned to consonants. For instance, palatalized consonants may have come from Ci sequences and labialized consonants may have come from Cu sequences.

The grammars of these languages are complex. Unlike the NE Caucasian languages, they have simple noun systems, usually with only a handful of cases.

However, they have some of the complex verbal systems on Earth. These are some of the most synthetic languages in the Old World. Often the entire syntax of the sentence is contained within the verb. All verbs are marked with ergative, absolutive and direct object morphemes in addition to various applicative affixes.

These are akin to what some might call “verbal case.” For instance, in applicative voice systems, applicatives may take forms such as comitative, locative, instrumental, benefactive and malefactive. These roles are similar to the case system in nouns – even the names are the same. So you can see why some call this “verbal case.”

NW Caucasian verbs can be marked for aspect (whether something is momentous, continuous or habitual), mood (if something is certain, likely, desired, potential, or unreal). Other affixes can shape the verb in an adverbial sense, to express pity, excess or emphasis.

Like NE Caucasian, they are also ergative.

NW Caucasian makes it onto a lot of craziest language lists.

These are some of the strangest sounding languages on Earth. Of all of these languages, Abaza has the most consonants. Here is a video in the Abaza language.

Ubykh

Ubykh, a Caucasian language of Turkey, is now extinct, but there is one second language speaker, a linguist who is said to have taught himself the language. It has more consonants than any non-click language on Earth – 84 consonant sounds in all. Furthermore, the phonemic inventory allows some very strange consonant clusters.

Ubykh has many rare consonant sounds. is only also found in two of Ubykh’s relatives, Abkhaz and Abaza and in two other languages, both in the Brazilian Amazon. The pharyngealized labiodental voiced fricative  does not exist in any other language. It often makes it onto weirdest phonologies lists. Ubykh also got a very high score on a study of the weirdest languages on Earth.

Combine that with only two vowel sounds and a highly complex grammar, and you have one tough language.

In addition, Ubykh is both agglutinative and polysynthetic, ergative, and has polypersonal agreement:

Aχʲazbatʂʾaʁawdətʷaajlafaqʾajtʾmadaχ! If only you had not been able to make him take it all out from under me again for them…

There are an incredible 16 morphemes in that nine syllable word.

Ubykh has only four case systems on its nouns, but much case function has shifted over to the verb via preverbs and determinants. It is these preverbs and determinants that make Ubykh monstrously complex. The following are some of the directional preverbs:

  • above and touching
  • above and not touching
  • below and touching
  • below and not touching
  • at the side of
  • through a space
  • through solid matter
  • on a flat horizontal surface
  • on a non-horizontal or vertical surface
  • in a homogeneous mass
  • towards
  • in an upward direction
  • in a downward direction
  • into a tubular space
  • into an enclosed space

There are also some preverbal forms that indicate deixis:

j-  = towards the speaker

Others can indicate ideas that would take up whole phrases in English:

jtɕʷʼaa- = on the Earth, in the Earth

ʁadja ajtɕʷʼaanaaɬqʼa They buried his body. (Lit. They put his body in the earth.)

faa– = out of, into or with regard to a fire.

Amdʒan zatʃətʃaqʲa faastχʷən. I take a brand out of the fire.

Morphemes may be as small as a single phoneme:

wantʷaan They give you to him.

w – 2nd singular absolutive a – 3rd singular dative n – 3rd ergative – to give aa – ergative plural n – present tense

Adverbial suffixes are attached to words to form meanings that are often formed by aspects or tenses in other languages:

asfəpχaI need to drink it. asfəfanI can drink it. asfəɡʲanI drink it all the time. asfəlanI am drinking it all up. asfətɕʷan I drink it too much. asfaajənI drink it again.

Nouns and verbs can transform into each other. Any noun can turn into a stative verb:

məzəchild

səməzəjtʼ I was a child. (Lit. I child-waschild-was is a verb – to be a child.)

By the same token, many verbs can become nouns via the use of a nominal affix:

qʼato say

səqʼa what I say – (Lit. That which I saymy speech, my words, my language, my orders, etc.

Number is marked on the verb via a verbal suffix and is only marked on the noun in the ergative case.

However, it does lack the convoluted case systems of the Caucasian languages next door and there is no grammatical gender.

Ubykh is rated 6, hardest of all.

Abkhaz-Abazin

Abkhaz is an extremely difficult language to learn. Each basic consonant has eight different positions of articulation in the mouth. Imagine how difficult that would be for an Abkhaz child with a speech impediment. Abkhaz seems to put agreement markers on just about everything in the language. Abkhaz makes it onto many craziest language lists, and it recently got a very high score on a weirdest language study.

Abkhaz is rated 6, hardest of all.

Burushaski

Burushaski is often thought to be a language isolate, related to no other languages, however, I think it is Dene-Caucasian. It is spoken in the Himalaya Mountains of far northern Pakistan in an area called the Hunza. It’s verb conjugation is complex, it has a lot of inflections, there are complicated ways of making sentences depending on many factors, and it is an ergative language, which is hard to learn for speakers of non-ergative languages. In addition, there are very few to no cognates for the vocabulary.

Burushaski is rated 6, hardest of all.

American Indian Languages

American Indian languages are also notoriously difficult, though few try to learn them in the US anyway. In the rest of the continent, they are still learned by millions in many different nations. You almost really need to learn these as a kid. It’s going to be quite hard for an adult to get full competence in them.

One problem with these languages is the multiplicity of verb forms. For instance, the standard paradigm for the overwhelming number of regular English verbs is a maximum of five forms:

steal steals stealing stole stolen

Many Amerindian languages have over 1,000 forms of each verb in the language.

Kootenai

Yet the Salishans (see below) always considered the neighboring language Kootenai to be too hard to learn. Kootenai also has a distinction between proximate/obviate along with direct/inverse alignment, probably from contact with Algonquian.

However, the Kootenai direct/inverse system is less complex than Algonquian’s, as it is present only in the 3rd person. Kootenai also has a very strange feature in that they have particles that look like subject pronouns, but these go outside of the full noun phrase. This is a very rare feature in the world’s languages. Kootenai scored very high on a weirdest language survey.

Kootenai is an isolate spoken in Idaho by 100 people.

Kootenai is rated 6, hardest of all.

Yuchi

Yuchi is a language isolate spoken in the Southern US. They were originally located in Eastern Tennessee and were part of the Creek Confederacy at one time. Yuchi is nearly extinct, with only five remaining speakers.

Yuchi has noun genders or classes based on three distinctions of position: standing, sitting or lying. All nouns are either standing, sitting or lying. Trees are standing, and rivers are lying, for instance. It it is taller than it is wide, it is standing. It if is  wider than it is tall, it is lying.

If it is about as about as wide as it is tall, it is sitting. All nouns are one of these three genders, but you can change the gender for humorous or poetic effect. A linguist once asked a group of female speakers whether a penis was standing, sitting or lying. After lots of giggles, they said the default was sitting, but you could say it was standing or lying for poetic effect.

Also all Yuchi pronouns must make a distinction between age (older or younger than the speaker) and ethnicity (Yuchi or non-Yuchi).

Yuchi gets a 6 rating, hardest of all.

Dene-Yeniseian Na-Dene Athabascan-Eyak Tlingit

Tlingit is probably one of the hardest, if not the hardest, language in the world. Tlingit is analyzed as partly synthetic, partly agglutinative, and sometimes polysynthetic. It has not only suffixes and prefixes, but it also has infixes, or affixes in the middle of words.

‘eechto pick

All prefixes must be in proper order for the word to work.

tuyakaoonagadagaxayaeecheen. I am usually picking, on purpose, a long object through the hole while standing on a table.

tuyakaoonagootxayaeecheen. I am usually being forced to pick a long object through the hole while standing on a table.

tuyaoonagootxawa’eecheen. I am usually being picking the edible long object through the hole while standing on a table.

Tlingit has a pretty unusual phonology. For one thing, it is the only language on Earth with no l. This despite the fact that it has five other laterals: dl (), tl (tɬʰ), tl’ (tɬʼ), l (ɬ) and l’ (ɬʼ). The tɬʼ and ɬʼ sounds are rare in the world’s languages. ɬʼ  is only found in the wild NW Caucasian languages. It also has two labialized glottal consonants, ʔʷ and hw ().

Tlingit gets a 6 rating, hardest of all.

Athabascan Southern

Navajo has long, short and nasal vowels, a tone system and a grammar totally unlike anything in Indo-European. A stem of only four letters or so can take enough affixes to fill a whole line of text.

Navajo is a polysynthetic language. In polysynthetic languages, very long words can denote an entire sentence, and it’s quite hard to take the word apart into its parts and figure out exactly what they mean and how they go together. The long words are created because polysynthetic languages have an amazing amount of morphological richness. They put many morpheme together to create a word out of what might be a sentence in a non-polysynthetic language.

Some Navajo dictionaries have thousands of entries of verbs only, with no nouns. Many adjectives have no direct translation into Navajo. Instead, verbs are used as adjectives. A verb has no particular form like in English – to walk. Instead, it assumes various forms depending on whether or not the action is completed, incomplete, in progress, repeated, habitual, one time only, instantaneous, or simply desired. These are called aspects. Navajo must have one of the most complex aspect systems of any language:

The Primary aspects:

Momentaneous – punctually (takes place at one point in time) Continuative – an indefinite span of time & movement with a specified direction Durative – over an indefinite span of time, non-locomotive uninterrupted continuum Repetitive – a continuum of repeated acts or connected series of acts Conclusive – like durative but in perfective terminates with static sequel Semelfactive – a single act in a repetitive series of acts Distributive – a distributive manipulation of objects or performance of actions Diversative – a movement distributed among things (similar to distributive) Reversative – results in directional change Conative – an attempted action Transitional – a shift from one state to another Cursive – progression in a line through time/space (only progressive mode)

The subaspects:

Completive – an event/action simply takes place (similar to the aorist tense) Terminative – a stopping of an action Stative – sequentially durative and static Inceptive – beginning of an action Terminal – an inherently terminal action Prolongative – an arrested beginning or ending of an action Seriative – an interconnected series of successive separate & distinct acts Inchoative – a focus on the beginning of a non-locomotion action Reversionary – a return to a previous state/location Semeliterative – a single repetition of an event/action

The tense system is almost as wild as the aspectual system.

For instance, the verb ndideesh means to pick up or to lift up. But it varies depending on what you are picking up:

ndideeshtiilto pick up a slender stiff object (key, pole) ndideeshleel to pick up a slender flexible object (branch, rope) ndideesh’aalto pick up a roundish or bulky object (bottle, rock) ndideeshgheelto pick up a compact and heavy object (bundle, pack) ndideeshjolto pick up a non-compact or diffuse object (wool, hay) ndideeshteelto pick up something animate (child, dog) ndideeshnil to pick up a few small objects (a couple of berries, nuts) ndideeshjihto pick up a large number of small objects (a pile of berries, nuts) ndideeshtsosto pick up something flexible and flat (blanket, piece of paper) ndideeshjilto pick up something I carry on my back ndideeshkaalto pick up anything in a vessel ndideeshtlohto pick up mushy matter (mud).

But picking up is only one way of handling the 12 different consistencies. One can also bring, take, hang up, keep, carry around, turn over, etc. objects. There are about 28 different verbs one can use for handling objects. If we multiply these verbs by the consistencies, there are over 300 different verbs used just for handling objects.

In Navajo textbooks, there are conjugation tables for inflecting words, but it’s pretty hard to find a pattern there. One of the most frustrating things about Navajo is that every little morpheme you add to a word seems to change everything else around it, even in both directions.

Navajo is said to have a very difficult system for counting numerals.

There is also a noun classifier system with more than a dozen classifiers that affect inflection. This is quite a few classifiers even for a noun classifier language and is similar to African languages like Zulu. In addition, it has the strange direct/inverse system.

To add insult to injury, Navajo is an ergative language.

Navajo also has an honorifics or politeness system similar to Japanese or Korean.

Navajo also has the odd feature where the word niinaabecause can be analyzed as a verb.

X áhóót’įįd biniinaa… Because X happened…

Shiniinaa sits’il. It broke into pieces because of me.

In the latter sentence, the only way we know that 1st singular was involved in because of the person marking on niinaa.

There are 25 different kinds of pronominal prefixes that can be piled onto one another before a verb base.

Navajo has a very strange feature called animacy, where nouns take certain verbs according to their rank in the hierarchy of animation which is a sort of a ranking based on how alive something is. Humans and lightning are at the top, children and large animals are next and abstractions are at the bottom.

All in all, Navajo, even compared to other polysynthetic languages, has some of the most incredibly complicated polysynthetic morphology of any language. On craziest grammar and craziest language lists, Navajo is typically listed.

It is even said that Navajo children have a hard time learning Navajo as compared to children learning other languages, but Navajo kids definitely learn the language. Similarly with Hopi below, even linguists find even the best Navajo grammars difficult or even impossible to understand.

However, Navajo is quite regular, a common feature in Amerindian languages.

Navajo is rated 6, hardest of all.

Northern

Slavey, a Na-Dene language of Canada, is hard to learn. It is similar to Navajo and Apache. Verbs take up to 15 different prefixes. All Athabascan languages have wild verbal systems. It also uses a completely different alphabet, a syllabic one designed for Canadian Indians.

Slavey is rated 6, hardest of all.

Haida

Haida is often thought to be a Na-Dene language, but proof of its status is lacking. If it is Na-Dene, it is the most distant member of the family. Haida is in the competition for the most complicated language on Earth, with 70 different suffixes.

Haida is rated 6, hardest of all.

Salishan

The Salishan languages spoken in the Northwest have a long reputation for being hard to learn, in part because of long strings of consonants, in one case 11 consonants long. Salish languages are the only languages on Earth that allow words without sonorants.

Many of the vowels and consonants are not present in most of the world’s widely spoken languages. The Salish languages are, like Chukchi, polysynthetic. Some translations treat all Salish words are either verbs or phrases. Some say that Salish languages do not contain nouns, though this is controversial. The verbal system of Salish languages is absurdly complex.

All Salishan languages are rated rated 6, hardest of all.

Nuxálk (Bella Coola)

Nuxálk is a notoriously difficult Salishan Amerindian language spoken in British Colombia. It is famous for having some really wild words and even sentences that don’t seem to have any vowels in them at all. For instance:

xłp̓x̣ʷłtłpłłskʷc̓  (xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ in IPA) He had a bunchberry plant.

sxs seal fat

Here are some more odd words and sentences:

smnmnmuuc mute

Nuyamłamkis timantx tisyuttx ʔułtimnastx. The father sang the song to his son.

Musis tiʔimmllkītx taq̓lsxʷt̓aχ. The boy felt that rope.

However, this word is not typically used by speakers and by no means do most words consist of all consonants. The language sounds odd when spoken. It has been described as “whispering while chewing on a granola bar” (see the video sample under Montana Salish below).

These wild consonant clusters are even crazier than the ones in Ubykh and NW Caucasian. In fact, the nutty consonant clusters in Salish and causing a debate in linguistics about whether or not the syllable is even a universal phenomenon in language as some Salish words and phrases appear to lack syllables. Some Berber dialects have raised similar questions about the syllable.

Nuxálk makes it onto lists of the craziest phonologies on Earth.

Nuxálk is rated 6, hardest of all.

Interior Salish Southern

Montana Salish is said to be just as hard to learn as Nuxálk . Spokane (Montana Salish) has combining and independent forms with the same meaning:

spim’cnmouth -cinmouth

Montana Salish makes it onto a lot of craziest grammars lists.

This link shows an elder on the Flathead Indian Reservation in Montana, Steven Smallsalmon, speaking Montana Salish. He also leads classes in the language. This is probably one of the strangest sounding languages on Earth.

Montana Salish is rated 6, hardest of all.

Central

Straits Salish has an aspectual distinction between persistent and nonpersistent. Persistent means the activity continues after its inception as a state. The persistent morpheme is . The result is similar to English:

figure out – nonpersistent know – persistent

look at – nonpersistent watch – persistent

take – nonpersistent hold – persistent

is referred to as a “parasitic morpheme” and only occurs in stem that has an underlying ə which serves as a “host” for the morpheme.

How strange.

The Saanich dialect of Straits Salish is often listed in the rogue’s gallery of craziest grammars on Earth. The writing system is often listed as one of the worst out there. In addition, Saanich makes it onto craziest grammars lists for the parasitic morphemes and for having no distinction between nouns and verbs!

Straits Salish gets a 6 rating, hardest of all.

Halkomelem, spoken by 570 people around Vancouver, British Colombia, is widely considered to be one of the hardest languages on Earth to learn. In Halkomelem, many verbs have an orientation towards water. You can’t just say, She went home. You have say how she was going home in relation to nearby bodies of water. So depending on where she was walking home in relation to the nearest river, you would say:

She was farther away from the water and going home. She was coming home in the direction away from the water. She was walking parallel to the flow of the water downstream. She was walking parallel to the flow of the water upstream.

Halkomelem gets a 6 rating, hardest of all.

Lushootseed

Lushootseed is said to be just as hard to learn as Nuxálk. Lushootseed is one of the few languages on Earth that has no nasals at all, except in special registers like baby talk and the archaic speech of mythological figures. It also has laryngealized glides and nasals: w ̰ , m̥ ̰ , and n̥ ̰ .

Lushootseed is rated 6, hardest of all.

Iroquoian

All Iroquoian languages are extremely difficult, but Athabaskan is probably even harder. Siouan languages may be equal to Iroquoian in difficulty.

Compare the same phrases in Tlingit (Athabaskan) and and  Cherokee (Iroquoian).

Tlingit:

kutíkusa‘áatIt’s cold outside. kutíkuta‘áatIt’s cold right now.

In Tlingit, you can add or modify affixes at the beginning as prefixes, in the middle as infixes and at the end as suffixes. In the above example, you changed a part of the word within the clause itself.

Cherokee:

doyáditlv uyvtlvIt is cold outside. (Lit. Outside it is cold) ka uyvtlv It is cold now. (Lit. Now it is cold.)

As you can see, Cherokee is easier.

Cherokee

Cherokee is very hard to learn. In addition to everything else, it has a completely different alphabet. It’s polysynthetic, to make matters worse. It is possible to write a Cherokee sentence that somehow lacks a verb. There are five categories of verb classifiers. Verbs needing classifiers must use one. Each regular verb can have an incredible 21,262 inflected forms! All verbs contain a verb root, a pronominal prefix, a modal suffix and an aspect suffix. In addition, verbs inflect for singular, plural and also dual. For instance:

ᎠᎸᎢᎭ   a'lv'íha 

You have 126 different forms:
ᎬᏯᎸᎢᎭ  gvyalv'iha     I tie you up
ᏕᎬᏯᎸᎢᎭ degvyalviha  I'm tying you up
ᏥᏯᎸᎢᎭ  jiyalv'ha        I tie him up
ᎦᎸᎢᎭ                          I tie it
ᏍᏓᏯᎸᎢᎭ sdayalv'iha  I tie you (dual)
ᎢᏨᏯᎢᎭ  ijvyalv'iha    I tie you (pl)
ᎦᏥᏯᎸᎢᎭ gajiyalv'iha  I tie them (animate)
ᏕᎦᎸᎢᎭ                        I tie them up (inanimate)
ᏍᏆᎸᎢᎭ  squahlv'iha    You tie me
ᎯᏯᎸᎢᎭ  hiyalv'iha     You're tying him
ᎭᏢᎢᎭ   hatlv'iha         You tie it
ᏍᎩᎾᎸᎢᎭ skinalv'iha    You're tying me and him
ᎪᎩᎾᏢᎢᎭ goginatlv'iha  They tie me and him etc.

Let us look at another form:

to see

I see myself           gadagotia
I see you                gvgohtia
I see him/               tsigotia
I see it                    tsigotia
I see you two          advgotia
I see you (plural)    istvgotia
I see them (live)    gatsigotia
I see them (things) detsigotia

You see me                     sgigotia
You see yourself              hadagotia
You see him/her              higo(h)tia
You see it                        higotia
You see another and me  sginigotia
You see others and me    isgigotia
You see them (living)      dehigotia
You see them (living)      gahigotia
You see them (things)     detsigotia

He/she sees me                    agigotia
He/she sees you                   tsagotia
He/she sees you                   atsigotia
He/she sees him/her            agotia
He/she sees himself/herself  adagotia
He/she sees you + me          ginigotia
He/she sees you two             sdigotia
He/she sees another + me    oginigotia
He she sees us (them + me) otsigotia
He/she sees you (plural)       itsigotia
He/she sees them                 dagotia

You and I see him/her/it                igigotia
You and I see ourselves                 edadotia
You and I see one another             denadagotia/dosdadagotia
You and I see them (living)           genigotia
You and I see them (living or not) denigotia

You two see me                           sgninigotia
You two see him/her/it                 esdigotia
You two see yourselves                sdadagotia
You two see us (another and me) sginigotia
You two see them                        desdigotia

Another and I see you             sdvgotia
Another and I see him/her       osdigotia
Another and I see it                 osdigotia
Another and I see you-two      sdvgotia
Another and I see ourselves    dosdadagotia
Another and I see you (plural) itsvgotia
Another and I see them           dosdigotia

You (plural) see me        isgigoti
You (plural) see him/her etsigoti

They see me                    gvgigotia
They see you                   getsagotia
They see him/her             anigoti
They see you and me       geginigoti
They see you two             gesdigoti
They see another and me gegigotia/gogenigoti
They see you (plural)       getsigoti
They see them                 danagotia
They see themselves       anadagoti

I will see datsigoi
I saw      agigohvi

He/she will see dvgohi
He/she             sawugohvi

Number is marked for inclusive vs. exclusive and there is a dual. 3rd person plural is marked for animate/inanimate. Verbs take different object forms depending on if the object is solid/alive/indefinite shape/flexible. This is similar to the Navajo system.

Cherokee also has lexical tone, with complex rules about how tones may combine with each other. Tone is not marked in the orthography. The phonology is noted for somehow not having any labial consonants.

However, Cherokee is very regular. It has only three irregular verbs. It is just that there are many complex rules.

Cherokee is rated 5.5, close to most difficult of all.

Iroquoian Northern Iroquoian Five Nations-Huronian-Susquehannock Huronian Huron-Petun

Wyandot, a dormant language that has been extinct for about 50 years, has some unbelievably complex structures. Let us look at one of them. Wyandot is the only language on Earth that allows negative sentences that somehow do not contain a negative morpheme. Wyandot makes it onto craziest grammars lists. (To be continued).

Siouan-Catawban Siouan Mississippi Valley-Ohio Valley Siouan Mississippi Valley Siouan Dakota

Lakota and other Siouan languages may well be as convoluted as Iroquoian. In Lakota, all adjectives are expressed as verbs. Something similar is seen in Nahuatl.

Ógle sápe kiŋ mak’ú. The shirt it is black he gave it to me. He gave me the black shirt.

In the above, it is black is a stative verb and serves as an adjective.

Ógle kiŋ sabyá mak’ú. Shirt the blackly he gave it to me. He gave me the black shirt. (Lit. He gave me the shirt blackly.)

Bkackly is an adverb serving as an adjective above.

Lakota gets a 5.5 rating, hardest of all.

Algic Algonquian

All Algonquian languages have distinctions between animate/inanimate nouns, in addition to having proximate/obviate and direct/inverse distinctions. However, most languages that have proximate/obviate and direct/inverse distinctions are not as difficult as Algonquian.

Proximate/obviative is a way of marking the 3rd person in discourse. It distinguishes between an important 3rd person (proximate) and a more peripheral 3rd person (obviative). Animate nouns and possessor nouns tend to be marked proximate while inanimate nouns and possessed nouns tend to be marked obviative.

Direct/inverse is a way of marking discourse in terms of saliency, topicality or animacy. Whether one noun ranks higher than another in terms of saliency, topicality or animacy means that that nouns ranks higher in terms of person hierarchy. It is used only in transitive clauses. When the subject has a higher ranking than the object, the direct form is used. When the object has a higher ranking than the object, the inverse form is used.

Central Algonquian Cree-Montagnais

Cree is very hard to learn. It are written in a variety of different ways with different alphabets and syllabic systems, complicating matters even further. The syllabic alphabet has many problems and is often listed as one of the worst scripts out there. They are both polysynthetic and have long, short and nasal vowels and aspirated and unaspirated voiceless consonants. Words are divided into metrical feet, the rules for determining stress placement in words are quite complex and there is lots of irregularity. Vowels fall out a lot, or syncopate, within words.

Cree adds noun classifiers to the mix, and both nouns and verbs are marked as animate or inanimate. In addition, verbs are marked for transitive and intransitive. In addition, verbs get different affixes depending on whether they occur in main or subordinate clauses.

Cree is rated 6, hardest of all.

Ojibwa-Patowatomi

Ojibwa is said to be about as hard to learn, as Cree as it is very similar.

Ojibwa is rated 6, hardest of all.

Plains Algonquian Cheyenne

Cheyenne is well-known for being a hard Amerindian language to learn. Like many polysynthetic languages, it can have very long words.

Náohkêsáa’oné’seómepêhévetsêhésto’anéhe. I truly don’t know Cheyenne very well.

However, Cheyenne is quite regular, but has so many complex rules that it is hard to figure them all out.

Cheyenne is rated 6, hardest of all.

Arapahoan

Arapaho has a strange phonology. It lacks phonemic low vowels. The vowel system consists of i, ɨ~,u, ɛ, and ɔ, with no low phonemic vowels. Each vowel also has a corresponding long version. In addition, there are four diphthongs, ei, ou, oe and ie, several triphthongs, eii, oee, and ouu, as well as extended sequences of vowels such as eee with stress on either the first or the last vowel in the combination. Long vowels of various types are common:

Héétbih’ínkúútiinoo. I will turn out the lights.

Honoosóó’. It is raining.

There is a pitch accent system with normal, high and allophonic falling tones. Arapaho words also undergo some very wild sound changes.

Arapaho is rated 6, hardest of all.

Gros Ventre has a similar phonological system and similar elaborate sound changes as Arapaho.

Gros Ventre is rated 5, hardest of all.

Caddoan Northern Wichita

Wichita has many strange phonological traits. It has only one nasal. Labials are rare and appear in only two roots. It also may have only three vowels, i, e, and a, with only height as a distinction. Such a restricted vertical vowel distribution is only found in NW Caucasian and the Papuan Ndu languages. There is apparently a three-way contrast in vowel length – regular, long and extra-long.

This is only found in Mixe and Estonian. There are some interesting tenses. Perfect tense means that an act has been carried out. The strange intentive tense means that one hopes or hoped to to carry out an act. The habitual tense means one regularly engages in the activity, not that one is doing so at the moment.

Long consonant clusters are permitted.

kskhaːɾʔa

nahiʔinckskih while sleeping

There are many cases where a CVɁ sequence has been reduced to due to loss of the vowel, resulting in odd words such as:

ki·sɁ bone

Word order is ordered in accordance with novelty or importance.

hira:wisɁiha:s kiyari:ce:hire: Our ancestors God put us on this Earth.

weɁe hira:rɁ tiɁi na:kirih God put our ancestors on this Earth.

In the sentence above, “our ancestors” is actually the subject, so it makes sense that it comes first.

Wichita has inclusive and exclusive 3rd person plural and has singular, dual and plural. There is an evidential system where if you say you know something, you must say how you know it – whether it is personal knowledge or hearsay.

Wichita gets a 6 rating, hardest of all.

Hokan Tequislatecan Coastal Chantal

Huamelutec or Lowland Oaxaca Chantal has the odd glottalized fricatives , , ɬʼ and as its only glottalized consonants. They alternate with plain f, s, l and x. , ɬʼ and are extremely rare in the world’s languages, usually only found in 2-3 other languages, often in NW Caucasian. occurs only in one other language – Tlingit. is slightly more common, occurring five other languages including Tlingit. In other languages, these odd sounds derived from sequences of consonant + q: Cq -> Cʔ -> glottalized fricative.

Sentence structure is odd:

Hit the ball the man. Hit the man the ball. The man hit the ball.

All mean the same thing.

Huamelutec gets a 6 rating, hardest of all.

Karok

Karok is a language isolate spoken by a few dozen people in northern California. The last native speaker recently died, however, there are ~80 who have varying levels of L2 fluency.

In Karok, you can use a suffix for different types of containment – fire, water or a solid.

pa:θ-kirih throw into a fire

pa:θ-kurih throw into water

pa:θ-ruprih throw through a solid

The suffixes are unrelated to the words for fire, water and solid.

Karok gets a 5 rating, hardest of all.

Uto-Aztecan Northern

Hopi is so difficult that even grammars describing the language are almost impossible to understand. For instance, Hopi has two different words for and depending on whether the noun phrase containing the word and is nominative or accusative.

Hopi is rated 6, hardest of all.

Southern Uto-Aztecan Corachol-Aztecan Core Nahua Nahuatl

In Nahuatl, most adjectives are simply stative verbs. Hence:

Umntu omde waya eTenochtitlan. The man he is tall went to Tenochtitlan. The tall man went to Tenochtitlan.

He is tall is a stative verb in the above.

Nahuatl gets a 6 rating, hardest of all.

Numic Central Numic

Comanche is legendary for being one of the hardest Indian languages of all to learn. Reasons are unknown, but all Amerindian languages are quite difficult. I doubt if Comanche is harder than other Numic languages.

Bizarrely enough, Comanche has very strange sounds called voiceless vowels, which seems to be an oxymoron, as vowels would seem to be inherently voiced. English has something akin to voiceless vowels in the words particular and peculiar, where the bolded vowels act something akin to a voiceless vowel.

Comanche was used for a while by the codespeakers in World War 2 – not all codespeakers were Navajos. Comanche was specifically chosen because it was hard to figure out. The Japanese were never able to break the Comanche code.

Comanche is rated 6, hardest of all.

Oto-Manguean Western Oto-Mangue Oto-Pame-Chinantecan Chinantecan

Chinantec, an Indian language of southwest Mexico, is very hard for non-Chinantecs to learn. The tone system is maddeningly complex, and the syntax and morphology are very intricate.

Chinantec is rated 6, hardest of all.

Popolocan Mazatecan Lowland Valley Southern

Jalapa Mazatec has distinctions between modal, creaky, breathy-voiced vowels along with nasal versions of those three. It also has creaky consonants and voiceless nasals. It has three tones, low, mid and high. Combining the tones results in various contour tones. In addition, it has a 3-way distinction in vowel length. Whistled speech is also possible. It has a phonemic distinction between “ballistic” and “controlled” syllables which is only present on Oto-Manguean.

Ballistic (short) warm nīˑntūslippery tsǣguava hų̄you plural

Controlled (half-long) sūˑblue nīˑntūˑneedle tsǣˑfull hų̄ˑ – six

Jalapa Mazatec is rated 6, hardest of all.

Maipurean Northern Upper Amazon Eastern Nawiki

Tariana is a very difficult language mostly because of the unbelievable amount of information it crams into its morphology and syntax. This is mostly because it is an Arawakan language that has been heavily influenced by neighboring Tucanoan languages, with the result that it has many of the grammatical categories and particles present in both families.

This stems from the widespread bilingualism in the Vaupes Basin of Colombia, where many people grow up bilingual from childhood and often become multilingual by adulthood. Learning up to five different languages is common. Code-switching was frowned upon and anyone using a word from Language Y while speaking Language X would get laughed at. Hence the various languages tended to borrow features from each other quite easily.

For instance, Tariana has both a noun classifier system and a gender system. Noun classifiers and gender are sometimes subsumed under the single category of “noun classifiers.” Yet Tariana has both, presumably from its relationship to two completely different language families. So in Tariana is not unusual to get both demonstratives and verbs marked for both gender and noun classifier. Tariana borrowed such things as serialized perception verbs and the dubitative marker from Tucano.

In addition, Tariana has some very odd sounds, including aspirated nasals mh (), nh (n̺ʰ) and ñh (ɲʰ) and an aspirated w () of all things. They seem to be actually aspirated, not just partially devoiced as many voiceless nasals and liquids are.

Tariana gets 6, hardest of all.

Huitotoan Proto-Bora-Muinane

Bora, a Wintotoan language spoken in Peru and Colombia near the border between the two countries, has a mind-boggling 350 different noun classes. The noun classifier system is actually highly productive and is often used to create new nouns. New nouns can be created very easily, and their meanings are often semantically transparent. In some noun classifier systems, classifiers can be stacked one upon the other. In these cases, typically the last one is used for agreement purposes.

Bora also is a tonal language, but it has only two tones. In addition, nearly all consonantal phonemes have phonemic aspirated and palatalized counterparts. The agreement structure in the language is also quite convoluted. The classifier system effectively replaces much derivational morphology on the noun and noun compounding processes that other languages use to expand the meanings of nominals.

Bora gets a 6 rating, hardest of all.

Tucanoan Eastern Tucanoan Bará-Tuyuka

Tuyuca is a Tucanoan language spoken in by 450 people in the department of Vaupés in Colombia. An article in The Economist magazine concluded that it was the hardest language on Earth to learn.

It has a simple sound system, but it’s agglutinative, and agglutinative languages are pretty hard. For instance, hóabãsiriga means I don’t know how to write. It has two forms of 1st person plural, I and you (inclusive) and I and the others (exclusive). It has between 50-140 noun classes, including strange ones like bark that does not cling closely to a tree, which can be extended to mean baggy trousers or wet plywood that has begun to fall apart.

Like Yamana, a nearly extinct Amerindian language of Chile, Tuyuca marks for evidentiality, that is, how it is that you know something. For instance:

Diga ape-wi. = The boy played soccer. (I saw him playing). Diga ape-hiyi. = The boy played soccer. (I assume he was playing soccer, though I did not see it firsthand).

Evidential marking is obligatory on all Tuyuca verbs and it forces you to think about how you know whatever it is you know.

Tuyuca definitely gets a 6 rating!

Central Tucanoan

Cubeo, a language spoken in the Vaupes of Colombia, has a either SOV or OVS. That would mean that the following:

The man the ball hit. The ball hit the man.

Mean the same things. OVS languages are quite rare.

Morphemes belong to one of four classes:

  1. Nasal (many roots, as well as suffixes like -xã  = associative)
  2. Oral (many roots, as well as suffixes like -pe  = similarity, -du = frustrative)
  3. Unmarked (only suffixes, e.g. -re  = in/direct object)
  4. Oral/Nasal (some roots and some suffixes) /bãˈkaxa-/(mãˈkaxa-) – to defecate and -kebã = suppose

Just by looking at any given consonant-initial suffix, it is impossible to determine which of the first three categories it belongs to. They must be learned one by one.

Cubeo has nasal assimilation, common to many Amazonian languages. In some of these, nasalization is best analyzed at the syllable level – some syllables are nasal and others are not.

dĩ-bI-ko /dĩ-bĩ-ko/ nĩmĩko She recently went.

The underlying form dĩ-bI-ko is realized on the surface as nĩmĩko. The ĩ in dĩ-bI-ko nasalizes the d, the b, and the I on either side of it, so nasal spreading works in both directions. However, it is blocked from the third syllable because k is part of a class of non-nasalizable consonants.

Pretty difficult language.

Cuneo gets a 6 rating, hardest of all.

Carib Waiwai

Hixkaryána is famous for being the only language on Earth to have basic OVS (Object-Verb-Subject) word order.

The sentence Toto yonoye kamara, or The man ate the jaguar, actually means The jaguar ate the man.

Toto yonoye kamara Lit. The man ate the jaguar. Gloss: The jaguar ate the man.

Grammatical suffixes attached to the end of the verb mark not only number but also aspect, mood and tense.

Hixkaryána gets a 6 rating, hardest of all.

Nambikwaran Mamaindê

This is actually a series of closely related languages as opposed to one language, but the Southern Nambikwara language is the most well-known of the family, with 1,200 speakers in the Brazilian Amazon.

Phonology is complex. Consonants distinguish between aspirated, plain and glottalized, common in the Americas. There are strange sounds like prestopped nasals glottalized fricatives. There are nasal vowels and three different tones. All vowels except one have both nasal, creaky-voiced and nasal-creaky counterparts, for a total of 19 vowels.

The grammar is polysynthetic with a complex evidential system.

Reportedly, Nambikwara children do not pick up the language fully until age 10 or so, one of the latest recorded ages for full competence. Nambikwara is sometimes said to be the hardest language on Earth to learn, but it has some competition.

Nambikwara definitely gets a 6 rating, hardest of all!

Muran

Pirahã is a language isolate spoken in the Brazilian Amazon. Recent writings by Daniel Everett indicate that not only is this one of the hardest languages on Earth to learn, but it is also one of the weirdest languages on Earth. It is monumentally complex in nearly every way imaginable. It is commonly listed on the rogue’s gallery of craziest languages and phonologies on Earth.

It has the smallest phonemic inventory on Earth with only seven consonants, three vowels and either two or three tones. Everett recently wrote a paper about it after spending many years with them. Previous missionaries who had spent time with the Pirahã generally failed to learn the language because it was too hard to learn. It took Everett a very long time, but he finally learned it well.

Many of Everett’s claims about Pirahã are astounding: whistled speech, no system for counting, very few Portuguese loans (they deliberately refuse to use Portuguese loans) evidence for the Sapir-Whorf linguistic relativity hypothesis, and evidence that it violates some of Noam Chomsky’s purported language universals such as embedding. It also has the t͡ʙ̥ sound – a bilabially trilled postdental affricate which is only found in two other languages, both in the Brazilian Amazon – Oro Win and Wari’.

Initially, Everett never heard the sound, but they got to know him better, they started to make it more often. Everett believes that they were ridiculed by other groups when they made the odd sound.

Pirahã has the simplest kinship system in any language – there is only word for both mother and father, and the Pirahã do not have any words for anyone other than direct biological relatives.

Pirahã may have only two numerals, or it may lack a numeral system altogether.

Pirahã does not distinguish between singular and plural person. This is highly unusual. The language may have borrowed its entire pronoun set from the Tupian languages Nheengatu and Tenarim, groups the Pirahã had formerly been in contact with. This may be one of the only attested case of the borrowing of a complete pronoun set.

There are mandatory evidentiality markers that must be used in Pirahã discourse. Speakers must say how they know something, whether they saw it themselves, whether it was hearsay or whether they inferred it circumstantially.

There are various strange moods – the desiderative (desire to perform an action) and two types of frustrative – frustration in starting an action (inchoative/incompletive) and frustration in completing an action (causative/incompletive). There are others: immediate/intentive (you are going to do something now/you intend to do it in the future)

There are many verbal aspects: perfect/imperfect (completed/incomplete) telic/atelic (reaching a goal/not reaching a goal), continuative (continuing), repetitive (iterative), and beginning an action (inchoative).

Each Pirahã verb has 262,144 possible forms, or possibly in the many millions, depending on which analysis you use.

The future tense is divided into future/somewhere and future/elsewhere. The past tense is divided into plain past and immediate past.

Pirahã has a closed class of only 90 verb roots, an incredibly small number. But these roots can be combined together to form compound verbs, a much larger category. Here is one example of three verbs strung together to form a compound verb:

xig ab op take turn gobring back, You take something away, you turn around, and you go back to where you got it to return it.

There are no abstract color terms in Pirahã. There are only two words for colors, one for light and one for dark. The only other languages with this restricted of a color sense are in Papua New Guinea. The other color terms are not really color terms, but are more descriptive – red is translated as like blood.

Pirahã can be whistled, hummed or encoded into music. Consonants and vowels can be omitted altogether and meaning conveyed instead via variations in stress, pitch and rhythm. Mothers teach the language to children by repeating musical patterns.

Pirahã may well be one of the hardest languages on Earth to learn.

Pirahã gets a 6 rating, hardest of all.

Quechuan

Quechua (actually a large group of languages and not a single language at all) is one of the easiest Amerindian languages to learn. Quechua is a classic example of a highly regular grammar with few exceptions. Its agglutinative system is more straightforward than even that of Turkish. The phonology is dead simple.

On the down side, there is a lot of dialectal divergence (these are actually separate languages and not dialects) and a lack of learning materials. Some say that Quechua speakers spend their whole lives learning the language.

Quechua has inconsistent orthographies. There is a fight between those who prefer a Spanish-based orthography and those who prefer a more phonemic one. Also there is an argument over whether to use the Ayacucho language or the Cuzco language as a base.

Quechua has a difficult feature known as evidential marking. This marker indicates the source of the speaker’s knowledge and how sure they are about the statement.

-mi expresses personal knowledge:

Tayta Wayllaqawaqa chufirmi. Mr. Huayllacahua is a driver. (I know it for a fact.)

-si expresses hearsay knowledge:

Tayta Wayllaqawaqa chufirsi. Mr. Huayllacahua is a driver (or so I’ve heard).

chá expresses strong possibility:

Tayta Wayllaqawaqa chufirchá. Mr. Huayllacahua is a driver (most likely).

Quechua is rated 4, very difficult.

Aymaran Aymara

Aymara has some of the wildest morphophonology out there. Morpheme-final vowel deletion is present in the language as a morphophonological process, and it is dependent on a set of highly complex phonological, morphological and syntactic rules (Kim 2013).

For instance, there are three types of suffixes: dominant, recessive and a 3rd class is neither dominant nor recessive. If a stem ends in a vowel, dominant suffixes delete the vowel but recessive suffixes allow the vowel to remain. The third class either deletes or retains the vowel on the stem depending on how many vowels are in the stem. If the root has two vowels, the vowel is retained. If it has three vowels, the vowel is deleted.

Although all of this seems quite odd, Finnish has something similar going on, if not a lot worse.

Nevertheless, Aymara is still said to be a very easy language to learn. The Guinness Book of World Records claims it is almost as easy to learn as Esperanto.

Aymara gets a 2 rating, very easy to learn.

Australian

Australian Aborigine languages are some of the hardest languages on Earth to learn, like Amerindian or Caucasian languages. Some Australian languages have phonemic contrasts that few other languages have, such as apico-dental, lamino-dental, apico-post-alveolar, and lamino-postalveolar cononals.

Australian languages tend to be mixed ergative. Ordinary nouns are ergative-absolutive, but 1st and 2nd person pronouns are nominative-accusative. One language has a three way agent-patient-experiencer distinction in the 1st person pronoun. Australian pronouns typically have singular, plural and dual forms along with inclusive and exclusive 1st plural. In some sentences, they have what is known as double case agreement which is rare in the world’s languages:

I gave a spear to my father. I gave a spear mine-to father’s-to.

Both elements of the phrase my father are in both dative and genitive.

However, Aboriginal languages do have the plus of being very regular.

All Australian languages are rated 6, most difficult of all.

Tor-Kwerba Orya-Tor Tor

Berik is a Tor-Orya language spoken in Indonesian colony of Irian Jaya in New Guinea.

Verbs take many strange endings, in many cases mandatory ones, that indicate what time of day something happened, among other things.

TelbenerHe drinks in the evening.

Where a verb takes an object, it will not only be marked for time of day but for the size of the object.

KitobanaHe gives three large objects to a man in the sunlight.

Verbs may also be marked for where the action takes place in reference to the speaker.

GwerantenaTo place a large object in a low place nearby.

Berik is rated 6, hardest of all.

Trans New Guinea Madang Croisilles Gum

Amele is the world’s most complex language as far as verb forms go, with 69,000 finitive and 860 infinitive forms.

Amele is rated 6, hardest of all.

Torricelli Wapei Valman

Valman is a bizarre case where the word and that connects two nouns is actually a verb of all things and is marked with the first noun as subject and the second noun as object.

John (subject) and Mary (object)

John is marked as subject for some reason, and Mary is marked as object, and the and word shows subject agreement with John and object agreement with Mary.

Valman gets a 6 rating, hardest of all.

Afroasiatic Semitic

Semitic languages such as Arabic and Hebrew are notoriously difficult to learn, and Arabic (especially MSA) tops many language learners’ lists as the hardest language they have ever attempted to learn. Although Semitic verbs are notoriously complex, the verbal system does have some advantages especially as compared to IE languages like Slavic. Unlike Slavic, Semitic verbs are not inflected for mood and there is no perfect or imperfect.

Central South Arabic

Arabic has some very irregular manners of noun declension, even in the plural. For instance, the word girls changes in an unpredictable way when you say one girl, two girls and three girls, and there are two different ways to say two girls depending on context. Two girls is marked with the dual, but different dual forms can be used. All languages with duals are relatively difficult for most speakers that lack a dual in their native language. However, the dual is predictable from the singular, so one might argue that you only need to learn how to say one girl and three girls.

Further, it is full of irregular plurals similar to octopus and octopi in English, whereas these forms are rare in English. With any given word, there might be 20 different possible ways to pluralize it, and there is no way to know which of the 20 paradigms to use with that word, and further, there is no way to generalize a plural pattern from a singular pattern. In addition, many words have 2-3 ways of pluralizing them. Some messy Arab plurals:

kalb -> kilaab qalb -> quluub maktab -> makaatib taalib -> tullaab balad -> buldaan

When you say I love you to a man, you say it one way, and when you say it to a woman, you say it another way. On and on.

The Arabic writing system is exceeding difficult and is more of the hardest to use of any on Earth. Soft vowels are omitted. You have to learn where to insert missing vowels, where to double consonants and which vowels to skip in the script. There are 28 different symbols in the alphabet and four different ways to write each symbol depending on its place in the word.

Consonants are written in different ways depending on where they appear in a word. An h is written differently at the beginning of a word than at the end of a word. However, one simple aspect of it is that the medial form is always the same as the initial form. You need to learn not only Arabic words but also the grammar to read Arabic.

Pronouns attach themselves to roots, and there are many different verb conjugation paradigms which simply have to be memorized. For instance, if a verb has a و, a ي, or a ء  in its root, you need to memorize the patters of the derivations, and that is a good chunk of the conjugations right there. The system for measuring quantities is extremely confusing.

The grammar has many odd rules that seem senseless. Unfortunately, most rules have exceptions, and it seems that the exceptions are more common than the rules themselves. Many people, including native speakers, complain about Arabic grammar.

Arabic does have case, but the system is rather simple.

The laryngeals, uvulars and glottalized sounds are hard for many foreigners to make and nearly impossible for them to get right. The ha’(ح ), qa (ق ) and غ sounds and the glottal stop in initial position give a lot of learners headaches.

Arabic is at least as idiomatic as French or English, so it order to speak it right, you have to learn all of the expressionistic nuances.

One of the worst problems with Arabic is the dialects, which in many cases are separate languages altogether. If you learn Arabic, you often have to learn one of the dialects along with classical Arabic. All Arabic speakers speak both an Arabic dialect and Classical Arabic.

In some Arabic as a foreign language classes, even after 1 1/2 years, not one student could yet make a complete and proper sentence that was not memorized.

Adding weight to the commonly held belief that Arabic is hard to learn is research done in Germany in 2005 which showed that Turkish children learn their language at age 2-3, German children at age 4-5, but Arabic kids did not get Arabic until age 12.

Arabic has complex verbal agreement with the subject, masculine and feminine gender in nouns and adjectives, head-initial syntax and a serious restriction to forming compounds. If you come from a language that has similar nature, Arabic may be easier for you than it is for so many others. Its 3 vowel system makes for easy vowels.

MSA Arabic is rated 5, extremely difficult.

Arabic dialects are often somewhat easier to learn than MSA Arabic. At least in Lebanese and Egyptian Arabic, the very difficult q’ sound has been turned into a hamza or glottal stop which is an easier sound to make. Compared to MSA Arabic, the dialectal words tend to be shorter and easier to pronounce.

To attain anywhere near native speaker competency in Egyptian Arabic, you probably need to live in Egypt for 10 years, but Arabic speakers say that few if any second language learners ever come close to native competency. There is a huge vocabulary, and most words have a wealth of possible meanings.

Egyptian Arabic is rated 4.5, very to extremely difficult.

Moroccan Arabic is said to be particularly difficult, with much vowel elision in triconsonantal stems. In addition, all dialectal Arabic is plagued by irrational writing systems.

Moroccan Arabic is rated 4.5, very to extremely difficult.

Maltese is a strange language, basically a Maghrebi Arabic language (similar to Moroccan or Tunisian Arabic) that has very heavy influence from non-Arabic tongues. It shares the problem of Gaelic that often words look one way and are pronounced another.

It has the common Semitic problem of difficult plurals. Although many plurals use common plural endings (-i, -iet, -ijiet, -at), others simply form the plural by having their last vowel dropped or adding an s (English borrowing). There’s no pattern, and you simply have to memorize which ones act which way. Maltese permits the consonant cluster spt, which is surely hard to pronounce.

On the other hand, Maltese has quite a few IE loans from Italian, Sicilian, Spanish, French and increasingly English. If you have knowledge of Romance languages, Maltese is going to be easier than most Arabic dialects.

Maltese is rated 4, very difficult.

South Canaanite

Hebrew is hard to learn according to a number of Israelis. Part of the problem may be the abjad writing system, which often leaves out vowels which must simply be remembered. Also, other than borrowings, the vocabulary is Afroasiatic, hence mostly unknown to speakers of IE languages. There are also difficult consonants as in Arabic such as pharyngeals and uvulars.

The het or glottal h is particularly hard to make. However, most modern Israelis no longer make the het sound or a’ain sounds. Instead, they pronounce the het like the chaf sound and the a’ain like an alef. Almost all Ashkenazi Israeli Jews no longer use the het or a’ain sounds. But most Jews who came from Arab countries (often older people) still use the sound, and some of their children do (Dorani 2013).

Hebrew has complex morphophonological rules. The letters p, b, t, d, k and g change to v, f, dh, th, kh and gh in certain situations. In some environments, pharyngeals change the nature of the vowels around them. The prefix ve-, which means and, is pronounced differently when it precedes certain letters. Hebrew is also quite irregular.

Hebrew has quite a few voices, including active, passive, intensive, intensive passive, etc. It also has a number of tenses such as present, past, and the odd juissive.

Hebrew also has two different noun classes. There are also many suffixes and quite a few prefixes that can be attached to verbs and nouns.

Even most native Hebrew speakers do not speak Hebrew correctly by a long shot.

Quite a few say Hebrew is as hard to learn as MSA or perhaps even harder, but this is controversial.

Hebrew gets a 5 rating for extremely difficult.

Berber Northern Atlas

Berber languages are considered to be very hard to learn. Worse, there are very few language learning resources available.

Tamazight allows doubled consonants at the beginning of a word! How can you possibly make that sound?

Tamazight gets a 6 rating, hardest of all.

In Tachelhit , words like this are possible:

tkkststt You took it off.

tfktstt You gave it.

In addition, there are words which contain only one or two consonants:

ɡ be

ks feed on

Tachelhit gets a 6 rating, hardest of all.

South Ethiopian South Transversal Amharic–Argobba Amharic

Amharic is said to be a very hard language to learn. It is quite complex, and its sentence structures seem strange even to speakers of other Semitic languages. Hebrew speakers say they have a hard time with this language.

There are a multitude of rules which almost seem ridiculous in their complexity, there are numerous conjugation patterns, objects are suffixed to the verb, the alphabet has 274 letters, and the pronunciation seems strange. However, if you already know Hebrew or Arabic, it will be a lot easier. The hardest part of all is the verbal system, as with any Semitic language. It is easier than Arabic.

Amharic gets a 4.5 rating, very hard to extremely hard.

Cushitic East Cushitic

Dahalo is legendary for having some of the wildest consonant phonology on Earth. It has all four airstream mechanisms found in languages: ejectives, implosives, clicks and normal pulmonic sounds. There are both glottal and epiglottal stops and fricatives and laminal and apical stops.

There is also a strange series of nasal clicks and are both glottalized and plain. Some of these clicks are also labialized. It has both voiced and unvoiced prenasalized stops and affricates, and some of the stops are also labialized. There is a weird palatal lateral ejective. There are three different lateral fricatives, including a labialized and palatalized one, and one lateral approximant. It contrasts alveolar and palatal lateral affricates and fricatives, the only language on Earth to do this.

The Dahalo are former elephant hunting hunter gatherers who live in southern Kenya. It is believed that at one time they spoke a language like Sandawe or Hadza, but they switched over to Cushitic at some point. The clicks are thought to be substratum from a time when Dahalo was a Sandawe-Hadza type language.

Dahalo gets a 6 rating, hardest of all.

Somali

Somali has one of the strangest proposition systems on Earth. It actually has no real prepositions at all. Instead it has preverbal particles and possessives that serve as prepositions.

Here is how possessives serve as prepositions:

habeennimada horteeda the night her front before nightfall

kulaylka dartiisa the heat his reason because of the heat

Here we have the use of a preverbal particle serving as a preposition:

kú ríd shandádda Into put the suitcase. Put it into the suitcase.

Somali combines four “prepositions” with four deictic particles to form its prepositions.

There are four basic “prepositions”:

to in from with

These combine with a four different deictic particles:

toward the speaker away from the speaker toward each other away from each other

Hence you put the “prepositions” and the deictic particles together in various ways. Both tend to go in front of and close to the verb:

Nínkíi bàan cèelka xádhig kagá sóo saaray. …well-the rope with-from towards-me I-raised. I pulled the man out of the well with a rope.

Way inoogá warrámi jireen. They us-to-about news gave. They used to give us news about it.

Prepositions are the hardest part of the Somali language for the learner.

Somali deals with verbs of motion via deixis in a similar way that Georgian does. One reference point is the speaker and the other is any other entities discussed. Verbs of motion are formed using adverbs. Entities may move:

towards each other    wada
away from each other  kala
towards the speaker   so
away from the speaker si

Hence:

kala durka separate
si gal     go in (away from the speaker)
so gal     come in (toward the speaker)

Somali lacks orthographic consistency. There are four different orthographic systems in use – the lists.

Somali pluralization makes no sense and must be memorized. There are seven different plurals, and there is no clue in the singular that tells you what form to use in the plural. See here:

Republication:

áf  (language) -> afaf

Suffixation:

hoóyo (mother) -> hoyoóyin

áabbe -> aabayaal

Note the tone shifts in all three of the plurals above.

There are four cases, absolutive, nominative, genitive and vocative. Despite the presences of absolutive and nominative cases, Somali is not an ergative language. Absolutive case is the basic case of the noun, and nominative is the case given to the noun when a verb follows in the sentence. There are different articles depending on whether the noun was mentioned previously or not (similar to the articles a and the in English). The absolutive and nominative are marked not only on the noun but also on the article that precedes it.

In terms of difficulty, Somali is much harder than Persian and probably about as difficult as Arabic.

Somali gets a 5 rating, extremely hard to learn.

Dravidian Southern Tamil-Kannada Tamil-Kodagu Tamil-Malayalam Malayalam

Malayalam, a Dravidian language of India, was has been cited as the hardest language to learn by an language foundation, but the citation is obscure and hard to verify.

Malayalam words are often even hard to look up in a Malayalam dictionary.

For instance, adiyAnkaLAkkikkoNDirikkukayumANello is a word in Malayalam. It means something like I, your servant, am sitting and mixing s.t. (which is why I cannot do what you are asking of me). The part in parentheses is an example of the type of sentence where it might be used.

The above word is composed of many different morphemes, including conjunctions and other affixes, with sandhi going on with some of them so they are eroded away from their basic forms. There doesn’t seem to be any way to look that word up or to write a Malayalam dictionary that lists all the possible forms, including forms like the word above. It would probably be way too huge of a book. However, all agglutinative languages are made up of affixes, and if you know the affixes, it is not particularly hard to parse the word apart.

Malayalam is said to be very hard to pronounce correctly.

Further, few foreigners even try to learn Malayalam, so Malayalam speakers, like the French, might not listen to you and might make fun of you if your Malayalam is not native sounding.

However, Malayalam has the advantage of having many pedagogic materials available for language learning such as audio-visual material and subtitled videos.

Malayalam is rated 5, extremely difficult.

Tamil

Tamil, a Dravidian language is hard, but probably not as difficult as Malayalam is. Tamil has an incredible 247 characters in its alphabet. Nevertheless, most of those are consonant-vowel combinations, so it is almost more of a syllabary than an alphabet. Going by what would traditionally be considered alphabetic symbols, there are probably only 72 real symbols in the alphabet. Nevertheless, Tamil probably has one of the easier Indic scripts as Tamil has fewer characters than other scripts due to its lack of aspiration. Compare to Devanagari’s over 1,000 characters.

But no Indic script is easy. A problem with Tamil is that all of the characters seem to look alike. It is even worse than Devanagari in that regard. However, the more rounded scripts such as Kannada, Sinhala, Telegu and Malayalam have that problem to a worse degree. Tamil has a few sharp corners in the characters that helps to disambiguate them.

In addition, as with other languages, words are written one way and pronounced another. However, there are claims that the difficulty of Tamil’s diglossia is overrated.

Tamil has two different registers for written and spoken speech, but the differences are not large, so this problem is exaggerated. Both Tamil and Malayalam are spoken very fast and have extremely complicated, nearly impenetrable scripts. If Westerners try to speak a Dravidian language in south India, more often than not the Dravidian speaker will simply address them in English rather than try to accommodate them.

Tamil has the odd evidential mood, similar to Bulgarian.

However, on the plus side, the language does seem to be very logical and regular, almost like German in that regard. In addition, there are a lot of language learning materials for Tamil.

Tamil is rated 4, very difficult.

Altaic Korean

Most agree that Korean is a hard language to learn.

The alphabet, Hangul at least is reasonable; in fact, it is quite elegant. But there are four different Romanizations- Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul.

Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether because you really need to know the hanja or Chinese character that are used in addition to the Hangul. After World War 2, the Koreas decided to officially get rid of their Chinese characters, but in practice this was not successful. With the use of Chinese characters in Korean, you can be a lot more precise in terms what you are trying to communicate.

Bizarrely, there are two different numeral sets used, but one is derived from Chinese so it should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems.

Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings. Japanese has a similar problem with homonyms, but at least with Japanese you have the benefit of kanji to help you tell the homonyms apart. With Korean Hangul, you get no such advantage.

Similarly, there seem to be many ways to say the same thing in Korean. The learner will feel when people are using all of these different ways of saying the same thing that they are actually saying something different each time, but that is not the case.

One problem is that the bp, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible.

The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. On the other hand, Japanese or Chinese will help you a lot with Korean.

Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand. In addition, there are hundreds of ways of conjugating any given verb based on tense, mood, age or seniority. Adjectives also decline and take hundreds of different suffixes.

Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. A single sentence can be said in three different ways depending on the relationship between the speaker and the listener. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway.

Maybe 6

Speakers of Korean can learn Japanese fairly easily. Korean seems to be a more difficult language to learn than Japanese. There are maybe twice as many particles as in Japanese, the grammar is dramatically more difficult and the verbs are quite a bit harder. The phonemic inventory in Korean is also larger and includes such oddities as double consonants.

Korean is rated by language professors as being one of the hardest languages to learn.

Korean is rated 5, extremely hard.

Japonic

Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English.

The Japanese orthography is one of the most difficult to use of any orthography.

There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood.

The Japanese writing system is probably crazier than the Chinese writing system and it often makes it onto lists of worst orthographies. The very idea of writing an agglutinative language in a combination of two syllabaries and an ideography seems wacky right off the bat. Japanese borrowed Chinese characters.

But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millennia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse.

Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character.

There are some writers, typically of literature, who deliberately choose to use kanji that even Japanese people cannot read. For instance, Ryuu Murakami  uses the odd symbols 擽る、, 轢く、and 憑ける.

The Japanese system is made up of three different systems: the katakana and hiragana (the kana) and the kanji, similar to the hanzi used in Chinese. Chinese has at least 85,000 hanzi. The number of kanji is much less than that, but kanji often have more than one meaning in contrast to hanzi.

After WW2, Japan decided to simplify its language. They both simplified and reduced the number of Chinese characters used, and they unified the written and spoken language, which previously had been different.

Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English.

A common problem is that a perfectly grammatically correct sentence uttered by a Japanese language learner, while perfectly correct, is still not acceptable by Japanese speakers because “we just don’t say it that way.” The Japanese speaker often cannot tell why the unacceptable sentence you uttered is not ok. On the other hand, this problem may be common to more languages than Japanese.

There is also a class of Japanese called “honorifics” or “keigo” that is quite hard to master. Honorifics are meant to show respect and to indicate one’s place or status in the social hierarchy. These typically effect verbs but can also affect particles and prefixes. They are usually formed by archaic or highly irregular verbs. However, there are both regular and irregular honorific forms. Furthermore, there are five different levels of honorifics. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play.

Although it is true the Japanese young people are said to not understand the intricacies of keigo, it is still expected that they know how to speak this well. Consequently, many young Japanese will opt out of certain conversations because they feel that their keigo is not very good. Books explaining how to use keigo properly have been big sellers among young people in Japan in recent years as young people try to appear classy, refined or cultured.

In addition, Japanese born overseas (especially in the US), while often learning Japanese pretty well, typically have a very poor understanding of keigo. Instead of embarrassing themselves by not using keigo or using it wrong, these Japanese speakers often prefer to speak in English to Japanese people rather than bother with keigo-less Japanese. Overcorrection in keigo is also a problem when hypercorrection leads to someone making errors in keigo due to “trying to hard.” This looks like phony or insincere politeness and is often worse than not using keigo at all.

One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things which involve the use of a complex numerical noun classifier system.

Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Nouns can act like adjectives and adverbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that all noun modifiers, even phrases, must precede the nouns they are modifying.

It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative.

In this sentence:

The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM.

Everything underlined must precede the noun plane:

Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM.

One of the main problems with Japanese grammar is that it is going to seem to so different from the sort of grammar and English speaker is likely to be used to.

Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning.

The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing.

Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words.

However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs.

Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension.

Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. A study by the US Navy concluded that the hardest language the corpsmen had to learn in the course of service was Japanese. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily.

Japanese is rated 5, extremely hard.

Classical Japanese is much harder to read than Modern Japanese. Though you can get by with much less kanji when reading the modern language, you will need a minimum knowledge of 3,000 kanji for reading Classical Japanese, and that’s using a dictionary. There are only about 500-1,000 frequently used characters, but there are countless other words that will come up in your reading especially say special words used in the Imperial Court. Many words have more than one meaning, and unless you know this, you will be lost. 東宮(とうぐう) for instance means Eastern Palace. However, it also means Crown Prince because his residence was to the east of the Emperor’s.

The movie The Seven Samurai (set in the late 1500’s) seems to use some sort of Classical Japanese, or at least Classical vocabulary and syntax with modern pronunciation. Japanese language learners say they can’t understand a word of the archaic Japanese used in this movie.

Classical Japanese gets 5.5, nearly hardest of all.

Turkic Oghuz Western Oghuz

Turkish is often considered to be hard to learn, and it’s rated one of the hardest in surveys of language teachers, however, it’s probably easier than its reputation made it out to be. It is agglutinative, so you can have one long word where in English you might have a sentence of shorter words. One word is

Çekoslovakyalilastiramadiklarimizdanmissiniz? Were you one of those people whom we could not turn into a Czechoslovakian?

Many words have more than one meaning. However, the agglutination is very regular in that each particle of meaning has its own morpheme and falls into an exact place in the word. See here:

göz            eye
göz-lük        glasses
göz-lük-çü     optician
göz-lük-çü-lük the business of an optician

Nevertheless, agglutination means that you can always create new words or add new parts to words, and for this reason even a lot of Turkish adults have problems with their language.

There is no verb to be, which is hard for many foreigners. Instead, the concept is wrapped onto the subject of the sentence as a -dim or -im suffix. Turkish is an imagery-heavy language, and if you try to translate straight from a dictionary, it often won’t make sense.

However, the suffixation in Turkish, along with the vowel harmony, are both precise. Nevertheless, many words have irregular vowel harmony. The rules for making plurals are very regular, with no exceptions (the only exceptions are in foreign loans). In Turkish, incredible as it sounds, you can make a plural out of anything, even a word like what, who or blood. However, there is some irregularity in the strengthening of adjectives, and the forms are not predictable and must be memorized.

Turkish is a language of precision in other ways. For instance, there are eight different forms of subjunctive mood that describe various degrees of uncertainty that one has about what one is talking about. This relates to the evidentiality discussed under Tuyuca above, and Turkish has an evidential form similar to Tamil and Bulgarian. On Turkish news, verbs are generally marked with miş, which means that the announcer believes it to be true though he has not seen it firsthand. The particle miş is interesting because this evidential form is coded into the tense system, which is an unusual use of evidentiality.

The Roman alphabet and almost mathematically precise grammar really help out. Turkish lacks gender and has but a single irregular verb – olmak. Nevertheless, there are many verbal forms. However, this is controversial and it depends on how you define grammatical irregularity. There is some strangeness in some of the verb paradigms, but it is argued that these oddities are rule-based. The aorist tense is said to have irregularity.

There is some irregular morphophonology, but not much. The oblique relative clauses have complex morphosyntax. Turkish has two completely different ways of making relative clauses, one of which may have been borrowed from Persian. There are many gerunds for verbs, and these have many different uses. At the end of the day, Turkish grammar is not as regular or as simple as it is made out to be.

Words are pronounced nearly the same as they are written. A suggestion that Turkish may be easier to learn that many think is the research that shows that Turkish children learn attain basic grammatical mastery of Turkish at age 2-3, as compared to 4-5 for German and 12 for Arabic. The research was conducted in Germany in 2005.

In addition, Turkish has a phonetic orthography.

However, Turkish is hard for an English speaker to learn for a variety of reasons. It is agglutinative like Japanese, and all agglutinative languages are difficult for English speakers to learn. As in Japanese, you start your Turkish sentence the way you would end your English sentence. As in the Japanese example above, the subordinate clause must precede the subject, whereas in English, the subordinate clause must follow the subject. The italicized phrase below is a subordinate clause.

In English, we say, “I hope that he will be on time.”

In Turkish, the sentence would read, “That he will be on time I hope.”

Turkish vowels are unusual to speakers of IE languages, and Turkish learners say the vowels are hard to make or even tell apart from one another.

Turkish is rated 3.5, harder than average to learn.

Uralic

Finno-Ugric

One test of the difficulty of any language is how much of the grammar you must know in order to express yourself on a basic level. On this basis, Finno-Ugric languages are complicated because you need to know quite a bit more grammar to communicate on a basic level in them than in say, German.

Finnic Northern

Finnish is very hard to learn, and even long-time learners often still have problems with it. Famous polyglot Barry Farber said it was one of the hardest languages he learned. You have to know exactly which grammatical forms to use where in a sentence. In addition, Finnish has 15 cases in the singular and 16 in the plural. This is hard to learn for speakers coming from a language with little or no case.

For instance, talothe house

Cases:

talon        house's
taloasome    of the house
taloksiinto  as the house
talossain    the house
talostafrom  inside the house
talooninto   the house
talollaon    to the house
taloltafrom  beside the house
talolleto    the house
taloistafrom the houses
taloissa     in the houses

It gets much worse than that. This web page shows that the noun kauppashop can have 2,253 forms.

A simple adjective + noun type of noun phrase of two words can be conjugated in up to 100 different ways.

Adjectives and nouns belong to 20 different classes. The rules governing their case declension depend on what class the substantive is in.

As with Hungarian, words can be very long. For instance:

lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas non-commissioned officer cadet learning to be an assistant mechanic for airplane jet engines

Like Turkish, Finnish agglutination is very regular. Each bit of information has its own morpheme and has an exact place in the word.

Like Turkish, Finnish has vowel harmony, but the vowel harmony is very regular like that of Turkish. Unlike Turkish or Hungarian, consonant gradation forms a major part of Finnish morphology. In order to form a sentence in Finnish, you will need to learn about verb types, cases and consonant gradation, and it can take a while to get your mind around those things.

Finnish, oddly enough, always puts the stress on the first syllable. Finnish vowels will be hard to pronounce for most foreigners.

However, Finnish has the advantage of being pronounced precisely as it is written. This is also part of the problem though, because if you don’t say it just right, the meaning changes. So, similarly with Polish, when you mangle their language, you will only achieve incomprehension. Whereas with say English, if a foreigner mangles the language, you can often winnow some sense out of it.

However, despite that fact that written Finnish can be easily pronounced, when learning Finnish, as in Korean, it is as if you must learn two different languages – the written language and the spoken language. A better way to put it is that there is “one language for writing and another for speaking.” You use different forms whether conversing or putting something on paper.

Some pronunciation is difficult. The the contrast between short and long vowels and consonants is particularly troublesome. Check out these minimal pairs:

sydämellä sydämmellä

jollekin jollekkin

A problem for the English speaker coming to Finnish would be the vocabulary, which is alien to the speaker of an IE language. Finnish language learners often find themselves looking up over half the words they encounter. Obviously, this slows down reading quite a bit!

In the grammar, the partitive case and potential tense can be difficult. Here is an example of how Finnish verb tenses combine with various cases to form words:

I A-Infinitive
Base form mennä

II E-Infinitive
Active inessive    mennessä
Active instructive mennen
Passive inessive   mentäessä

III MA-Infinitive
Inessive            menemässä
Elative             menemästä
Illative            menemään
Adessive            menemällä
Abessive            menemättä
Active instructive  menemän
Passive instructive mentämän

Verbs in Finnish

Finnish verbs are very regular. The irregular verbs can almost be counted on one hand:

juosta käydä olla nähdä tehdä

and a few others. In fact, on the plus side, Finnish in general is very regular.

One easy aspect of Finnish is the way you can build many forms from a base root:

kirj-

kirjabook kirjeletter kirjoittaato write kirjailijawriter

As in many Asian languages, there are no masculine or feminine pronouns, and there is no grammatical gender. The numeral system is quite simple compared to other languages. Finnish has a complete lack of consonant clusters. In addition, the phonology is fairly simple.

Finnish is rated 5, extremely hard to learn.

Southern

Estonian has similar difficulties as Finnish, since they are closely related. However, Estonian is more irregular than Finnish. In particular, the very regular agglutination system described in Finnish seems to have gone awry in Estonian. Estonian has 14 cases, including strange cases such as the abessive, adessive, elative and inessive. On the other hand, all of these cases can simply be analyzed as the genitive case plus a single unvarying suffix for each case. In addition, there is no gender, so the only things you have to worry about when forming cases are singular and plural.

Estonian has a strange mood form called the quotative, often translated as “reported speech.”

tema onhe/she/it is

tema olevatit’s rumored that he/she/it is or he/she/it is said to be

This mood is often used in newspaper reporting and is also used for gossip.

Estonian has an astounding 25 diphthongs. It also has three different varieties of vowel length, which is strange in the world’s languages. There are short, vowels and extra-long vowels and consonants.

linalinen – short n linnathe town’s – long n, written as nn `linnainto the town – extra-long n, not written out!

There are differences in the pronunciation of the three forms above, but in rapid speech, they are hard to hear, though native speakers can make them out. Difficulties are further compounded in that extra-long sonorants (m, n, ng, l, and r) and vowels and are not written out. All in all, phonemic length can be a problem in Estonian, and foreigners never seem to get it completely down.

Estonian pronunciation is not very difficult, though the õ sound can cause problems. However, Estonian has completely lost the vowel harmony system it inherited from Finnish, resulting in words that seem very hard to pronounce.

At least in written form, Estonian is not as complex as Finnish. Estonian can be seen as an abbreviated and modernized form of Finnish. The grammar is also like a simplified version of Finnish grammar and may be much easier to learn.

Estonian is rated 4.5, very to extremely difficult.

Sami Eastern

Skolt Sami‘s Latinization is often listed as one of the worst Latinizations around. The rest of the language is quite similar to, and as difficult as, Finnish.

Skolt Sami gets a 5 rating, extremely hard to learn.

Ugric Hungarian

It’s widely agreed that Hungarian is one of the hardest languages on Earth to learn. Even language professors agree. The British Diplomatic Corps did a study of the languages that its diplomats commonly had to learn and concluded that Hungarian was the hardest. Hungarian grammar is maddeningly complex, and Hungarian is often listed on craziest grammar lists. For one thing, there are many different forms for a single word via word modification. This enables the speaker to make his intended meaning very precise. Looking at nouns, there are about 257 different forms per noun.

Hungarian is said to have from 24-35 different cases (there are charts available showing 31 cases), but the actual number may only be 18. Nearly everything in Hungarian is inflected, similar to Lithuanian or Czech. Similar to Georgian and Basque, Hungarian has the polypersonal agreement, albeit to a lesser degree than those two languages. There are many irregularities in inflections, and even Hungarians have to learn how to spell all of these in school and have a hard time learning this.

The case distinctions alone can create many different words out of one base form. For the word house, we end up with 31 different words using case forms:

házbainto the house házbanin the house házból from [within] the house házraonto the house házonon the house házróloff [from] the house házhozto the house házíguntil/up to the house háználat the house háztól [away] from the house házzá – Translative case, where the house is the end product of a transformation, such as They turned the cave into a house. házkéntas the house, which could be used if you acted in your capacity as a house or disguised yourself as one. He dressed up as a house for Halloween. házértfor the house, specifically things done on its behalf or done to get the house. They spent a lot of time fixing things up (for the house). házul – Essive-modal case. Something like “house-ly” or in the way/manner of a house. The tent served as a house (in a house-ly fashion).

And we do have some basic cases:

ház – Nominative. The house is down the street. házat – Accusative. The ball hit the house. háznak – Dative. The man gave the house to Mary. házzal – Similar to instrumental, but more similar to English with. Refers to both instruments and companions.

The genitive takes 12 different declensions, depending on person and number:

házammy house házaimmy houses házadyour house házaidyour houses házahis/her/its house házai his/her/its houses házunk our house házainkour houses házatok your house házaitok your house házuk their house házaik their houses egyházchurch, as in the Catholic Church. (Literally one-house)

In addition, the genitive suffixes to the possession, which is not how the genitive works in IE.

emberman/person házhouse a(z)the

az ember házathe man’s house (Lit. the man house-his) a házammy house (Lit. the house-my) a házadyour house (Lit. the house-your)

There are also very long words such as this:

megszentségteleníthetetlenségeskedéseitekért… for your (you all possessive) repeated pretensions at being impossible to desecrate

Being an agglutinative language, that word is made up of many small parts of words, or morphemes. That word means something like

The preposition is stuck onto the word in this language, and this will seem strange to speakers of languages with free prepositions.

Hungarian is full of synonyms, similar to English.

For instance, there are 78 different words that mean to move: halad, jár, megy, dülöngél, lépdel, botorkál, kódorog, sétál , andalog, rohan, csörtet, üget, lohol, fut, átvág, vágtat, tipeg, libeg, biceg, poroszkál, vágtázik, somfordál , bóklászik, szedi a lábát, kitér, elszökken, betér , botladozik, őgyeleg, slattyog, bandukol, lófrál, szalad, vánszorog, kószál, kullog, baktat, koslat, kaptat, császkál, totyog, suhan, robog, rohan, kocog, cselleng, csatangol, beslisszol, elinal, elillan, bitangol, lopakodik, sompolyog, lapul, elkotródik, settenkedik, sündörög, eltérül, elódalog, kóborol, lézeng, ődöng, csavarog, lődörög, elvándorol , tekereg, kóvályog, ténfereg, özönlik, tódul, vonul, hömpölyög, ömlik, surran, oson, lépeget, mozog and mozgolódik .

Only about five of those terms are archaic and seldom used, the rest are in current use. However, to be a fair, a Hungarian native speaker might only recognize half of those words.

In addition, while most languages have names for countries that are pretty easy to figure out, in Hungarian even languages of nations are hard because they have changed the names so much. Italy becomes Olazorszag, Germany becomes Nemetzorsag, etc.

As in Russian and Serbo-Croatian, word order is relatively free in Hungarian. It is not completely free as some say but rather is it governed by a set of rules. The problem is that as you reorder the word order in a sentence, you say the same thing but the meaning changes slightly in terms of nuance. Further, there are quite a few dialects in Hungarian. Native speakers can pretty much understand them, but foreigners often have a lot of problems. Accent is very difficult in Hungarian due to the bewildering number of rules used to determine accent. In addition, there are exceptions to all of these rules. Nevertheless, Hungarian is probably more regular than Polish.

Hungarian spelling is also very strange for non-Hungarians, but at least the orthography is phonetic. Nevertheless, the orthography often makes it onto worst orthographies lists.

Hungarian phonetics is also strange. One of the problems with Hungarian phonetics is vowel harmony. Since you stick morphemes together to make a word, the vowels that you have used in the first part of the word will influence the vowels that you will use to make up the morphemes that occur later in the word. The vowel harmony gives Hungarian a “singing effect” when it is spoken. The ty, ny, sz, zs, dzs, dz, ly, cs and gy sounds are hard for many foreigners to make. The á, é, ó, ö, ő, ú, ü, ű, and í vowel sounds are not found in English.

Verbs are marked for object (indefinite, definite and person/number), subject (person and number) tense (past, present and future), mood (indicative, conditional and imperative), and aspect (frequency, potentiality, factitiveness, and reflexiveness.

Elmentegettethetnélek. I could make others save you occasionally (on a disk).

Verbs change depending on whether the object is definite or indefinite.

Olvasok könyvet. I read a book. (indefinite object)

Olvasom a könvyet. I read the book. (definite object)

As noted in the introduction to the Finno-Ugric section, you need to know quite a bit of Hungarian grammar to be able to express yourself on a basic level. For instance, in order to say:

I like your sister.

you will need to understand the following Hungarian forms:

  1. verb conjugation and definite or indefinite forms
  2. possessive suffixes
  3. case
  4. how to combine possessive suffixes with case
  5. word order
  6. explicit pronouns
  7. articles

It’s hard to say, but Hungarian is probably harder to learn than even the hardest Slavic languages like Czech, Serbo-Croatian and Polish. At any rate, it is generally agreed that Hungarian grammar is more complicated than Slavic grammar, which is pretty impressive as Slavic grammar is quite a beast.

Hungarian is rated 5, extremely hard to learn.

Sino-Tibetan Sinitic Chinese Mandarin

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you often tend to hit a wall, often because the syntactic structure is so strangely different from English (isolating).

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English. No word is capable of declension, and there is no tense, case, and number, nor are there articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense. Mandarin has 12 different adverbs for which there is no good English translation.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There is such things as aspect, serial verbs, a complex classifier system, syntax marked by something called topic-prominence, a strange form called the detrimental passive, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff. Verb complements can be baffling, especially potential and directional complements. The 把, 是 and 的 constructions can be very hard to understand.

The topic-prominence is interesting in that only a few major languages have topic-comment syntax, and most of those are Oriental languages with a lot of Chinese borrowing. Topicalization is not marked morphologically.

There are sentences where the entire meaning changes with the addition of a single character. Chinese sentences are SVO (Subject -Verb – Object) at their base, but that is a bit of an illusion. A sentence that causes you to discuss time duration makes you repeat the verb after the direct object – SVOVT (T= time phrase). In the case of topicalization, sentences can have the structure of OSV (Object – Subject – Verb). Relative clauses and all subordinate clauses come before the noun they modify. In other words:

English: The man who always wore red walked into the room. Chinese: Who always wore red the man walked into the room.

The relative clause in the sentences above is marked in bold.

In Chinese, the prepositional phrase comes between the subject and the verb:

English: The man hit the ball into the yard. Chinese: The man into the yard hit the ball.

The prepositional phrase is bolded in the sentences above.

In Chinese, adjectives are actually stative verbs as in Nahuatl and Lakota.

那个热菜很好吃。 Nàgè rède cài hěnhǎochī. The it is hot food is good to eat. The hot food is delicious.

The symbol turns food hot into food it is hot, an attributive verb. means something like to be.

There are dozens of words called particles which shade the meaning of a sentence ever so slightly.

Chinese phonology is not as easy as some say. There are way too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same. There is a distinction between aspirated and nonaspirated consonants. There is also the presence of odd retroflex consonants.

Chinese orthography is probably the most hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more, but you only need to know about 3-5,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than

In addition, the characters have not been changed in 3,000 years, and the alphabet is at least somewhat phonetic, so we run into a serious problem of lack of a spelling reform.

The Communists tried to simplify the system (simplified Mandarin) but instead of making the connections between the phonetic aspects of character more sensible by decreasing their number and increasing their regularity (they did do this somewhat but not enough), they simply decreased the number of strokes needed for each symbol typically without dealing with the phonetic aspect of all. The simplification did not work well, so now you have a mixture of two different types of written Chinese – simplified and traditional.

In addition to all of this, Chinese borrowed a lot from the Japanese symbolic alphabet a full 1,000 years after it had already been developed and had not undergone a spelling reform, adding insult to injury.

Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese prose. It’s as if it’s written in a different language – actually, it is technically a different language similar to Middle English or Old English. However, few Middle English or Old English texts are read anymore, and Classical Chinese is still widely read.

However, the orthography is at least consistent. 9

Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense.

It’s a real problem when you encounter a symbol you don’t know because there is no way to sound out the word. You are really and truly lost and screwed. There is a clue at the right side of the symbol, but it is not always accurate.You need to learn quite a bit of vocabulary just to speak simple sentences.

Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary.

Some Chinese Muslims write Chinese using an Arabic script. This is often considered to be one of the worst orthographies of all.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another. However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones, and meaning is often discerned by context, stress, rhythm and intonation. Chinese, like French and English, is heavily idiomatic.

It’s little known, but Chinese also uses different forms (classifiers) to count different things, like Japanese.

There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms.

In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling.

mei meiyounger sister jie jieolder sister ge geolder brother di diyounger brother

Mandarin scored very high on a weirdest languages study.

On the positive side, Chinese grammar is fairly regular and word derivation, compound words are sensible and the meaning can be determined by looking at the word. In other languages, compound words are not necessarily so obvious.

Many agree that Chinese is the hardest to learn of all of the major languages. A recent survey of language professors rated Chinese as the hardest language on Earth to learn.

Mandarin gets a 5.5 rating for nearly hardest of all.

However, Cantonese is even harder to learn than Mandarin. Cantonese has eight tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.

In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Modal particles are difficult in Cantonese. Clusters of up to the 3 sentence final particles are very common. 我食咗飯 and 我食咗飯架啦喎 are both grammatical for I have had a meal, but the particles add the meaning of I have already had a meal, answering a question or even to imply I have had a meal, so I don’t need to eat anymore.

Cantonese gets a 5.5 rating, nearly hardest of all.

Min Nan is also said to be harder to learn than Mandarin, as it has a more complex tone system, with five tones on three different levels. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor, and many fewer children are being raised speaking it than before.

Min Nan gets a 5.5 rating, nearly hardest of all.

A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in 91 linguistic families in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of southern Shanghai (Dônđän Wu) was the most phonologically complex language of all, with 20 separate vowels (Wang 2012). The nearest competitor was Norwegian with 16 vowels.

Dônđän Wu gets a 5.5 rating, nearly hardest of all.

Classical Chinese is still read by many Chinese people and Chinese language learners. Unless you have a very good grasp on modern Chinese, classical Chinese will be completely wasted on you. Classical Chinese is much harder to read than reading modern Chinese.

Classical Chinese covers an era extending over 3,000 years, and to attain a reading fluency in this language, you need to be familiar with all of the characters used during this period along with all of the literature of the period so you can understand all the allusions. Even with a knowledge of Classical Chinese, you need to read it in context. If you are good at Classical Chinese and someone throws you a random section of it, it will take you a good amount of time to figure it out unless you know context.

The language is much more to the point than Modern Chinese, but this is not as good as it sounds. This simplicity leaves a room for ambiguity, and context plays an important role. A joke about some obscure historical or literary anecdote will be lost you unless you know what it refers to. For reading modern Chinese, you will need at least 5,000 characters, but even then, you will still need a dictionary. With Classical Chinese, there are no lower limits on the number of characters you need to know. The sky is the limit.

Classical Chinese gets a 6 rating, hardest of all.

Tibeto-Burman Qiangic Northern Qiang

In Quiang, a language of Sichuan Province in China, not only are there rhotic vowels, which are present in only

ʀuɑ +e˞ > ʀuɑ˞kʰ me + w ˞> mw

Rhotic vowels are found in US English – Unstressed ɚ: standard, dinner, Lincolnshire, editor, measure, martyr.

Qiang also has a very bad romanization, so bad that the Qiang will not even use it. Voiced consonants are written by adding a vowel to the symbol for the voiceless consonant. It has long and short vowels, but these are not represented in the system.

Qiang gets a 5 rating, extremely hard to learn.

Western Tibeto-Burman Bodish Central Bodish Central

Tibetan probably has one of the least rational orthographies of any language. The orthography has not changed in ~1,000 years while the language has gone through all sorts of changes. A langauge learner in Tibet can get by using phonetic spelling. The problem comes when you try to spell using the Classical Alphabet. For instance:

Srong rtsan Sgam po (written) soŋtsɛn ɡampo (spoken)

bsgrubs (written)

d`up (spoken)

While the orthography is etymological and completely outdated, it is quite predictable.

Tibetan gets a 5 rating, extremely hard to learn.

Southern

Dzongka, the official language of Bhutan, has some pretty wild phonology, in addition to having the Tibetan writing system, this time using Bhutanese forms of the Tibetan script.

It contrasts all of the following: s, , ʰs, ʰsʰ, ts, ʰts, tsʰ, z, ʱz, dz, ʱdz, ⁿsʰ, ᵐtsʰ, ⁿtsʰ, ⁿdz, ᵖts, ᵖtsʰ, ᵖtsʷʰ, and ᶲs, and in addition it has four tones, but there is no single word that is distinguished by tone only. On top of that, there are 22 different vowels.

Dzongka gets a 5 rating, extremely hard to learn.

Austroasiatic Mon-Khmer Vietic

Vietnamese is also hard to learn because to an outsider, the tones seem hard to tell apart. Therefore, foreigners often make themselves difficult to understand by not getting the tone precisely correct. It also has “creaky-voiced” tones, which are very hard for foreigners to get a grasp on.

Vietnamese grammar is fairly simple, and reading Vietnamese is pretty easy once you figure out the tone marks. Words are short as in Chinese. However, the simple grammar is relative, as you can have 25 or more forms just for I, the 1st person singular pronoun. In addition, the Latin orthography is said to be quite bad. It was invented by missionaries a few centuries ago, and it has never made much sense.

Vietnamese gets 5 rating, extremely hard to learn.

Mon-Khmer Khmer

Khmer has a reputation for being hard to learn. I understand that it has one of the most complex honorifics systems of any language on Earth. Over a dozen different words mean to carry depending on what one is carrying. There are several different words for slave depending on who owned the slave and what the slave did. There are 28-30 different vowels, including sets of long and short vowels and long and short diphthongs. The vowel system is so complicated that there isn’t even agreement on exactly what it looks like. Khmer learners, especially speakers of IE languages, often have a hard time producing or even distinguishing these vowels.

Speaking it is not so bad, but reading and writing it is pretty difficult. For instance, you can put up to five different symbols together in one complex symbol. The orthographic script is even worse than the Thai one. There are actually rules to this mess, but no one seems to know who they are.

Khmer gets a  4.5 rating, very to extremely hard.

Bahnaric North Bahnaric West Sedang-Todrah Sedang

Sedang, a language of Vietnam, has the highest number of vowel sounds of any language on Earth, at 55 distinct vowel sounds.

Sedang gets a 5 rating, extremely hard to learn.

Hmong-Mien Hmongic Chuanqiandian

Hmong is widely spoken in this part of California, but it’s not easy to learn. There are eight tones, and they are not easy to figure out. It’s not obviously related to any other major language but the obscure Mien.

It has some very strange consonants called voiceless nasals. We have them in English as allophones – the m in small is voiceless, but in Hmong, they put them at the front of words – the m in the word Hmong is voiceless. These can be very hard to pronounce.

The romanization is widely criticized for being a lousy one, but the Hmong use it anyway.

Hmong gets a 5 rating, extremely hard to learn.

Austro-Tai Austronesian Tsouic

Tsou is a Taiwanese aborigine language spoken by about 2,000 people in Taiwan. It has the odd feature whereby the underlying glides y and w turn into or surface as non-syllabic mid vowels e̯ and o̯ in certain contexts:

jo~joskɨ -> e̯oˈe̯oskɨ  -= fishes

Tsou is also ergative like most Formosan languages. Tsou is the only language in the world that has no prepositions or anything that looks like a preposition. Instead it uses nouns and verbs in the place of prepositions. Tsou allows more potential consonant clusters than most other languages. About 1/2 of all possible CC clusters are allowed.

Tsou has an inclusive/exclusive distinction in the 1st person plural and a very strange visible and non-visible distinction in the 3rd person singular and plural. Both adjectives and adverbs can turn into verbs and are marked for voice in the same way that verbs are. Verbs are extensively marked for voice. Nouns are marked for a variety of odd cases, often referring to perception, (visible/invisible) person, and place deixis.

‘e –               visible and near speaker si/ta –           visible and near hearer ta –               visible but away from speaker ‘o/to –           invisible and far away, or newly introduced to discourse na/no ~ ne – non-identifiable and non-referential (often when scanning a class of elements)

Tsou gets a 5 rating, extremely hard to learn.

Malayo-Polynesian Malayo-Chamic Malayic Malay

Bahasa Indonesia is an easy language to learn. For one thing, the grammar is dead simple. There are only a handful of prefixes, only two of which might be seen as inflectional. There are also several suffixes. Verbs are not marked for tense at all. And the sound system of these languages, in common with Austronesian in general, is one of the simplest on Earth, with only two dozen phonemes. Bahasa Indonesia has few homonyms, homophones, homographs, or heteronyms. Words in general have only one meaning.

Though the orthography is not completely phonetic, it only has a small number of nonphonetic exceptions. The orthography is one of the easiest on Earth to use.

The system for converting words into either nouns or verbs is regular. To make a plural, you simply repeat a word, so instead of saying pencils, you say pencil pencil.

Bahasa Indonesia gets a 1.5 rating, extremely easy to learn.

Malay is only easy if you learn the standard spoken form or one of the creoles. Learning the literary language is quite a bit more difficult. However, the Jawi script, which is Malay written in Arabic script, is often considered to be perfectly awful.

Malay get a 2 rating for moderately easy.

Philippine Greater Central Philippine Central Philippine Tagalog

However, Tagalog is much harder than Malay or Indonesian. Compared to many European languages, Tagalog syntax, morphology and semantics are often quite different. Also, Tagalog is typically spoken very fast. Unlike Malay, verbs conjugate quite a bit in Tagalog. The main idea of Tagalog grammar is something called focus. Once you figure that out, the language gets pretty easy, but until you understand that concept, you are going to have a hard time.

Everything is affixed in Tagalog.

However, articles and creation of adjectives from nouns is very easy.

Compare:

gandabeauty (noun) magandabeautiful (adjective)

Tagalog gets a 4 rating, very difficult.

Central-Eastern Malayo-Polynesian Eastern Malayo-Polynesian Oceanic Central-Eastern Oceanic Remote Oceanic Central Pacific East Fijian-Polynesian Polynesian Nuclear East Central Tahitic

Maori and other Polynesian languages have a reputation for being quite easy to learn. The main problem for English speakers is that the sentence structure is backwards compared to English. In addition, macrons can cause problems.

One problem with Maori is dialects. The dialects are so diverse that this means that there are multiple words for the same thing. Swiss German has a similar issue, with up to 50 words for each common household item (nearly every major dialect has its own word for common objects):

ngongi, noni, koki, waiwater whiri, rarangi, hiri –  to plait, to twist, to weave pai, maitaigood tu, , tutehu, mātikato stand mau, mouto hold pau, pouto be exhausted ika, tohorāwhale ika, ngohifish kāwei, kāwailine ori, kori, keukeu, koukou, neke, nukuto move haere, hara, here, horo, whanoto go, to come hara, hapa, to be wrong kōrerorero, wānanga, rūnangato discuss tohunga, tahungapriest matikuku, maikukufinger nail kanohi, konohi, mata, whatu, kamo, karueye, face

Entire Maori sentences can be written with vowels only.

E uu aau? Are yours firm?

I uaa ai. It rained as usual.

I ui au ‘i auau aau?’ E uaua! It will be difficult/hard/heavy!

On the plus side, the pronunciation is simple, and there is no gender. The language is as regular as Japanese. No Polynesian language has more than 16 sounds, and they all lack tones. They all have five vowels, which can be either long or short. A consonant must be followed by a vowel, so there are no consonant clusters. All consonants are easy to pronounce.

Maori gets a 3 rating, average difficulty.

Marquesic

Hawaiian is a pretty easy language to learn. It is easy to pronounce, has a simple alphabet, lacks complex morphology and has a fairly simple syntax.

Hawaiian gets a 2 rating, very easy to learn.

North and Central Vanuatu East Santo North

Sakao is a very strange langauge spoken by 4,000 people in Vanuatu.  It is very strange. It is a polysynthetic Austronesian language, which is very weird. It allows extreme consonant clusters. Sakao has an incredible seven degrees of deixis. The language has an amazing four persons: singular, dual, paucal and plural. The neighboring language Tomoko has singular, dual, trial and plural. The trial form is very odd. Sakao’s paucal derived from Tomato’s trial:

jørðœl they, from three to ten

jørðœl løn the five of them (Literally, they three, five)

All nouns are always in the singular except for kinship forms and demonstratives, which only display the plural:

ðjœɣmy mother/aunt -> rðjœɣmy aunts

walðyɣmy child -> raalðyɣmy children

It has a number of nouns that are said to be “inalienably possessed”, that is, whenever they occur, they must be possessed by some possessor. These often take highly irregular inflections:

Sakao 	  English
œsɨŋœ-ɣ   my mouth
œsɨŋœ-m   thy mouth
ɔsɨŋɔ-n   his/her/its mouth
œsœŋ-...  ...'s mouth	

uly-ɣ 	  my hair
uly-m 	  thy hair
ulœ-n 	  his/her/its hair
nøl-...   ...'s hair

Here, mouth is either œsɨŋœ-, ɔsɨŋɔ- or œsœŋ-, and hair is either uly-, ulœ- or nøl-

Sakao, strangely enough, may not even have syllables in the way that we normally think of them. If it does have syllables at all, they would appear to be at least a vowel optionally  surrounded by any number of consonants.

i (V) thou Mhɛrtpr. (CCVCCCC) Having sung and stopped singing thou kept silent.

Sakao has a suffix -in that makes an intransitive verb transitive and makes a transitive verb ditransitive. Ditransitive verbs can take two arguments – a direct object and an instrumental.

Mɨjilɨn amas ara./Mɨjilɨn ara amas. He kills the pig with the club/He kills with the club the pig.

Sakao polysynthesis allows compound verbs, each one having its own instrument or object:

Mɔssɔnɛshɔβrɨn aða ɛðɛ. He-shooting-fish-kept-on-walking with-a-bow the-sea. He walked along the sea shooting the fish with a bow.

Sakao gets a 5 rating, extremely hard to learn.

Central-Eastern Oceanic Southeast Solomonic Malaita–San Cristobal Malaita Northern Malaita

Kwaio is an Austronesian language spoken in the Solomon Islands. It has four different forms of number to mark pronouns – not only the usual singular and plural, but also the rarer dual and the very rare paucal. In addition, there is an inclusive/exclusive contrast in the non-singular forms.

For instance:

1 dual inclusive (you and I) 1 dual exclusive (I and someone else, not you)

1 paucal inclusive (you, I and a few others) 1 paucal exclusive (I and a few others)

1 plural inclusive (I, you and many others) 1 plural exclusive (I and many others)

Pretty wild!

Kwaio gets a 5 rating, extremely hard to learn.

Greater Barito East Barito Malagasy

Malagasy, the official language of Madagascar, has a reputation for being even easier to learn than Indonesian or Malay.

Malagasy gets a 1 rating, easiest of all to learn.

Tai-Kadai Kam-Tai Tai Southwestern

Thai is a pretty hard language to learn. There are 75 symbols in the strange script, there are no spaces between words in the script, and vowels can come before, after, above or below consonants in any given syllable. There seem to be many different glyphs for every consonant, but the different glyphs for the same consonant will sometimes change the sound of the neighboring vowel. The orthography is as insensible as that of English since centuries have gone by with no spelling reforms, in fact, Thai has not changed its system in 1000 years. The wild card of having tone thrown in adds to the insanity.

Consonant pronunciations vary depending on the location of the syllable in the word – for instance, s can change to t. There are many vowels which are spoken but not written. There are many consonants that are pronounced the same – for instance, there are six different t‘s, not counting the s‘s that turn into t‘s. The Thai script is definitely one of the most difficult phonetic scripts. Nevertheless, the Thai script is easier to learn than the Japanese or Chinese character sets. In spite of all of that, the syntax is simple, like Chinese.

There are five tones, including a neutral tone. Tones are determined by a variety of complex things, including a combination of tone marks, the class of consonants, if the syllable ends in a sonorant or a stop and what the tone of the preceding syllable was. Tone marking in the orthography is quite complex.

The vowels are different than in many languages, and there are some unusual diphthongs: eua, euai, aui and uu. There is a contrast between aspirated and unaspirated consonants.

There is a system of noun classifiers for counting various things, similar to Japanese. In addition, common to many Asian languages, there is a complicated honorifics system.

On the plus side, Thai is a regular language, with few exceptions to the rules. However, the rules are quite complex. The syntax is about as complex as that of Chinese, and the grammar is dead simple.

Thai gets a 5 rating, hardest of all to learn.

Lao is very similar to Thai, in fact it is identical to a Thai language spoken by 16 million people in northeast Thailand called Northeastern Thai. The Lao script is similar to Thai, but it has fewer letters so there is somewhat less confusion.

Lao gets a 4.5 rating, very to extremely hard to learn.

Kam-Sui

The Kam languages of the Dong people in southwest China were rated by the Fudan University study referenced above under Wu as the 2nd most phonologically complex on Earth (Wang 2012). There are 32 stem initial consonants, including oddities like , tɕʰ, , pʲʰ, ɕ, , kʷʰ, ŋʷ, tʃʰ, tsʰ. Note the many contrasts between aspirated and unaspirated voiceless consonants, including bilabial palatalized stops, labialized velar stops, and alveolar affricates. There are an incredible 64 different syllable finals, and 14 others that occur only in Chinese loans.

There are an astounding 15 different tones, nine in open syllables and six in checked syllables (entering tones). Main tones are high, high rising, high falling, low, low rising, low falling, mid, dipping and peaking. When they speak, it sounds as if they are singing.

Kam gets a 5 rating, extremely hard to learn.

Kra Paha

According to the Fudan University study quoted above, Buyang in the 3rd most phonologically complex language in the world. Buyang is a cluster of 4 related languages spoken by 1,900 people in Yunnan Province, China. Buyang has a completely wild consonant inventory.

It has a full set of both voiced and voiceless plain and aspirated stops, including voiceless uvulars. The contrast between aspirated and plain voiced stops is peculiar. The stop series also has distinctions between palatalized and rounded stops throughout the series. It has a labialized voiceless palatal fricative and a voiceless dental aspirated lateral, unusual sounds. It has four different voiceless aspirated nasals. It has voiceless y and w, more odd sounds. It also has plain and labialized palatal glides.

That is one heck of a wild phonology.

Buyang gets a 5 rating, extremely hard to learn.

Niger-Kordofanian Niger-Congo Atlantic–Congo Kwa Nyo Ga-Dangme

The African Bantu language Ga has a bad reputation for being a tough nut to crack. It is spoken in Ghana by about 600,000 people. It has two tones and engages in a strange behavior called tone terracing that is common to many West African languages. There is a phonemic distinction between three different types of vowel length. All vowels have 3 different lengths – short, long and extra long. It also has many sounds that are not in any Western languages.

Ga gets a 5 rating, extremely hard to learn.

Potou-Tano Tano Central Bia Northern

Anyi is a language spoken by 610,000 people in Côte d’Ivoire.  It is relatively straightforward as far as African languages go. Probably the hardest part about the language is that it is tonal, and it does have two tones. The phonology does have the unusual +-ATR contrast which will seem very odd. ATR stands for advanced tongue root, so the language has a contrast between vowels with an advanced tongue root and without one. However, the grammar is pretty regular. There are few confusing phonological processes.

Anyi has a simple tense system, with only present, past and future. There is no aspect, mood or voice marking, and it lacks the noun class systems so common in many African languages. It has a plural marker, but it is often optional.

The syntax does have serial verbs, which will seem odd to Westerners. It distinguishes between relative clauses marked with and subordinate clauses marked with .

Anyi gets a 4 rating, very hard to learn.

Volta-Congo Benue-Congo Bantoid Southern Narrow Bantu Central M Nyika-Safwa

Ndali is a Bantu language with 150,000 speakers spoken in Malawi and Tanzania. It has many strange tense forms. For instance, in the past tense:

Past tense A: He went just now. Past tense B: He went sometime earlier today. Past tense C: He went yesterday. Past tense D: He went sometime before yesterday.

Future tense is marked similarly:

Future tense A: He’s going to go right away. Future tense B: He’s going to go sometime later today. Future tense C: He’s going to go tomorrow. Future tense D: He’s going to go sometime after tomorrow.

Ndali gets a 5 rating, extremely hard to learn.

S Nguni

Xhosa, a language of South Africa, is quite difficult, with up to nine click sounds. Clicks only exist in one language outside of Africa – the Australian language Damin – and are extremely difficult to learn. Even native speakers mess up the clicks sometimes. Nelson Mandela said he had problems making some of the click sounds in Xhosa. The phonemics in general of Xhosa are pretty wild.

Xhosa gets a 5 rating, extremely hard to learn.

Zulu and Ndebele also have these impossible click sounds. However, outside of click sounds, the phonology of Nguni languages is straightforward. All Nguni languages are agglutinative. These languages also make plurals by changing the prefix of the noun, and the manner varies according the noun class. If you want to look up a word in the dictionary, first of all you need to discard the prefix. For instance, in Ndebele,

riverumfula riversimifula, but

stoneilitshe stones –  amatsheyet

treeisihlahla trees izihlahla

Ndebele gets a 5 rating, hardest of all.

Zulu has pitch accent, tones and clicks. There are nine different pitch accents, four tones and three clicks, but each click can be pronounced in five different ways. However, tones are not marked in writing, so it’s hard to figure out when to use them. Zulu also has depressor consonants, which lower the tone in the vowel in the following syllable. In addition, Zulu has multiple gender – 15 different genders. And some nouns behave like verbs. It also has 12 different noun classes, but 9

Zulu gets a 5 rating, extremely hard to learn.

G Swahili

For unknown reasons, Swahili is generally considered to be an easy language to learn. The US military ranks it 1, with the easiest of all languages to learn. This seems to be the typical perception. Why Swahili is so easy to learn, I am not sure. It’s a trade language, and trade languages are often fairly easy to learn. There’s also a lot of controversy about whether or not Swahili can be considered a creole, but that has not been proven. For the moment, the reasons why Swahili is so easy to learn will have to remain mysterious.

On the down side, Swahili has many noun classes, but they have the benefit of being more or less logical.

Swahili gets a 2 rating, moderately easy.

Khoisan Southern Africa Southern Hua

!Xóõ (Taa), spoken by only 4,200 Bushmen in Botswana and Namibia, is a notoriously difficult Khoisan language replete with the notoriously impossible to comprehend click sounds. Taa has anywhere from 130 to 164 consonants, the largest phonemic inventory of any language. Of this vast wealth of sounds, there are anywhere from 30-64 different click sounds. There are five basic clicks and 17 accompanying ones. Speakers develop a lump on their larynx from making the click sounds.

In addition, there are four types of vowels: plain, pharyngealized, breathy-voiced and strident. On top of that, there are four tones. Taa appears on many lists of the wildest phonologies and craziest languages period on Earth.

Taa gets a 5 rating, extremely hard to learn.

Northern

Ju|’hoan, a Khoisan language spoken by 5,000 people in Botswana, has one of the study of the weirdest languages on Earth.

Ju|’hoan gets a 5 rating, extremely hard to learn.

Eskimo-Aleut Eskimo Inuit-Inupiaq

Inuktitut is extremely hard to learn. Inuktitut is polysynthetic-agglutinative, and roots can take many suffixes, in some cases up to 700. Verbs have 63 forms of the present indicative, and conjugation involves 252 different inflections. Inuktitut has the complicated polypersonal agreement system discussed under Georgian above and Basque below. In a typical long Inuktitut text, 9

Inuktituusuungutsialaarungnanngittuaraaluuvunga. I truly don’t know how to speak Inuktitut very well.

You may need to analyze up to 10 different bits of information in order to figure out a single word. However, the affixation is all via suffixes (there are no prefixes or infixes) and the suffixation is extremely regular.

Inuktitut is also rated one by linguists one of the hardest languages on Earth to pronounce. Inuktitut may be as hard to learn as Navajo.

Inuktitut is rated 6, hardest of all.

Kalaallisut (Western Greenlandic) is very closely related to Inuktitut. Look at this sentence:

Aliikusersuillammassuaanerartassagaluarpaalli… However, they will say that he is a great entertainer, but …

That word is composed of 12 separate morphemes. A single word can conceptualize what could be an entire sentence in a non-polysynthetic language.

Kalaallisut is rated 6, hardest of all.

Chukotko-Kamchatkan Northern Chukot

Chukchi is a polysynthetic, agglutinating and incorporating language and is often listed as one of the hardest languages on Earth to learn.

Təmeyŋəlevtpəγtərkən. I have a fierce headache.

There are five morphemes in that word, and there are three lexical morphemes (nouns or adjectives) incorporated in that word: meyŋgreat, levthead, and pəγtache.

Chukchi gets a 6 rating, hardest of all.

Basque

Basque, of course, is just a wild language altogether. There is an old saying that the Devil tried to learn Basque, but after seven years, he only learned how to say Hello and Goodbye. Many Basques, including some of the most ardent Basque nationalists, tried to learn Basque as adults. Some of them succeeded, but a very large number of them failed. Based on the number that failed, it does seem that Basque is harder for an adult to learn as an L2 than many other languages are. Basque grammar is maddeningly complex and it often makes it onto craziest grammars and craziest language lists.

There are 11 cases, and each one takes four different forms. The verbs are quite complex. This is because it is an ergative language, so verbs vary according to the number of subjects and the number of objects and if any third person is involved.

This is the same polypersonal agreement system that Georgian has above. Basque’s polypersonal system is a polysynthetic system consisting of two verb types – synthetic and analytical. Only a few verbs use the synthetic form.

Three of Basque’s cases – the absolutive (intransitive verb case), the ergative (intransitive verb case) and the dative – can be marked via affixes to the verb. In Basque, only present simple and past simple synthetic tenses take polypersonal affixes.

The analytical forms are composed of more than one word, while the synthetic forms are all one word. The analytic verbs are built via the synthetic verbs izanbe, ukanhave and egindo.

Synthetic:

d-akar-ki-o-gu = We bring it to him/her. The verb is ekarribring. z-erama-zki-gu-te-n = They took them to us. The verb is eramantake

Analytic:

Ekarriko d-i-o-gu = We’ll bring it to him/her. Literally: We will have-bring it to him/her. The analytic verb is built from ukanhave.

Eraman d-ieza-zki-gu-ke-te = They can take them to us. Literally: They can be taking them to us. The analytic verb is built from izanbe.

Most of the analytic verbs require an auxiliary which carries all sorts of information that is often carried on verbs in other languages – tense, mood, sometimes gender and person for subject, object and indirect object.

Jaten naiz. Eat I-am-doing. I am eating.

Jaten nintekeen. Eat I-was-able-to. I could eat.

Eman geniezazkiake. Give we-might-have-them-to-you-male. We might have given them to you.

In the above, naiz, nintekeen and geniezazkiake are auxiliaries. There are actually 2,640 different forms of these auxiliaries!

A language with ergative morphosyntax in Europe is quite a strange thing, and Basque is the only one of its kind. The ergative itself is quite unusual:

Gizona etorri da.The man has arrived. Gizonak mutila ikusi du.The man saw the boy.

gizonman mutilboy -a = the

The noun gizon takes a different form whether it is the subject of a transitive or intransitive verb. The first sentence is in absolutive case (unmarked) while the second sentence is in the ergative case (marked by the morpheme -k). If you come from a non-ergative IE language, the concept of ergativity itself is difficult enough to conceptualize, much less trying to actually learn an ergative language. Consequently, any ergative language will automatically be more difficult than a non-ergative one for all speakers of IE languages.

Ergativity also works with pronouns.  There are four basic systems:

Nor:           verb has subject only
Nor-Nork:          "    subj. + direct complement
Nor-Nori:          "    subj. + indirect comp.
Nor-Nori-Nork:     "    subj. + indir. + dir. comps.

Some call Basque the most consistently ergative language on Earth.

If you don’t grow up speaking Basque, it’s hard to attain native speaker competence. It’s quite a bit easier to write in Basque than to speak it.

Nevertheless, Basque verbs are quite regular. There are only a few irregularities in conjugations and they have phonetic explanations. In fact, the entire language is quite regular. In addition, most words above the intermediate level are borrowings from large languages, so once you reach intermediate Basque, the rest is not that hard. In addition, pronunciation is straightforward.

Basque is rated 5.5, nearly hardest of all.

References

Dorani, Yakir. Hebrew speaker, Israel. August 2013. Personal communication.

Hewitt, B. G.. 2005. Georgian: A Learner’s Grammar, p. 29.

Kim, Yuni. December 16, 2003. Vowel Elision and the Morphophonology of Dominance in Aymara. UC Berkeley.

Kirk, John William Carnegie. 1905. A Grammar of the Somali Language: With Examples in Prose and Verse and an Account of the Yibir and Midgan Dialects, pp. 73-74.

Rogers, Jean H. 1978. Differential Focusing in Ojibwa Conjunct Verbs: On Circumstances, Participants, and Events. International Journal of American Linguistics 44: 167-179.

Wang, Chuan-Chao et al. 2012. Comment on ”Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa.” Science 335:657.

This research takes a lot of time, and I do not get paid anything for it. If you think this website is valuable to you, please consider a a contribution to support more of this valuable research.

error

Enjoy this blog? Please spread the word :)