Repost: Englishes, Portugueses, and Chineses

In the comments to a previous post, Goyta made several comments. First of all he noted that the differences between Brazilian and European Portuguese are considerable, especially when a Brazilian hears a less educated, working class or rural Portuguese.

He also said that when European Portuguese are interviewed on Brazilian TV, Brazilians wish they had subtitles. Wanting to have subtitles when you see a video of someone speaking is actually a symptom that you are dealing with another language. He said the differences are particularly severe when it comes to IT. He said he cannot understand 9

It does appear that the differences between European Portuguese and Brazilian Portuguese are pretty significant, more significant than the differences between US and British English.

On other hand, I find Hibernian English spoken in Ireland to be nearly incomprehensible, though it is said to be just a dialect of English. It’s clearly been influenced extensively by the Irish language. Scots, the regional English spoken in Scotland and exemplified by the movie Trainspotting , is actually a completely separate language from English. That movie actually needed subtitles. On the other hand, there is a Scottish English dialect that is not Scots that is pretty intelligible.

We can always understand British English no matter who is writing it. Same with understanding spoken Australian and New Zealand (Kiwi) English. British English is often written a bit differently in slang expressions, but we pick them up. The formal writing is totally understandable.

There have been huge fights on Wikipedia between British English and US English speakers with complaints from the Brits of bullying by the Americans. There was an attempt to fork the English Wiki into Br and US versions but it failed. Wikipedia demands that you have an ISO code in order to get a Wikipedia and ISO codes only come from SIL, who publishes Ethnologue. I petitioned for a few new languages a couple of years ago and they all got shot down.

There is an ongoing war between European Portuguese and Brazilian Portuguese on the Portuguese Wikipedia with complaints from the European Portuguese of bullying on the part of the Brazilians. Gotya noted that he, a Brazilian, could not read Portuguese IT materials. This is unfortunate. All written British is intelligible to us. We can read anything written in the UK, though most of our reading material here is from the US. I can read The Economist and The New Standard and The Spectator with no problems at all.

As a Californian, I speak completely normally, of course, and have no accent whatsoever! Haha. We can understand the Midwest accent perfectly, though it can be different. It sounds “flat”. They also insert rhotic consonants before some consonants at the end of a word and the raising of the preceding vowel – “wash” becomes “worsh”.

The Oklahoma accent is different and sometimes it can be hard to understand. I heard some people speaking Oklahoman in the doctor’s office the other day for a minute or so I thought they were speaking a foreign language! Of course they were mumbling too. Then I asked them where they were from and they said Oklahoma. At that point, I had caught onto their accent and could understand them perfectly.

I do not know why the Texan accent is said to be hard to understand. We understand it perfectly, but it sounds funny. We make a lot of jokes about it. George Bush has a strong Texan accent. There is also an Arkansas accent (Arkies) that is different but understandable. This is also the source of jokes. In this part of California there are many Whites who still speak Arkie and Okie. They are the descendants of those who came out here from the Dust Bowl in the 1930’s. Steinbeck wrote a book about this called The Grapes of Wrath.

Other than that, there are no accents in the West.

There is some sort of a Kentucky-Tennessee accent, but I am not sure if they differ. This is also a source of jokes. It’s sort of a general Appalachian accent, and it’s the source of jokes about inbred hillbillies and whatnot.

The Southern accent is well-known but usually understandable. My brother went to live in Alabama though and he said that the workers in the factory he worked at were often completely unintelligible. The Blacks were worse than the Whites, and they had separate accents. He has imitated their incomprehensible accent to me and it’s pretty hilarious.

I have heard poor Blacks from Memphis on the Cops show who were completely unintelligible to me. People with more money and status tended to be more comprehensible. I sometimes have a hard time understanding a Mississippi or Alabama accent, but it’s no problem. Our Southern politicians all have thick Southern accents.

Cajun English from Louisiana is often unintelligible to us, but the people with more money and status are quite intelligible.

There is also a Black accent from the coast of South Carolina called Gullah that is hard to understand. The Blacks from around there speak something like it and you can pick it out if you are sharp. It has a pretty, lilting sound to it. It’s different from the standard Southern accent and is sort of charming. Moving up the coast, there is a Virginia accent that is softer, pleasant and charming.

There is the famous New York accent, which to us laid back Californians sounds horribly rude, obnoxious, loud and belligerent. Some forms of it also sound ignorant – these tend to be associated with working class Whites in Brooklyn and the Bronx.

One thing they do is to glide and lengthen rhotic consonants – “New York” becomes New Yawwk” “Brooklyn” becomes “Bwwoklyn”.

A similar accent seems to be spoken in New Jersey, but it may be different. One again, it involves lenition of rhotic consonants, in this case turning them into dipthongs with long vowels. “New Jersey” becomes “New Joiisey”. This is also a source of jokes. There is a Boston accent which is completely understandable. Ted Kennedy speaks that. It involves the lenition of hard consonants into glides and the end of a word – “car” becomes “caw”.

I believe there is a sort of a slow drawl from Vermont and New Hampshire too. Those people, especially the older men, are known for not talking much. Men of few words.

Some Blacks around here still talk with thick Black accents that sound Southern even though they were born in the Central Valley. There is also an “Ebonics” English (for lack of a better word) that is spoken here by sort of ghettoish or semi-ghettoish Blacks. It is frankly, almost completely unintelligible. They seem like they are talking with their mouths full, mumbling and speaking extremely fast, running all of the sounds together.

Everyone who talks like this can also speak Standard English thank God, and they can quickly move in and out of that Ebonics talk when you talk to them. It’s sort of a language for them to talk so that we can’t understand them, I think. To us, it sounds sloppy, low class and ghetto, but it reportedly a full-fledged language. The Blacks in the Caribbean do not speak English! That makes me feel good because I can hardly understand a word they say. Each island has its own form of Creole English which is a completely separate language.

I think that Indian English (Chichi derogatorily) and West African English need to be split into separate languages because they are often incomprehensible to us. This is a case of regional Englishes evolving on their own. Further, West African English often differs a lot in its written form.

Indian English is often so mangled in its written form that it is incomprehensible, but more educated writers are comprehensible. The tendency to drop articles is very annoying and makes written Indian English sound ignorant to us. Don’t mess with our damned useless articles!

Reading about the Chinese languages, there are efforts underway to get speakers to speak proper Putonghua, whatever that means. Speakers from different parts of China still speak Putonghua with an accent that can be heavy at times.

Here in the US, we do not have this problem. Even our politicians still speak in heavy regional accents, and no one cares. We can always understand them. There is no national effort to get everyone to speak proper English that involves wiping out regional accents, though I understand that in the corporate world, they are offering classes to help people get rid of Southern accents, which are stereotyped as sounding backwards, ignorant and racist. I think this is sad. Our regional accents are what makes this country great.

Goyta also notes that Brazilians are starting to speak Spanish and the neighboring Spanish speaking countries are starting to speak Portuguese. When I was dealing with them 5-10 years ago, most Brazilians did not speak much Spanish (They acted like it was extremely low on their list of priorities) and Spanish speakers had zero interest in learning Portuguese (In fact, they regarded the suggestion as offensive and preposterous!)

Goyta notes that with regional integration, more Portuguese are speaking Spanish and more Spanish speakers from nearby countries are learning Portuguese. Spanish is becoming a prerequisite to getting a good job in Brazil. This is good as it’s good to see Latin Americans getting together.

It is also true that in China there has been a big fight over Chinese language classification. The unificationist – fascist types, associated with the Communist government (and actually with the Nationalist government before also – this is really a Chinese elite project) insist that there is only one Chinese language. This goes along with racism of Northern Chinese against Southern Chinese and to some extent vice versa. This racism is most evident in the Cantonese vs Mandarin war in China.

Cantonese speakers say that they speak the real Chinese and that Northern Chinese speak a bastardized tongue derived from the old Manchu language. Cantonese speakers also resent that a Northern Chinese was turned into the national tongue and imposed on them against their will. They also say that Northern Chinese are really from the South and that the real NE Asians are the Mongolians, Koreans, Manchu, Japanese, etc. Genetic studies show that this is not the case.

Northern Chinese say that Southern Chinese are not real Chinese and their blood is “contaminated” with Tai types like the Tai, Zhuang, Vietnamese, etc. There is probably something to this. Although Putonghua is the only official language in China and there is a war going on against the regional Chineses, enforcement has been held off against Cantonese. And Cantonese  areas are still where you will hear the least Putonghua and the most regional Chinese in all spheres of life. Cantonese is also allowed on the radio and TV, whereas regional Chineses had previously banned from the media.

The Putonghua-only campaign has been too successful and regional Chineses are being wiped out. There is now a regionalism movement arising in China to promote and retain regional Chineses.

I think that the Putonghua campaign has been good, but that China should promote bilingualism. The Putonghua campaign has not yet been successful. As of 2001, only 5

China clearly needs a language that they can all speak. For its entire history, many Chinese have not been able to speak to each other, including folks from one village to the next if you go to the southeast and the central coast. Provinces like Fujian, Jiangxi, Jiangsu, Henan and Hunan are notoriously multilingual.

Most of these places have a lot of very high mountains, and transportation was typically very poor. Even today, you can scarcely get around by vehicle and you sometimes have to walk from one place to the next, sometimes for dozens of miles! Bottom line is they were very isolated from each other.

These places also retained a tradition of being hideouts for “hillbilly” types where there was a lot of unemployment and many folks turned to crime. Also criminals fled to the mountains where they could hide. Upshot was that due to all of this, and people seen as backwards, lazy, stupid and thieving, people from the rest of China had no interest in going to these places anyway. When people left these parts of China to go to big cities, they were stereotyped in a way similar to how ghetto Blacks and Browns are in the US. This made them want to stay in their mountains.

Alt Left: A Reasonable Project for “Soft” Taiwanese Independence to Assuage PRC Fears

Vicmund the Han: What do you think of Taiwanese based on your observations?

You’re going to hate me for saying this, but I think they should go independent. But I would like a peace treaty with China beforehand, an economic agreement, CCP military bases in Taiwan dual staffed, Taiwanese military bases in China dual-staffed, perhaps some sort of integration military or econonomic-wise like the CIS or better yet, Belarus. Transform it into a deep alliance and work together. The radical independencists will have to be sidelined.

The main thing is to make it so an independent Taiwan is not a military threat to China. No US military bases in Taiwan, integration of both nations’ policies towards the US and maybe on a lot of other things. Brotherly countries with a strong alliance who agree to disagree on certain things, but when they do, they are “brotherly opposition.”

There is only one China. There are two countries, Taiwan and China. Taiwan is not China. It’s Taiwan. The only China is the People’s Republic. Two Chinas policy was insane, but one China policy is crazy too as it says that Taiwan doesn’t even exist!

The problem is that most  Chinese, including the CCP, are stark raving nuts about this question, so I am really worried that they will not want to put this project into effect. China sees Taiwan as a rebellious province of China. Well, it’s a part of China that fought a war and  achieved their independence from China via military might. So it’s not a rebellious province anymore. It’s like Eritrea split off from Ethiopia. It’s a new country.

Chinese nationalism is ok in a sense, but it’s also ethnic nationalism in a sense and it’s definitely ultranationalism in a revanchist way. You can’t go back and retake land you lost in wars. That’s what those world wars were all about. Irredentism and revanchism have got to go. Chinese nationalism suffers from a lot of the insanities, toxicities, and mental disorders of any nationalism. It is fascist in a sense that all extreme nationalisms or patriotardisms are, though only in a very broad sense of wanting a restoration of a Chinese empire.

It’s nation-state nationalism or patriotardism like exists in many countries, including the US.  It differs from almost all fascisms in not being ethnic-based and in not being part of a nation-building project where all non-Chinese Han/non-Mandarin speakers have to turn into Chinese Mandarin-speaking Hans. They all have to get rid of their languages, ethnic identities, and religions and cultures and become Hans in a sense. Chinese nationalism doesn’t work like that.

It’s inclusive rather than exclusive, offers autonomy instead of forced assimilation, and retains in a sense the notion of self-determination of nations in that nations in  China are free to  speak their languages, practice  their cultures and religions, etc. Pretty typical of the national policies of many Communist countries, though certainly not all of them! It’s more like Soviet nationalism. The Soviets went after breakaway provinces too you know.

Eastern Europe was quite hostile to minority languages, ethnicities, and cultures. Polish and Yugoslavian nationalisms were nation-building projects. I’m not sure how minorities were treated in Slovakia (Hungarians), Romania (Germans), etc. There was much persecution of the Rusyns in Poland, ethnic Germans everywhere, and Italians and Chakavian-speaking Istrians on the islands in Croatia after World War 2 of course. They were accused of siding with the enemy.

An Interesting Mostly Southern Chinese Phenotype

A good friend of mine who resides in Singapore. He is very interested in his background and gave me his photo to analyze.

Looking at it, I believe he is definitely Southern Chinese fore the most part. His father is Hainanese and has a rather distinctive genotype that looks something like his son’s. His mother is a certain type of Malay that dates back to the 1400’s and is significantly mixed with European blood, mostly British and Dutch, as Europeans have a presence in the area dating back centuries. I believe that they are called Pernakans. He also has some female relatives that look very Malay. I do not know who the older man to the right is, but he looks quite Malay to me.

I think my friend ended up looking more Chinese than Malay. The Hainanese are definitely a Chinese type people. Whether they also have a Vietic type SE Asian component is not known as I do not know the history of Hainan.

Although my friend definitely has a strong Southern Chinese look, he also has another component that makes him look, well, different. I’m not going to attempt to describe this element, but it does make him look somewhat “odd,” “interesting,” or “unusual, ” from a Southern Chinese POV. A typical Southern Chinese would say that he looks like a Southern Chinese, but he’s not like us. A Southern Chinese has more of a Modern Mongoloid look. My friend is mostly modern Mongoloid, with some elements of transitional Mongoloid or archaic Mongoloid – this is what the Malays are after all – added in.

The evolution from Negritos to moderns occurred much later in Malaysia, much taking place in only the last 5,000 years. The Senoi are an example of an archaic group that is definitely Australoid yet nevertheless more progressive than the Negritos. These are the “dream people” of psychological and anthropological literature, though modern research has shown that they do not incorporate dreams as much into their waking lives as we previously thought and that the extent to which they do this was much exaggerated.

There are also Negritos (or original Asians) in Malaysia. In fact, there is a group in Malaysia that genes that date back to 72,000 YBP. This is actually before the main Out of Africa event, yet is has now been shown that other small groups went out of Africa before then.

Most of these groups were devastated by the vast Toba volcanic explosion in India 72,000 YBP that exterminated almost all humans in South and Southeast Asia. It is thought that only 1,500 of this group survived the explosion. This means that humans went through a severe genetic bottleneck no doubt accompanied by massive selection pressure and huge genetic effects. Whether this explosion’s effects extended to Central Asia (probably), the Middle East (maybe), or East Africa (unknown) is not known. At any rate, this original group departed from East Africa near Somalia and Djibouti.

The main OOA group left out of here too. No one quite knows what these people looked like but they have appeared somewhat Khoisan. The Khoisan are the most ancient group in Africa with genes dating back 52,000 YBP. Further, their click language to me seems like a good candidate for the original human language. It does seem to be quite primitive. Before that, we clearly used sign language. Neandertals could not speak due to their hyoid bones. The great apes also have this problem. So when Neantertals vocalized, they may have sounded like great apes.

The Sasquatch, which I believe is an archaic hominid related to Heidebergensis which somehow survived, has a very odd speech pattern (it speaks on the inhale, bizarrely enough – try it sometime) and a friend of mine who shot and killed two of them told me that the juveniles were using extensive sign language. They ran half the time on all four and half the time on two legs, which is very odd. Sasquatches can run up to 30 mph on all fours. That must be quite frightening to watch but it can be seen in the Port Edward Island Sasquatch footage. Anyway, enough about Bigfoot for today!

It’s not known how far modern human language dates back. Sergei Starostin feels it cannot date back more than 50,000 because so many cognates remain that we can actually construct a bit of Proto-World. One Proto-World term is “tik” meaning one, to point, index finger, etc. From this comes our word to teach. Imagine a teacher pointing at a blackboard with his index finger. I worked on an Indian language a while back and they had a very archaic word found only in the earliest vocabularies – tik, meaning “the point of a spearhead. I cannot prove it but I believe deep down inside that this is from the same root. I

It’s more of a gut feeling or intuitive thing, and intuitions are often wrong because they overgeneralize, throw out logic altogether, and rely exclusively on notoriously unreliable and subjective (the very word subjective implies emotional response) feelings, especially deep or gut feelings that can be described as “Gestalt.” I’m a birdwatcher and we use something called Gestalt to identify fleeing glimpses of a bird.

All we can see is what philosophers like Heidegger might call “the essence” or essential nature of the bird rather than it’s surface characteristics which are too fleeting to identify. Heidegger discusses surface versus essence interpretations of objects a lot. It seems hard to figure out but it’s easier than you think.

Logic relies on surface or appearance, including the human definition we have given to the object.

Intuition on the other hand pretty much throws out the surface stuff and looks for the “essence of the thing” or the “deep meaning” or “true meaning” of the object. We are getting into Plato here with the concept of “pure objects” that actually do not exist in reality.

An example of Platonic pure objects would be what I call the Masculine and Feminine spirit (see the brilliant and wrongly derided Otto Weininger’s “Sex and Character” for more. And Weininger comes from Nietzsche in my opinion and leads to Heidigger, also in my opinion. He seems to be a sort of a bridge between the two. Note that all were Germans, Weininger an Austrian, but oh well.

The Masculine Spirit and the Feminine Spirit is one way of dividing the universe or world in a binary manner. Not that there are not other binary methods of chopping the world into opposite halves, but this is just one of them.

I would argue that the world is half Masculine principle and half Feminine principle and that neither is better than the other and the marriage of the two opposites creates a whole that is bigger than the sum of its parts, hence the human pair bond where each pair of the male-female couple fills in the missing blanks or parts of the other one, each creating a whole person in the other where only a “half person” had existed before.

We are also getting into Taoism here, but the ancient Chinese were awful damn smart, so you ignore them at your peril in my opinion. Furthermore, the Taoist maxim of how to live your life – “moderation in all things” is an excellent aphorism, not that many of us ever do it. It’s clearly the route to a long lifespan.

To do the opposite is to burn candles at both ends, life fast, die young, and leave a pretty corpse, which sounds very romantic and appealing when young (it did to me) but which sounds increasing idiotic and even suicidal for no good reason with each advancing year past 30. I now find it laughable, pathetic, and openly suicidal and delight in mocking the concept. But I survived another 30 years past the expire date on that concept, so perhaps my new attitude is simply the inevitable product of living out that maxim twice and hence nullifying it.

There are a number of Southern Chinese groups with more of an indigenous look, sometimes prognathous. These date back to the original indigenous elements in Southern China and SE Asia, who all date back to the Negritos. The Montagnards of Vietnam are definitely one of these indigenous types. The indigenous went from

Indigenous (Negrito) -> Proto SE Asian (with Melanesian component) -> modern SE Asian (Modern Mongoloid with archaic components. This effect is quite pronounced in the Vietnamese, who were completely overrun by a Chinese invasion 2,300 years ago after which there was much interbreeding and a huge infusion of Cantonese words, which now make up 7

However, the core vocabulary of of Vietnamese remains Austroasiatic (a language family nevertheless with Southern Chinese roots derived from the archaic Mongoloid peoples of the region 5-7,000 YBP, who later moved into SE Asia. This core vocabulary is shared by the Munda branch of Astroasiatic, completely isolated India, particularly Eastern (Mongoloid) India. The fact that Vietic shares a common core vocabulary with the geographically separated Munda proves the existence of Austrasiatic.

In fact, it is the final convincing argument. Anyone who says that Austroasiatic does not exist is a fool.

Further, the evidence for Austroasiatic, a proven family, is no greater than the existence for Altaic, and in fact Altaic may be better proven. The “numerals” argument against Altaic is belied by the 13,000 year old Afroasiatic language, the numerals of which are a complete disaster.

Numerals are more often innovated and replaced than people think. Often the old cognates survive in archaic words or words used for related concepts, but it’s not unusual at all for the main term to be an out and out innovation. Most Altaic numerals are innovated, but there are a few cognates. Further most of the numerals have cognates in related or archaic words.

This is the most archaic layer of Austroasiatic. Some of these peoples are archaic Mongoloids with a strong Australoid component. A branch of these Australoids called Carpenterians went from India to Australia 11,000 YBP and become part of the Aborigines. Another group of archaic Australoids were called Murrayans. They came from Thailand 17,000 YBP and went to Australia. It is not known what Australians looked like before that but no doubt they were quite primitive. It’s long been thought that they have more Erectus component than the rest of us, but I’m not sure that is proven. Certainly their appearance resembles that.

The Murrayans are the core element of the Ainu, who went to the Philippines 16,000 YBP in an unusual, Caucasian appearing type, and then moved to the Southern Japanese islands north into Japan 13,000 YBP, quite possibly replacing an ancient Negrito type already there. This Negrito type definitely existed in Southern China and may well have existed in Korea. Some Australoids or especially Australoid-Mongoloid mixes can have a superficial “Caucasian” appearance, but that’s just parallel development, coincidence or more probably the fact that the possible human phenotypes is only a small subset of the possible ones.

It is this coincidentally “Caucasoid” appearance that led many observers to believe that the Ainu were somehow ancient Caucasians (Norwegians, joked one anthropologist was) that got stranded from the rest of Europoid flock way over on the other side of Asia. In fact, the Ainu are Australoid by skull and Mongoloid by genes. Their language, like the Japanese language, has an ancient Austronesian layer that has led many to falsely conclude that the Altaic Japanese language is actually an Austronesian one. The argument is even better with Ainu, the deeper group of which has not been shown to my satisfaction.

English as a Genocidal Language Attacking Other Tongues Spoken in the Anglosphere – USA

English has had a genocidal affect on the other languages spoken here, but many non-English languages still survive and some are quite thriving.

Pennsylvania Dutch is still quite alive with 300,000 native speakers. I think is is just a dialect of Rhenish German. It’s actually two separate languages and they can’t understand each other.

There are many other languages in the US that have been taken out by English. Most of the Indian languages spoken here have been driven extinct or moribund by English. A few like Cherokee, Sioux, Navajo, Mohawk, Pueblo, some Alaskan languages, a couple of Indian languages of the US South, are still doing well.

Most of the others are in bad to very bad shape, often moribund with only 10 or fewer speakers, often elderly. Many others are extinct. However, quite a few of these languages have had a small number of middle aged to elderly speakers for the last 25 years, so the situation is somewhat stable at least at the moment.

Almost all Indian languages are not being  learned by children. But there are still children being raised speaking Cherokee, Navajo, Pueblo, Mohawk, and some Alaskan and Southern US Indian languages. Navajo is so difficult that when Navajo children show up at school, they still have  problems with Navajo. They often don’t get the  language in full until they are twelve.

However, there are revitalization efforts going on with many to most Indian languages, with varying amounts of success. Some are developing quite competent native speakers, often young people who learn the language starting at 18-20. I know that Wikchamni Yokuts has a new native speaker, a 23 year old man who learned from an old who is a native speaker. In California, there is a master apprentice program going on along these lines.

There are a number of preschool programs where elders try to teach the  languages to young children. I am not sure how well they are working. There are problems with funding, orthographies and mostly apathy that are getting in the way of a lot of these programs.

There are many semi-speakers. For instance in the tribe I worked with, many of the Indians knew at least a few words, and some of the leadership knew quite a few words. But they could hardly make a sentence.

Eskimo-Aleut languages are still widely spoken in Alaska. I know that Inuktitut is still spoken, and  there are children being raised in the language. Aleut is in poor shape.

Hawaiian was almost driven extinct but it was revived with a revitalization program. I understand that the language still has problems. I believe that there are Hawaiian medium schools that you can send your child to. There may be only ~10,000 fluent speakers but there are many more second language speakers with varying fluency.

There are actually some European based languages and creoles spoken in the US.  A noncontroversial one is Gullah, spoken on the islands of South Carolina. There may be less than 5,000 speakers, but the situation has been stable for 30-35 years. Speakers are all Black. It is an English creole and it is not intelligible with English at all.

There is at least one form of French creole spoken in Louisiana.  There is also an archaic form of French Proper called Continental French that resembles French from 1800. It has 2,000 speakers. Louisiana French Creole still has ~50,000 speakers. People worry about it but it has been stable for a long time. Many of the speakers are Black.

Texas German is really just a dialect of German spoken in Texas. There are only a few elderly speakers left.

There are a few Croatian languages spoken in the US that have diverged dramatically from the languages back home that they are now different languages. The status of these languages vary. Some are in good shape and others are almost dead. One of these is called Strawberry Hill Gorski Kotar Kaikavian spoken in Missouri. It is absolutely a full separate language and is no longer intelligible with the Gorski Kotar Kaikavian spoken back home.

There are other European languages spoken in the US, but they are not separate from those back home. Most are going out.

There are many Mandarin and especially Cantonese speakers in the US.

There are many Korean speakers in the US, especially in California.

There are a fair number of Japanese speakers in the US, mostly in California.

There are many speakers of Khmer, Lao, Hmong, and Vietnamese in the US. Most are in California but there are Hmong speakers in Minnesota also.

There are quite a few speakers of Arabic languages in the US. Yemeni, Syrian, and Palestinian Arabic are widely spoken. There are many in New York City, Michigan and California.

There are also some Assyrian speakers in  the US and there are still children being raised in Assyrian. Most are in California.

There are quite a few Punjabi and Gujarati speakers in the US now. We have many Punjabi speakers in my city.

There are quite a few Urdu speakers here. Most of all of these speakers are in California.

Obviously there are many Spanish speakers in the US. English is definitely not taking out Spanish. They are mostly in the Southwest, Florida, and New York City, but they are spreading out all across the country now.

There are a few Portuguese speakers in the US. All also speak English. They are mostly in California but some are back east around Massachusetts.

The Sicilian Italian spoken in the US by Italian immigrants is still spoken fairly widely to this day. It has diverged so much from the Sicilian back home that when they go back to Sicily, they are not understood. This is mostly spoken in large cities back east.

There are quite a few Armenian speakers in the US and children are still being raised in Armenian. Most are in California.

There are some Persian speakers in the US, but not a lot. Most of these are in California too.

All of these languages are the same languages as spoken back home.

A Reworking of Chinese Language Classification

This is a huge work that I have lost track of. Really need some Chinese informants to work on this one some more. I look at this work and get a headache just looking at it. It’s 211 pages. This is one of the most extensive overviews of the Chinese languages ever published in English though, I will say that. Work in progress for ten years now. Download as pdf for best experience.

A Reworking of Chinese Language Classification

A Look at the Chinese Model of Communism – Market Socialism

You are starting to see a lot of articles in the capitalist press bashing China now, saying their economy is not as good as they say, that it cannot be sustained, and that it is headed for crash. They base this on a comparison to other Communist countries, but those economies fell behind far before China’s did. China has sustained Communism under various forms, including presently under market socialism, for 70 years now. That’s as long as the Soviet Union, and the Soviets started stagnating a long time before that. China is an example of a smashing success for a Communist country, and the capitalist press is freaking out because that shows that their anti-Communist propaganda has been crap for all of these years. Incidentally, Deng Xiaoping emphatically stated that he was a Communist. Deng’s idea was to create “a rich Communist country.”. In an interview in 2005, a top party official was asked if China was still committed to spreading Communism all over the world. “Of course,” the minister beamed. “That is the purpose of the Communist party (CCP).” Incidentally, China still has 5-year plans and the whole economy is planned. The business sector has to go along with the plan, and if you do not go along with it, they can confiscate your business. A party committee sits on the board of all large corporations. The government owns every inch of land in China. The state invests an incredible amount in the economy and also overseas where it makes vast investments. This is because some Chinese government companies are very profitable. A number of Chinese government companies are on the list of largest companies in the world. Capitalists in the US openly complain that they cannot compete with Communist Chinese government  corporations, crying that they get subsidies so it’s not fair. So here we have US corporations openly admitting that they can’t compete with Chinese government Communist state-owned companies. 4 Much of the state sector is owned by small municipalities, and this works very well. Further, cities compete against each other. For instance, City A’s steel mill will compete against City B’s steel mill, and both will compete against a private sector steel mill, if there is one. Successful enterprises bring in a lot of money to the city, which it uses to upgrade the city, which results in more workers moving there, which grows the economy more with more workers and more demand. There are also still a number of pure Maoist villages in China that are run completely on a Maoist line. Everything is done as it was right out of the Mao era. I understand that they do very well, and there is a huge waiting list to move to those villages. I did a lot of research on China recently, and the party is literally everywhere you look every time you turn around. The party itself still runs many enterprises all over the country, especially in the rural areas. There are party officials in every village and city, and they take a very active role in developing the municipality in every way, including culturally. They have an ear to the ground and are typically very popular in the villages and cities. Party officials lobby the state to try to solve any urgent problem in the area. The government is always spending a lot of money all over China on public works, on fixing various environmental problems, or on really any societal problem or issue you can think of. This of course includes economic development, which tends to be state-led. I read synopses of many dissertations coming out of Chinese universities, and most were on how to deal with some particular societal problem or issue. Many others dealt with technology and industry. So a lot of the research on technology and industry that is driving economic development is coming straight out of state universities. Instead of leaving it up to the private sector to deal with the problems in society, create public works, and even plan the economy, the government does all of that. Incidentally, the way the US leaves the planning of the economy, such as it is, up to the private sector is insane. All sensible economic planning in any nation will always be done by the state with a view towards allowing the country to prosper. Capitalists have no interest in whether the country profits or not, so they engage in no economic planning at all. Leaving economic planning up to the whims of the capitalists is economic malpractice. There are 1,000 protests every day in China. Yes, there is corruption and there are government abuses, but if protests last long enough, the party usually gets alarmed and tries to do something about the problem because they don’t want serious unrest. This is party that does everything it can to serve the people and try to remain popular with citizens by giving them as much as they can and doing as much for them as possible. The party spends every single day of its rule literally trying to buy off unrest and keep its citizens satisfied. It’s illegal to be homeless in China. If you end up homeless in China, they will try to put you in a homeless shelter, or if they cannot do that, they will send you back to your village because most homeless are rural migrants who moved to the city. The state is now investing a vast amount of money in the rural areas because these places have been neglected for a long time. The state still wants to own all the land because they want to keep the rural areas as a secure base where rural migrants to the city can always return if they fail in the city. How can a government in which 4 The state spends an unbelievable amount of money on public works all over the country all the time. Many projects that in the US have “conclusively proven” to be too costly to be implemented have been done in China quickly and easily. And China’s per capita income in less than 1 Most ethnic minorities are still allowed to support their culture, and in most cases they are allowed to have education in their native language. In these areas, the native language is co-official with Mandarin. In recent years, the Chinese government has begun to support a lot of the Chinese dialects, of which there are over 2,000 main ones, many of which are actually separate languages. Cantonese is still an official language in Hong Kong, and it is widely used in Guangdong. The other major Chinese languages or macrolanguages still have millions of tens of millions of speakers. Lately the Chinese government is telling people they can preserve their dialect as long as they also speak Mandarin. Many schools now have classes in the local dialect. Cheap medical insurance is available and it covers 8 This is a serious problem but it is much better than earlier in the Deng Era when millions were dying from lack of health care. However, the state still need to cover everyone. They got away from universal coverage  when they moved away from Maoism early in the Deng era. In addition, tens of thousands of schools, many of which were built during the Cultural Revolution, were closed early in the Deng era. The introduction of a market had a lot of problems in the early days. The capitalist press was cheering wildly as thousands of schools were closed all over China, medical care was cut off from or reduced for hundreds of millions of people, while millions of Chinese died from lack of medical care. This was all cause for celebration! Isn’t capitalism wonderful? What’s millions of humans dying from lack of health care as long as a few rich people can buy ridiculously expensive, useless items that they don’t even need? A recent good survey done by a Western polling firm found that 8 The economic model of China is called Market Socialism and a lot of modern day Leftists and even Communists support it and agree that this is the way forward for the left and Communist movement. Like all words, the word Communism has no inherent meaning. It means whatever people who use it say it means. So the definition of Communism can clearly change with the times as Communists update their definitions of what the word means. China cannot be called capitalist in any way. Their model is far more socialist than anything in any European social democracy. It also goes far beyond the US in the New Deal and of course beyond beyond the social liberalism and its more left analogue in Canada, not to mention beyond social democracy in Australia or New Zealand. Interestingly, Japan is not a capitalist country. They don’t have neoliberalism. That country does not operate on the capitalist mode of development. Instead the resemblance is, I hate to say, to Nazi Germany. Nazi Germany also did not have a capitalist mode of development. I’m not sure what you call it, but it’s not capitalism. For instance, in Japan, the commanding heights of the economy, including almost all of the banks, is owned by the state. The state still plans the economy. They plan the economy together with the business community and the state allocates a lot of funds and loans to areas of the economy it wishes to develop. There is probably a similar model in South Korea, which also is not capitalist and instead operates on a series of monopolies that are owned currently by large corporations and the government. The South Korean economy is also planned, and the plan is worked out by the government and the business sector working together.

Repost: The Classification of the Vietnamese Language

This ran first a long time ago, but I just sold an ad on this post, so I decided to repost it. Rereading it, it’s a great Historical Linguistics post. One of the reasons that I am doing this post is that one of my commenters asked me a while back to do a post on the theories of long-range comparison like Joseph Greenberg’s and how well they hold up. That will have to wait for another day, but for now, I can  at least show you how some principles of Historical Linguistics, a subfield that I know a thing or two about. I will keep this post pretty non-technical, so most of you ought to be able to figure out what is going on. Let us begin by looking at some proposals about the classification of Vietnamese. The Vietnamese language has been subject to a great deal of speculation regarding its classification. At the moment, it is in the Mon-Khmer or Austroasiatic family with Khmer, Mon, Muong, Wa, Palaung, Nicobarese, Khmu, Munda, Santali, Pnar, Khasi, Temiar, and some others. The family ranges through Vietnam, Cambodia, Laos, Thailand, Malaysia, Burma, China, and over into Northeastern India. It is traditionally divided into Mon-Khmer and Munda branches. Here is Ethnologue’s split, and here are some other ways of dividing up the family. The homeland of the Austroasiatics was probably in China, in Yunnan, Southwest China. They moved down from China probably around 5,000 years ago. Some of the most ancient Austroasiatics are probably the Senoi people, who came down from China into Malaysia about 4,000 years ago. Others put the time frame at about 4-8,000 YBP (years before present). A major fraud has been perpetrated lately based on Senoi Dream Therapy. I discussed it on the old blog, and you can Google it if you are interested. In Anthropology classes we learned all about these fascinating Senoi people, who based their lives around their dreams. Turns out most of the fieldwork was poor to fraudulent like Margaret Mead’s unfortunate sojourn in the South Pacific. The Senoi resemble Veddas of India, so it is probably true that they are ancient people.  Also, their skulls have Australoid features. In hair, they mostly have wavy hair (like Veddoids), a few have straight hair (like Mongoloids) and a scattering have woolly hair (like Negritos). Bottom line is that ancient Austroasiatics were probably Australoid types who resembled what the Senoi look like today. There has long been a line arguing that the Vietnamese language is related to Sino-Tibetan (the family that Chinese is a part of). Even those who deny this acknowledge that there is a tremendous amount of borrowing from Chinese (especially Cantonese) to Vietnamese. This level of borrowing so long ago makes historical linguistics a difficult field. Here is an excellent piece by a man who has done a tremendous amount of work detailing his case for Vietnamese as a Sino-Tibetan language. It’s not for the amateur, but if you want to dip into it, go ahead. I spent some time there, and after a while, I was convinced that Vietnamese was indeed a Sino-Tibetan language. One of the things that convinced me is that if borrowing was involved, seldom have I seen such a case for such a huge amount of borrowing, in particular of basic vocabulary. I figured the  case was sealed. Not so fast now. Looking again, and reading some of Joseph Greenberg’s work on the subject, I am now convinced otherwise. There is a serious problem with the cognates between Vietnamese and Chinese, of which there are a tremendous number. This problem is somewhat complex, but I will try to simplify it. Briefly, if Vietnamese is indeed related to Sino-Tibetan, its cognates should be not only with Chinese, but with other members of Sino-Tibetan also. In other words, we should find cognates with Tibetan, Naga, Naxi, Tujia, Karen, Lolo, Kuki, Nung, Jingpho, Chin, Lepcha, etc. We should also find cognates with those languages, where we do not find them in Chinese. That’s a little complicated, so I will let you think about it a bit. Further, the comparisons between Chinese and Vietnamese should be variable. Some should look quite close, while others should look much more distant. So there’s a problem with the Vietnamese as ST theory. The cognates look like Chinese. Problem is, they look too much like Chinese. They look more like Chinese than they should in a genetic relationship. Further, they look like Chinese and only Chinese. Looking for relationships in S-T outside of Chinese, and we find few if any. That’s a dead ringer for borrowing from Chinese to Vietnamese. If it’s not clear to you how that is, think about it a bit. Looking at Mon-Khmer, the case is not so open and shut. There seem to be more cognates with Chinese than with Mon-Khmer. So many more that the case for Vietnamese as AA looks almost silly, and you wonder how anyone came up with it. But let us look again. The cognates with AA and Vietnamese are not just with its immediate neighbors like Cambodian and Khmu but with languages far off in far Eastern India like Munda and Santali. There are words that are found only in the Munda branch in one or two obscure languages that somehow show up again as cognates in Vietnamese. Now tell me how Vietnamese borrowed ancient basic vocabulary from some obscure Munda tongue way over in Northeast India? It did not. How did those words end up in some unheard of NE Indian tongue and also in Vietnamese? Simple. They both descended long ago from a common ancestor. This is Historical Linguistics. The concepts I have dealt with here are not easy for the non-specialist to figure out, but most smart people can probably get a grasp on them. A different subject is the deep relationships of AA. Is AA related to any other languages? I leave that as an open question now,  though there does appear to be a good case for AA being related to Austronesian. One good piece of evidence is the obscure AA languages found in the Nicobar Islands off the coast of Thailand. Somehow, we see quite a few cognates in Nicobarese with Austronesian. We do not see them in any other branches of AA, only in Nicobarese. This seems odd,  and it’s hard to make a case for borrowing. On the other hand, why cognates in Nicobarese and only in Nicobarese? Truth is there are some cognates outside of Nicobarese but not a whole lot. In historical linguistics, one thing we look at is morphology. Those are parts of words, like the -s plural ending in English. In both AA and Austronesian, we have funny particles called infixes. Those are what in English we might call prefixes or suffixes, except they are stuck in the middle of the word instead of at the end or the beginning. So, in English, we have pre- as a prefix meaning “before” and -er meaning “object that does X verb”. So pre-destination means that our lives are figured out before we are even born.  Comput-er and print-er are two objects, one that computes and the other that prints. If we had infixes instead, pre-destination would look something like destin-pre-ation and comput-er and print-er would look something like com-er-pute and prin-er-t. Anyway, there are some fairly obscure infixes that show up not only in some isolated languages in AA but also in far-flung Austronesian languages in, say, the Philippines. Ever heard of the borrowing of an infix? Neither have I? So were those infixes borrowed,  and what are they doing in languages as far away as Thailand and the Philippines, and none in between? Because they  got borrowed? When? How? Forget it. Bottom line is that said borrowing did not happen. So what are those infix cognates doing there? Probably ancient particles left over from a common language that derived both Austronesian and AA, probably spoken somewhere in SW China maybe 9,000 years ago or more. Why is this sort of long-range comparison so hard? For one thing, because after 9,000 years or more, there are hardly any cognates left anymore, due to the fact of language change. Languages change and tend to change at a certain rate. After 1000X years, so much change has taken place that even if two languages were once “sprung from a common source,” in the famous words of Sir William Jones in his epochal lecture to the Asiatic Society in Calcutta on February 2, 1786, there is almost nothing, or actually nothing, left to show of that relationship. Any common words have become so mangled by time that they don’t look much or anything alike anymore. So are AA and Austronesian related? I think so, but I suppose it’s best to say that it has not been proven yet. This thesis is part of a larger long-range concept known as “Austric.” Paul Benedict, a great scholar, was one of the champions of this. Austric is normally made up of AA, Austronesian, Tai-Kadai (the Thai language and its relatives) and Hmong-Mien (the Hmong and Mien languages). Based on genetics, the depth of Austric may be What Makes Vietnamese So Chinese? An Introduction to Sinitic-Vietnamese Studies.

A Few Words on Language Endangerment

Carlos Lam: Congrats! However, isn’t language death a rather standard occurrence among societies?

It is, but we linguists don’t really like it. It is quite a debate going on, but the bottom line seems to be that ethnic groups and speaker groups have the right to ownership of their languages. We worry that a lot of speaker groups are being pressured into blowing up their languages prematurely. We like to study these languages and we are not real happy about seeing them vanish into the horizon. On the other hand, is cultural death a natural thing too? Both cultural death and language death are occurring at rates far beyond the normal background rates. English and some of the other major languages are like weapons of mass destruction in taking out languages. You really want a world with one language and one culture? I don’t. The best position seems to be that speakers have the right to decide the fate of their languages. If speakers wish to continue speaking their languages, then governments and linguists should help them to preserve and continue to develop their languages. Quite a few groups do not seem to care that their languages are going are extinct or they are even driving or drove their languages extinct, and they have the full right to do so. In these cases, we will simply do salvage linguistics. There are many salvage linguistics projects going on in the world today. You won’t get very far with linguists arguing that language death is a good thing. Most people don’t think so. Occurring at the same time as language death is a lot of language revitalization. Even fully dead languages are being resurrected from the grave. Also in addition to language death, we are creating new languages all the time. In this piece, I created a total of net 13 new languages. And new languages are occurring on their own. To give you an example. A group of Crimean Tatars moved from Crimea to Turkey about 200 years ago in the course of the Crimean War. They have been speaking Crimean Tatar in Turkey ever since, for 200 years now. But in that time, Crimean Tatar in Turkey and Crimean Tatar in Ukraine has diverged so much that Turkish Crimean Tatar is now, in my opinion, a fully separate tongue from the Ukrainian language. This is because in Turkey, a lot of Turkish has gone into Turkish Crimean Tatar which is not well understand in the Ukraine. And in the Ukraine, a lot of Russian has gone in which is not well understood in Turkey. Hence, Crimean Tatar speakers in Turkey and Ukraine can no longer understand each other well. To give you another example, there are many Kazakh speakers in China. However, Kazakh speakers in China can no longer understand Standard Kazakh broadcasts from Kazakhstan because so many Russian loans have gone into Standard Kazakh that it is no longer intelligible with Chinese Kazakh speakers. I learned this too late for my paper, otherwise I would have split Chinese Kazakh off as a separate language. There are many cases like this. Further, many languages are being discovered. Sonqori, Western Khalaj, Todzhin, Duha, Dukha and Siberian Tatar are just a few of the new languages that I created. Khorosani Turkic was split into three different languages. Dayi was subsumed into one of the Khorosani Turkic languages. Altai was split from one into five separate languages, but the truth is that it is six languages, not five. Salar was split into Western Salara and Eastern Salar. Ili Turki was eliminated becuase it does not even exist. It is simply a form of Uighur. Kabardian and Balkar, Tatar and Bashkir, Kazakh and Kirghiz were some languages that were eliminated and subsumed into single tongues such as Tatar-Bashkir, Kazakh-Kirghiz, and Kabardian-Balkar. And on and on. Languages and of course dialects are dying all the time, but new languages are being created by humans and by linguists as we continue our splitting projects. Many lects referred to as dialects are more properly seen as separate languages. Chinese is at least 450 separate languages, only 14 of which are recognized. German may be up to 130 separate languages, only 20 of which are recognized. There are quite a few more languages to be created out there, but there is a lot of resistance to splitters like me from more conservative linguists and especially from linguistic nationalists. For while Chinese may well be over 1,000 languages, the Chinese government is anti-scientifically insistent that there is but one Chinese language and maybe 2,000 “dialects,” most of which are probably separate languages. The German government is quite resistant to the idea that there is more than one form of German, though I believe Bavarian and Swiss German have official status in Austria and Switzerland.

Is There a Language That is (Nearly) Impossible to Learn to Speak Without Growing up with It?

Answer from Quora I recently talked to a man who is learning Min Nan, which is a Sinitic language often called a dialect of Chinese. He told me that Min Nan speakers say that the tones are so hard that no one who doesn’t grow up speaking Min Nan ever seems to get it very well. Cantonese is a similar language that is very difficult. It is much harder than Mandarin, and many native Mandarin speakers say they tried to learn Cantonese and gave up on it because it was too hard. Cantonese has nine tones. Basque is said to be very hard to learn unless you grow up with it. There is a joke that the Devil spent seven years trying to learn Basque, and he only learned how to say Hello and Goodbye. Navajo would also be hard. Even Navajo children struggle quite a bit learning Navajo and don’t seem to get it well until maybe age 12. When Navajo children arrive at school, they often do not speak Navajo well yet. Korean is a surprise, but apparently it is very hard to learn well. A native Korean speaker told me that Korean is so hard that no Korean speaker ever speaks it with 10 Czech is also hard. Even most Czech speakers never get Czech all the way. They have TV contests in Czechoslovakia where they try to stump native speakers with hard forms in the language. If you can last 30 minutes without making even one error, you win. I think only two men have been able to do it, but one was a non-native speaker! Piraha, spoken in the Brazilian Amazon, is also very hard. Over the course of a few centuries, several Portuguese speaking priests had tried to learn Piraha, but they had all given up because it was too hard. And these same priests had been able to master a number of other Indian languages, but Piraha was just too much. Daniel Everett learned the language and wrote important papers on it. He is only of the only non-native speakers who was able to learn the language. Tsez, spoken in the Caucasus, is also murderously hard. Every verb can have over 100,000’s of possible forms. I understand that even native speakers make regular errors when speaking Tsez.

What Race Is This Person (Singapore)?

13043717_1174597142564927_1383531800797081546_n
An interesting phenotype from Singapore.
This is the aunt of a friend of mine. The family is from Singapore. They are part of an ethnic group called the Pernakans, a Southern Chinese group that moved to Malaysia ~600 years ago for some reason, possibly due to overcrowding in Fujian or worse, the terrible wars that periodically raged through the region. Chinese groups have been leaving from this part of Southern China for a very long time now, especially in the last 200 years. In the past couple of centuries, this part of China has become very crowded. Possibly as a result, wild and vicious wars periodically raged through the area, sometimes killing 100,000’s of people. If you study Chinese history, you will hear about these wars a lot. It is not uncommon to read that invaders conquered several large cities and exterminated the whole populations of perhaps 300,000 people, men, women and children. This is how the Chinese have often fought wars. Chinese wars are unbelievably vicious and savage. The Pernakans moved to Malaysia, and over time, bred in with Dutch and Portuguese and to a lesser extent British Europeans. All three were colonists in the region. I believe that they were Min speakers, but their Hokkien has gotten so changed, in particular from massive borrowings from Malay, that these languages in general are no longer intelligible with Amoy or Taiwanese Hokkien Proper. Most Pernakans now are somewhat Eurasian, Chinese crossed with Dutch, Portuguese and sometimes British. The Pernakans had their own patriarchal culture and were known as very hard workers, often at manual labor type jobs like farming, timber harvest are working on rubber plantations. They committed little crime and had very orderly societies. The European colonists marveled at their high level of civilization. They did keep slaves, but they probably treated their slaves better than any slaves have ever been treated, and in many cases, slaves were freed. Over time, most Pernakans also bred in with Malays. Pernakans are now a Chinese/Malay/European race, but the Asiatic tends to be prominent over the European in the stock. The mixing of cultures over 600 years in Malaysia resulted in some very interesting fine cuisine. Many of these Chinese migrated to Singapore, where they, along with Teochew speakers (another Min group) and a large group of Cantonese Chinese, form what is known as the Singaporean Chinese, one of the wealthiest and most economically advanced ethnic groups on Earth. There is still a division of labor in Singapore, with Chinese on top, Malays on the bottom, and Southern Indian Dravidian speakers in between. Nevertheless all three groups are substantially mixed by this point. Most Chinese have Malay blood, and a lot of Malays have some Chinese in them. Malays and Indians are now intermarrying quite a bit. There is some ethnic conflict but not a lot possibly due to the wealth and everyone being so mixed. Although this woman has a somewhat archaic phenotype (note prognathism), these archaic types are fairly common in Southern China. Many can be seen in the mountains of Yunnan Province. The archaism may be due to incomplete transition from Australoid -> Mongoloid, as the transition happened much later in Southern China than in Northern China, and prominent Australoid types were common in the far south of China only 3-4,000 YBP. I also believe that this woman may be admixed with Caucasian. And I think the Malay admixture is quite clear. Perhaps I am mistaken, but I think I see some Vedda influence here. That would not be unusual, as Malays were Veddoids only until quite recently, and the Senoi are Veddoids to this day. The Mani Negritos are also still extant. The transition in Malaysia went from Australoid Negritos (Mani) and Orang Asli -> Australoid Veddas (Senoi) -> Paleomongoloid Southeast Asians (modern Malays). The Malays appear to be aware of this transition, as they state that the Mani and Orang Asli are their ancestors. The bloodline of the Orang Asli goes back 72,000 YBP, so this group has been present in Malaysia since the very first Out of Africa groups, and their archaism is about on a par with the Andaman Islanders, another Australoid group which is also the remains of some of the earliest OOA groups.

The Roots of the Alphabet(s)

Probably most of you do not know that we are all using a variant of the ancient Phoenician alphabet. Actually I am not sure if that is precisely true, as I think the Phoenician alphabet was preceded by an Assyrian one. But at any rate, our classic Western alphabets all came out of the Levant and Mesopotamia in some way or other. Indeed, it is even theorized that many of the syllabaries in use in Central, South and Southeast Asia are also rooted in this original alphabet from the Levant.

Of course, Chinese and consequently Korean and Japanese alphabets have another origin.

One might wish to throw the odd SE Asian orthographies such as Thai, Lao, Burmese, Vietnamese, Javanese, Sundanese and Khmer there, but my understanding is that all of those SE Asian orthographies were actually derived from syllabaries originally designed in India.

A few writing systems such as Georgian, Armenian and Cree may have been created de novo, but I might have to look that up. The only non-Middle Eastern derived orthography that immediately comes to my mind is the Chinese ideographs.

The origins of the Assyrian/Phoenician alphabet appear to have been ultimately in Egyptian hieroglyphics. So the ancient Egyptians really started it all when it comes to writing down words, at least for the West.

Chinese ideographs may date from even earlier. Chinese bone writing goes way back.

Very early European writing such as runic systems and similar systems in Asia such as the Turkic Orkhon inscriptions may not be related to the Phoenician system at all. The Yukaghir in Siberia and the Yi in South China may also have designed de novo systems.

Is There a Language That Is Almost Impossible to Learn Without Growing Up with It?

A question was recently asked on Quora. Here is my answer.

Hello, I recently talked to a Westerner who is learning Min Nan, which is a Sinitic language often called a dialect of Chinese. He already speaks Mandarin, but he told me Min Nan if vastly harder than Mandarin. At age 35, he was studying it 2 hours a day, and at some point, he hit a wall, and he didn’t seem to be making any progress. He kept adding more study hours to the day  – four hours, six hours – with little effect. Finally when he was studying it for eight hours a day, he started making some good progress. I believe he said contour tones and tone sandhi were the major roadblocks.

Min Nan speakers say that even Cantonese is easier than Min Nan, and Cantonese is deadly hard. They also say that Min Nan tones are so hard that no one who did not learn Min Nan growing up gets anywhere near native fluency.

Cantonese is a similar language that is very difficult. It is much harder than Mandarin, and many native Mandarin speakers say they tried to learn Cantonese and gave up on it because it was too hard. Cantonese has 9 tones. The general consensus among Chinese is that Cantonese is much harder to learn than Mandarin.

Basque is said to be very hard to learn unless you grow up with it. There is a joke that the Devil spent seven years trying to learn Basque, and he only learned how to say Hello and Goodbye.

Navajo would also be murderously hard. Even Navajo children struggle quite a bit learning Navajo. When they show up at school at age 5-6, they are still struggling with Navajo. There are reports that Navajo children don’t seem to get Navajo well until maybe age 12.

Korean is a surprise, but apparently it is very hard to learn well. A native Korean speaker told me that Korean is so hard that no Korean speaker ever speaks it with 10

As another respondent pointed out, Japanese is also quite notorious, and most Westerners get nowhere near native fluency.

Czech is also hard. Even most Czech speakers never get Czech all the way. They have TV contests in Czechoslovakia where they try to stump native speakers with hard forms in the language. If you can last 30 minutes without making even one error, you win. I think only two men have been able to do it, but one was a non-native speaker! Czech also has a strange r sound found only in one other language on Earth. It is said that no native speaker ever gets this phoneme quite right.

Piraja is also very hard as another respondent pointed out. Only two non-natives have ever been able to speak Piraha with any fluency. When Daniel Everett went to study the language, he found a number of reports from priests who had tried to learn Piraha since the early 1800’s, and only one had succeeded. The others tried to learn but gave up because they said it was too hard.

Tsez, spoken in the Caucasus, is also murderously hard. Every verb can have tens of thousands of possible forms. Reports say that even native speakers make regular errors when speaking Tsez.

Hardest Languages on Earth Found in China

Article here. Video here. The most complex language on Earth is Fengxian Wu, spoken near Shanghai. The second most complex was the language of the Dong people of Southwest China. The third most complex was the language of the Buyang people, also of SW China. The study took over 10 years and involved teams from both the Anthropology and Linguistics Departments at Fudan University. They also found that the languages of Eurasia were more complex than the languages of Africa or the Americas. The article suggests that this may mean that humans came out of Asia instead of out of Africa. This is an old conceit of the Chinese. They just can’t handle the idea that humans came out of Africa, and they are always fighting this idea. They even claim that there proto-humans have special characteristics that mean that they could not possibly have come out of Africa. The Chinese have long been pushing the multiregional theory of human evolution. I am not sure why the Chinese feel that way, but it may have to do with not wanting to believe that they came from Black people. The Chinese have always thought that that China was the center of the world. The ancient belief is that China is the land where all four winds (north, south, east and west) arise. It’s a silly nationalistic conceit and the sooner they dump it, the better. Here is a bit on Fengxian Wu and the Dong or Kam languages, focusing on their difficulty:

Sino-Tibetan Chinese

A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of Shanghai (Fengxian Wu) was the most complex language of all, with 20 separate vowels. The nearest competitor was Norwegian with 16 vowels. Fengxian Wu gets a 5 rating, hardest of all.

Kam-Sui

The Kam languages of the Dong people in southwest China were rated by the Fudan University study referenced above under Wu as the 2nd most complex on Earth. There are 32 stem initial consonants, including oddities like , tɕʰ, , pʲʰ, ɕ, , kʷʰ, ŋʷ, tʃʰ, tsʰ. Note the many contrasts between aspirated and unaspirated voiceless consonants, including bilabial palatalized stops, labialized velar stops, and alveolar affricates. There are an incredible 64 different syllable finals, and 14 others that occur only in Chinese loans. There are an astounding 15 different tones, nine in open syllables and six in checked syllables (entering tones). The main tones are high, high rising, high falling, low, low rising, low falling, mid, dipping and peaking. Kam gets a 5 rating, hardest of all. If you think this website is valuable to you, please consider a contribution to support the continuation of the site. Donations are the only thing that keep the site operating.

A Look at the Korean Language

From here. A look at the Korean from the perspective of an English speaker trying to learn the language. The truth is that Korean is one of the hardest languages on Earth for an English speaker to learn. Most agree that Korean is a hard language to learn. The alphabet, Hangul at least is reasonable; in fact, it is elegant. But there are four different Romanizations – Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul. Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether because you really need to know the hanja or Chinese character that is in back of the Hangul symbols. Bizarrely, there are two different numeral sets used, but one is derived from Chinese so it should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems. Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings. Japanese has a similar problem with homonyms, but at least with Japanese you have the benefit of kanji to help you tell the homonyms apart. With Korean Hangul, you get no such advantage. Similarly, there seem to be many ways to say the same thing in Korean. The learner will feel when people are using all of these different ways of saying the same thing that they are actually saying something different each time, but that is not the case. One problem is that the bp, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible. The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. On the other hand, Japanese or Chinese will help you a lot with Korean. Chinese and Japanese speakers can usually learn Korean quickly. Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand. Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway. Maybe 6 Speakers of Korean can learn Japanese fairly easily. Korean seems to be a more difficult language to learn than Japanese. There are maybe twice as many particles as in Japanese, the grammar is dramatically more difficult and the verbs are quite a bit harder. The phonemic inventory in Korean is also larger and includes such oddities as double consonants. Korean is rated by language professors as being one of the hardest languages to learn. Korean is rated 5, hardest of all.

A Look at the Japanese Language

From here. A look at Japanese, with a view to how hard it is to learn for a speaker of English Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English. The Japanese orthography is one of the most difficult to use of any orthography. There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood. The Japanese writing system is probably crazier than the Chinese writing system. Japanese borrowed Chinese characters. But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millenia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse. Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character. There are some writers, typically of literature, who deliberately choose to use kanji that even Japanese people cannot read. For instance, Ryuu  Murakami  uses the odd symbols 擽る、, 轢く、and 憑ける. The Japanese system is made up of three different systems: the katakana and hiragana (the kana) and the kanji, similar to the hanzi used in Chinese. Chinese has at least 85,000 hanzi. The number of kanji is much less than that, but kanji often have more than one meaning in contrast to hanzi. Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English. A common problem is that a perfectly grammatically correct sentence uttered by a Japanese language learner, while perfectly correct, is still not acceptable by Japanese speakers because “we just don’t say it that way.” The Japanese speaker often cannot tell why the unacceptable sentence you uttered is not ok. On the other hand, this problem may be common to more languages than Japanese. There is also a class of Japanese called “honorifics” or “keigo” that is quite hard to master. Honorifics are meant to show respect and to indicate one’s place or status in the social hierarchy. These typically effect verbs but can also affect particles and prefixes. They are usually formed by archaic or highly irregular verbs. However, there are both regular and irregular honorific forms. Furthermore, there are five different levels of honorifics. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play. Although it is true the Japanese young people are said to not understand the intricacies of keigo, it is still expected that they know how to speak this well. Consequently, many young Japanese will opt out of certain conversations because they feel that their keigo is not very good. Books explaining how to use keigo properly have been big sellers among young people in Japan in recent years as young people try to appear classy, refined or cultured. In addition, Japanese born overseas (especially in the US), while often learning Japanese pretty well, typically have a very poor understanding of keigo. Instead of embarrassing themselves by not using keigo or using it wrong, these Japanese speakers often prefer to speak in English to Japanese people rather than bother with keigo-less Japanese. Overcorrection in keigo is also a problem when hypercorrection leads to someone making errors in keigo due to “trying to hard.” This looks like phony or insincere politeness and is often worse than not using keigo at all. One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things which involve the use of a complex numerical noun classifier system. Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that all noun modifiers, even phrases, must precede the nouns they are modifying. It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative. In this sentence: The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM. Everything underlined must precede the noun plane: Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM. One of the main problems with Japanese grammar is that it is going to seem to so different from the sort of grammar and English speaker is likely to be used to. Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning. However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs. Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension. The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing. Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words. Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. A study by the US Navy concluded that the hardest language the corpsmen had to learn in the course of service was Japanese. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily. Japanese is rated 5, hardest of all. Classical Japanese is much harder to read than Modern Japanese. Though you can get by with much less kanji when reading the modern language, you will need a minimum knowledge of 3,000 kanji for reading Classical Japanese, and that’s using a dictionary. There are only about 500-1,000 frequently used characters, but there are countless other words that will come up in your reading especially say special words used in the Imperial Court. Many words have more than one meaning, and unless you know this, you will be lost. 東宮(とうぐう) for instance means Eastern Palace. However, it also means Crown Prince because his residence was to the east of the Emperor’s. The movie The Seven Samurai (set in the late 1500’s) seems to use some sort of Classical Japanese, or at least Classical vocabulary and syntax with modern pronunciation. Japanese language learners say they can’t understand a word of the archaic Japanese used in this movie. Classical Japanese gets 5, hardest of all.

A Look at the Chinese Language

From here. This post will look at how hard it is to learn Chinese for an English speaker. It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you hit a wall, often because the isolating syntactic structure is so strangely different from English. Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English with no tense or articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense. Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There are serial verbs, a complex classifier system, syntax marked by something called topic-prominence, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff. Verb complements can be baffling, especially potential and directional complements. The 了 character can have seemingly countless meanings. You also need to learn quite a bit of vocabulary just to speak simple sentences. Chinese phonology is not as easy as some say. There are too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same. There is a distinction between aspirated and nonaspirated consonants which does not exist in English. Chinese orthography is probably the hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more (although this is controversial), but you only need to know about 4-6,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than The Communists tried to simplify the system (simplified Mandarin), but they simply decreased the number of strokes needed for each symbol. The Communists’ spelling reform left much to be desired. To make matters worse, there are different ways to write each symbol – different styles of Chinese calligraphy. For instance, Classical Chinese may be written in so called “grass-style” calligraphy or in another style altogether. It’s a real problem when you encounter a symbol you don’t know because there is often no good way to sound out the word as the system simply is not very phonetic. The Chinese alphabet is probably only 2 Furthermore, word boundaries are not obvious, as one character does not necessarily equal one word. Therefore it is hard to tell where one word starts and stops and another one begins. Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary. Furthermore, merely learning how to look up words in the dictionary in the first place takes new Chinese learners several months and learning how to use a dictionary well is typically not possible until a year of study. Even people who have studied for several years sometimes encounter characters that they simply cannot find in the dictionary. In China, dictionary look-up contests are often held, showing that the process is not transparent at all. A good student of Chinese often has more than one dictionary, and some have up to 20 different dictionaries. There are separate dictionaries for simplified and traditional characters and dictionaries that have both. There are entire dictionaries just for Classical Chinese particles and others for four character idioms (chéngyǔ), a type of allegorical sayings with two parts (xiēhòuyǔ), and another for proverbs (yànyǔ). There are separate dictionaries for terms that entered Chinese during the Chinese era and others for specifically Buddhist terms. There is an easier way to use a Chinese dictionary called four-part look-up, but it takes a long time to learn it and most learners never master it for whatever reason. To solve all of these problems with the ideographic writing system, numerous romanization schemes have been invented. At last count, there were a dozen or so of them, but a number of those are rarely used. Certainly, there are 2-3 heavily used ones and that is not counting the bomofu phonetic alphabet used in Taiwan. One of the main problems with these romanization systems is that none of them are very good and they all have serious limitations. Furthermore, the romanization system you studied as a Chinese learner tends to affect your accent in Chinese. Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense. The writing system is often so opaque that even native speakers forget how to write the characters of eve commonly used words. Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese (wenyanwen) prose. It’s actually written in a different language, so to learn to read Chinese properly like an educated Chinese person does, you will have to learn not one language but two. One rejoinder is that Classical Chinese to Chinese people is similar to Greek and Latin to an English speaker, but this is a bad analogy, as Classical Chinese is widely studied in Chinese secondary schools and some of the finest Chinese prose is written in this language (see the Confucius and Mencius examples below). Further, after studying French for a few years, you should be able to read French authors who wrote 300 years ago, but after a similar period of studying Chinese, you will not be able to read Confucius or Mencius. Hence most educated Chinese would be expected to know something about Classical Chinese, and if you wanted to learn Chinese like an educated Chinese speaker, you would have to learn this other language also. In addition, you need to learn Classical Chinese even if you do not aspire to be an educated Chinese speaker because  one encounters Classical Chinese often in modern Chinese society, often in paintings or character scrolls. The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another. One problem with the tone system is that when you want to change the meaning of a sentence in a subtle manner via changing intonation of a word, you are bound to change the tone of the word in Chinese. Merely by placing semantic emphasis on a single word, you may deliver a gibberish sentence. Chinese speakers have their own way of using tone as a way of generating subtle semantic meaning, but they do so in an entirely different way than speakers of non-tonal languages do. However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy. A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones. Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones even with the tones, and in that case, meaning is often discerned by context, stress, rhythm and intonation. Chinese, like French and English, is heavily idiomatic. It’s little known, but Chinese also uses different forms to count different things, like Japanese. There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms and have no cognates to fall back on. In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling. mei meiyounger sister jie jieolder sister ge geolder brother di diyounger brother Many agree that Chinese is the hardest to learn of all of the major languages. In a recent international survey of language professors worldwide, these teachers rated Chinese as the hardest language to learn among languages that are commonly studied. Mandarin gets a 5 rating for extremely hard. However, Cantonese is even harder to learn than Mandarin. Cantonese has nine tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken. In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Modal particles are difficult in Cantonese. Clusters of up to the 3 sentence final particles are very common. 我食咗飯 and 我食咗飯架啦喎 are both grammatical for I have had a meal, but the particles add the meaning of I have already had a meal or answering a question or even to imply I have had a meal, so I don’t need to eat anymore. Cantonese gets a 5.5 rating, close to hardest of all. Min Nan is also said to be harder to learn than Mandarin, as it has a more complex tone system, with five tones on three different levels. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor and many fewer children are being raised speaking it than before. Min Nan gets a 5.5 rating, close to hardest of all. A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of Shanghai (Fengxian Wu) was the most complex language of all, with 20 separate vowels. The nearest competitor was Norwegian with 16 vowels. Fengxian Wu gets a 5.5 rating, close to hardest of all. Classical Chinese is still read by many Chinese people and Chinese language learners. Unless you have a very good grasp on modern Chinese, classical Chinese will be completely wasted on you. Classical Chinese is much harder to read than reading modern Chinese. Classical Chinese covers an era extending over 3,000 years, and to attain a reading fluency in this language, you need to be familiar with all of the characters used during this period along with all of the literature of the period so you can understand all the allusions. Even with a knowledge of Classical Chinese, you need to read it in context. If you are good at Classical Chinese and someone throws you a random section of it, it will take you a good amount of time to figure it out unless you know context. The language is much more to the point than Modern Chinese, but this is not as good as it sounds. This simplicity leaves a room for ambiguity and context plays an important role. A joke about some obscure historical or literary anecdote will be lost you unless you know what it refers to. For reading modern Chinese, you will need at least 5,000 characters, but even then, you will still need a dictionary. With Classical Chinese, there are no lower limits on the number of characters you need to know. The sky is the limit. Classical Chinese gets a 5.5 rating, close to hardest of all.

More On The Hardest Languages To Learn – Non-Indo-European Languages

Caution: This post is very long. It runs to 200 pages on the Net. Updated January 17, 2016.

This is a continuation of the earlier post. I split it up into two parts because it had gotten too long.

The post refers to which languages are the hardest for English speakers to learn, though to some extent, the ratings are applicable across languages. Most Chinese speakers would recognize Spanish as being an easy language, despite its alien nature. And even most Chinese, Navajo, Poles or Czechs acknowledge that their languages are hard to learn. To a certain extent, difficulty is independent of linguistic starting point. Some languages are just harder than others, and that’s all there is to it.

Method, Results and Conclusion. See here.

In this case, 73 non-IE languages were examined.

Ratings: Languages are rated 1-6, easiest to hardest. 1 = easiest, 2 = moderately easy to average, 3 = average to moderately difficult, 4 = very  difficult, 5 = extremely difficult, 6 = most difficult of all.

Time needed: Time needed to learn the language “reasonably well”: Level 1 languages = 3 months-1 year. Level 2 languages = 6 months-1 year. Level 3 languages = 1-2 years. Level 4 languages = 2 years. Level 5 languages = 3-4 years, but some may take longer.

Here is a list of the ratings for the languages below as a handy reference.

 

Malagasy 1.0 Bahasa Indonesian 1.5 Aymara 2.0 Malay 2.0 Hawaiian 2.0 Swahili 2.0 Maori 3.0 Turkish 3.5 Quechua 4.0 Maltese 4.0 Tamil 4.0 Tagalog 4.0 Anyi 4.0 Egyptian Arabic 4.5 Moroccan Arabic 4.5 Amharic 4.5 Estonian 4.5 Khmer 4.5 Lao 4.5 Georgian 5.0 Gros Ventre 5.0 Karok 5.0 MSA Arabic 5.0 Hebrew 5.0 Somali 5.0 Malayalam 5.0 Korean 5.0 Japanese 5.0 Finnish 5.0 Skolt Sami 5.0 Hungarian 5.0 Quiang 5.0 Tibetan 5.0 Dzongka 5.0 Vietnamese 5.0 Sedang 5.0 Hmong 5.0 Tsou 5.0 Sakai 5.0 Kwaio 5.0 Thai 5.0 Kam 5.0 Buyang 5.0 Ga 5.0 Ndali 5.0 Xhosa 5.0 Ndebele 5.0 Zulu 5.0 Taa 5.0 Ju|’hoan 5.0 Cherokee 5.5 Lakota 5.5 Classical Japanese 5.5 Mandarin 5.5 Cantonese 5.5 Min Nan 5.5 Dondan Wu 5.5 Basque 5.5 Chechen 6.0 Circassian 6.0 Tsez 6.0 Archi 6.0 Tabasaran 6.0 Ingush 6.0 Ubykh 6.0 Abkhaz 6.0 Burushaski 6.0 Kootenai 6.0 Yuchi 6.0 Tlingit 6.0 Navajo 6.0 Slavey 6.0 Haida 6.0 Salish 6.0 Nuxalk 6.0 Montana Salish 6.0 Straits Salish 6.0 Halkomelem 6.0 Lushootseed 6.0 Cree 6.0 Ojibwa 6.0 Cheyenne 6.0 Arapaho 6.0 Wichita 6.0 Huamelutec 6.0 Hopi 6.0 Nahuatl 6.0 Comanche 6.0 Chinantec 6.0 Jalapa Mazatec 6.0 Tarina 6.0 Bora 6.0 Tuyuca 6.0 Cubeo 6.0 Hixkaryána 6.0 Nambikwara 6.0 Pirahã 6.0 Australian Languages – 6.0 Berik 6.0 Amele 6.0 Valpan 6.0 Tamazight 6.0 Tachelhit 6.0 Dahalo 6.0 Classical Chinese 6.0 Inuktitut 6.0 Kalaallisut 6.0 Chukchi 6.0

Northeast Caucasian, Northwest Caucasian and Kartvelian

Of course the Caucasian languages like Tsez, Tabasaran, Georgian, Chechen, Ingush, Abkhaz and Circassian are some of the hardest languages on Earth to learn.

Chechen and Circassian are rated 6, hardest of all.

Northeast Caucasian

NE Caucasian languages have the uvulars and ejectives of Georgian in addition to pharyngeals, lateral fricatives, and other strangeness. They have noun classes like the Bantu languages (but usually fewer). Nevertheless, they have noun class agreement markers on verbs on adjectives. One thing NE Caucasian has is lots of case. Some languages have 40+ cases. They are built from the ground up via two forms – one a spatial form such as in, on or around and the other a directional motion form such as to, from, through or at.

Tsezic

Tsez has 64-126 different cases, making it by far the most complex case system on Earth! It is one of the few languages on Earth that has two genitive cases – Genitive 1 (-s) and Genitive 2 (-z). Genitive 1 is used when the genitive’s head noun is in absolutive case and Genitive 2 is used when the genitive’s head noun is in any other case. It also has four noun classes. It is said that even native speakers have a hard time picking up the correct inflection to use sometimes.

In Tsez, you need to know a lot Tsez grammar to communicate at a basic level. The sentence:

English: I like your mother.

Tsez: Дāьр деби энийу йетих. (Dǟr debi eniyu yetix.)

In order to speak that sentence in Tsez, you need to know:

• the words themselves (word order is not as important) • that the verb -eti- requires the subject to be in the dative/lative case and the object to be in the absolutive • the noun class for eniyu (class II) • the dative/lative form of di (I), which is dǟr • the genitive 1 form of mi (you), which is debi • the congruence prefix y- that corresponds to the noun class of the absolutive argument of the phrase, in this case mother • the present tense ending for vowel-final verbs -x

Tsez is rated 6, hardest of all.

Lezgic Archi

Archi has an extremely complex phonology and one of the most complicated grammars on Earth. The extreme fusional aspects and the verbal morphology are what make the grammar so difficult. Every verb root has 1,502,839 possible forms! It is also an ergative language, but there is irregularity in its ergative system.

Some verbs take the typical ergative/absolutive case (absolutive for the subject of an intransitive very and ergative for the subject of a transitive verb – where the direct object would be in absolutive). In others the subject is in dative rather than the expected ergative/absolutive case. These are usually verbs of perception like love/want, hear, see, feel, and be bored. For instance, the verb:

-эти- = to love/want must have its subject in dative case instead of the expected absolutive or ergative case.

Among non-click languages, Archi has one of the largest consonant inventories, with only the extinct Ubykh having more. There are 26 vowels and between 76 and 82 consonants, depending on the analysis. Five of the six vowels can occur in five varieties: short, pharyngealized, high tone, long (with high tone), and pharyngealized with high tone.

It has many unusual phonemes, including contrasts between several voiceless velar lateral fricatives, voiceless and ejective velar lateral affricates and a voiced velar lateral fricative. The voiceless velar lateral fricative ʟ̝̊, the voiced velar lateral fricative ʟ̝, and the corresponding voiceless and ejective affricates k͡ʟ̝̊ and k͡ʟ̝̊ʼ are extremely unusual sounds, as velar fricatives are not typically laterals.

There are 15 cases, 10 regular cases, five spatial cases and five directional cases. The Spatial cases are Inessive (in), Intrative (between), superessive (above), Subessive (below) and Pertingent (against). The directional cases are Essive (as), Elative (out of), Lative (to/into), Allative (onto), Terminative (specifies a limit) and Translative (indicates change).

There are four noun classes:

I Male human II Female human III All insects, some animates, and some inanimates IV Abstracts, some animates, and some inanimates that can only be seen via verbal agreement

Archi is rated 6, hardest of all.

Samur Eastern Samur Lezgi–Aghul–Tabasaran

Tabasaran is rated the 3rd most complex grammar in the world, with 48 different noun cases.

Tabasaran is rated 6, hardest of all.

Nakh Vainakh

Ingush has a very difficult phonology, an extremely complex grammar, and furthermore, is extremely irregular. Ingush also has a proximate/obviate distinction and is the only language in the region that has this feature. Ingush along with Chechen both have a closed class of verbs, an unusual feature in the world’s languages. New verbs are formed by adding a noun to the verb do:

shootdo gun

Ingush is rated 6, hardest of all.

Kartvelian Karto-Zan

One problem with Georgian is the strange alphabet: ქართულია ერთ ერთი რთული ენა. It also has lots of glottal stops that are hard for many foreigners to speak; consonant clusters can be huge – up to eight consonants stuck together (CCCCCCCCVC)- and many consonant sounds are strange. In addition, there are uvulars and ejectives. Georgian is one of the hardest languages on Earth to pronounce. It regularly makes it onto craziest phonologies lists.

Its grammar is exceedingly complex. Georgian is both highly agglutinative and highly irregular, which is the worst of two worlds. Other agglutinative languages such as Turkish and Finnish at least have the benefit of being highly regular. The verbs in particular seem nearly random with no pattern to them at all. The system of argument and tense marking on the verb is exceedingly complex, with tense, aspect, mood on the verb, person and number marking for the subject, and direct and indirect objects.

Although it is an ergative language, the ergative (or active-stative case marking as it is called) oddly enough is only used in the aorist and perfect tenses where the agent in the sentence receives a different case, while the aorist also masquerades as imperative. In the present, there is standard nominative-accusative marking. A single verb can have up to 12 different parts, similar to Polish, and there are six cases and six tenses.

Georgian also features something called polypersonal agreement, a highly complex type of morphological feature that is often associated with polysynthetic languages and to a lesser extent with ergativity.

In a polypersonal language, the verb has agreement morphemes attached to it dealing with one or more of the verbs arguments (usually up to four arguments). In a non polypersonal language like English, the verb either shows no agreement or agrees with only one of its arguments, usually the subject. Whereas in a polypersonal language, the verb agrees with one or more of the subject, the direct object, the indirect object, the beneficiary of the verb, etc. The polypersonal marking may be obligatory or optional.

In Georgian, the polypersonal morphemes appear as either suffixes or prefixes, depending on the verb class and the person, number, aspect and tense of the verb. The affixes also modify each other phonologically when they are next to each other. In the Georgian system, the polypersonal affixes convey subject, direct object, indirect object, genitive, locative and causative meanings.

g-mal-av-en = they hide you g-i-mal-av-en = they hide it from you

mal (to hide) is the verb, and the other four forms are polypersonal affixes.

In the case below,

xelebi ga-m-i-tsiv-d-a = My hands got cold.

xelebi means hands. The m marker indicates genitive or my. With intransitive verbs, Georgian often omits my before the subject and instead puts the genitive onto the verb to indicate possession.

Georgian verbs of motion focus on deixis, whether the goal of the motion is towards the speaker or the hearer. You use a particle to signify who the motion is heading towards. If it heading towards neither of you, you use no deixis marker. You specify the path taken to reach the goal through the use or prefixes called preverbs, similar to “verbal case.” These come after the deixis marker:

up             a-
out            ga-
in             sha-
down into      cha-
across/through garda-
thither        mi-
away           c’a-
or down        da-

Hence:

up towards me = amo-. The deixis marker is mo- and up is a-

On the plus side, Georgian has borrowed a great deal of Latinate foreign vocabulary, so that will help anyone coming from a Latinate or Latinate-heavy language background.

Georgian is rated 5, extremely difficult.

Northwest Caucasian

All NW Caucasian languages are characterized by a very small number of vowels (usually only two or three) combined with a vast consonant inventory, the largest consonant inventories on Earth. Almost any consonant can be plain, labialized or palatalized. This is apparently the result of an historical process whereby many vowels were lost and their various features became assigned to consonants. For instance, palatalized consonants may have come from Ci sequences and labialized consonants may have come from Cu sequences.

The grammars of these languages are complex. Unlike the NE Caucasian languages, they have simple noun systems, usually with only a handful of cases.

However, they have some of the complex verbal systems on Earth. These are some of the most synthetic languages in the Old World. Often the entire syntax of the sentence is contained within the verb. All verbs are marked with ergative, absolutive and direct object morphemes in addition to various applicative affixes.

These are akin to what some might call “verbal case.” For instance, in applicative voice systems, applicatives may take forms such as comitative, locative, instrumental, benefactive and malefactive. These roles are similar to the case system in nouns – even the names are the same. So you can see why some call this “verbal case.”

NW Caucasian verbs can be marked for aspect (whether something is momentous, continuous or habitual), mood (if something is certain, likely, desired, potential, or unreal). Other affixes can shape the verb in an adverbial sense, to express pity, excess or emphasis.

Like NE Caucasian, they are also ergative.

NW Caucasian makes it onto a lot of craziest language lists.

These are some of the strangest sounding languages on Earth. Of all of these languages, Abaza has the most consonants. Here is a video in the Abaza language.

Ubykh

Ubykh, a Caucasian language of Turkey, is now extinct, but there is one second language speaker, a linguist who is said to have taught himself the language. It has more consonants than any non-click language on Earth – 84 consonant sounds in all. Furthermore, the phonemic inventory allows some very strange consonant clusters.

Ubykh has many rare consonant sounds. is only also found in two of Ubykh’s relatives, Abkhaz and Abaza and in two other languages, both in the Brazilian Amazon. The pharyngealized labiodental voiced fricative  does not exist in any other language. It often makes it onto weirdest phonologies lists. Ubykh also got a very high score on a study of the weirdest languages on Earth.

Combine that with only two vowel sounds and a highly complex grammar, and you have one tough language.

In addition, Ubykh is both agglutinative and polysynthetic, ergative, and has polypersonal agreement:

Aχʲazbatʂʾaʁawdətʷaajlafaqʾajtʾmadaχ! If only you had not been able to make him take it all out from under me again for them…

There are an incredible 16 morphemes in that nine syllable word.

Ubykh has only four case systems on its nouns, but much case function has shifted over to the verb via preverbs and determinants. It is these preverbs and determinants that make Ubykh monstrously complex. The following are some of the directional preverbs:

  • above and touching
  • above and not touching
  • below and touching
  • below and not touching
  • at the side of
  • through a space
  • through solid matter
  • on a flat horizontal surface
  • on a non-horizontal or vertical surface
  • in a homogeneous mass
  • towards
  • in an upward direction
  • in a downward direction
  • into a tubular space
  • into an enclosed space

There are also some preverbal forms that indicate deixis:

j-  = towards the speaker

Others can indicate ideas that would take up whole phrases in English:

jtɕʷʼaa- = on the Earth, in the Earth

ʁadja ajtɕʷʼaanaaɬqʼa They buried his body. (Lit. They put his body in the earth.)

faa– = out of, into or with regard to a fire.

Amdʒan zatʃətʃaqʲa faastχʷən. I take a brand out of the fire.

Morphemes may be as small as a single phoneme:

wantʷaan They give you to him.

w – 2nd singular absolutive a – 3rd singular dative n – 3rd ergative – to give aa – ergative plural n – present tense

Adverbial suffixes are attached to words to form meanings that are often formed by aspects or tenses in other languages:

asfəpχaI need to drink it. asfəfanI can drink it. asfəɡʲanI drink it all the time. asfəlanI am drinking it all up. asfətɕʷan I drink it too much. asfaajənI drink it again.

Nouns and verbs can transform into each other. Any noun can turn into a stative verb:

məzəchild

səməzəjtʼ I was a child. (Lit. I child-waschild-was is a verb – to be a child.)

By the same token, many verbs can become nouns via the use of a nominal affix:

qʼato say

səqʼa what I say – (Lit. That which I saymy speech, my words, my language, my orders, etc.

Number is marked on the verb via a verbal suffix and is only marked on the noun in the ergative case.

However, it does lack the convoluted case systems of the Caucasian languages next door and there is no grammatical gender.

Ubykh is rated 6, hardest of all.

Abkhaz-Abazin

Abkhaz is an extremely difficult language to learn. Each basic consonant has eight different positions of articulation in the mouth. Imagine how difficult that would be for an Abkhaz child with a speech impediment. Abkhaz seems to put agreement markers on just about everything in the language. Abkhaz makes it onto many craziest language lists, and it recently got a very high score on a weirdest language study.

Abkhaz is rated 6, hardest of all.

Burushaski

Burushaski is often thought to be a language isolate, related to no other languages, however, I think it is Dene-Caucasian. It is spoken in the Himalaya Mountains of far northern Pakistan in an area called the Hunza. It’s verb conjugation is complex, it has a lot of inflections, there are complicated ways of making sentences depending on many factors, and it is an ergative language, which is hard to learn for speakers of non-ergative languages. In addition, there are very few to no cognates for the vocabulary.

Burushaski is rated 6, hardest of all.

American Indian Languages

American Indian languages are also notoriously difficult, though few try to learn them in the US anyway. In the rest of the continent, they are still learned by millions in many different nations. You almost really need to learn these as a kid. It’s going to be quite hard for an adult to get full competence in them.

One problem with these languages is the multiplicity of verb forms. For instance, the standard paradigm for the overwhelming number of regular English verbs is a maximum of five forms:

steal steals stealing stole stolen

Many Amerindian languages have over 1,000 forms of each verb in the language.

Kootenai

Yet the Salishans (see below) always considered the neighboring language Kootenai to be too hard to learn. Kootenai also has a distinction between proximate/obviate along with direct/inverse alignment, probably from contact with Algonquian.

However, the Kootenai direct/inverse system is less complex than Algonquian’s, as it is present only in the 3rd person. Kootenai also has a very strange feature in that they have particles that look like subject pronouns, but these go outside of the full noun phrase. This is a very rare feature in the world’s languages. Kootenai scored very high on a weirdest language survey.

Kootenai is an isolate spoken in Idaho by 100 people.

Kootenai is rated 6, hardest of all.

Yuchi

Yuchi is a language isolate spoken in the Southern US. They were originally located in Eastern Tennessee and were part of the Creek Confederacy at one time. Yuchi is nearly extinct, with only five remaining speakers.

Yuchi has noun genders or classes based on three distinctions of position: standing, sitting or lying. All nouns are either standing, sitting or lying. Trees are standing, and rivers are lying, for instance. It it is taller than it is wide, it is standing. It if is  wider than it is tall, it is lying.

If it is about as about as wide as it is tall, it is sitting. All nouns are one of these three genders, but you can change the gender for humorous or poetic effect. A linguist once asked a group of female speakers whether a penis was standing, sitting or lying. After lots of giggles, they said the default was sitting, but you could say it was standing or lying for poetic effect.

Also all Yuchi pronouns must make a distinction between age (older or younger than the speaker) and ethnicity (Yuchi or non-Yuchi).

Yuchi gets a 6 rating, hardest of all.

Dene-Yeniseian Na-Dene Athabascan-Eyak Tlingit

Tlingit is probably one of the hardest, if not the hardest, language in the world. Tlingit is analyzed as partly synthetic, partly agglutinative, and sometimes polysynthetic. It has not only suffixes and prefixes, but it also has infixes, or affixes in the middle of words.

‘eechto pick

All prefixes must be in proper order for the word to work.

tuyakaoonagadagaxayaeecheen. I am usually picking, on purpose, a long object through the hole while standing on a table.

tuyakaoonagootxayaeecheen. I am usually being forced to pick a long object through the hole while standing on a table.

tuyaoonagootxawa’eecheen. I am usually being picking the edible long object through the hole while standing on a table.

Tlingit has a pretty unusual phonology. For one thing, it is the only language on Earth with no l. This despite the fact that it has five other laterals: dl (), tl (tɬʰ), tl’ (tɬʼ), l (ɬ) and l’ (ɬʼ). The tɬʼ and ɬʼ sounds are rare in the world’s languages. ɬʼ  is only found in the wild NW Caucasian languages. It also has two labialized glottal consonants, ʔʷ and hw ().

Tlingit gets a 6 rating, hardest of all.

Athabascan Southern

Navajo has long, short and nasal vowels, a tone system and a grammar totally unlike anything in Indo-European. A stem of only four letters or so can take enough affixes to fill a whole line of text.

Navajo is a polysynthetic language. In polysynthetic languages, very long words can denote an entire sentence, and it’s quite hard to take the word apart into its parts and figure out exactly what they mean and how they go together. The long words are created because polysynthetic languages have an amazing amount of morphological richness. They put many morpheme together to create a word out of what might be a sentence in a non-polysynthetic language.

Some Navajo dictionaries have thousands of entries of verbs only, with no nouns. Many adjectives have no direct translation into Navajo. Instead, verbs are used as adjectives. A verb has no particular form like in English – to walk. Instead, it assumes various forms depending on whether or not the action is completed, incomplete, in progress, repeated, habitual, one time only, instantaneous, or simply desired. These are called aspects. Navajo must have one of the most complex aspect systems of any language:

The Primary aspects:

Momentaneous – punctually (takes place at one point in time) Continuative – an indefinite span of time & movement with a specified direction Durative – over an indefinite span of time, non-locomotive uninterrupted continuum Repetitive – a continuum of repeated acts or connected series of acts Conclusive – like durative but in perfective terminates with static sequel Semelfactive – a single act in a repetitive series of acts Distributive – a distributive manipulation of objects or performance of actions Diversative – a movement distributed among things (similar to distributive) Reversative – results in directional change Conative – an attempted action Transitional – a shift from one state to another Cursive – progression in a line through time/space (only progressive mode)

The subaspects:

Completive – an event/action simply takes place (similar to the aorist tense) Terminative – a stopping of an action Stative – sequentially durative and static Inceptive – beginning of an action Terminal – an inherently terminal action Prolongative – an arrested beginning or ending of an action Seriative – an interconnected series of successive separate & distinct acts Inchoative – a focus on the beginning of a non-locomotion action Reversionary – a return to a previous state/location Semeliterative – a single repetition of an event/action

The tense system is almost as wild as the aspectual system.

For instance, the verb ndideesh means to pick up or to lift up. But it varies depending on what you are picking up:

ndideeshtiilto pick up a slender stiff object (key, pole) ndideeshleel to pick up a slender flexible object (branch, rope) ndideesh’aalto pick up a roundish or bulky object (bottle, rock) ndideeshgheelto pick up a compact and heavy object (bundle, pack) ndideeshjolto pick up a non-compact or diffuse object (wool, hay) ndideeshteelto pick up something animate (child, dog) ndideeshnil to pick up a few small objects (a couple of berries, nuts) ndideeshjihto pick up a large number of small objects (a pile of berries, nuts) ndideeshtsosto pick up something flexible and flat (blanket, piece of paper) ndideeshjilto pick up something I carry on my back ndideeshkaalto pick up anything in a vessel ndideeshtlohto pick up mushy matter (mud).

But picking up is only one way of handling the 12 different consistencies. One can also bring, take, hang up, keep, carry around, turn over, etc. objects. There are about 28 different verbs one can use for handling objects. If we multiply these verbs by the consistencies, there are over 300 different verbs used just for handling objects.

In Navajo textbooks, there are conjugation tables for inflecting words, but it’s pretty hard to find a pattern there. One of the most frustrating things about Navajo is that every little morpheme you add to a word seems to change everything else around it, even in both directions.

Navajo is said to have a very difficult system for counting numerals.

There is also a noun classifier system with more than a dozen classifiers that affect inflection. This is quite a few classifiers even for a noun classifier language and is similar to African languages like Zulu. In addition, it has the strange direct/inverse system.

To add insult to injury, Navajo is an ergative language.

Navajo also has an honorifics or politeness system similar to Japanese or Korean.

Navajo also has the odd feature where the word niinaabecause can be analyzed as a verb.

X áhóót’įįd biniinaa… Because X happened…

Shiniinaa sits’il. It broke into pieces because of me.

In the latter sentence, the only way we know that 1st singular was involved in because of the person marking on niinaa.

There are 25 different kinds of pronominal prefixes that can be piled onto one another before a verb base.

Navajo has a very strange feature called animacy, where nouns take certain verbs according to their rank in the hierarchy of animation which is a sort of a ranking based on how alive something is. Humans and lightning are at the top, children and large animals are next and abstractions are at the bottom.

All in all, Navajo, even compared to other polysynthetic languages, has some of the most incredibly complicated polysynthetic morphology of any language. On craziest grammar and craziest language lists, Navajo is typically listed.

It is even said that Navajo children have a hard time learning Navajo as compared to children learning other languages, but Navajo kids definitely learn the language. Similarly with Hopi below, even linguists find even the best Navajo grammars difficult or even impossible to understand.

However, Navajo is quite regular, a common feature in Amerindian languages.

Navajo is rated 6, hardest of all.

Northern

Slavey, a Na-Dene language of Canada, is hard to learn. It is similar to Navajo and Apache. Verbs take up to 15 different prefixes. All Athabascan languages have wild verbal systems. It also uses a completely different alphabet, a syllabic one designed for Canadian Indians.

Slavey is rated 6, hardest of all.

Haida

Haida is often thought to be a Na-Dene language, but proof of its status is lacking. If it is Na-Dene, it is the most distant member of the family. Haida is in the competition for the most complicated language on Earth, with 70 different suffixes.

Haida is rated 6, hardest of all.

Salishan

The Salishan languages spoken in the Northwest have a long reputation for being hard to learn, in part because of long strings of consonants, in one case 11 consonants long. Salish languages are the only languages on Earth that allow words without sonorants.

Many of the vowels and consonants are not present in most of the world’s widely spoken languages. The Salish languages are, like Chukchi, polysynthetic. Some translations treat all Salish words are either verbs or phrases. Some say that Salish languages do not contain nouns, though this is controversial. The verbal system of Salish languages is absurdly complex.

All Salishan languages are rated rated 6, hardest of all.

Nuxálk (Bella Coola)

Nuxálk is a notoriously difficult Salishan Amerindian language spoken in British Colombia. It is famous for having some really wild words and even sentences that don’t seem to have any vowels in them at all. For instance:

xłp̓x̣ʷłtłpłłskʷc̓  (xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ in IPA) He had a bunchberry plant.

sxs seal fat

Here are some more odd words and sentences:

smnmnmuuc mute

Nuyamłamkis timantx tisyuttx ʔułtimnastx. The father sang the song to his son.

Musis tiʔimmllkītx taq̓lsxʷt̓aχ. The boy felt that rope.

However, this word is not typically used by speakers and by no means do most words consist of all consonants. The language sounds odd when spoken. It has been described as “whispering while chewing on a granola bar” (see the video sample under Montana Salish below).

These wild consonant clusters are even crazier than the ones in Ubykh and NW Caucasian. In fact, the nutty consonant clusters in Salish and causing a debate in linguistics about whether or not the syllable is even a universal phenomenon in language as some Salish words and phrases appear to lack syllables. Some Berber dialects have raised similar questions about the syllable.

Nuxálk makes it onto lists of the craziest phonologies on Earth.

Nuxálk is rated 6, hardest of all.

Interior Salish Southern

Montana Salish is said to be just as hard to learn as Nuxálk . Spokane (Montana Salish) has combining and independent forms with the same meaning:

spim’cnmouth -cinmouth

Montana Salish makes it onto a lot of craziest grammars lists.

This link shows an elder on the Flathead Indian Reservation in Montana, Steven Smallsalmon, speaking Montana Salish. He also leads classes in the language. This is probably one of the strangest sounding languages on Earth.

Montana Salish is rated 6, hardest of all.

Central

Straits Salish has an aspectual distinction between persistent and nonpersistent. Persistent means the activity continues after its inception as a state. The persistent morpheme is . The result is similar to English:

figure out – nonpersistent know – persistent

look at – nonpersistent watch – persistent

take – nonpersistent hold – persistent

is referred to as a “parasitic morpheme” and only occurs in stem that has an underlying ə which serves as a “host” for the morpheme.

How strange.

The Saanich dialect of Straits Salish is often listed in the rogue’s gallery of craziest grammars on Earth. The writing system is often listed as one of the worst out there. In addition, Saanich makes it onto craziest grammars lists for the parasitic morphemes and for having no distinction between nouns and verbs!

Straits Salish gets a 6 rating, hardest of all.

Halkomelem, spoken by 570 people around Vancouver, British Colombia, is widely considered to be one of the hardest languages on Earth to learn. In Halkomelem, many verbs have an orientation towards water. You can’t just say, She went home. You have say how she was going home in relation to nearby bodies of water. So depending on where she was walking home in relation to the nearest river, you would say:

She was farther away from the water and going home. She was coming home in the direction away from the water. She was walking parallel to the flow of the water downstream. She was walking parallel to the flow of the water upstream.

Halkomelem gets a 6 rating, hardest of all.

Lushootseed

Lushootseed is said to be just as hard to learn as Nuxálk. Lushootseed is one of the few languages on Earth that has no nasals at all, except in special registers like baby talk and the archaic speech of mythological figures. It also has laryngealized glides and nasals: w ̰ , m̥ ̰ , and n̥ ̰ .

Lushootseed is rated 6, hardest of all.

Iroquoian

All Iroquoian languages are extremely difficult, but Athabaskan is probably even harder. Siouan languages may be equal to Iroquoian in difficulty.

Compare the same phrases in Tlingit (Athabaskan) and and  Cherokee (Iroquoian).

Tlingit:

kutíkusa‘áatIt’s cold outside. kutíkuta‘áatIt’s cold right now.

In Tlingit, you can add or modify affixes at the beginning as prefixes, in the middle as infixes and at the end as suffixes. In the above example, you changed a part of the word within the clause itself.

Cherokee:

doyáditlv uyvtlvIt is cold outside. (Lit. Outside it is cold) ka uyvtlv It is cold now. (Lit. Now it is cold.)

As you can see, Cherokee is easier.

Cherokee

Cherokee is very hard to learn. In addition to everything else, it has a completely different alphabet. It’s polysynthetic, to make matters worse. It is possible to write a Cherokee sentence that somehow lacks a verb. There are five categories of verb classifiers. Verbs needing classifiers must use one. Each regular verb can have an incredible 21,262 inflected forms! All verbs contain a verb root, a pronominal prefix, a modal suffix and an aspect suffix. In addition, verbs inflect for singular, plural and also dual. For instance:

ᎠᎸᎢᎭ   a'lv'íha 

You have 126 different forms:
ᎬᏯᎸᎢᎭ  gvyalv'iha     I tie you up
ᏕᎬᏯᎸᎢᎭ degvyalviha  I'm tying you up
ᏥᏯᎸᎢᎭ  jiyalv'ha        I tie him up
ᎦᎸᎢᎭ                          I tie it
ᏍᏓᏯᎸᎢᎭ sdayalv'iha  I tie you (dual)
ᎢᏨᏯᎢᎭ  ijvyalv'iha    I tie you (pl)
ᎦᏥᏯᎸᎢᎭ gajiyalv'iha  I tie them (animate)
ᏕᎦᎸᎢᎭ                        I tie them up (inanimate)
ᏍᏆᎸᎢᎭ  squahlv'iha    You tie me
ᎯᏯᎸᎢᎭ  hiyalv'iha     You're tying him
ᎭᏢᎢᎭ   hatlv'iha         You tie it
ᏍᎩᎾᎸᎢᎭ skinalv'iha    You're tying me and him
ᎪᎩᎾᏢᎢᎭ goginatlv'iha  They tie me and him etc.

Let us look at another form:

to see

I see myself           gadagotia
I see you                gvgohtia
I see him/               tsigotia
I see it                    tsigotia
I see you two          advgotia
I see you (plural)    istvgotia
I see them (live)    gatsigotia
I see them (things) detsigotia

You see me                     sgigotia
You see yourself              hadagotia
You see him/her              higo(h)tia
You see it                        higotia
You see another and me  sginigotia
You see others and me    isgigotia
You see them (living)      dehigotia
You see them (living)      gahigotia
You see them (things)     detsigotia

He/she sees me                    agigotia
He/she sees you                   tsagotia
He/she sees you                   atsigotia
He/she sees him/her            agotia
He/she sees himself/herself  adagotia
He/she sees you + me          ginigotia
He/she sees you two             sdigotia
He/she sees another + me    oginigotia
He she sees us (them + me) otsigotia
He/she sees you (plural)       itsigotia
He/she sees them                 dagotia

You and I see him/her/it                igigotia
You and I see ourselves                 edadotia
You and I see one another             denadagotia/dosdadagotia
You and I see them (living)           genigotia
You and I see them (living or not) denigotia

You two see me                           sgninigotia
You two see him/her/it                 esdigotia
You two see yourselves                sdadagotia
You two see us (another and me) sginigotia
You two see them                        desdigotia

Another and I see you             sdvgotia
Another and I see him/her       osdigotia
Another and I see it                 osdigotia
Another and I see you-two      sdvgotia
Another and I see ourselves    dosdadagotia
Another and I see you (plural) itsvgotia
Another and I see them           dosdigotia

You (plural) see me        isgigoti
You (plural) see him/her etsigoti

They see me                    gvgigotia
They see you                   getsagotia
They see him/her             anigoti
They see you and me       geginigoti
They see you two             gesdigoti
They see another and me gegigotia/gogenigoti
They see you (plural)       getsigoti
They see them                 danagotia
They see themselves       anadagoti

I will see datsigoi
I saw      agigohvi

He/she will see dvgohi
He/she             sawugohvi

Number is marked for inclusive vs. exclusive and there is a dual. 3rd person plural is marked for animate/inanimate. Verbs take different object forms depending on if the object is solid/alive/indefinite shape/flexible. This is similar to the Navajo system.

Cherokee also has lexical tone, with complex rules about how tones may combine with each other. Tone is not marked in the orthography. The phonology is noted for somehow not having any labial consonants.

However, Cherokee is very regular. It has only three irregular verbs. It is just that there are many complex rules.

Cherokee is rated 5.5, close to most difficult of all.

Iroquoian Northern Iroquoian Five Nations-Huronian-Susquehannock Huronian Huron-Petun

Wyandot, a dormant language that has been extinct for about 50 years, has some unbelievably complex structures. Let us look at one of them. Wyandot is the only language on Earth that allows negative sentences that somehow do not contain a negative morpheme. Wyandot makes it onto craziest grammars lists. (To be continued).

Siouan-Catawban Siouan Mississippi Valley-Ohio Valley Siouan Mississippi Valley Siouan Dakota

Lakota and other Siouan languages may well be as convoluted as Iroquoian. In Lakota, all adjectives are expressed as verbs. Something similar is seen in Nahuatl.

Ógle sápe kiŋ mak’ú. The shirt it is black he gave it to me. He gave me the black shirt.

In the above, it is black is a stative verb and serves as an adjective.

Ógle kiŋ sabyá mak’ú. Shirt the blackly he gave it to me. He gave me the black shirt. (Lit. He gave me the shirt blackly.)

Bkackly is an adverb serving as an adjective above.

Lakota gets a 5.5 rating, hardest of all.

Algic Algonquian

All Algonquian languages have distinctions between animate/inanimate nouns, in addition to having proximate/obviate and direct/inverse distinctions. However, most languages that have proximate/obviate and direct/inverse distinctions are not as difficult as Algonquian.

Proximate/obviative is a way of marking the 3rd person in discourse. It distinguishes between an important 3rd person (proximate) and a more peripheral 3rd person (obviative). Animate nouns and possessor nouns tend to be marked proximate while inanimate nouns and possessed nouns tend to be marked obviative.

Direct/inverse is a way of marking discourse in terms of saliency, topicality or animacy. Whether one noun ranks higher than another in terms of saliency, topicality or animacy means that that nouns ranks higher in terms of person hierarchy. It is used only in transitive clauses. When the subject has a higher ranking than the object, the direct form is used. When the object has a higher ranking than the object, the inverse form is used.

Central Algonquian Cree-Montagnais

Cree is very hard to learn. It are written in a variety of different ways with different alphabets and syllabic systems, complicating matters even further. The syllabic alphabet has many problems and is often listed as one of the worst scripts out there. They are both polysynthetic and have long, short and nasal vowels and aspirated and unaspirated voiceless consonants. Words are divided into metrical feet, the rules for determining stress placement in words are quite complex and there is lots of irregularity. Vowels fall out a lot, or syncopate, within words.

Cree adds noun classifiers to the mix, and both nouns and verbs are marked as animate or inanimate. In addition, verbs are marked for transitive and intransitive. In addition, verbs get different affixes depending on whether they occur in main or subordinate clauses.

Cree is rated 6, hardest of all.

Ojibwa-Patowatomi

Ojibwa is said to be about as hard to learn, as Cree as it is very similar.

Ojibwa is rated 6, hardest of all.

Plains Algonquian Cheyenne

Cheyenne is well-known for being a hard Amerindian language to learn. Like many polysynthetic languages, it can have very long words.

Náohkêsáa’oné’seómepêhévetsêhésto’anéhe. I truly don’t know Cheyenne very well.

However, Cheyenne is quite regular, but has so many complex rules that it is hard to figure them all out.

Cheyenne is rated 6, hardest of all.

Arapahoan

Arapaho has a strange phonology. It lacks phonemic low vowels. The vowel system consists of i, ɨ~,u, ɛ, and ɔ, with no low phonemic vowels. Each vowel also has a corresponding long version. In addition, there are four diphthongs, ei, ou, oe and ie, several triphthongs, eii, oee, and ouu, as well as extended sequences of vowels such as eee with stress on either the first or the last vowel in the combination. Long vowels of various types are common:

Héétbih’ínkúútiinoo. I will turn out the lights.

Honoosóó’. It is raining.

There is a pitch accent system with normal, high and allophonic falling tones. Arapaho words also undergo some very wild sound changes.

Arapaho is rated 6, hardest of all.

Gros Ventre has a similar phonological system and similar elaborate sound changes as Arapaho.

Gros Ventre is rated 5, hardest of all.

Caddoan Northern Wichita

Wichita has many strange phonological traits. It has only one nasal. Labials are rare and appear in only two roots. It also may have only three vowels, i, e, and a, with only height as a distinction. Such a restricted vertical vowel distribution is only found in NW Caucasian and the Papuan Ndu languages. There is apparently a three-way contrast in vowel length – regular, long and extra-long.

This is only found in Mixe and Estonian. There are some interesting tenses. Perfect tense means that an act has been carried out. The strange intentive tense means that one hopes or hoped to to carry out an act. The habitual tense means one regularly engages in the activity, not that one is doing so at the moment.

Long consonant clusters are permitted.

kskhaːɾʔa

nahiʔinckskih while sleeping

There are many cases where a CVɁ sequence has been reduced to due to loss of the vowel, resulting in odd words such as:

ki·sɁ bone

Word order is ordered in accordance with novelty or importance.

hira:wisɁiha:s kiyari:ce:hire: Our ancestors God put us on this Earth.

weɁe hira:rɁ tiɁi na:kirih God put our ancestors on this Earth.

In the sentence above, “our ancestors” is actually the subject, so it makes sense that it comes first.

Wichita has inclusive and exclusive 3rd person plural and has singular, dual and plural. There is an evidential system where if you say you know something, you must say how you know it – whether it is personal knowledge or hearsay.

Wichita gets a 6 rating, hardest of all.

Hokan Tequislatecan Coastal Chantal

Huamelutec or Lowland Oaxaca Chantal has the odd glottalized fricatives , , ɬʼ and as its only glottalized consonants. They alternate with plain f, s, l and x. , ɬʼ and are extremely rare in the world’s languages, usually only found in 2-3 other languages, often in NW Caucasian. occurs only in one other language – Tlingit. is slightly more common, occurring five other languages including Tlingit. In other languages, these odd sounds derived from sequences of consonant + q: Cq -> Cʔ -> glottalized fricative.

Sentence structure is odd:

Hit the ball the man. Hit the man the ball. The man hit the ball.

All mean the same thing.

Huamelutec gets a 6 rating, hardest of all.

Karok

Karok is a language isolate spoken by a few dozen people in northern California. The last native speaker recently died, however, there are ~80 who have varying levels of L2 fluency.

In Karok, you can use a suffix for different types of containment – fire, water or a solid.

pa:θ-kirih throw into a fire

pa:θ-kurih throw into water

pa:θ-ruprih throw through a solid

The suffixes are unrelated to the words for fire, water and solid.

Karok gets a 5 rating, hardest of all.

Uto-Aztecan Northern

Hopi is so difficult that even grammars describing the language are almost impossible to understand. For instance, Hopi has two different words for and depending on whether the noun phrase containing the word and is nominative or accusative.

Hopi is rated 6, hardest of all.

Southern Uto-Aztecan Corachol-Aztecan Core Nahua Nahuatl

In Nahuatl, most adjectives are simply stative verbs. Hence:

Umntu omde waya eTenochtitlan. The man he is tall went to Tenochtitlan. The tall man went to Tenochtitlan.

He is tall is a stative verb in the above.

Nahuatl gets a 6 rating, hardest of all.

Numic Central Numic

Comanche is legendary for being one of the hardest Indian languages of all to learn. Reasons are unknown, but all Amerindian languages are quite difficult. I doubt if Comanche is harder than other Numic languages.

Bizarrely enough, Comanche has very strange sounds called voiceless vowels, which seems to be an oxymoron, as vowels would seem to be inherently voiced. English has something akin to voiceless vowels in the words particular and peculiar, where the bolded vowels act something akin to a voiceless vowel.

Comanche was used for a while by the codespeakers in World War 2 – not all codespeakers were Navajos. Comanche was specifically chosen because it was hard to figure out. The Japanese were never able to break the Comanche code.

Comanche is rated 6, hardest of all.

Oto-Manguean Western Oto-Mangue Oto-Pame-Chinantecan Chinantecan

Chinantec, an Indian language of southwest Mexico, is very hard for non-Chinantecs to learn. The tone system is maddeningly complex, and the syntax and morphology are very intricate.

Chinantec is rated 6, hardest of all.

Popolocan Mazatecan Lowland Valley Southern

Jalapa Mazatec has distinctions between modal, creaky, breathy-voiced vowels along with nasal versions of those three. It also has creaky consonants and voiceless nasals. It has three tones, low, mid and high. Combining the tones results in various contour tones. In addition, it has a 3-way distinction in vowel length. Whistled speech is also possible. It has a phonemic distinction between “ballistic” and “controlled” syllables which is only present on Oto-Manguean.

Ballistic (short) warm nīˑntūslippery tsǣguava hų̄you plural

Controlled (half-long) sūˑblue nīˑntūˑneedle tsǣˑfull hų̄ˑ – six

Jalapa Mazatec is rated 6, hardest of all.

Maipurean Northern Upper Amazon Eastern Nawiki

Tariana is a very difficult language mostly because of the unbelievable amount of information it crams into its morphology and syntax. This is mostly because it is an Arawakan language that has been heavily influenced by neighboring Tucanoan languages, with the result that it has many of the grammatical categories and particles present in both families.

This stems from the widespread bilingualism in the Vaupes Basin of Colombia, where many people grow up bilingual from childhood and often become multilingual by adulthood. Learning up to five different languages is common. Code-switching was frowned upon and anyone using a word from Language Y while speaking Language X would get laughed at. Hence the various languages tended to borrow features from each other quite easily.

For instance, Tariana has both a noun classifier system and a gender system. Noun classifiers and gender are sometimes subsumed under the single category of “noun classifiers.” Yet Tariana has both, presumably from its relationship to two completely different language families. So in Tariana is not unusual to get both demonstratives and verbs marked for both gender and noun classifier. Tariana borrowed such things as serialized perception verbs and the dubitative marker from Tucano.

In addition, Tariana has some very odd sounds, including aspirated nasals mh (), nh (n̺ʰ) and ñh (ɲʰ) and an aspirated w () of all things. They seem to be actually aspirated, not just partially devoiced as many voiceless nasals and liquids are.

Tariana gets 6, hardest of all.

Huitotoan Proto-Bora-Muinane

Bora, a Wintotoan language spoken in Peru and Colombia near the border between the two countries, has a mind-boggling 350 different noun classes. The noun classifier system is actually highly productive and is often used to create new nouns. New nouns can be created very easily, and their meanings are often semantically transparent. In some noun classifier systems, classifiers can be stacked one upon the other. In these cases, typically the last one is used for agreement purposes.

Bora also is a tonal language, but it has only two tones. In addition, nearly all consonantal phonemes have phonemic aspirated and palatalized counterparts. The agreement structure in the language is also quite convoluted. The classifier system effectively replaces much derivational morphology on the noun and noun compounding processes that other languages use to expand the meanings of nominals.

Bora gets a 6 rating, hardest of all.

Tucanoan Eastern Tucanoan Bará-Tuyuka

Tuyuca is a Tucanoan language spoken in by 450 people in the department of Vaupés in Colombia. An article in The Economist magazine concluded that it was the hardest language on Earth to learn.

It has a simple sound system, but it’s agglutinative, and agglutinative languages are pretty hard. For instance, hóabãsiriga means I don’t know how to write. It has two forms of 1st person plural, I and you (inclusive) and I and the others (exclusive). It has between 50-140 noun classes, including strange ones like bark that does not cling closely to a tree, which can be extended to mean baggy trousers or wet plywood that has begun to fall apart.

Like Yamana, a nearly extinct Amerindian language of Chile, Tuyuca marks for evidentiality, that is, how it is that you know something. For instance:

Diga ape-wi. = The boy played soccer. (I saw him playing). Diga ape-hiyi. = The boy played soccer. (I assume he was playing soccer, though I did not see it firsthand).

Evidential marking is obligatory on all Tuyuca verbs and it forces you to think about how you know whatever it is you know.

Tuyuca definitely gets a 6 rating!

Central Tucanoan

Cubeo, a language spoken in the Vaupes of Colombia, has a either SOV or OVS. That would mean that the following:

The man the ball hit. The ball hit the man.

Mean the same things. OVS languages are quite rare.

Morphemes belong to one of four classes:

  1. Nasal (many roots, as well as suffixes like -xã  = associative)
  2. Oral (many roots, as well as suffixes like -pe  = similarity, -du = frustrative)
  3. Unmarked (only suffixes, e.g. -re  = in/direct object)
  4. Oral/Nasal (some roots and some suffixes) /bãˈkaxa-/(mãˈkaxa-) – to defecate and -kebã = suppose

Just by looking at any given consonant-initial suffix, it is impossible to determine which of the first three categories it belongs to. They must be learned one by one.

Cubeo has nasal assimilation, common to many Amazonian languages. In some of these, nasalization is best analyzed at the syllable level – some syllables are nasal and others are not.

dĩ-bI-ko /dĩ-bĩ-ko/ nĩmĩko She recently went.

The underlying form dĩ-bI-ko is realized on the surface as nĩmĩko. The ĩ in dĩ-bI-ko nasalizes the d, the b, and the I on either side of it, so nasal spreading works in both directions. However, it is blocked from the third syllable because k is part of a class of non-nasalizable consonants.

Pretty difficult language.

Cuneo gets a 6 rating, hardest of all.

Carib Waiwai

Hixkaryána is famous for being the only language on Earth to have basic OVS (Object-Verb-Subject) word order.

The sentence Toto yonoye kamara, or The man ate the jaguar, actually means The jaguar ate the man.

Toto yonoye kamara Lit. The man ate the jaguar. Gloss: The jaguar ate the man.

Grammatical suffixes attached to the end of the verb mark not only number but also aspect, mood and tense.

Hixkaryána gets a 6 rating, hardest of all.

Nambikwaran Mamaindê

This is actually a series of closely related languages as opposed to one language, but the Southern Nambikwara language is the most well-known of the family, with 1,200 speakers in the Brazilian Amazon.

Phonology is complex. Consonants distinguish between aspirated, plain and glottalized, common in the Americas. There are strange sounds like prestopped nasals glottalized fricatives. There are nasal vowels and three different tones. All vowels except one have both nasal, creaky-voiced and nasal-creaky counterparts, for a total of 19 vowels.

The grammar is polysynthetic with a complex evidential system.

Reportedly, Nambikwara children do not pick up the language fully until age 10 or so, one of the latest recorded ages for full competence. Nambikwara is sometimes said to be the hardest language on Earth to learn, but it has some competition.

Nambikwara definitely gets a 6 rating, hardest of all!

Muran

Pirahã is a language isolate spoken in the Brazilian Amazon. Recent writings by Daniel Everett indicate that not only is this one of the hardest languages on Earth to learn, but it is also one of the weirdest languages on Earth. It is monumentally complex in nearly every way imaginable. It is commonly listed on the rogue’s gallery of craziest languages and phonologies on Earth.

It has the smallest phonemic inventory on Earth with only seven consonants, three vowels and either two or three tones. Everett recently wrote a paper about it after spending many years with them. Previous missionaries who had spent time with the Pirahã generally failed to learn the language because it was too hard to learn. It took Everett a very long time, but he finally learned it well.

Many of Everett’s claims about Pirahã are astounding: whistled speech, no system for counting, very few Portuguese loans (they deliberately refuse to use Portuguese loans) evidence for the Sapir-Whorf linguistic relativity hypothesis, and evidence that it violates some of Noam Chomsky’s purported language universals such as embedding. It also has the t͡ʙ̥ sound – a bilabially trilled postdental affricate which is only found in two other languages, both in the Brazilian Amazon – Oro Win and Wari’.

Initially, Everett never heard the sound, but they got to know him better, they started to make it more often. Everett believes that they were ridiculed by other groups when they made the odd sound.

Pirahã has the simplest kinship system in any language – there is only word for both mother and father, and the Pirahã do not have any words for anyone other than direct biological relatives.

Pirahã may have only two numerals, or it may lack a numeral system altogether.

Pirahã does not distinguish between singular and plural person. This is highly unusual. The language may have borrowed its entire pronoun set from the Tupian languages Nheengatu and Tenarim, groups the Pirahã had formerly been in contact with. This may be one of the only attested case of the borrowing of a complete pronoun set.

There are mandatory evidentiality markers that must be used in Pirahã discourse. Speakers must say how they know something, whether they saw it themselves, whether it was hearsay or whether they inferred it circumstantially.

There are various strange moods – the desiderative (desire to perform an action) and two types of frustrative – frustration in starting an action (inchoative/incompletive) and frustration in completing an action (causative/incompletive). There are others: immediate/intentive (you are going to do something now/you intend to do it in the future)

There are many verbal aspects: perfect/imperfect (completed/incomplete) telic/atelic (reaching a goal/not reaching a goal), continuative (continuing), repetitive (iterative), and beginning an action (inchoative).

Each Pirahã verb has 262,144 possible forms, or possibly in the many millions, depending on which analysis you use.

The future tense is divided into future/somewhere and future/elsewhere. The past tense is divided into plain past and immediate past.

Pirahã has a closed class of only 90 verb roots, an incredibly small number. But these roots can be combined together to form compound verbs, a much larger category. Here is one example of three verbs strung together to form a compound verb:

xig ab op take turn gobring back, You take something away, you turn around, and you go back to where you got it to return it.

There are no abstract color terms in Pirahã. There are only two words for colors, one for light and one for dark. The only other languages with this restricted of a color sense are in Papua New Guinea. The other color terms are not really color terms, but are more descriptive – red is translated as like blood.

Pirahã can be whistled, hummed or encoded into music. Consonants and vowels can be omitted altogether and meaning conveyed instead via variations in stress, pitch and rhythm. Mothers teach the language to children by repeating musical patterns.

Pirahã may well be one of the hardest languages on Earth to learn.

Pirahã gets a 6 rating, hardest of all.

Quechuan

Quechua (actually a large group of languages and not a single language at all) is one of the easiest Amerindian languages to learn. Quechua is a classic example of a highly regular grammar with few exceptions. Its agglutinative system is more straightforward than even that of Turkish. The phonology is dead simple.

On the down side, there is a lot of dialectal divergence (these are actually separate languages and not dialects) and a lack of learning materials. Some say that Quechua speakers spend their whole lives learning the language.

Quechua has inconsistent orthographies. There is a fight between those who prefer a Spanish-based orthography and those who prefer a more phonemic one. Also there is an argument over whether to use the Ayacucho language or the Cuzco language as a base.

Quechua has a difficult feature known as evidential marking. This marker indicates the source of the speaker’s knowledge and how sure they are about the statement.

-mi expresses personal knowledge:

Tayta Wayllaqawaqa chufirmi. Mr. Huayllacahua is a driver. (I know it for a fact.)

-si expresses hearsay knowledge:

Tayta Wayllaqawaqa chufirsi. Mr. Huayllacahua is a driver (or so I’ve heard).

chá expresses strong possibility:

Tayta Wayllaqawaqa chufirchá. Mr. Huayllacahua is a driver (most likely).

Quechua is rated 4, very difficult.

Aymaran Aymara

Aymara has some of the wildest morphophonology out there. Morpheme-final vowel deletion is present in the language as a morphophonological process, and it is dependent on a set of highly complex phonological, morphological and syntactic rules (Kim 2013).

For instance, there are three types of suffixes: dominant, recessive and a 3rd class is neither dominant nor recessive. If a stem ends in a vowel, dominant suffixes delete the vowel but recessive suffixes allow the vowel to remain. The third class either deletes or retains the vowel on the stem depending on how many vowels are in the stem. If the root has two vowels, the vowel is retained. If it has three vowels, the vowel is deleted.

Although all of this seems quite odd, Finnish has something similar going on, if not a lot worse.

Nevertheless, Aymara is still said to be a very easy language to learn. The Guinness Book of World Records claims it is almost as easy to learn as Esperanto.

Aymara gets a 2 rating, very easy to learn.

Australian

Australian Aborigine languages are some of the hardest languages on Earth to learn, like Amerindian or Caucasian languages. Some Australian languages have phonemic contrasts that few other languages have, such as apico-dental, lamino-dental, apico-post-alveolar, and lamino-postalveolar cononals.

Australian languages tend to be mixed ergative. Ordinary nouns are ergative-absolutive, but 1st and 2nd person pronouns are nominative-accusative. One language has a three way agent-patient-experiencer distinction in the 1st person pronoun. Australian pronouns typically have singular, plural and dual forms along with inclusive and exclusive 1st plural. In some sentences, they have what is known as double case agreement which is rare in the world’s languages:

I gave a spear to my father. I gave a spear mine-to father’s-to.

Both elements of the phrase my father are in both dative and genitive.

However, Aboriginal languages do have the plus of being very regular.

All Australian languages are rated 6, most difficult of all.

Tor-Kwerba Orya-Tor Tor

Berik is a Tor-Orya language spoken in Indonesian colony of Irian Jaya in New Guinea.

Verbs take many strange endings, in many cases mandatory ones, that indicate what time of day something happened, among other things.

TelbenerHe drinks in the evening.

Where a verb takes an object, it will not only be marked for time of day but for the size of the object.

KitobanaHe gives three large objects to a man in the sunlight.

Verbs may also be marked for where the action takes place in reference to the speaker.

GwerantenaTo place a large object in a low place nearby.

Berik is rated 6, hardest of all.

Trans New Guinea Madang Croisilles Gum

Amele is the world’s most complex language as far as verb forms go, with 69,000 finitive and 860 infinitive forms.

Amele is rated 6, hardest of all.

Torricelli Wapei Valman

Valman is a bizarre case where the word and that connects two nouns is actually a verb of all things and is marked with the first noun as subject and the second noun as object.

John (subject) and Mary (object)

John is marked as subject for some reason, and Mary is marked as object, and the and word shows subject agreement with John and object agreement with Mary.

Valman gets a 6 rating, hardest of all.

Afroasiatic Semitic

Semitic languages such as Arabic and Hebrew are notoriously difficult to learn, and Arabic (especially MSA) tops many language learners’ lists as the hardest language they have ever attempted to learn. Although Semitic verbs are notoriously complex, the verbal system does have some advantages especially as compared to IE languages like Slavic. Unlike Slavic, Semitic verbs are not inflected for mood and there is no perfect or imperfect.

Central South Arabic

Arabic has some very irregular manners of noun declension, even in the plural. For instance, the word girls changes in an unpredictable way when you say one girl, two girls and three girls, and there are two different ways to say two girls depending on context. Two girls is marked with the dual, but different dual forms can be used. All languages with duals are relatively difficult for most speakers that lack a dual in their native language. However, the dual is predictable from the singular, so one might argue that you only need to learn how to say one girl and three girls.

Further, it is full of irregular plurals similar to octopus and octopi in English, whereas these forms are rare in English. With any given word, there might be 20 different possible ways to pluralize it, and there is no way to know which of the 20 paradigms to use with that word, and further, there is no way to generalize a plural pattern from a singular pattern. In addition, many words have 2-3 ways of pluralizing them. Some messy Arab plurals:

kalb -> kilaab qalb -> quluub maktab -> makaatib taalib -> tullaab balad -> buldaan

When you say I love you to a man, you say it one way, and when you say it to a woman, you say it another way. On and on.

The Arabic writing system is exceeding difficult and is more of the hardest to use of any on Earth. Soft vowels are omitted. You have to learn where to insert missing vowels, where to double consonants and which vowels to skip in the script. There are 28 different symbols in the alphabet and four different ways to write each symbol depending on its place in the word.

Consonants are written in different ways depending on where they appear in a word. An h is written differently at the beginning of a word than at the end of a word. However, one simple aspect of it is that the medial form is always the same as the initial form. You need to learn not only Arabic words but also the grammar to read Arabic.

Pronouns attach themselves to roots, and there are many different verb conjugation paradigms which simply have to be memorized. For instance, if a verb has a و, a ي, or a ء  in its root, you need to memorize the patters of the derivations, and that is a good chunk of the conjugations right there. The system for measuring quantities is extremely confusing.

The grammar has many odd rules that seem senseless. Unfortunately, most rules have exceptions, and it seems that the exceptions are more common than the rules themselves. Many people, including native speakers, complain about Arabic grammar.

Arabic does have case, but the system is rather simple.

The laryngeals, uvulars and glottalized sounds are hard for many foreigners to make and nearly impossible for them to get right. The ha’(ح ), qa (ق ) and غ sounds and the glottal stop in initial position give a lot of learners headaches.

Arabic is at least as idiomatic as French or English, so it order to speak it right, you have to learn all of the expressionistic nuances.

One of the worst problems with Arabic is the dialects, which in many cases are separate languages altogether. If you learn Arabic, you often have to learn one of the dialects along with classical Arabic. All Arabic speakers speak both an Arabic dialect and Classical Arabic.

In some Arabic as a foreign language classes, even after 1 1/2 years, not one student could yet make a complete and proper sentence that was not memorized.

Adding weight to the commonly held belief that Arabic is hard to learn is research done in Germany in 2005 which showed that Turkish children learn their language at age 2-3, German children at age 4-5, but Arabic kids did not get Arabic until age 12.

Arabic has complex verbal agreement with the subject, masculine and feminine gender in nouns and adjectives, head-initial syntax and a serious restriction to forming compounds. If you come from a language that has similar nature, Arabic may be easier for you than it is for so many others. Its 3 vowel system makes for easy vowels.

MSA Arabic is rated 5, extremely difficult.

Arabic dialects are often somewhat easier to learn than MSA Arabic. At least in Lebanese and Egyptian Arabic, the very difficult q’ sound has been turned into a hamza or glottal stop which is an easier sound to make. Compared to MSA Arabic, the dialectal words tend to be shorter and easier to pronounce.

To attain anywhere near native speaker competency in Egyptian Arabic, you probably need to live in Egypt for 10 years, but Arabic speakers say that few if any second language learners ever come close to native competency. There is a huge vocabulary, and most words have a wealth of possible meanings.

Egyptian Arabic is rated 4.5, very to extremely difficult.

Moroccan Arabic is said to be particularly difficult, with much vowel elision in triconsonantal stems. In addition, all dialectal Arabic is plagued by irrational writing systems.

Moroccan Arabic is rated 4.5, very to extremely difficult.

Maltese is a strange language, basically a Maghrebi Arabic language (similar to Moroccan or Tunisian Arabic) that has very heavy influence from non-Arabic tongues. It shares the problem of Gaelic that often words look one way and are pronounced another.

It has the common Semitic problem of difficult plurals. Although many plurals use common plural endings (-i, -iet, -ijiet, -at), others simply form the plural by having their last vowel dropped or adding an s (English borrowing). There’s no pattern, and you simply have to memorize which ones act which way. Maltese permits the consonant cluster spt, which is surely hard to pronounce.

On the other hand, Maltese has quite a few IE loans from Italian, Sicilian, Spanish, French and increasingly English. If you have knowledge of Romance languages, Maltese is going to be easier than most Arabic dialects.

Maltese is rated 4, very difficult.

South Canaanite

Hebrew is hard to learn according to a number of Israelis. Part of the problem may be the abjad writing system, which often leaves out vowels which must simply be remembered. Also, other than borrowings, the vocabulary is Afroasiatic, hence mostly unknown to speakers of IE languages. There are also difficult consonants as in Arabic such as pharyngeals and uvulars.

The het or glottal h is particularly hard to make. However, most modern Israelis no longer make the het sound or a’ain sounds. Instead, they pronounce the het like the chaf sound and the a’ain like an alef. Almost all Ashkenazi Israeli Jews no longer use the het or a’ain sounds. But most Jews who came from Arab countries (often older people) still use the sound, and some of their children do (Dorani 2013).

Hebrew has complex morphophonological rules. The letters p, b, t, d, k and g change to v, f, dh, th, kh and gh in certain situations. In some environments, pharyngeals change the nature of the vowels around them. The prefix ve-, which means and, is pronounced differently when it precedes certain letters. Hebrew is also quite irregular.

Hebrew has quite a few voices, including active, passive, intensive, intensive passive, etc. It also has a number of tenses such as present, past, and the odd juissive.

Hebrew also has two different noun classes. There are also many suffixes and quite a few prefixes that can be attached to verbs and nouns.

Even most native Hebrew speakers do not speak Hebrew correctly by a long shot.

Quite a few say Hebrew is as hard to learn as MSA or perhaps even harder, but this is controversial.

Hebrew gets a 5 rating for extremely difficult.

Berber Northern Atlas

Berber languages are considered to be very hard to learn. Worse, there are very few language learning resources available.

Tamazight allows doubled consonants at the beginning of a word! How can you possibly make that sound?

Tamazight gets a 6 rating, hardest of all.

In Tachelhit , words like this are possible:

tkkststt You took it off.

tfktstt You gave it.

In addition, there are words which contain only one or two consonants:

ɡ be

ks feed on

Tachelhit gets a 6 rating, hardest of all.

South Ethiopian South Transversal Amharic–Argobba Amharic

Amharic is said to be a very hard language to learn. It is quite complex, and its sentence structures seem strange even to speakers of other Semitic languages. Hebrew speakers say they have a hard time with this language.

There are a multitude of rules which almost seem ridiculous in their complexity, there are numerous conjugation patterns, objects are suffixed to the verb, the alphabet has 274 letters, and the pronunciation seems strange. However, if you already know Hebrew or Arabic, it will be a lot easier. The hardest part of all is the verbal system, as with any Semitic language. It is easier than Arabic.

Amharic gets a 4.5 rating, very hard to extremely hard.

Cushitic East Cushitic

Dahalo is legendary for having some of the wildest consonant phonology on Earth. It has all four airstream mechanisms found in languages: ejectives, implosives, clicks and normal pulmonic sounds. There are both glottal and epiglottal stops and fricatives and laminal and apical stops.

There is also a strange series of nasal clicks and are both glottalized and plain. Some of these clicks are also labialized. It has both voiced and unvoiced prenasalized stops and affricates, and some of the stops are also labialized. There is a weird palatal lateral ejective. There are three different lateral fricatives, including a labialized and palatalized one, and one lateral approximant. It contrasts alveolar and palatal lateral affricates and fricatives, the only language on Earth to do this.

The Dahalo are former elephant hunting hunter gatherers who live in southern Kenya. It is believed that at one time they spoke a language like Sandawe or Hadza, but they switched over to Cushitic at some point. The clicks are thought to be substratum from a time when Dahalo was a Sandawe-Hadza type language.

Dahalo gets a 6 rating, hardest of all.

Somali

Somali has one of the strangest proposition systems on Earth. It actually has no real prepositions at all. Instead it has preverbal particles and possessives that serve as prepositions.

Here is how possessives serve as prepositions:

habeennimada horteeda the night her front before nightfall

kulaylka dartiisa the heat his reason because of the heat

Here we have the use of a preverbal particle serving as a preposition:

kú ríd shandádda Into put the suitcase. Put it into the suitcase.

Somali combines four “prepositions” with four deictic particles to form its prepositions.

There are four basic “prepositions”:

to in from with

These combine with a four different deictic particles:

toward the speaker away from the speaker toward each other away from each other

Hence you put the “prepositions” and the deictic particles together in various ways. Both tend to go in front of and close to the verb:

Nínkíi bàan cèelka xádhig kagá sóo saaray. …well-the rope with-from towards-me I-raised. I pulled the man out of the well with a rope.

Way inoogá warrámi jireen. They us-to-about news gave. They used to give us news about it.

Prepositions are the hardest part of the Somali language for the learner.

Somali deals with verbs of motion via deixis in a similar way that Georgian does. One reference point is the speaker and the other is any other entities discussed. Verbs of motion are formed using adverbs. Entities may move:

towards each other    wada
away from each other  kala
towards the speaker   so
away from the speaker si

Hence:

kala durka separate
si gal     go in (away from the speaker)
so gal     come in (toward the speaker)

Somali lacks orthographic consistency. There are four different orthographic systems in use – the lists.

Somali pluralization makes no sense and must be memorized. There are seven different plurals, and there is no clue in the singular that tells you what form to use in the plural. See here:

Republication:

áf  (language) -> afaf

Suffixation:

hoóyo (mother) -> hoyoóyin

áabbe -> aabayaal

Note the tone shifts in all three of the plurals above.

There are four cases, absolutive, nominative, genitive and vocative. Despite the presences of absolutive and nominative cases, Somali is not an ergative language. Absolutive case is the basic case of the noun, and nominative is the case given to the noun when a verb follows in the sentence. There are different articles depending on whether the noun was mentioned previously or not (similar to the articles a and the in English). The absolutive and nominative are marked not only on the noun but also on the article that precedes it.

In terms of difficulty, Somali is much harder than Persian and probably about as difficult as Arabic.

Somali gets a 5 rating, extremely hard to learn.

Dravidian Southern Tamil-Kannada Tamil-Kodagu Tamil-Malayalam Malayalam

Malayalam, a Dravidian language of India, was has been cited as the hardest language to learn by an language foundation, but the citation is obscure and hard to verify.

Malayalam words are often even hard to look up in a Malayalam dictionary.

For instance, adiyAnkaLAkkikkoNDirikkukayumANello is a word in Malayalam. It means something like I, your servant, am sitting and mixing s.t. (which is why I cannot do what you are asking of me). The part in parentheses is an example of the type of sentence where it might be used.

The above word is composed of many different morphemes, including conjunctions and other affixes, with sandhi going on with some of them so they are eroded away from their basic forms. There doesn’t seem to be any way to look that word up or to write a Malayalam dictionary that lists all the possible forms, including forms like the word above. It would probably be way too huge of a book. However, all agglutinative languages are made up of affixes, and if you know the affixes, it is not particularly hard to parse the word apart.

Malayalam is said to be very hard to pronounce correctly.

Further, few foreigners even try to learn Malayalam, so Malayalam speakers, like the French, might not listen to you and might make fun of you if your Malayalam is not native sounding.

However, Malayalam has the advantage of having many pedagogic materials available for language learning such as audio-visual material and subtitled videos.

Malayalam is rated 5, extremely difficult.

Tamil

Tamil, a Dravidian language is hard, but probably not as difficult as Malayalam is. Tamil has an incredible 247 characters in its alphabet. Nevertheless, most of those are consonant-vowel combinations, so it is almost more of a syllabary than an alphabet. Going by what would traditionally be considered alphabetic symbols, there are probably only 72 real symbols in the alphabet. Nevertheless, Tamil probably has one of the easier Indic scripts as Tamil has fewer characters than other scripts due to its lack of aspiration. Compare to Devanagari’s over 1,000 characters.

But no Indic script is easy. A problem with Tamil is that all of the characters seem to look alike. It is even worse than Devanagari in that regard. However, the more rounded scripts such as Kannada, Sinhala, Telegu and Malayalam have that problem to a worse degree. Tamil has a few sharp corners in the characters that helps to disambiguate them.

In addition, as with other languages, words are written one way and pronounced another. However, there are claims that the difficulty of Tamil’s diglossia is overrated.

Tamil has two different registers for written and spoken speech, but the differences are not large, so this problem is exaggerated. Both Tamil and Malayalam are spoken very fast and have extremely complicated, nearly impenetrable scripts. If Westerners try to speak a Dravidian language in south India, more often than not the Dravidian speaker will simply address them in English rather than try to accommodate them.

Tamil has the odd evidential mood, similar to Bulgarian.

However, on the plus side, the language does seem to be very logical and regular, almost like German in that regard. In addition, there are a lot of language learning materials for Tamil.

Tamil is rated 4, very difficult.

Altaic Korean

Most agree that Korean is a hard language to learn.

The alphabet, Hangul at least is reasonable; in fact, it is quite elegant. But there are four different Romanizations- Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul.

Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether because you really need to know the hanja or Chinese character that are used in addition to the Hangul. After World War 2, the Koreas decided to officially get rid of their Chinese characters, but in practice this was not successful. With the use of Chinese characters in Korean, you can be a lot more precise in terms what you are trying to communicate.

Bizarrely, there are two different numeral sets used, but one is derived from Chinese so it should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems.

Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings. Japanese has a similar problem with homonyms, but at least with Japanese you have the benefit of kanji to help you tell the homonyms apart. With Korean Hangul, you get no such advantage.

Similarly, there seem to be many ways to say the same thing in Korean. The learner will feel when people are using all of these different ways of saying the same thing that they are actually saying something different each time, but that is not the case.

One problem is that the bp, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible.

The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. On the other hand, Japanese or Chinese will help you a lot with Korean.

Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand. In addition, there are hundreds of ways of conjugating any given verb based on tense, mood, age or seniority. Adjectives also decline and take hundreds of different suffixes.

Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. A single sentence can be said in three different ways depending on the relationship between the speaker and the listener. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway.

Maybe 6

Speakers of Korean can learn Japanese fairly easily. Korean seems to be a more difficult language to learn than Japanese. There are maybe twice as many particles as in Japanese, the grammar is dramatically more difficult and the verbs are quite a bit harder. The phonemic inventory in Korean is also larger and includes such oddities as double consonants.

Korean is rated by language professors as being one of the hardest languages to learn.

Korean is rated 5, extremely hard.

Japonic

Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English.

The Japanese orthography is one of the most difficult to use of any orthography.

There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood.

The Japanese writing system is probably crazier than the Chinese writing system and it often makes it onto lists of worst orthographies. The very idea of writing an agglutinative language in a combination of two syllabaries and an ideography seems wacky right off the bat. Japanese borrowed Chinese characters.

But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millennia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse.

Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character.

There are some writers, typically of literature, who deliberately choose to use kanji that even Japanese people cannot read. For instance, Ryuu Murakami  uses the odd symbols 擽る、, 轢く、and 憑ける.

The Japanese system is made up of three different systems: the katakana and hiragana (the kana) and the kanji, similar to the hanzi used in Chinese. Chinese has at least 85,000 hanzi. The number of kanji is much less than that, but kanji often have more than one meaning in contrast to hanzi.

After WW2, Japan decided to simplify its language. They both simplified and reduced the number of Chinese characters used, and they unified the written and spoken language, which previously had been different.

Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English.

A common problem is that a perfectly grammatically correct sentence uttered by a Japanese language learner, while perfectly correct, is still not acceptable by Japanese speakers because “we just don’t say it that way.” The Japanese speaker often cannot tell why the unacceptable sentence you uttered is not ok. On the other hand, this problem may be common to more languages than Japanese.

There is also a class of Japanese called “honorifics” or “keigo” that is quite hard to master. Honorifics are meant to show respect and to indicate one’s place or status in the social hierarchy. These typically effect verbs but can also affect particles and prefixes. They are usually formed by archaic or highly irregular verbs. However, there are both regular and irregular honorific forms. Furthermore, there are five different levels of honorifics. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play.

Although it is true the Japanese young people are said to not understand the intricacies of keigo, it is still expected that they know how to speak this well. Consequently, many young Japanese will opt out of certain conversations because they feel that their keigo is not very good. Books explaining how to use keigo properly have been big sellers among young people in Japan in recent years as young people try to appear classy, refined or cultured.

In addition, Japanese born overseas (especially in the US), while often learning Japanese pretty well, typically have a very poor understanding of keigo. Instead of embarrassing themselves by not using keigo or using it wrong, these Japanese speakers often prefer to speak in English to Japanese people rather than bother with keigo-less Japanese. Overcorrection in keigo is also a problem when hypercorrection leads to someone making errors in keigo due to “trying to hard.” This looks like phony or insincere politeness and is often worse than not using keigo at all.

One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things which involve the use of a complex numerical noun classifier system.

Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Nouns can act like adjectives and adverbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that all noun modifiers, even phrases, must precede the nouns they are modifying.

It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative.

In this sentence:

The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM.

Everything underlined must precede the noun plane:

Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM.

One of the main problems with Japanese grammar is that it is going to seem to so different from the sort of grammar and English speaker is likely to be used to.

Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning.

The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing.

Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words.

However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs.

Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension.

Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. A study by the US Navy concluded that the hardest language the corpsmen had to learn in the course of service was Japanese. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily.

Japanese is rated 5, extremely hard.

Classical Japanese is much harder to read than Modern Japanese. Though you can get by with much less kanji when reading the modern language, you will need a minimum knowledge of 3,000 kanji for reading Classical Japanese, and that’s using a dictionary. There are only about 500-1,000 frequently used characters, but there are countless other words that will come up in your reading especially say special words used in the Imperial Court. Many words have more than one meaning, and unless you know this, you will be lost. 東宮(とうぐう) for instance means Eastern Palace. However, it also means Crown Prince because his residence was to the east of the Emperor’s.

The movie The Seven Samurai (set in the late 1500’s) seems to use some sort of Classical Japanese, or at least Classical vocabulary and syntax with modern pronunciation. Japanese language learners say they can’t understand a word of the archaic Japanese used in this movie.

Classical Japanese gets 5.5, nearly hardest of all.

Turkic Oghuz Western Oghuz

Turkish is often considered to be hard to learn, and it’s rated one of the hardest in surveys of language teachers, however, it’s probably easier than its reputation made it out to be. It is agglutinative, so you can have one long word where in English you might have a sentence of shorter words. One word is

Çekoslovakyalilastiramadiklarimizdanmissiniz? Were you one of those people whom we could not turn into a Czechoslovakian?

Many words have more than one meaning. However, the agglutination is very regular in that each particle of meaning has its own morpheme and falls into an exact place in the word. See here:

göz            eye
göz-lük        glasses
göz-lük-çü     optician
göz-lük-çü-lük the business of an optician

Nevertheless, agglutination means that you can always create new words or add new parts to words, and for this reason even a lot of Turkish adults have problems with their language.

There is no verb to be, which is hard for many foreigners. Instead, the concept is wrapped onto the subject of the sentence as a -dim or -im suffix. Turkish is an imagery-heavy language, and if you try to translate straight from a dictionary, it often won’t make sense.

However, the suffixation in Turkish, along with the vowel harmony, are both precise. Nevertheless, many words have irregular vowel harmony. The rules for making plurals are very regular, with no exceptions (the only exceptions are in foreign loans). In Turkish, incredible as it sounds, you can make a plural out of anything, even a word like what, who or blood. However, there is some irregularity in the strengthening of adjectives, and the forms are not predictable and must be memorized.

Turkish is a language of precision in other ways. For instance, there are eight different forms of subjunctive mood that describe various degrees of uncertainty that one has about what one is talking about. This relates to the evidentiality discussed under Tuyuca above, and Turkish has an evidential form similar to Tamil and Bulgarian. On Turkish news, verbs are generally marked with miş, which means that the announcer believes it to be true though he has not seen it firsthand. The particle miş is interesting because this evidential form is coded into the tense system, which is an unusual use of evidentiality.

The Roman alphabet and almost mathematically precise grammar really help out. Turkish lacks gender and has but a single irregular verb – olmak. Nevertheless, there are many verbal forms. However, this is controversial and it depends on how you define grammatical irregularity. There is some strangeness in some of the verb paradigms, but it is argued that these oddities are rule-based. The aorist tense is said to have irregularity.

There is some irregular morphophonology, but not much. The oblique relative clauses have complex morphosyntax. Turkish has two completely different ways of making relative clauses, one of which may have been borrowed from Persian. There are many gerunds for verbs, and these have many different uses. At the end of the day, Turkish grammar is not as regular or as simple as it is made out to be.

Words are pronounced nearly the same as they are written. A suggestion that Turkish may be easier to learn that many think is the research that shows that Turkish children learn attain basic grammatical mastery of Turkish at age 2-3, as compared to 4-5 for German and 12 for Arabic. The research was conducted in Germany in 2005.

In addition, Turkish has a phonetic orthography.

However, Turkish is hard for an English speaker to learn for a variety of reasons. It is agglutinative like Japanese, and all agglutinative languages are difficult for English speakers to learn. As in Japanese, you start your Turkish sentence the way you would end your English sentence. As in the Japanese example above, the subordinate clause must precede the subject, whereas in English, the subordinate clause must follow the subject. The italicized phrase below is a subordinate clause.

In English, we say, “I hope that he will be on time.”

In Turkish, the sentence would read, “That he will be on time I hope.”

Turkish vowels are unusual to speakers of IE languages, and Turkish learners say the vowels are hard to make or even tell apart from one another.

Turkish is rated 3.5, harder than average to learn.

Uralic

Finno-Ugric

One test of the difficulty of any language is how much of the grammar you must know in order to express yourself on a basic level. On this basis, Finno-Ugric languages are complicated because you need to know quite a bit more grammar to communicate on a basic level in them than in say, German.

Finnic Northern

Finnish is very hard to learn, and even long-time learners often still have problems with it. Famous polyglot Barry Farber said it was one of the hardest languages he learned. You have to know exactly which grammatical forms to use where in a sentence. In addition, Finnish has 15 cases in the singular and 16 in the plural. This is hard to learn for speakers coming from a language with little or no case.

For instance, talothe house

Cases:

talon        house's
taloasome    of the house
taloksiinto  as the house
talossain    the house
talostafrom  inside the house
talooninto   the house
talollaon    to the house
taloltafrom  beside the house
talolleto    the house
taloistafrom the houses
taloissa     in the houses

It gets much worse than that. This web page shows that the noun kauppashop can have 2,253 forms.

A simple adjective + noun type of noun phrase of two words can be conjugated in up to 100 different ways.

Adjectives and nouns belong to 20 different classes. The rules governing their case declension depend on what class the substantive is in.

As with Hungarian, words can be very long. For instance:

lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas non-commissioned officer cadet learning to be an assistant mechanic for airplane jet engines

Like Turkish, Finnish agglutination is very regular. Each bit of information has its own morpheme and has an exact place in the word.

Like Turkish, Finnish has vowel harmony, but the vowel harmony is very regular like that of Turkish. Unlike Turkish or Hungarian, consonant gradation forms a major part of Finnish morphology. In order to form a sentence in Finnish, you will need to learn about verb types, cases and consonant gradation, and it can take a while to get your mind around those things.

Finnish, oddly enough, always puts the stress on the first syllable. Finnish vowels will be hard to pronounce for most foreigners.

However, Finnish has the advantage of being pronounced precisely as it is written. This is also part of the problem though, because if you don’t say it just right, the meaning changes. So, similarly with Polish, when you mangle their language, you will only achieve incomprehension. Whereas with say English, if a foreigner mangles the language, you can often winnow some sense out of it.

However, despite that fact that written Finnish can be easily pronounced, when learning Finnish, as in Korean, it is as if you must learn two different languages – the written language and the spoken language. A better way to put it is that there is “one language for writing and another for speaking.” You use different forms whether conversing or putting something on paper.

Some pronunciation is difficult. The the contrast between short and long vowels and consonants is particularly troublesome. Check out these minimal pairs:

sydämellä sydämmellä

jollekin jollekkin

A problem for the English speaker coming to Finnish would be the vocabulary, which is alien to the speaker of an IE language. Finnish language learners often find themselves looking up over half the words they encounter. Obviously, this slows down reading quite a bit!

In the grammar, the partitive case and potential tense can be difficult. Here is an example of how Finnish verb tenses combine with various cases to form words:

I A-Infinitive
Base form mennä

II E-Infinitive
Active inessive    mennessä
Active instructive mennen
Passive inessive   mentäessä

III MA-Infinitive
Inessive            menemässä
Elative             menemästä
Illative            menemään
Adessive            menemällä
Abessive            menemättä
Active instructive  menemän
Passive instructive mentämän

Verbs in Finnish

Finnish verbs are very regular. The irregular verbs can almost be counted on one hand:

juosta käydä olla nähdä tehdä

and a few others. In fact, on the plus side, Finnish in general is very regular.

One easy aspect of Finnish is the way you can build many forms from a base root:

kirj-

kirjabook kirjeletter kirjoittaato write kirjailijawriter

As in many Asian languages, there are no masculine or feminine pronouns, and there is no grammatical gender. The numeral system is quite simple compared to other languages. Finnish has a complete lack of consonant clusters. In addition, the phonology is fairly simple.

Finnish is rated 5, extremely hard to learn.

Southern

Estonian has similar difficulties as Finnish, since they are closely related. However, Estonian is more irregular than Finnish. In particular, the very regular agglutination system described in Finnish seems to have gone awry in Estonian. Estonian has 14 cases, including strange cases such as the abessive, adessive, elative and inessive. On the other hand, all of these cases can simply be analyzed as the genitive case plus a single unvarying suffix for each case. In addition, there is no gender, so the only things you have to worry about when forming cases are singular and plural.

Estonian has a strange mood form called the quotative, often translated as “reported speech.”

tema onhe/she/it is

tema olevatit’s rumored that he/she/it is or he/she/it is said to be

This mood is often used in newspaper reporting and is also used for gossip.

Estonian has an astounding 25 diphthongs. It also has three different varieties of vowel length, which is strange in the world’s languages. There are short, vowels and extra-long vowels and consonants.

linalinen – short n linnathe town’s – long n, written as nn `linnainto the town – extra-long n, not written out!

There are differences in the pronunciation of the three forms above, but in rapid speech, they are hard to hear, though native speakers can make them out. Difficulties are further compounded in that extra-long sonorants (m, n, ng, l, and r) and vowels and are not written out. All in all, phonemic length can be a problem in Estonian, and foreigners never seem to get it completely down.

Estonian pronunciation is not very difficult, though the õ sound can cause problems. However, Estonian has completely lost the vowel harmony system it inherited from Finnish, resulting in words that seem very hard to pronounce.

At least in written form, Estonian is not as complex as Finnish. Estonian can be seen as an abbreviated and modernized form of Finnish. The grammar is also like a simplified version of Finnish grammar and may be much easier to learn.

Estonian is rated 4.5, very to extremely difficult.

Sami Eastern

Skolt Sami‘s Latinization is often listed as one of the worst Latinizations around. The rest of the language is quite similar to, and as difficult as, Finnish.

Skolt Sami gets a 5 rating, extremely hard to learn.

Ugric Hungarian

It’s widely agreed that Hungarian is one of the hardest languages on Earth to learn. Even language professors agree. The British Diplomatic Corps did a study of the languages that its diplomats commonly had to learn and concluded that Hungarian was the hardest. Hungarian grammar is maddeningly complex, and Hungarian is often listed on craziest grammar lists. For one thing, there are many different forms for a single word via word modification. This enables the speaker to make his intended meaning very precise. Looking at nouns, there are about 257 different forms per noun.

Hungarian is said to have from 24-35 different cases (there are charts available showing 31 cases), but the actual number may only be 18. Nearly everything in Hungarian is inflected, similar to Lithuanian or Czech. Similar to Georgian and Basque, Hungarian has the polypersonal agreement, albeit to a lesser degree than those two languages. There are many irregularities in inflections, and even Hungarians have to learn how to spell all of these in school and have a hard time learning this.

The case distinctions alone can create many different words out of one base form. For the word house, we end up with 31 different words using case forms:

házbainto the house házbanin the house házból from [within] the house házraonto the house házonon the house házróloff [from] the house házhozto the house házíguntil/up to the house háználat the house háztól [away] from the house házzá – Translative case, where the house is the end product of a transformation, such as They turned the cave into a house. házkéntas the house, which could be used if you acted in your capacity as a house or disguised yourself as one. He dressed up as a house for Halloween. házértfor the house, specifically things done on its behalf or done to get the house. They spent a lot of time fixing things up (for the house). házul – Essive-modal case. Something like “house-ly” or in the way/manner of a house. The tent served as a house (in a house-ly fashion).

And we do have some basic cases:

ház – Nominative. The house is down the street. házat – Accusative. The ball hit the house. háznak – Dative. The man gave the house to Mary. házzal – Similar to instrumental, but more similar to English with. Refers to both instruments and companions.

The genitive takes 12 different declensions, depending on person and number:

házammy house házaimmy houses házadyour house házaidyour houses házahis/her/its house házai his/her/its houses házunk our house házainkour houses házatok your house házaitok your house házuk their house házaik their houses egyházchurch, as in the Catholic Church. (Literally one-house)

In addition, the genitive suffixes to the possession, which is not how the genitive works in IE.

emberman/person házhouse a(z)the

az ember házathe man’s house (Lit. the man house-his) a házammy house (Lit. the house-my) a házadyour house (Lit. the house-your)

There are also very long words such as this:

megszentségteleníthetetlenségeskedéseitekért… for your (you all possessive) repeated pretensions at being impossible to desecrate

Being an agglutinative language, that word is made up of many small parts of words, or morphemes. That word means something like

The preposition is stuck onto the word in this language, and this will seem strange to speakers of languages with free prepositions.

Hungarian is full of synonyms, similar to English.

For instance, there are 78 different words that mean to move: halad, jár, megy, dülöngél, lépdel, botorkál, kódorog, sétál , andalog, rohan, csörtet, üget, lohol, fut, átvág, vágtat, tipeg, libeg, biceg, poroszkál, vágtázik, somfordál , bóklászik, szedi a lábát, kitér, elszökken, betér , botladozik, őgyeleg, slattyog, bandukol, lófrál, szalad, vánszorog, kószál, kullog, baktat, koslat, kaptat, császkál, totyog, suhan, robog, rohan, kocog, cselleng, csatangol, beslisszol, elinal, elillan, bitangol, lopakodik, sompolyog, lapul, elkotródik, settenkedik, sündörög, eltérül, elódalog, kóborol, lézeng, ődöng, csavarog, lődörög, elvándorol , tekereg, kóvályog, ténfereg, özönlik, tódul, vonul, hömpölyög, ömlik, surran, oson, lépeget, mozog and mozgolódik .

Only about five of those terms are archaic and seldom used, the rest are in current use. However, to be a fair, a Hungarian native speaker might only recognize half of those words.

In addition, while most languages have names for countries that are pretty easy to figure out, in Hungarian even languages of nations are hard because they have changed the names so much. Italy becomes Olazorszag, Germany becomes Nemetzorsag, etc.

As in Russian and Serbo-Croatian, word order is relatively free in Hungarian. It is not completely free as some say but rather is it governed by a set of rules. The problem is that as you reorder the word order in a sentence, you say the same thing but the meaning changes slightly in terms of nuance. Further, there are quite a few dialects in Hungarian. Native speakers can pretty much understand them, but foreigners often have a lot of problems. Accent is very difficult in Hungarian due to the bewildering number of rules used to determine accent. In addition, there are exceptions to all of these rules. Nevertheless, Hungarian is probably more regular than Polish.

Hungarian spelling is also very strange for non-Hungarians, but at least the orthography is phonetic. Nevertheless, the orthography often makes it onto worst orthographies lists.

Hungarian phonetics is also strange. One of the problems with Hungarian phonetics is vowel harmony. Since you stick morphemes together to make a word, the vowels that you have used in the first part of the word will influence the vowels that you will use to make up the morphemes that occur later in the word. The vowel harmony gives Hungarian a “singing effect” when it is spoken. The ty, ny, sz, zs, dzs, dz, ly, cs and gy sounds are hard for many foreigners to make. The á, é, ó, ö, ő, ú, ü, ű, and í vowel sounds are not found in English.

Verbs are marked for object (indefinite, definite and person/number), subject (person and number) tense (past, present and future), mood (indicative, conditional and imperative), and aspect (frequency, potentiality, factitiveness, and reflexiveness.

Elmentegettethetnélek. I could make others save you occasionally (on a disk).

Verbs change depending on whether the object is definite or indefinite.

Olvasok könyvet. I read a book. (indefinite object)

Olvasom a könvyet. I read the book. (definite object)

As noted in the introduction to the Finno-Ugric section, you need to know quite a bit of Hungarian grammar to be able to express yourself on a basic level. For instance, in order to say:

I like your sister.

you will need to understand the following Hungarian forms:

  1. verb conjugation and definite or indefinite forms
  2. possessive suffixes
  3. case
  4. how to combine possessive suffixes with case
  5. word order
  6. explicit pronouns
  7. articles

It’s hard to say, but Hungarian is probably harder to learn than even the hardest Slavic languages like Czech, Serbo-Croatian and Polish. At any rate, it is generally agreed that Hungarian grammar is more complicated than Slavic grammar, which is pretty impressive as Slavic grammar is quite a beast.

Hungarian is rated 5, extremely hard to learn.

Sino-Tibetan Sinitic Chinese Mandarin

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you often tend to hit a wall, often because the syntactic structure is so strangely different from English (isolating).

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English. No word is capable of declension, and there is no tense, case, and number, nor are there articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense. Mandarin has 12 different adverbs for which there is no good English translation.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There is such things as aspect, serial verbs, a complex classifier system, syntax marked by something called topic-prominence, a strange form called the detrimental passive, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff. Verb complements can be baffling, especially potential and directional complements. The 把, 是 and 的 constructions can be very hard to understand.

The topic-prominence is interesting in that only a few major languages have topic-comment syntax, and most of those are Oriental languages with a lot of Chinese borrowing. Topicalization is not marked morphologically.

There are sentences where the entire meaning changes with the addition of a single character. Chinese sentences are SVO (Subject -Verb – Object) at their base, but that is a bit of an illusion. A sentence that causes you to discuss time duration makes you repeat the verb after the direct object – SVOVT (T= time phrase). In the case of topicalization, sentences can have the structure of OSV (Object – Subject – Verb). Relative clauses and all subordinate clauses come before the noun they modify. In other words:

English: The man who always wore red walked into the room. Chinese: Who always wore red the man walked into the room.

The relative clause in the sentences above is marked in bold.

In Chinese, the prepositional phrase comes between the subject and the verb:

English: The man hit the ball into the yard. Chinese: The man into the yard hit the ball.

The prepositional phrase is bolded in the sentences above.

In Chinese, adjectives are actually stative verbs as in Nahuatl and Lakota.

那个热菜很好吃。 Nàgè rède cài hěnhǎochī. The it is hot food is good to eat. The hot food is delicious.

The symbol turns food hot into food it is hot, an attributive verb. means something like to be.

There are dozens of words called particles which shade the meaning of a sentence ever so slightly.

Chinese phonology is not as easy as some say. There are way too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same. There is a distinction between aspirated and nonaspirated consonants. There is also the presence of odd retroflex consonants.

Chinese orthography is probably the most hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more, but you only need to know about 3-5,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than

In addition, the characters have not been changed in 3,000 years, and the alphabet is at least somewhat phonetic, so we run into a serious problem of lack of a spelling reform.

The Communists tried to simplify the system (simplified Mandarin) but instead of making the connections between the phonetic aspects of character more sensible by decreasing their number and increasing their regularity (they did do this somewhat but not enough), they simply decreased the number of strokes needed for each symbol typically without dealing with the phonetic aspect of all. The simplification did not work well, so now you have a mixture of two different types of written Chinese – simplified and traditional.

In addition to all of this, Chinese borrowed a lot from the Japanese symbolic alphabet a full 1,000 years after it had already been developed and had not undergone a spelling reform, adding insult to injury.

Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese prose. It’s as if it’s written in a different language – actually, it is technically a different language similar to Middle English or Old English. However, few Middle English or Old English texts are read anymore, and Classical Chinese is still widely read.

However, the orthography is at least consistent. 9

Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense.

It’s a real problem when you encounter a symbol you don’t know because there is no way to sound out the word. You are really and truly lost and screwed. There is a clue at the right side of the symbol, but it is not always accurate.You need to learn quite a bit of vocabulary just to speak simple sentences.

Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary.

Some Chinese Muslims write Chinese using an Arabic script. This is often considered to be one of the worst orthographies of all.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another. However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones, and meaning is often discerned by context, stress, rhythm and intonation. Chinese, like French and English, is heavily idiomatic.

It’s little known, but Chinese also uses different forms (classifiers) to count different things, like Japanese.

There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms.

In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling.

mei meiyounger sister jie jieolder sister ge geolder brother di diyounger brother

Mandarin scored very high on a weirdest languages study.

On the positive side, Chinese grammar is fairly regular and word derivation, compound words are sensible and the meaning can be determined by looking at the word. In other languages, compound words are not necessarily so obvious.

Many agree that Chinese is the hardest to learn of all of the major languages. A recent survey of language professors rated Chinese as the hardest language on Earth to learn.

Mandarin gets a 5.5 rating for nearly hardest of all.

However, Cantonese is even harder to learn than Mandarin. Cantonese has eight tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.

In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Modal particles are difficult in Cantonese. Clusters of up to the 3 sentence final particles are very common. 我食咗飯 and 我食咗飯架啦喎 are both grammatical for I have had a meal, but the particles add the meaning of I have already had a meal, answering a question or even to imply I have had a meal, so I don’t need to eat anymore.

Cantonese gets a 5.5 rating, nearly hardest of all.

Min Nan is also said to be harder to learn than Mandarin, as it has a more complex tone system, with five tones on three different levels. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor, and many fewer children are being raised speaking it than before.

Min Nan gets a 5.5 rating, nearly hardest of all.

A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in 91 linguistic families in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of southern Shanghai (Dônđän Wu) was the most phonologically complex language of all, with 20 separate vowels (Wang 2012). The nearest competitor was Norwegian with 16 vowels.

Dônđän Wu gets a 5.5 rating, nearly hardest of all.

Classical Chinese is still read by many Chinese people and Chinese language learners. Unless you have a very good grasp on modern Chinese, classical Chinese will be completely wasted on you. Classical Chinese is much harder to read than reading modern Chinese.

Classical Chinese covers an era extending over 3,000 years, and to attain a reading fluency in this language, you need to be familiar with all of the characters used during this period along with all of the literature of the period so you can understand all the allusions. Even with a knowledge of Classical Chinese, you need to read it in context. If you are good at Classical Chinese and someone throws you a random section of it, it will take you a good amount of time to figure it out unless you know context.

The language is much more to the point than Modern Chinese, but this is not as good as it sounds. This simplicity leaves a room for ambiguity, and context plays an important role. A joke about some obscure historical or literary anecdote will be lost you unless you know what it refers to. For reading modern Chinese, you will need at least 5,000 characters, but even then, you will still need a dictionary. With Classical Chinese, there are no lower limits on the number of characters you need to know. The sky is the limit.

Classical Chinese gets a 6 rating, hardest of all.

Tibeto-Burman Qiangic Northern Qiang

In Quiang, a language of Sichuan Province in China, not only are there rhotic vowels, which are present in only

ʀuɑ +e˞ > ʀuɑ˞kʰ me + w ˞> mw

Rhotic vowels are found in US English – Unstressed ɚ: standard, dinner, Lincolnshire, editor, measure, martyr.

Qiang also has a very bad romanization, so bad that the Qiang will not even use it. Voiced consonants are written by adding a vowel to the symbol for the voiceless consonant. It has long and short vowels, but these are not represented in the system.

Qiang gets a 5 rating, extremely hard to learn.

Western Tibeto-Burman Bodish Central Bodish Central

Tibetan probably has one of the least rational orthographies of any language. The orthography has not changed in ~1,000 years while the language has gone through all sorts of changes. A langauge learner in Tibet can get by using phonetic spelling. The problem comes when you try to spell using the Classical Alphabet. For instance:

Srong rtsan Sgam po (written) soŋtsɛn ɡampo (spoken)

bsgrubs (written)

d`up (spoken)

While the orthography is etymological and completely outdated, it is quite predictable.

Tibetan gets a 5 rating, extremely hard to learn.

Southern

Dzongka, the official language of Bhutan, has some pretty wild phonology, in addition to having the Tibetan writing system, this time using Bhutanese forms of the Tibetan script.

It contrasts all of the following: s, , ʰs, ʰsʰ, ts, ʰts, tsʰ, z, ʱz, dz, ʱdz, ⁿsʰ, ᵐtsʰ, ⁿtsʰ, ⁿdz, ᵖts, ᵖtsʰ, ᵖtsʷʰ, and ᶲs, and in addition it has four tones, but there is no single word that is distinguished by tone only. On top of that, there are 22 different vowels.

Dzongka gets a 5 rating, extremely hard to learn.

Austroasiatic Mon-Khmer Vietic

Vietnamese is also hard to learn because to an outsider, the tones seem hard to tell apart. Therefore, foreigners often make themselves difficult to understand by not getting the tone precisely correct. It also has “creaky-voiced” tones, which are very hard for foreigners to get a grasp on.

Vietnamese grammar is fairly simple, and reading Vietnamese is pretty easy once you figure out the tone marks. Words are short as in Chinese. However, the simple grammar is relative, as you can have 25 or more forms just for I, the 1st person singular pronoun. In addition, the Latin orthography is said to be quite bad. It was invented by missionaries a few centuries ago, and it has never made much sense.

Vietnamese gets 5 rating, extremely hard to learn.

Mon-Khmer Khmer

Khmer has a reputation for being hard to learn. I understand that it has one of the most complex honorifics systems of any language on Earth. Over a dozen different words mean to carry depending on what one is carrying. There are several different words for slave depending on who owned the slave and what the slave did. There are 28-30 different vowels, including sets of long and short vowels and long and short diphthongs. The vowel system is so complicated that there isn’t even agreement on exactly what it looks like. Khmer learners, especially speakers of IE languages, often have a hard time producing or even distinguishing these vowels.

Speaking it is not so bad, but reading and writing it is pretty difficult. For instance, you can put up to five different symbols together in one complex symbol. The orthographic script is even worse than the Thai one. There are actually rules to this mess, but no one seems to know who they are.

Khmer gets a  4.5 rating, very to extremely hard.

Bahnaric North Bahnaric West Sedang-Todrah Sedang

Sedang, a language of Vietnam, has the highest number of vowel sounds of any language on Earth, at 55 distinct vowel sounds.

Sedang gets a 5 rating, extremely hard to learn.

Hmong-Mien Hmongic Chuanqiandian

Hmong is widely spoken in this part of California, but it’s not easy to learn. There are eight tones, and they are not easy to figure out. It’s not obviously related to any other major language but the obscure Mien.

It has some very strange consonants called voiceless nasals. We have them in English as allophones – the m in small is voiceless, but in Hmong, they put them at the front of words – the m in the word Hmong is voiceless. These can be very hard to pronounce.

The romanization is widely criticized for being a lousy one, but the Hmong use it anyway.

Hmong gets a 5 rating, extremely hard to learn.

Austro-Tai Austronesian Tsouic

Tsou is a Taiwanese aborigine language spoken by about 2,000 people in Taiwan. It has the odd feature whereby the underlying glides y and w turn into or surface as non-syllabic mid vowels e̯ and o̯ in certain contexts:

jo~joskɨ -> e̯oˈe̯oskɨ  -= fishes

Tsou is also ergative like most Formosan languages. Tsou is the only language in the world that has no prepositions or anything that looks like a preposition. Instead it uses nouns and verbs in the place of prepositions. Tsou allows more potential consonant clusters than most other languages. About 1/2 of all possible CC clusters are allowed.

Tsou has an inclusive/exclusive distinction in the 1st person plural and a very strange visible and non-visible distinction in the 3rd person singular and plural. Both adjectives and adverbs can turn into verbs and are marked for voice in the same way that verbs are. Verbs are extensively marked for voice. Nouns are marked for a variety of odd cases, often referring to perception, (visible/invisible) person, and place deixis.

‘e –               visible and near speaker si/ta –           visible and near hearer ta –               visible but away from speaker ‘o/to –           invisible and far away, or newly introduced to discourse na/no ~ ne – non-identifiable and non-referential (often when scanning a class of elements)

Tsou gets a 5 rating, extremely hard to learn.

Malayo-Polynesian Malayo-Chamic Malayic Malay

Bahasa Indonesia is an easy language to learn. For one thing, the grammar is dead simple. There are only a handful of prefixes, only two of which might be seen as inflectional. There are also several suffixes. Verbs are not marked for tense at all. And the sound system of these languages, in common with Austronesian in general, is one of the simplest on Earth, with only two dozen phonemes. Bahasa Indonesia has few homonyms, homophones, homographs, or heteronyms. Words in general have only one meaning.

Though the orthography is not completely phonetic, it only has a small number of nonphonetic exceptions. The orthography is one of the easiest on Earth to use.

The system for converting words into either nouns or verbs is regular. To make a plural, you simply repeat a word, so instead of saying pencils, you say pencil pencil.

Bahasa Indonesia gets a 1.5 rating, extremely easy to learn.

Malay is only easy if you learn the standard spoken form or one of the creoles. Learning the literary language is quite a bit more difficult. However, the Jawi script, which is Malay written in Arabic script, is often considered to be perfectly awful.

Malay get a 2 rating for moderately easy.

Philippine Greater Central Philippine Central Philippine Tagalog

However, Tagalog is much harder than Malay or Indonesian. Compared to many European languages, Tagalog syntax, morphology and semantics are often quite different. Also, Tagalog is typically spoken very fast. Unlike Malay, verbs conjugate quite a bit in Tagalog. The main idea of Tagalog grammar is something called focus. Once you figure that out, the language gets pretty easy, but until you understand that concept, you are going to have a hard time.

Everything is affixed in Tagalog.

However, articles and creation of adjectives from nouns is very easy.

Compare:

gandabeauty (noun) magandabeautiful (adjective)

Tagalog gets a 4 rating, very difficult.

Central-Eastern Malayo-Polynesian Eastern Malayo-Polynesian Oceanic Central-Eastern Oceanic Remote Oceanic Central Pacific East Fijian-Polynesian Polynesian Nuclear East Central Tahitic

Maori and other Polynesian languages have a reputation for being quite easy to learn. The main problem for English speakers is that the sentence structure is backwards compared to English. In addition, macrons can cause problems.

One problem with Maori is dialects. The dialects are so diverse that this means that there are multiple words for the same thing. Swiss German has a similar issue, with up to 50 words for each common household item (nearly every major dialect has its own word for common objects):

ngongi, noni, koki, waiwater whiri, rarangi, hiri –  to plait, to twist, to weave pai, maitaigood tu, , tutehu, mātikato stand mau, mouto hold pau, pouto be exhausted ika, tohorāwhale ika, ngohifish kāwei, kāwailine ori, kori, keukeu, koukou, neke, nukuto move haere, hara, here, horo, whanoto go, to come hara, hapa, to be wrong kōrerorero, wānanga, rūnangato discuss tohunga, tahungapriest matikuku, maikukufinger nail kanohi, konohi, mata, whatu, kamo, karueye, face

Entire Maori sentences can be written with vowels only.

E uu aau? Are yours firm?

I uaa ai. It rained as usual.

I ui au ‘i auau aau?’ E uaua! It will be difficult/hard/heavy!

On the plus side, the pronunciation is simple, and there is no gender. The language is as regular as Japanese. No Polynesian language has more than 16 sounds, and they all lack tones. They all have five vowels, which can be either long or short. A consonant must be followed by a vowel, so there are no consonant clusters. All consonants are easy to pronounce.

Maori gets a 3 rating, average difficulty.

Marquesic

Hawaiian is a pretty easy language to learn. It is easy to pronounce, has a simple alphabet, lacks complex morphology and has a fairly simple syntax.

Hawaiian gets a 2 rating, very easy to learn.

North and Central Vanuatu East Santo North

Sakao is a very strange langauge spoken by 4,000 people in Vanuatu.  It is very strange. It is a polysynthetic Austronesian language, which is very weird. It allows extreme consonant clusters. Sakao has an incredible seven degrees of deixis. The language has an amazing four persons: singular, dual, paucal and plural. The neighboring language Tomoko has singular, dual, trial and plural. The trial form is very odd. Sakao’s paucal derived from Tomato’s trial:

jørðœl they, from three to ten

jørðœl løn the five of them (Literally, they three, five)

All nouns are always in the singular except for kinship forms and demonstratives, which only display the plural:

ðjœɣmy mother/aunt -> rðjœɣmy aunts

walðyɣmy child -> raalðyɣmy children

It has a number of nouns that are said to be “inalienably possessed”, that is, whenever they occur, they must be possessed by some possessor. These often take highly irregular inflections:

Sakao 	  English
œsɨŋœ-ɣ   my mouth
œsɨŋœ-m   thy mouth
ɔsɨŋɔ-n   his/her/its mouth
œsœŋ-...  ...'s mouth	

uly-ɣ 	  my hair
uly-m 	  thy hair
ulœ-n 	  his/her/its hair
nøl-...   ...'s hair

Here, mouth is either œsɨŋœ-, ɔsɨŋɔ- or œsœŋ-, and hair is either uly-, ulœ- or nøl-

Sakao, strangely enough, may not even have syllables in the way that we normally think of them. If it does have syllables at all, they would appear to be at least a vowel optionally  surrounded by any number of consonants.

i (V) thou Mhɛrtpr. (CCVCCCC) Having sung and stopped singing thou kept silent.

Sakao has a suffix -in that makes an intransitive verb transitive and makes a transitive verb ditransitive. Ditransitive verbs can take two arguments – a direct object and an instrumental.

Mɨjilɨn amas ara./Mɨjilɨn ara amas. He kills the pig with the club/He kills with the club the pig.

Sakao polysynthesis allows compound verbs, each one having its own instrument or object:

Mɔssɔnɛshɔβrɨn aða ɛðɛ. He-shooting-fish-kept-on-walking with-a-bow the-sea. He walked along the sea shooting the fish with a bow.

Sakao gets a 5 rating, extremely hard to learn.

Central-Eastern Oceanic Southeast Solomonic Malaita–San Cristobal Malaita Northern Malaita

Kwaio is an Austronesian language spoken in the Solomon Islands. It has four different forms of number to mark pronouns – not only the usual singular and plural, but also the rarer dual and the very rare paucal. In addition, there is an inclusive/exclusive contrast in the non-singular forms.

For instance:

1 dual inclusive (you and I) 1 dual exclusive (I and someone else, not you)

1 paucal inclusive (you, I and a few others) 1 paucal exclusive (I and a few others)

1 plural inclusive (I, you and many others) 1 plural exclusive (I and many others)

Pretty wild!

Kwaio gets a 5 rating, extremely hard to learn.

Greater Barito East Barito Malagasy

Malagasy, the official language of Madagascar, has a reputation for being even easier to learn than Indonesian or Malay.

Malagasy gets a 1 rating, easiest of all to learn.

Tai-Kadai Kam-Tai Tai Southwestern

Thai is a pretty hard language to learn. There are 75 symbols in the strange script, there are no spaces between words in the script, and vowels can come before, after, above or below consonants in any given syllable. There seem to be many different glyphs for every consonant, but the different glyphs for the same consonant will sometimes change the sound of the neighboring vowel. The orthography is as insensible as that of English since centuries have gone by with no spelling reforms, in fact, Thai has not changed its system in 1000 years. The wild card of having tone thrown in adds to the insanity.

Consonant pronunciations vary depending on the location of the syllable in the word – for instance, s can change to t. There are many vowels which are spoken but not written. There are many consonants that are pronounced the same – for instance, there are six different t‘s, not counting the s‘s that turn into t‘s. The Thai script is definitely one of the most difficult phonetic scripts. Nevertheless, the Thai script is easier to learn than the Japanese or Chinese character sets. In spite of all of that, the syntax is simple, like Chinese.

There are five tones, including a neutral tone. Tones are determined by a variety of complex things, including a combination of tone marks, the class of consonants, if the syllable ends in a sonorant or a stop and what the tone of the preceding syllable was. Tone marking in the orthography is quite complex.

The vowels are different than in many languages, and there are some unusual diphthongs: eua, euai, aui and uu. There is a contrast between aspirated and unaspirated consonants.

There is a system of noun classifiers for counting various things, similar to Japanese. In addition, common to many Asian languages, there is a complicated honorifics system.

On the plus side, Thai is a regular language, with few exceptions to the rules. However, the rules are quite complex. The syntax is about as complex as that of Chinese, and the grammar is dead simple.

Thai gets a 5 rating, hardest of all to learn.

Lao is very similar to Thai, in fact it is identical to a Thai language spoken by 16 million people in northeast Thailand called Northeastern Thai. The Lao script is similar to Thai, but it has fewer letters so there is somewhat less confusion.

Lao gets a 4.5 rating, very to extremely hard to learn.

Kam-Sui

The Kam languages of the Dong people in southwest China were rated by the Fudan University study referenced above under Wu as the 2nd most phonologically complex on Earth (Wang 2012). There are 32 stem initial consonants, including oddities like , tɕʰ, , pʲʰ, ɕ, , kʷʰ, ŋʷ, tʃʰ, tsʰ. Note the many contrasts between aspirated and unaspirated voiceless consonants, including bilabial palatalized stops, labialized velar stops, and alveolar affricates. There are an incredible 64 different syllable finals, and 14 others that occur only in Chinese loans.

There are an astounding 15 different tones, nine in open syllables and six in checked syllables (entering tones). Main tones are high, high rising, high falling, low, low rising, low falling, mid, dipping and peaking. When they speak, it sounds as if they are singing.

Kam gets a 5 rating, extremely hard to learn.

Kra Paha

According to the Fudan University study quoted above, Buyang in the 3rd most phonologically complex language in the world. Buyang is a cluster of 4 related languages spoken by 1,900 people in Yunnan Province, China. Buyang has a completely wild consonant inventory.

It has a full set of both voiced and voiceless plain and aspirated stops, including voiceless uvulars. The contrast between aspirated and plain voiced stops is peculiar. The stop series also has distinctions between palatalized and rounded stops throughout the series. It has a labialized voiceless palatal fricative and a voiceless dental aspirated lateral, unusual sounds. It has four different voiceless aspirated nasals. It has voiceless y and w, more odd sounds. It also has plain and labialized palatal glides.

That is one heck of a wild phonology.

Buyang gets a 5 rating, extremely hard to learn.

Niger-Kordofanian Niger-Congo Atlantic–Congo Kwa Nyo Ga-Dangme

The African Bantu language Ga has a bad reputation for being a tough nut to crack. It is spoken in Ghana by about 600,000 people. It has two tones and engages in a strange behavior called tone terracing that is common to many West African languages. There is a phonemic distinction between three different types of vowel length. All vowels have 3 different lengths – short, long and extra long. It also has many sounds that are not in any Western languages.

Ga gets a 5 rating, extremely hard to learn.

Potou-Tano Tano Central Bia Northern

Anyi is a language spoken by 610,000 people in Côte d’Ivoire.  It is relatively straightforward as far as African languages go. Probably the hardest part about the language is that it is tonal, and it does have two tones. The phonology does have the unusual +-ATR contrast which will seem very odd. ATR stands for advanced tongue root, so the language has a contrast between vowels with an advanced tongue root and without one. However, the grammar is pretty regular. There are few confusing phonological processes.

Anyi has a simple tense system, with only present, past and future. There is no aspect, mood or voice marking, and it lacks the noun class systems so common in many African languages. It has a plural marker, but it is often optional.

The syntax does have serial verbs, which will seem odd to Westerners. It distinguishes between relative clauses marked with and subordinate clauses marked with .

Anyi gets a 4 rating, very hard to learn.

Volta-Congo Benue-Congo Bantoid Southern Narrow Bantu Central M Nyika-Safwa

Ndali is a Bantu language with 150,000 speakers spoken in Malawi and Tanzania. It has many strange tense forms. For instance, in the past tense:

Past tense A: He went just now. Past tense B: He went sometime earlier today. Past tense C: He went yesterday. Past tense D: He went sometime before yesterday.

Future tense is marked similarly:

Future tense A: He’s going to go right away. Future tense B: He’s going to go sometime later today. Future tense C: He’s going to go tomorrow. Future tense D: He’s going to go sometime after tomorrow.

Ndali gets a 5 rating, extremely hard to learn.

S Nguni

Xhosa, a language of South Africa, is quite difficult, with up to nine click sounds. Clicks only exist in one language outside of Africa – the Australian language Damin – and are extremely difficult to learn. Even native speakers mess up the clicks sometimes. Nelson Mandela said he had problems making some of the click sounds in Xhosa. The phonemics in general of Xhosa are pretty wild.

Xhosa gets a 5 rating, extremely hard to learn.

Zulu and Ndebele also have these impossible click sounds. However, outside of click sounds, the phonology of Nguni languages is straightforward. All Nguni languages are agglutinative. These languages also make plurals by changing the prefix of the noun, and the manner varies according the noun class. If you want to look up a word in the dictionary, first of all you need to discard the prefix. For instance, in Ndebele,

riverumfula riversimifula, but

stoneilitshe stones –  amatsheyet

treeisihlahla trees izihlahla

Ndebele gets a 5 rating, hardest of all.

Zulu has pitch accent, tones and clicks. There are nine different pitch accents, four tones and three clicks, but each click can be pronounced in five different ways. However, tones are not marked in writing, so it’s hard to figure out when to use them. Zulu also has depressor consonants, which lower the tone in the vowel in the following syllable. In addition, Zulu has multiple gender – 15 different genders. And some nouns behave like verbs. It also has 12 different noun classes, but 9

Zulu gets a 5 rating, extremely hard to learn.

G Swahili

For unknown reasons, Swahili is generally considered to be an easy language to learn. The US military ranks it 1, with the easiest of all languages to learn. This seems to be the typical perception. Why Swahili is so easy to learn, I am not sure. It’s a trade language, and trade languages are often fairly easy to learn. There’s also a lot of controversy about whether or not Swahili can be considered a creole, but that has not been proven. For the moment, the reasons why Swahili is so easy to learn will have to remain mysterious.

On the down side, Swahili has many noun classes, but they have the benefit of being more or less logical.

Swahili gets a 2 rating, moderately easy.

Khoisan Southern Africa Southern Hua

!Xóõ (Taa), spoken by only 4,200 Bushmen in Botswana and Namibia, is a notoriously difficult Khoisan language replete with the notoriously impossible to comprehend click sounds. Taa has anywhere from 130 to 164 consonants, the largest phonemic inventory of any language. Of this vast wealth of sounds, there are anywhere from 30-64 different click sounds. There are five basic clicks and 17 accompanying ones. Speakers develop a lump on their larynx from making the click sounds.

In addition, there are four types of vowels: plain, pharyngealized, breathy-voiced and strident. On top of that, there are four tones. Taa appears on many lists of the wildest phonologies and craziest languages period on Earth.

Taa gets a 5 rating, extremely hard to learn.

Northern

Ju|’hoan, a Khoisan language spoken by 5,000 people in Botswana, has one of the study of the weirdest languages on Earth.

Ju|’hoan gets a 5 rating, extremely hard to learn.

Eskimo-Aleut Eskimo Inuit-Inupiaq

Inuktitut is extremely hard to learn. Inuktitut is polysynthetic-agglutinative, and roots can take many suffixes, in some cases up to 700. Verbs have 63 forms of the present indicative, and conjugation involves 252 different inflections. Inuktitut has the complicated polypersonal agreement system discussed under Georgian above and Basque below. In a typical long Inuktitut text, 9

Inuktituusuungutsialaarungnanngittuaraaluuvunga. I truly don’t know how to speak Inuktitut very well.

You may need to analyze up to 10 different bits of information in order to figure out a single word. However, the affixation is all via suffixes (there are no prefixes or infixes) and the suffixation is extremely regular.

Inuktitut is also rated one by linguists one of the hardest languages on Earth to pronounce. Inuktitut may be as hard to learn as Navajo.

Inuktitut is rated 6, hardest of all.

Kalaallisut (Western Greenlandic) is very closely related to Inuktitut. Look at this sentence:

Aliikusersuillammassuaanerartassagaluarpaalli… However, they will say that he is a great entertainer, but …

That word is composed of 12 separate morphemes. A single word can conceptualize what could be an entire sentence in a non-polysynthetic language.

Kalaallisut is rated 6, hardest of all.

Chukotko-Kamchatkan Northern Chukot

Chukchi is a polysynthetic, agglutinating and incorporating language and is often listed as one of the hardest languages on Earth to learn.

Təmeyŋəlevtpəγtərkən. I have a fierce headache.

There are five morphemes in that word, and there are three lexical morphemes (nouns or adjectives) incorporated in that word: meyŋgreat, levthead, and pəγtache.

Chukchi gets a 6 rating, hardest of all.

Basque

Basque, of course, is just a wild language altogether. There is an old saying that the Devil tried to learn Basque, but after seven years, he only learned how to say Hello and Goodbye. Many Basques, including some of the most ardent Basque nationalists, tried to learn Basque as adults. Some of them succeeded, but a very large number of them failed. Based on the number that failed, it does seem that Basque is harder for an adult to learn as an L2 than many other languages are. Basque grammar is maddeningly complex and it often makes it onto craziest grammars and craziest language lists.

There are 11 cases, and each one takes four different forms. The verbs are quite complex. This is because it is an ergative language, so verbs vary according to the number of subjects and the number of objects and if any third person is involved.

This is the same polypersonal agreement system that Georgian has above. Basque’s polypersonal system is a polysynthetic system consisting of two verb types – synthetic and analytical. Only a few verbs use the synthetic form.

Three of Basque’s cases – the absolutive (intransitive verb case), the ergative (intransitive verb case) and the dative – can be marked via affixes to the verb. In Basque, only present simple and past simple synthetic tenses take polypersonal affixes.

The analytical forms are composed of more than one word, while the synthetic forms are all one word. The analytic verbs are built via the synthetic verbs izanbe, ukanhave and egindo.

Synthetic:

d-akar-ki-o-gu = We bring it to him/her. The verb is ekarribring. z-erama-zki-gu-te-n = They took them to us. The verb is eramantake

Analytic:

Ekarriko d-i-o-gu = We’ll bring it to him/her. Literally: We will have-bring it to him/her. The analytic verb is built from ukanhave.

Eraman d-ieza-zki-gu-ke-te = They can take them to us. Literally: They can be taking them to us. The analytic verb is built from izanbe.

Most of the analytic verbs require an auxiliary which carries all sorts of information that is often carried on verbs in other languages – tense, mood, sometimes gender and person for subject, object and indirect object.

Jaten naiz. Eat I-am-doing. I am eating.

Jaten nintekeen. Eat I-was-able-to. I could eat.

Eman geniezazkiake. Give we-might-have-them-to-you-male. We might have given them to you.

In the above, naiz, nintekeen and geniezazkiake are auxiliaries. There are actually 2,640 different forms of these auxiliaries!

A language with ergative morphosyntax in Europe is quite a strange thing, and Basque is the only one of its kind. The ergative itself is quite unusual:

Gizona etorri da.The man has arrived. Gizonak mutila ikusi du.The man saw the boy.

gizonman mutilboy -a = the

The noun gizon takes a different form whether it is the subject of a transitive or intransitive verb. The first sentence is in absolutive case (unmarked) while the second sentence is in the ergative case (marked by the morpheme -k). If you come from a non-ergative IE language, the concept of ergativity itself is difficult enough to conceptualize, much less trying to actually learn an ergative language. Consequently, any ergative language will automatically be more difficult than a non-ergative one for all speakers of IE languages.

Ergativity also works with pronouns.  There are four basic systems:

Nor:           verb has subject only
Nor-Nork:          "    subj. + direct complement
Nor-Nori:          "    subj. + indirect comp.
Nor-Nori-Nork:     "    subj. + indir. + dir. comps.

Some call Basque the most consistently ergative language on Earth.

If you don’t grow up speaking Basque, it’s hard to attain native speaker competence. It’s quite a bit easier to write in Basque than to speak it.

Nevertheless, Basque verbs are quite regular. There are only a few irregularities in conjugations and they have phonetic explanations. In fact, the entire language is quite regular. In addition, most words above the intermediate level are borrowings from large languages, so once you reach intermediate Basque, the rest is not that hard. In addition, pronunciation is straightforward.

Basque is rated 5.5, nearly hardest of all.

References

Dorani, Yakir. Hebrew speaker, Israel. August 2013. Personal communication.

Hewitt, B. G.. 2005. Georgian: A Learner’s Grammar, p. 29.

Kim, Yuni. December 16, 2003. Vowel Elision and the Morphophonology of Dominance in Aymara. UC Berkeley.

Kirk, John William Carnegie. 1905. A Grammar of the Somali Language: With Examples in Prose and Verse and an Account of the Yibir and Midgan Dialects, pp. 73-74.

Rogers, Jean H. 1978. Differential Focusing in Ojibwa Conjunct Verbs: On Circumstances, Participants, and Events. International Journal of American Linguistics 44: 167-179.

Wang, Chuan-Chao et al. 2012. Comment on ”Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa.” Science 335:657.

This research takes a lot of time, and I do not get paid anything for it. If you think this website is valuable to you, please consider a a contribution to support more of this valuable research.

Revisions to Races of Man Classification

Repost from the old site.

Click to enlarge. This is the chart from the paper, The Origin of Minnan & Hakka, the So-called “Taiwanese”, Inferred by HLA Study, utilized in this post.

I usually try to be very conservative about adding in new races to my races of man post, but sometimes I just feel like I’m forced to. Based on this article, and in particular, the figure above, forced me to make some new splits.

The question was what to do about the Taiwanese people. Not the Taiwan aborigines – but the Hakka and Min Nan people of SE China who settled in Taiwan in the past 400 years. It turns out that they appear to be a discrete race, and that they are linked to Singapore Chinese and the Thai Chinese. In Singapore and Thailand, Chinese form a market-dominant minority position.

They are a minority of the population, but they tend to run businesses and be very wealthy. Similar cases are seen in Indonesia and the Philippines, where tiny Chinese minorities of 2-

So the interesting question arises – who exactly are the Chinese minorities of Thailand and Singapore? By genetic studies, we can now see that they are SE Chinese people related to the Min Nan and the Hakka.

The Min Nan and Hakka both speak languages that are called Chinese dialects, but in reality, they are completely separate languages. Both languages are doing fine – Min Nan (Southern Min) with 49 million speakers and Hakka with 34 million speakers.

Min Nan and Hakka both strangely lack official status anywhere, although Southern Min is widely spoken in Taiwan. It’s odd that some of the world’s most widely spoken languages lack official status – Min Nan is the 24th largest language, and Hakka is the 35th largest language, in terms of numbers of speakers.

Both languages are vigorous and are in good shape. Southern Min has a roman script that is fairly widely used. Hakka also has a roman script, but I am not sure how widely it is used.

Southern Min is actually a number of separate languages: Min Nan proper, Amoy, Click to enlarge. Here is a map of the various Chinese languages. These are not Chinese dialects, but actual separate languages. Some may be dialects of other Chinese languages though. The main languages are Mandarin, Wu, Cantonese, Min, Xiang, Hakka and Gan. Ping, Hui and Jin are classed above as dialects of those larger languages.Jin is classed as a dialect of Mandarin, but it is actually a separate language with 45 million speakers, making it around the 25th largest language in the world.Min is said 5 separate languages, but it is actually many separate languages. The 5 separate recognized languages are Min Nan, Min Dong, Min Zhong, Min Bei and Puxian. Min Nan itself is a number of separate languages. Huizhou, or Hui, is a separate language that is actually a set of related languages. Wu is more than one language.

Ping is traditionally considered to be part of Cantonese, but it is a separate language. Mandarin is also a set of related languages instead of one language. Cantonese is also be more than one language. Hakka is also be more than one language.

It is nonsense to say someone speaks “Chinese”. There is no such thing as a language called “Chinese”.

Instead, there are various languages in the Chinese language family – at least 14 separate languages, and actually many more. Mandarin is by far the largest of these languages, and most of the smaller languages are suffering under the influence of Mandarin. In addition, the Chinese government favors Mandarin and does not support the other languages much, if at all.

I also split off a group called the Li and another group called the Oroqen based on the chart above.

The Li are a transitional group between the Northern Chinese and the Southern Chinese, though they live on Hainan Island in the far south of China. They speak a Tai-Kadai language called Hlai which has 667,000 speakers. Use is vigorous; the language is doing well, but it is generally not written, although a Roman script exists. Mandarin is used for writing.

The Oroqen are nomadic people who live in far northeastern China and speak a Tungusic tongue. As you can see from the chart, they are closer to the Japanese than to the NE Chinese. There are only 1,200 speakers left out of a small 7,000 population, but there are 800 monolinguals, and use is vigorous by those who speak the language.

They live by hunting and used to practice shamanism. They still lack an official script for their language, but there are radio programs in Oroqen.

The truth is that both the Oroqen people and their language are in poor shape, and most of the blame can be placed on the Communist Chinese regime, even though the regime has also done many good things for the Oroqen. The Cultural Revolution in particular was a period of insanity, stupidity and terror.

An Oroqen Race was added to the NE Asian Major Race due to the extreme divergence of these people. I also added Inner Mongolians to the Mongolian Race inside of NE Asian.

I added the Buyei to the Tai Race within the SE Asian Major Race and created a new race called SE Chinese Race, consisting of Min Nan, Hakka, Singapore Chinese and Thai Chinese. The Buyei live in southern China and northern Vietnam and speak a Tai language that has over 2 million speakers yet has no official status. Buyei language use is vigorous, and it is in good shape.

There is a romanized script, and there are newspapers in the language, but they mostly use Mandarin for writing. The Buyei language is probably made up of a few separate languages, because some of the dialects are not mutually intelligible. The language is very close to the Zhuang language.

The SE Chinese Race really consists of the descendants of the ancient Chinese people known as the Yueh. The Yueh, or Yue, formed a state in southeastern coastal China during the Warring States Period and the Spring and Autumn Period. The state lasted from about 525 BC to 334 BC. The Chinese were already involved in metallurgy and were producing excellent swords during these periods.

The new lineup looks like this:

Northeast Asian Major Race*

Japanese-Korean Race Southern Japanese Race (Honshu Kinki – Kyushu) Ryukyuan Race Ainu Race*** Gilyak Race** Northern Chinese Race (Northern Chinese – Qiang – Manchu – Hui) Oroqen Race Sherpa-Yakut Race Nepalese Race (Nepali – Newari) Mongolian Race (Mongolian – Inner Mongolian – Buryat – Kazakh) Northern Turkic Race (Dolgan – Altai – Shor – Tofalar – Uighur – Chelkan – Soyot – Kumandin Teleut – Hazara)*** Central Asian Race (Kirghiz – Karalkalpak – Uzbek – Turkmen) Tuva Race Tungus Race (Even – Evenki – Russian Saami) Siberian Race Beringian Race** (Chukchi – Aleut – Siberian Eskimo) Koryak-Itelmen Race Reindeer Chukchi Race General Tibetan Race (Tibetan – Lisu – Nu – Karen – Tujia – Hui – Akha – Burmese – Bai – Yizu – Pnar – Mizo) Bhutanese Race Siberian Uralic Race (Nentsy – Samoyed – Ket – Mansi – Khanty) Nganasan Race Uralic Race (Komi – Mari) North American Eskimo Race

Southeast Asian Major Race*

Southern Chinese Race (Hmong – Mien – Dong – Henan Han – Yi – Naxi) Li Race Southeast China Race (Hakka – Min Nan – Singapore Chinese – Thai Chinese) South China Sea Race (Filipino – Ami Taiwanese Aborigine – Guangdong Han) Tai Race (Thai – Lao – Lahu – Aini – Deang – Blang – Shan – Dai – Vietnamese – Muong – Buyei) Kachin Race (Kachin – Va – Nung – Lu) General Taiwanese Aborigine Race (Ayatal – Bunun – Yami) Island SE Asian Race (Paiwan Taiwanese Aborigine – Sea Dayak – Sumatran – Balinese) Indonesian Race (Sulawesi – Borneo – Lesser Sunda) Malay Race (Javanese – Sarawak – Malaysia) Zhuang Race (Senoi – Zhuang – She – Santhal – Ho – Nicobarese) Austroasiatic Race (Mon – Khmer – Khasi – Nongtrai – Bhoi – Maram – Kynriam – Wajaintia) Meghalaya NE Indian Race (Khasi – Garo – Lyngngam) Philippines Negrito Race (Aeta – Ati – Palau Micronesian) Mamanwa Philippines Negrito Race Andaman Islands Negrito Race** Semang Malay Negrito Race***

References

Lin M, Chu CC, Chang SL, Lee HL, Loo JH, Akaza T, Juji T, Ohashi J, Tokunaga K. March 2001. The Origin of Minnan & Hakka, the So-called “Taiwanese”, Inferred by HLA Study. Tissue Antigens:57(3):192-9.

Response To Mike Campbell on Chinese Language Classification

An autodidact named Mike Campbell has issued a long critique of my Chinese language classification. There are problems with his analysis. First of all, Campbell says we need to defer to the Chinese on what is a dialect and what is a language. But top Sinologists in the West are saying that the Chinese are falling down on the job and not working according to the modern scientific definition of what is a language and what is a dialect. The Chinese linguists operate, like Chinese medicine, according to a completely different format that is pretty much at odds with the one used in the West and in much of the rest of the world. One element of this format is the fangyan. A fangyan has many meanings, but in Chinese it tends to mean “dialect,” or better yet, “topolect.” It also tends to mean the speech form of a given county. But the Chinese definition of the word “dialect” differs radically from the definition used by linguists elsewhere in the world. For one thing, questions of intelligibility with other lects are left out of the definition of fangyan. Chinese linguists also use hua, which means something like “speech.” This tends to be more expansive than fangyan, but at the same time it can occur down to the level of dialect. Examples include Putonghua, Shanghaihua, Beijinghua, etc, but also Pinghua and Tuhua. It tends to be geographically based – the speech of a particular geographical location, however that geographical location can be expansive or very restricted. But this is not the case in Putonghua, which is just “average speech”, and is spoken all over China. The third category is yu. Yu is probably the category that Western linguists would most commonly associate with “language” or even “language family.” Yu only refers to separate languages within Chinese. Outside Chinese, the word wen tends to be used. Examples are Wuyu, Minyu, Huiyu, etc. No one seems to quite know exactly what the Chinese classification is at any given time. According to Campbell, we must not do anything until the Chinese act first, but they only make a new language maybe once every few years, and they are failing even at that. Campbell states that Scots and Bavarian are dialects, not languages. He says that Scots is a dialect of English and Bavarian is a dialect of German. However, Ethnologue says that Scots is a separate language and so is Bavarian. The intelligibility of Bavarian and German is only 4 Ethnologue is run by SIL. SIL has been granted the task of assigning all of the new ISO numbers. An ISO number means that a lect has been officially recognized by the world linguistic community as a separate language. So SIL are the linguistic scientists who world community has given the task of deciding what is a language and what is not. Campbell is saying that SIL does not know what they are talking about. Campbell states that mutual intelligibility cannot be determined by talking to speakers and simply asking them whether or not they can understand “those people over there.” According to Campbell, this is inaccurate. He says the only way to determine intelligibility is through scientific testing methods looking for On Ethnologue’s Mexico page, extensive tests have been done on various lects spoken in small villages determining intelligibility between one lect and another. Intelligibility testing is commonly done by simply sitting a speaker of Lect A down in front of a recorded corpus of Lect B and see how much they can understand. Campbell says that intelligibility testing on human informants is inherently erroneous because as speakers of Close Lect A hear more and more of Close Lect B, they can understand it over a period of time (the exposure factor). This is the problem of interdialectal learning. Interdialectal learning (the tendency of closely related lects to hear each others’ lects and quickly learn to speak them and hence muddy the waters of intelligibility), trumpeted by Campbell as a reason that intelligibility testing cannot be done on human informants, is regarded by SIL as different from inherent intelligibility. Inherent intelligibility is best regarded as a test of the ability to use the mother tongue. In other words, when two lects are said to be “inherently unintelligible” this appears to be referring to “virgin” speakers who have not yet had the opportunity to learn each other’s dialects. Similarly, members of Lect A may simply be bilingual in Lect B, which also invalidates intelligibility testing. However, measures have already been developed to determine bilingualism and the degree of it. A favorite one is SLOPE. SRT is also used in bilingualism testing. Like other intelligibility testing instruments, they have been subjected to tests for reliability and validity over the years. Further, testing has evolved to the point where we can begin to ferret out bilingualism from inherent intelligibility. In Casad 1974 the author describes testing done on speakers of Mazatec, a Mexican Indian language. Intelligibility testing was done to see how well they understood Huautla, a related language. Three female speakers had scores in the 50-6 At any rate, in the survey, the figures were averaged together so that Mazatec speakers had 7 Campbell also throws out a red herring in the notion that certain members of a group may simply refuse to hear the language of another group and insist that they do not understand it. Although existent, this problem has little relevance in intelligibility testing. SIL does testing with cross sections of communities. Furthermore, SIL notes that intelligibility is typically distributed evenly across a community with regard to sex, class and age. The SD’s for inherent intelligibility in a community are narrow, less than 1 This should throw out the notion that females, the aged, the young or the old, the wealthy or the poor, will automatically give us false data on intelligibility. Campbell hints that intelligibility is poorly defined. However, SIL has listed a hierarchy of intelligibility. SIL says that intelligibility below 7 Campbell recommends throwing out all intelligibility testing with informants as inherently inaccurate and focusing instead of measures of language similarity. However, SIL notes that linguistic similarity is not an adequate single predictor of intelligibility. For instance, testing in the Philippines revealed pairs of lects with vocabulary similarity of 52, 66, 72 and 7 In testing of Polynesian, Siouan and Buang, it was found that the higher the level of lexical similarity up to a certain point, the lower the intelligibility scores were. This is counterintuitive, but it shows once again that lexical similarity is poor measure. Morris Swadesh was the founder of lexicostatistics, the study of lexical similarity. Lexicostatistics has its uses, but determining between closely related languages and dialects is apparently not one of them. This myth seems to be dying a hard death. Robert Longacre and Sarah Gudschinsky were involved in long debates with Swadesh about the validity of lexical similarity measures, and they seem to have been proven right. The latest findings calculate that any study that uses lexical similarity alone to determine intelligibility of lects has a 4.5-1 chance of failing to do so with any reliability. Word lists still have their uses. Where word lists show similarities between lects below 6 Vocabulary similarity below 6 Intelligibility is usually asymmetrical. In other words, Lect A can understand 8 Campbell also points out that it is not uncommon that people speaking the same language cannot always understand each other. He asks how often we have heard a fellow English speaker of the same dialect say something and we did not catch what they were saying for some reason or other. The implication is that we need to throw out all testing with informants due to this. SIL has actually examined this, and they often include a test called “home-town” in which people are presented with narratives within their own dialect and an intelligibility score is given for that. It is true that sometimes this is lower than 10 One thing to do is to throw out all sentences or questions that score less than 10 Campbell suggests that there are no tests available to use on human informants that pass the smell test of empiricism. This is not the case. One test, the Sentence Repetition Test (SRT), has been used for decades, subjected to many papers and studies, and criticized and modified in many ways. In this case of SRT, testing of group members individually has been shown to be superior to testing them in groups. The reason for this is because when you do intelligibility testing in a group of say eight people, you can run into a strong personality or high-ranking male in that group who might say he understands much more than he really does for some reason or another,  possibly to show off. The other less dominant group members then follow his lead and give false high readings on the intelligibility test. Many linguists, led by SIL, have been leading the way in intelligibility testing for decades now. Some of the top figures in in this subfield are the couple Joseph and Barbara Grimes of SIL. Joseph Grimes is a retired linguistics professor from Cornell. In addition, a number of computer programs have been created that help the researcher to test intelligibility. Another charge, that intelligibility testing lacks adequate controls, has been shown to be false. Bias in both experimenter and subject has been shown to be a problem, as is the case in most or all science, and measures have been undertaken to deal with it. The notion that this subfield of Linguistics, intelligibility testing, is unscientific should be laid to rest. Ethnologue seems to place tremendous importance on mutual intelligibility, however defined. Mutually unintelligible lects are assumed to be separate languages by Ethnologue. Their criteria for splitting off a dialects into languages seems to be 9 In conclusion, Mr. Campbell’s principal contentions in his critique are all incorrect. First, he suggests that the very concept of mutual intelligibility between lects is impossible to define or prove. SIL has shown that the concept can be defined and tested by reliable instruments. Second, he says that the use of human informants in mutual intelligibility testing is so prone to error that it cannot guarantee satisfactory results. This is not the case. SIL has proven, through decades of testing, that mutual intelligibility is best done, or possibly can only be reliably done, through intelligibility tests with human informants. Third, he throws up a number of red herrings that supposedly prove the inherent unreliability of human informants in intelligibility testing. All of these are shown to be the very red herrings that I claim they are, although it is true that unrecognized bilingualism is a problem, but it can often be ferreted out. Fourth, he says that the only way to reliably test for intelligibility is to compare lects via tones, phonology, morphology, syntax and lexicon. This is an extremely complicated process utilizing math and computer programs and can only be undertaken by practiced linguists. In truth, such elaborate testing, while interesting, is entirely unnecessary. Fifth, he suggests that any Western reformulations of Chinese language classification need to first defer to the Chinese. The problem here is that the Chinese have completely fallen down on the job. We cannot defer to the Chinese without upsetting our entire system of language classification. The Chinese are entitled to their system, but it is at odds with that used by the rest of the world.

References

Casad, Eugene H. 1974. Dialect Intelligibility Testing. Summer Institute of Linguistics Publications in Linguistics and Related Fields, 38. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma. Casad, Eugene H. 1992. “State of the Art: Dialect Survey Fifteen Years Later.”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 147-58. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington. Grimes, Barbara F. 1992. “Notes on Oral Proficiency Testing (SLOPE).”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 53-60. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington. Grimes, Joseph E. 1992. “Calibrating Sentence Repetition Tests.”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 73-85. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington. Grimes, Joseph E. 1992. “Correlations Between Vocabulary Similarity and Intelligibility.”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 17-32. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.

The Place of Mandarin in Sinitic

In the comments, James Schipper suggests that Mandarin is to Sinitic what German and Russian are to Germanic and Slavic. He also offers that most Sinitic speakers also speak Mandarin and makes a comparison with Welsh and English and Frisian and Dutch, where every Welsh speaker speaks English and every Frisian speaker speaks Dutch, and each one would rather write in English or Dutch than in Welsh or Frisian. My comments: English and German have 6 It’s not uncommon for Chinese lects to have 5-3 So yes, your analogy with Russian and German as super-languages on top of their families is correct, but it is important to note the vast differences in the lects. It was said that no one could understand Chairman Mao’s dialect, Xiang Nan (Mandarin dialect). Apparently his secretary could understand him, but few others could. I’m not sure how he got his points across. Further, at this point probably most speakers of the Sinitic languages for sure speak Putonghua, which is the Standard Mandarin. It’s a standard the same way that High German is Standard German and Standard Italian is the standard for that language. However, overseas, many do not speak Putonghua, and in the Cantonese area, I believe many still do not speak Putonghua. English is a Germanic language. Look at the vocabulary – closest language is Frisian with 6 Besides Putonghua, you are correct that the vast majority of Sinitic speakers are native speakers of some kind of Mandarin. I believe that a lot of the older folks do not have very good Mandarin and may be monolinguals of their Sinitic tongue, but I’m not sure. The government has been pushing Putonghua very hard for the past decade or so, almost too hard. It’s been killing the smaller tongues. So it’s not quite the same way with Frisian and Welsh yet. I believe it’s pretty common in the South to find Cantonese speakers who don’t speak Mandarin, and it’s for sure the case overseas. As far as writing, I don’t believe it’s a problem. An ideographic system was perfect for Chinese as it was the one way that all of the speakers of the various Chinese lects could communicate. My father was in China in 1946 and he said that the rickshaw drivers often could not understand each other, but they could all write Chinese, so they would communicate by writing notes. All Chinese can write to each other, no matter what language they speak, assuming they are literate. A decade ago in a college in Henan, a professor said that the students would come to the college from all over the province and for the first month would communicate by writing notes to each other, so they all wrote a common language. In that province, every county has its own language, and there are even separate languages within counties. It took them about a month or so before they could start working out each other’s languages. Some comment that the Chinese languages are like a Cockney accent of English. On a website, a commenter said that that’s not true. He said he can understand Cockney, but they had a speaker of an Anhui Mandarin lect as a professor at the university and no one could understand what he was talking about. So it’s quite common for the various Chinese lects to be pretty much incomprehensible to each other. There are other comments around the Net that say that the Chinese lects are close enough to pick them up if you spend a bit of time there. That’s not really true. The differences between the Chinese lects are often as different as English and German. Now suppose you are an English speaker and you go to Germany. Are you going to “just pick up” German really vast? Forget it. I mean, if you stay there 3 years, maybe. Maybe! Someone else compared the differences between Chinese lects to the gulf between English and Irish. That may be too distant, but it may also be correct. Differences between the lects ecompass tones, grammar and lexicon. All of them boil down to intelligibility. The major Chinese lects regularly score around 50-6 This is especially true in the center and south of the country. In Anhui, Fujian , Henan, Hunan , Jiangsu and Zhejiang there is an incredible diversity of tongues. It is said in Fujian that every 3 miles the culture changes and every 6 miles the language changes. In these parts of China, there are lots of mountains and it is very rural. Many people never left their home village to go over the mountain to talk to the people over there, so a multitude of tongues arose. I understand that in this part of China there are even incomprehensible tongues inside major cities where the downtowners can’t understand the suburbs.

A Reworking of Chinese Language Classification

Updated December 3, 2014. This post runs to 112 pages so far. On March 6, 2011, Sinologist Victor Mair took on the question of Mutual Intelligibility of Sinitic Languages. The Chinese languages have undergone a lot of reclassification lately (Mair 1991), from one Chinese language a couple of decades ago up to 14 Chinese languages today according to the latest Ethnologue. However, Jerry Norman, one of the world’s top experts on Chinese, says that based on mutual intelligibility, there are 350-400 separate languages within Chinese (Mair 1991). According to Gong Xun, a Sichuan Mandarin speaker in Deyang, China, by my criteria of distinguishing between language and dialect, there would be 300-400 separate languages in Fujian alone. So far, 2,500 dialects of the Chinese language have been identified, and a number of them are separate languages. I have been doing research on this issue recently. Based on the criteria of mutual intelligibility, I have expanded the 14 Chinese languages into 365 separate languages. There are different ways of doing mutual intelligibility. I decided to put it at 9 In the cases below where I had intelligibility data available, a number of Chinese languages had no more than 6 Intelligibility is hard to determine. I am not interested in typological studies of lects involving either lexicon, phonology or tones, unless this can be quantified in terms of intelligibility in a scientific way (see Cheng 1991). For the most part, what I am interested in is, “Can they understand each other?” Reasonable, fair-minded and professional comments, additions, criticisms, elaborations, presentations of evidence, etc. are highly encouraged, as long as politics and emotions are left out of it. The purpose of the classification below is more to stimulate academic interest and sprout new thinking and theory. It is not intended to be an end-all or be-all statement on the subject, in fact, it is quite the opposite. Interested scholars, observers or speakers of Chinese languages are encouraged to contribute any knowledge that they may have to add to or criticize this data below. So far as I know, this is the first real attempt to split Chinese beyond the 14 languages elucidated by Ethnologue. There are lapses in the data below. I mean to present this data in outline form to make it more readable. There are also problems with the data below. In many cases, “separate language” just means that the lect is not intelligible with Putonghua. Unfortunately, I currently lack intelligibility data within the major language groups such as Gan, Xiang, Wu and the branches of Mandarin. There is probably quite a bit of lumping still to be done below. Where lects are mutually intelligible below, I have tried to lump them into one language with various dialects. It is reasonable to ask what background and expertise I have to write such a post. I have a Masters Degree in Linguistics and have been employed as a salaried linguist for a US Indian tribe. I also sit on a peer review board for a linguistics journal and will soon publish my first work in book form via a book chapter in a book on Turkic languages that will come out soon. I assume it will be controversial. Keep in mind that this work is extremely tentative and should not be taken as the last word on the subject by a long shot. There are claims that this study claims to be “accurate and precise.” In truth, it claims nothing of the sort. Initial studies, which is what this is, are de facto never “accurate and precise,” and you can take an extreme argument from scientific philosophy that no science is really “accurate and precise” but is simply “correct for now” or “correct until proven otherwise.” Gan is a separate language, already identified as such. Nanchang and Anyi are apparently Jiangyu, spoken in Hubei, is very strange and Wanzai must surely be a separate language, as must Yichun, Ji’an, Wanan, Fuzhou, Yingtan, Leiyang, Huaining and Dongkou. Nanchang and Anyi are within the Changjing Group of Gan, which has 15 different lects. Yingtan and Leping are members of the Yingyi Group has 12 lects. Jiangyu and Huarong are members of the Datong Group of Gan, which has 13 lects. Yichun is a member of the Yiliu Group of Gan, which has 11 lects. Wanzai is a member of the Yiping Group of Gan, of which it is the only member. Leiyang is a member of the Leizi Group of Gan, which has 5 lects. Wanan is a member of the Jilian Group of Gan, of which it is the only member. Ji’an is a member of the Jicha Group of Gan, which has 15 lects. Huaining is a member of the Huaiyue Group of Gan, which has 9 lects. Fuzhou is a member of the Fuguang Group of Gan, which has 15 lects. Dongkou is a member of the Dongsui Group of Gan, which has 5 lects. Gan has 102 separate lects in it. There are 30 million speakers of the Gan languages. Within the Min group, Northern Min (Min Bei) and Central Min (Sanminghua) have already been identified as separate languages. There are 50 million speakers of all of the Min languages (Olson 1998). Northern Min has only 0-2 Central Min has three lects, Shaxian, Sanming and Yongan, but we don’t know if there are languages among them. Central Min has 3.5 million speakers. Northern Min is said to be a single language, although it has 9 separate lects. Most dialects are said to be mutually intelligible, but Jianyang and Jian’ou have only about 7 The standard dialect of Min Dong or Eastern Min is Fuzhou. Eastern Min has only 0-2 Chengguan, Yangzhong and Zhongxian are separate languages, all spoken in Youxi County (Zheng 2008). Beyond that, Eastern Min is reported to have several other mutually unintelligible languages. One of them is Fuqing, located near Fuzhou but not intelligible with it, according to Wikipedia, but others say the two are mutually intelligible, although speakers are divided on the question. It appears that possibly Fuzhou speakers can understand Fuqing speakers better than the other way around. Fuzhou and Fuqing are about 6 Ningde, Fuding and Nanping are probably other languages in this family (evidence). Of these three, Ningde is definitely a separate language. According to George Ngù, a passionate proponent of Fuzhou, “Fuzhou is not intelligible even within its many varieties.” It’s not clear if that applies to all of Eastern Min, but it appears that it does. Therefore, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai, Fuan, Fuding, Shouning, Xiapu, Zherong and Zhouning are all separate languages. There are two other lects lumped in with Eastern Min. Manjiang is spoken in the central part of Taishun County, and Manhua spoken in the eastern part of Cangnan County. Both of these names mean “barbarian speech.” Both are probably mixtures of Southern Wu (Wenzhou etc.), Eastern Min, Northern Min, and maybe even pre-Sinitic languages. Manhua and Manjiang are not intelligible with Fuzhou. However, Manjiang has affinity with Shouning in phonology, vocabulary and grammar. Whether or not it is intelligible with Shouning is not known. Min Nan speakers who have looked at Manjiang data say that it doesn’t even look like a Sinitic language. Manhua is best dealt with as a form of Wu. I discuss it further below under Wu. Fuding, Fuan, Shouning, Xiapu, Zherong and Zhouning are in the Funing Group of Eastern Min, which has 6 lects. Fuzhou, Fuqing, Chengguan, Yangzhong, Zhongxian, Ningde, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai and Nanping are in the Houguan Group of Eastern Min, which has 16 lects. Eastern Min contains 23 separate lects. Within Min Nan, Xiamen and Teochew are separate languages (evidence). There is even a proposal to split Xiamen, Qiongwen and Teochew into three separate languages before SIL. Amoy, Jinmen is apparently a separate language, as it has poor intelligibility with Taiwanese. A much better name for Xiamen according to the Chinese literature is Quanzhang (Campbell January 2009). Quanzhang is a combination of Quanzhou and Zhangzhou, two of the most important dialects in the language. Xiamen has only can no longer understand Taiwanese or Xiamen well, though they have partial understanding of them. They have only 30-4 The Yilan dialect on Taiwan is so different that it alone has posed serious problems for the task of standardizing Taiwanese Min Nan, yet it is intelligible with the rest of Taiwanese (Campbell January 2009). Lugang is also very different but is also intelligible with Taiwanese (Campbell 2009). There are some communication problems for Tainan speakers hearing Taipei, but it appears that they are still intelligible with each other (Campbell January 2009). JieyangRaoping, Chaoyang, Shantou (Swatow) and Hailok’hong (Haklau) are lects in the Teochew Group (evidence) of Teochew. Teochew (Chaozhou) is the prestige version of Teochew. Chaoyang speakers can understand Jieyang, Raoping (evidence) and Shantou, but intelligibility is difficult with Haifeng and Lufeng. Shantou, Raoping, and Jieyang are then dialects of Chaoyang. Zhangzhou and Quanzhou have marginal intelligibility with Teochew varieties. They are both spoken in Taipei, Taiwan. After all, Taiwanese itself is just a mixture between Zhangzhou and Quanzhou. The situation in Taipei was interesting. The dialects of the city were a mix of Zhangzhou and Quanzhou. The dialect of the center of the city was mixed between the two, with a slight Quanzhou lean to it. In Sulim (Shilin), people spoke with a dialect that heavily favored Zhangzhou. Other districts spoke a Tang’oann-type dialect, which is just Quanzhou mixed with a bit of Zhangzhou. All these conditions are more common with the older generation because the new generation either does not speak Teochew at all or they favor the mixed Zhangzhou-leaning “Southern” style favored in the media, or they just do not speak the language at all. Hailok’hong (Haklau) is spoken down the coast between the Teochew zone and the Hong Kong area. It has marginal intelligibility with other Teochew lects. Nevertheless, Taiwanese speakers can no longer understand the pure Quanzhou spoken in the Chinese city of that name. On the other hand, Chaoyang itself is unintelligible to some other Teochew lects. Shantou speakers cannot understand some of the other Teochew lects, and speakers of other lects often find Shantou hard to understand. intelligible with Zhangzhou and Quanzhou, but these claims appear to be incorrect (see above). That might make some sense, as Teochew are a group of Min speakers who broke off from Zhangzhou Min about 600-1,100 years ago. They moved down to northeast Guangdong, after hundreds of years, a heavy dose of Cantonese went in, producing modern Teochew. chinese language map Teochew has only over 9 The Teochew spoken in Indochina – in particular, in Vietnam and Cambodia (many highly variant lects. Whether or not they are mutually intelligible with each other is not known. The variety spoken in Medan, Indonesia is particularly interesting. It has heavy Malay and Cantonese influence and cannot be understood by other Teochew speakers. Teochew has 10 million speakers. Zhangping, though close to Xiamen, is a separate language . Datian, in Fujian, is also a separate language. A version of Hokkien called Malay Hokkien is spoken in Malaysia and in Indonesia in Sumatra and Kalimantan. In Indonesia, it is spoken in the city of Medan, the state of Riau, the city of Bagansiapiapi on Sumatra and in a few places on Kalimantan, such as Kuching and especially in Brunei. Malay Hokkien is heavily laced with Teochew. Northern Malay Hokkien is spoken from Taiping along the coast formerly all the way to Phuket but now only to Pedang in Malaysia and in Indonesia in the city of Medan, the state of Riau, the city of Bagansiapiapi on Sumatra and in a few places on Kalimantan, such as Kuching and especially in Brunei. Speakers of Northern Malay Hokkien have a hard time understanding the Southern Malay Hokkien (see Singapore Hokkien below) spoken in Kelang, Malacca and Singapore. Northern Malay Hokkien is creolized, with Malay and Thai embedded deeply in the language. Southern Malay Hokkien is less creolized, if at all. Singapore Hokkien lies between Northern Malay Hokkien and Taiwanese on the continuum. A very pure variety of Hokkien is spoken in the Indonesian city of Bagansiapiapi. It has avoided the Mandarinization of Hokkien that is occurring elsewhere. They speak like the Hokkien speakers of Tang’oann (Tong’an), China. Kelantan Hokkien is spoken in the Malay state of Kelantan. It is wildly creolized with Malay and is probably not intelligible with any other form of Hokkien. The version of Hokkien spoken in the Philippines is often called Binamhue, Banlamhue or Minanhua (Philippines Hokkien) by speakers, derives from a dialect on the outskirts of Quanzhou, and it may have drifted into a separate language. At present, it is sometimes not intelligible with Quanzhou or Xiamen. That is, some Philippines Hokkien speakers claim that they can only understand about 7 The version of Min Nan, Singapore Hokkien (Southern Malay Hokkien), spoken in Singapore, Kelang and Malacca is similar to that spoken in Taiwan, but many Singapore Hokkien speakers have a hard time understanding Taiwanese Hokkien, while others can understand it just fine. Older Singapore Hokkien speakers can understand Taiwanese Hokkien better than younger ones. This is due to bilingual learning more than anything else because younger Singapore Hokkien speakers are no longer good at understanding other Min Nan dialects due to lack of exposure to them. The reason that Taiwanese speakers can seem to speak communicate well with Singapore Hokkien speakers is because they are using a simpler vocabulary. A Singapore Hokkien speaker, if immersed in Taiwan, could pick up Taiwanese fairly quickly, within say 3 months. An umbrella term covering Malay Hokkien, Singapore Hokkien and Philippines Hokkien may be Nusantaran Hokkien. Another language in the same group is best called Wan’an, comprising a number of dialects and possibly languages in Wan’an County of Fujian (Branner 2008). Zhaoan, Pinghe and Yunxiao, also of Fujian, are separate languages. Wan’an and Longyan are not mutually intelligible (Branner 2008). Longyan seems to have about 50 lects. Teochew, Shantou, Lufeng, Haifeng, Chaoyang, Jieyang, SE Asian Teochew and Malaysian Teochew are members of the Chaoshan Group of Min Nan, which has 12 lects. Datian is in its own group in Min Nan. Min Nan consists of 68 separate lects. Clearly, the dialectal relationships of Min Nan are confusing, as many of the lects are very closely related, if not fully intelligible. Intelligibility testing may be needed to sort out some of these issues. There are 30 million speakers of Southern Min. Zhenan Min, spoken in Zhejiang Province around Pingnang and Cangnan and in the Zhoushan Islands, is a separate language. Zhenan Min contains 4 lects, Pingyang, Cangnan, Dongtou and Yuhuan, which may or may not be languages. Zhenan Min has 574,000 speakers. Zhenan Min is influenced by Eastern and Northern Min. Qiongwen (Hainanese) is a separate language with 8 million speakers. It has the lowest intelligibility with the rest of Southern Min as any other Min Nan lect. Qiongwen itself has 14 separate lects, all spoken on Hainan. Whether or not any of them are separate languages is not known. Longyan (Branner 2008) is a separate language, apart from Southern Min. It is spoken in Longyan City’s Xinluo District and Zhangping City and has 740,000 speakers. It has heavy Hakka influence due to the large number of Hakka speakers in the surrounding areas. Another split in Min is Leizhou. Leizhou Min is a separate language and is now recognized by some as a separate branch of Min altogether, along the lines of Southern and Northern Min. Leizhou consists of 7 different lects. Haikang appears to be a dialect of Leizhou. However, at least some of the other 6 Leizhou lects are very different in phonology and lexicon. Intelligibility data is not known, but they may be mutually intelligible. Leizhou Min, with 4 million speakers, has low intelligibility with Min Nan lects and has only 5 Shaojiang Min, or Min Gan, is said to be a completely separate high-level division of the Min language like Leizhou Min. It has four lects – Shaowu, Guangze, Jiangle and Shunchang – that are said to be mutually intelligible. There are subdialects within these larger lects. The substratum of Shaojiang is not Min, Gan or Hakka – instead, Puxian Min has already been identified as a separate language. Puxian has 3 separate lects. There are minor differences between these lects. However, there is a form of Puxian Min spoken in Singapore, Hinghwa, and presently it lacks full intelligibility with Puxian Min proper. Puxian speakers are a minority in Singapore, and their language has mixed a lot with Singapore Hokkien, Malay, English and other languages spoken in Singapore, resulting in a separate language. A Min language called Longdu, located in Guangdong, is not only a separate language (evidence here and here) but seems to be in another Min category from Southern Min. It is spoken in the southwest corner of Zhongshan City in Shaxi and Dayong. In Guangdong Province, there are other divergent lects of Min Nan. Two of these, Nanlang (also spoken in Zhongshan) and Sanxiang, are also separate languages. Nanlang is spoken 10 miles southeast of Zhongshan in Cuiheng. It is also spoken in Sources give Longdu and Nanlang 100,000 speakers and Sanxiang 30,000 speakers. 1 All of these seem to be in the same group, Zhongshan Min, and all are spoken in the Pearl River Delta near Hong Kong. Zhongshan Min has 150,000 speakers. This group is possibly a Northern or Eastern Min group stranded way down in Guangdong. They are sometimes referred to in old literature as “Northeastern Min”. That’s not really a category. It often means Northern Min, but sometimes it means Eastern Min. These languages have all borrowed extensively from the type of Cantonese spoken in the Pearl River Delta. Looking at the whole picture, it appears that various immigrants speaking Puxian Min, Northern Min and Southern Min all settled around Zhongshan. These various Min elements, along with a hefty dose of Cantonese, have gone into the creation of Zhongshan Min. Sanxiang, Nanlang and Longdu are apparently Hakka Proper (Meixia), Tingzhou is a separate language (evidence). Wuhua Hakka is intelligible with Meixian. Fangcheng and Dabu are close to Meixian, but intelligibility data is lacking. Fangcheng has five different lects within it, but intelligibility data is not known. Hong Kong Hakka is not intelligible with the Hakka spoken on Taiwan, nor with Dabu. Dongguan, spoken near Hong Kong, can understand Meixian, but Meixian cannot understand Dongguan. Taipu or Taipo is spoken in the village of the same name in Hong Kong and is not intelligible with Meixian, nor is Wakia, also spoken in Hong Kong. A variety of Hakka spoken in a part of Hong Kong called Shataukok is called variously Satdiugok, Sathewkok, Shataukok, Satdiukok or Satdiugok. It is said to be different from other Hakka, and evidence indicates that Shataukok may indeed be a separate language. Shataukok has dialects within it and they are different, but they are generally mutually intelligible. All three of these are dialects of a more or less intelligible language called Hong Kong Hakka. Located near Hong Kong, Shenzhen/Bao’an is a separate language. Haifeng and Lufeng, located near each other in Guangdong, appear to be dialects of a separate language called Hailufeng. Longchuan in northeastern Guangdong is a separate language (evidence), with poor intelligibility with other Hakka lects. Longchuan has Boluo and Heyuan are separate languages, not mutually intelligible. Longchuan, Boluo and Heyuan are quite distant from other Hakka. Heyuan is spoken in central Guangdong. Huizhou is mutually intelligible with Longchuan and also with Meixia and Dabu. Sanxiang, spoken in Zhongshan prefecture, is different from all other Hakka, but intelligibility data is lacking. It is possible that in northern Guangdong, there may be many different Hakka languages, since dialects tend to differ from village to village, and in many cases, communication is difficult. The Hakka spoken in Kunming, Sarawak, in Malaysia, known as Ho Po Hak, is a separate language. It is very different from the Hakka spoken in Sabah, Malaysia, and it is similar to Hopo, spoken in Hopo, near Meizhou. Hopo is not intelligible with Dabu, Hailu or Meixian. Hopo appears to be a dialect of Jiaoling. Hopo has deep influence from Teochew Min, because it is located right next to the Teochew area. The Gannan Group (or Ninglong Group) from Southern Jiangxi, Mingxi from Western Fujian, and the Yuemin Group from Southern Fujian and Southeastern Guangdong are separate languages. In the Gannan Group are multiple lects. One of them is Xingguo, spoken in Xingguo County in Ganzhuo Prefecture (evidence). The Gannan Group is extremely diverse compared to the Hakka of Guangdong and Fujian. Gannan lects differ even from village to village. With Gannan Hakka, we may be dealing with a situation of many different languages, as with Wu, Hui, Tuhua and Xiang. In fact, it quite possible that with Jiangxi Hakka, we may be dealing with every Hakka lect being a separate language, but that remains to be proven. In Fujian Province, there is the wildly diverse Tingzhou Hakka Group mentioned above. Even within this group, there are separate languages, including Yongding, Liancheng, Changting, Xinquan, Qingliu, Mingxi, Ninghua and Shanghang (evidence). Gucheng is probably also a member of Tingzhou. Sources say that each Hakka village in Fujian speaks its own lect, and that the lects are far enough apart to make communication from village to village very difficult. Therefore, we conclude that in addition to the above, we will add Wuping, Longyan, Zhaoan, Yunxiao, Miaoli (Four Counties), Dongshi (Dapu) and Xinzhu (Hailu) lects are Bangka Island Indonesian Hakka, spoken on Bangka Island in Indonesia, has diverged so radically with its tones that it is now a separate language. That is, speakers of other Indonesian Hakka lects say that they cannot understand Bangka Island speakers. It’s actually said to be a Hakka creole more than anything else. In Indonesia, two other Hakka languages are spoken, Kun Dian Indonesian Hakka, spoken in Borneo, and Belitung (Ngion Voi) Indonesian Hakka. Kun Dian Hakka is the largest Hakka group in Indonesia. Most live at Pontianak and Singkawang, where they speak two different mutually intelligible lects, but they have spread all over Indonesia. Kun Dian Hakka is a dialect of Meixian. Belitung Hakka is spoken mostly on Sumatra and Borneo, and is characterized by a soft way of speaking. Belitung Hakka and Bangka Hakka say they cannot understand Kun Dian Hakka, but Kun Dian speakers say they can understand the other two for the most part. East Timor Hakka is a dialect of Meixian. Jiexi is spoken in southeast Guangdong. Dayu is spoken in southern Guangxi. Liannan is spoken northwest Guangdong. Dongguan Qingxi is spoken in south-central Guangdong. Wengyuan is spoken in northern Guangdong. Ningdu is spoken in Jiangxi. Mengshan Xihe is spoken in eastern Guangxi. Hong Kong Hakka is spoken in Hong Kong. Zhaoan Xiuzhuan is spoken in southern Fujian. Shanghang Pengxin, Basel Mission and Shanghang Guanzhuang Shangzhuo are spoken in West Fujian (Branner 2000). Dayu, spoken in Jiangxi, is a separate language, not intelligible at least to Central, or Meixian, Hakka speakers. Meixian, Wuhua and Bao’an are members of the Yuetai Group of Hakka, which has 23 lects. Within Yuetai, Wuhua and Dabu are members of the Xinghua subgroup, which has 5 lects. Xinghua has 3.4 million speakers. Bao’an and Lufeng are in the Xinhui subgroup of Yuetai, which has 9 lects. Xinhui has 2.4 million speakers. Gaoxiong, Xinzhu, Dongshi and Miaoli are members of the Jiaying Group of Hakka, which has 7 lects. Tingzhou, Yongding, Liancheng, Changting, Xinquan, Shanghang, Basel Mission, Shanghang Pengxin, Wuping, Ninghua, Qingliu and Mingxi are all part of the diverse Tingzhou Group of Hakka. All told, Tingzhou has 12 lects, all of which are separate languages. Longchuan, Boluo and Heyuan are members of the Yuezhong Group of Hakka, which has 5 lects. Huizhou is in its own subgroup of Hakka. Xingguo and Ningdu are in the Ninglong Group of Hakka, which has 13 lects. This group is said to be very diverse, with lects differing from village to village. Liannan and Wengyuan are members of the Yuebei Group of Hakka, which has 11 lects and must surely be a separate language. Dayu is a member of the Yugui Group of Hakka, which has 43 lects. Ho Po Hak, Bangka Island, Nanjing Qujiang, Jiexi, Dayu, Hong Kong, Mengshan Xihe, Zhaoan Xiuzhuan, Nanjing Qujiang, Fuan, Fuding and Haifeng are unclassified. There are Putonghua, the main branch, Jinan (New Jinan), Beijing and Tianjin (evidence and here) are not intelligible with Putonghua; however, Tianjin may be intelligible with Beijing, on the other hand, Tianjin is looking more and more like a separate language. For one thing, Tianjin’s tones are quite different from Putonghua’s, and its tone sandhi is much more complicated and it is more closely related to lects 150-500 miles away, since originally Tianjin speakers came from Anhui (Lee 2002). Some reports say that Tianjin is intelligible with Putonghua, so intelligibility testing may be needed. Jinan is not intelligible with Putonghua, but may be learned over a period of weeks to possibly months, as it is close enough. Jinan is only not inherently intelligible with Putonghua. Complaints about unintelligible taxi drivers in Beijing are legendary. At the very least, competing views of the intelligibility of Beijinghua and Putonghua deserve investigation. On the other hand, Beijinghua may be intelligible with Hebei and Nanjing City. I think that Hebei is clearly a dialect of Beijing. The lect of Beijing’s hutongs and taxi drivers is legendary for being hard to understand. It would be interesting to see whether Tianjin and Hebei speakers can understand it. Tianjin may be a separate language, since it is not intelligible with Beijinghua. What probably happened was that Beijinghua and Putonghua have taken separate trajectories. This has also occurred in Italian, where, though Standard Italian was based on Tuscan, Standard Italian and Tuscan have taken separate trajectories since. It is said that if you see old Tuscan men on TV in Italy, a speaker of Standard Italian from southern Italy would need subtitles to understand them, but one from northern Italy would not. Others say that Putonghua was based on the language of the Beijing suburbs, not the city itself. For whatever reason, Beijinghua often seems to have less than 9 I would describe the real, pure, Putonghua as “CCTV speech”, the lect you hear on Chinese state television. Evidence that Beijinghua lacks full intelligibility with Putonghua is here, here, here, here, here, here, here and here. The question of whether or not Beijinghua is a separate language from Putonghua is sure to be highly controversial. Perhaps intelligibility testing could settle the question. Beijing is in a group all of its own called the Beijing Group. It contains 43 separate lects, and may contain more than one language. We should also note here that even Putonghua, the language that was meant to tie the nation together, seems to be evolving into regional languages. Guangdong Putonghua is not fully intelligible to speakers of the Putonghuas of Northern China and hence is probably a separate language. There are also varieties of Putonghua that are spoken in Singapore and Taiwan. Taiwanese Mandarin is about 80-8 Shanghai Putonghua is often not intelligible with Putonghua from other regions. It has heavy interference from Shanghaihua, which seriously effects the Putonghua accent. Even after four years of exposure to it, Standard Putonghua speakers often have problems with it. In addition, Jianghuai Mandarin Putonghua and Zhengcao Mandarin Putonghua Putonghua are not intelligible with Putonghua from other areas (Campbell April 2009). These varieties of Mandarin cause a particular interference with Putonghua Mandarin that results in a severe dialectal disturbance in their Putonghua. These Putonghuas are spoken in the regions native to the Jianghuai and Zhengcao dialects of Mandarin. Jianghuai is spoken in Anhui, Jiangsu, Hubei and to a much lesser extent Zhejiang Provinces. Zhengcao is spoken in Anhui, Henan, Shandong, Jiangsu, with one dialect is spoken in Hebei. Although it is different, Singapore Putonghua is still intelligible with Putonghua. Malay Mandarin is said to be quite different but nevertheless intelligible. Nevertheless Malay Mandarin speakers say they have to make speech adjustments with Chinese speakers otherwise their speech is poorly intelligible. This implies that Malay Mandarin is indeed a separate language. Yunnan Putonghua is intelligible with Putonghua from other regions (Campbell January 2009). Cangzhou, spoken in southeastern Hebei, is a separate language. It is only partly intelligible with Putonghua. Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao and Mengcun, all spoken in Cangzhou prefecture, are all dialects of Cangzhou. Cangzhou shares some similarities with Tianjin, but it is only partly intelligible with it. Jinan is a member of the Liaotai Group of the larger Jilu Group, which has 37 lects. The Baotang Group of Jilu has 52 lects. Tianjin forms its own subgroup within Baotang. Cangzhou, Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao, and Mengcun are members of the Huangle subgroup of Baotang, which has 25 lects. Jilu itself consists of 170 lects. Taiwanese Mandarin, while different from Putonghua, is intelligible with it. Singapore Mandarin has fewer differences then Taiwanese. Both are dialects of Putonghua. Luoyang, Kiafeng, Changyuan and Zhengzhou, all in Henan Province, are not intelligible with Putonghua. However, all four are mutually intelligible with each other, so they are dialects of a single language, Henan Mandarin. Xinyang, also spoken in Henan, is a separate language and cannot be understood by Luoyang speakers. Nanyang has high but not complete intelligibility with Luoyang. After a few weeks of close contact, Luoyang speakers can understand Nanyang, but initially, comprehension is poor due to different tones. Nanyang has 15 million speakers. Luoyang and Gushi are unintelligible with Putonghua. In addition, Gushi is different from Nanyang and may not be intelligible with it. Intelligibility between Xinyang, Gushi and Nanyang is not known. In general, intelligibility between many lects in Henan is not good, but after a week or two of close contact, they can start to understand each other. In Shaanxi, Xining, spoken in Xinghai, seems to be very different from other Shaanxi lects, and is probably a separate language altogether (evidence here and here) . In Gansu Province, Tongwei is not intelligible with Putonghua, and Gansu Mandarin seems to be very different from other forms of Mandarin. Gansu Mandarin appears to be a separate language. However, within Gansu, there are divergent lects, such as Sale, which are unintelligible with other Gansu lects. Bozhou (evidence), Yingshang (evidence), and Fuyang (evidence), spoken in Anhui, are at least unintelligible with Putonghua. Fuyang is poor intelligibility with Standard Putonghua due to its phonology. Therefore, it is a separate language. Xian, Huxian and Zhouzhi are members of the Guanzhong Group of Zhongyuan, which has 45 lects. Yanan, Hanzhong and Xining are members of the Qinlong Group of Zhongyuan, which has 67 lects. Luoyang is a member of the Luoxu Group of Zhongyuan, which has 28 lects. Kiafeng, Nanyang, Zengzhou, Changyuan, and Bozhou are members of the Zhengcao Group of Zhongyuan. The Zhengcao Group has 93 lects. Xinyang and Gushi are in the Xinbeng subgroup of Zhongyuan, which has 20 lects. Tongwei and Sale are part of the Longzhong Group of Zhongyuan, which has 25 lects. Yingshang is a member of the Cailu Group of Zhongyuan, which has 30 lects. The Mandarin spoken in Qinghai is very different from that spoken in Gansu, but it’s not known if it is a separate language. They are both usually two types of Zhongyuan Mandarin. Zhongyuan has a shocking 388 lects. Zhongyuan Mandarin is not fully intelligible with Putonghua. Zhongyuan Mandarin has 130 million speakers (Olson 1998). Yichang (evidence), Longchang (evidence), Chengdu, Chongqing (evidence), Guilin and Nanping (spoken near Mt. Wuyi evidence), Longcheng (evidence), Luocheng (evidence), Luzhou (evidence here and here), Lingui (evidence), Jiuzhaigou (evidence) Xindu, Wenshan (not intelligible with general Southwest Mandarin speech. Wenshan at least is not intelligible with other Southwestern varieties (Johnson 2010). Chengdu is part of a Sichuan Mandarin koine that is spoken in many of the larger cities in Yunnan. It includes Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang, Deyang and Guiyang and is broadly intelligible (Xun 2009). Ziyang is intelligible with the koine but has a heavy accent (Xun 2009). Leshan is unintelligible with the koine, but it can be learned in a few weeks of exposure (Xun 2009). Dali is also not intelligible with Putonghua, but that is because Tibetan Mandarin has heavy Tibetan admixture. Chongqing speakers cannot understand Chengdu or Luzhou speakers. The many small lects around Mt. Emei are not intelligible with Chengdu, appear to be be very different, and may one or more separate languages. Wuhan is not intelligible to speakers of Southwest Mandarin from other provinces, for instance, it is only separate language . Lanping, may be a separate language. Kunming not intelligible with Tuoyuan., so Tuoyuan may be a separate language also. The language spoken in Kunming is part of the Sichuan Mandarin koine that includes Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang, Deyang and Guiyang. Yingshan is a separate language based on a Menghai (evidence) may well be a completely separate language. The mutual intelligibility of Menghai, Guiyang and Kunming is not known. Guiyang is at least not intelligible with Putonghua. Guiyang is evolving into the Sichuan Mandarin koine, which is broadly intelligible with Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang and Deyang. Shaoshan, apparently Mao Zedong’s lect, spoken in Hunan Province, is a separate language. It was said although Mao had a secretary who could understand him well, not many others could. Another language spoken in Hunan, in Zhangjiajie County, is called Zhangjiajie Maoxi. The Maoxi are a tribal group there that speak a strange variety of Mandarin. Tuoyuan in Hunan is not fully intelligible with other Southwest Mandarin lects, or at least not with Kunming. Junhua, or military language, is a language spoken by an ethnic group on Hainan in the city of Zonghe. It is said to be “Old Mandarin” and is probably not intelligible with other lects. It is a form of Southwest Mandarin known as the Junhua Group, which contains 4 lects . Guilin, Luocheng, Yangshuo, Liuzhou and Lingui are members of the Guiliu Group of Southwest Mandarin, which has 57 lects. Guiliu Southwest Mandarin is at least not comprehensible with Putonghua or Chengyu Southwest Mandarin. Leshan and Longchang are members of the Guanchi Group of Southwest Mandarin, which has 85 lects. Within Guanchi, Longchang is a member of the Renfu Group , which has 13 lects. Yichang, Chengdu, Chongqing and Yingshan are members of the Chengyu Group of Southwest Mandarin, which has 113 lects. Chengyu Southwest Mandarin is not comprehensible with Putonghua or Guiliu Southwest Mandarin. Menghai, Kunming, Wenshan and Guiyang are members of the Kungui Group of Southwest Mandarin. The Kungui Group itself has an incredible 95 lects. Lanping is in the Dianxi Group of Southwest Mandarin, which has 36 lects. Within Dianxi, it is a member of the Baolu subgroup, which has 21 lects. Taoyuan is in the Changhe Group of Southwest Mandarin, which has 14 lects. Wuhan is a member of Wutian Group of Southwest Mandarin, which has 9 lects. Dali is a member of the Dianxi Group of Mandarin, which has 36 members. Within Dianxi, Dali is a member of the Yaoli Group, which has 15 members. Nanping, Chuanlan, Shaoshan, Jiuzhaigou, Zhangjiajie Maoxi and Dahua are unclassified. Southwest Mandarin itself has a stunning 519 lects and is not fully intelligible with Putonghua. There are 240 million speakers of Southwest Mandarin (Olson 1998). Jianghuai Mandarin is a separate language. Yangzhou is considered to be a separate language by a evidence and here) is also a separate language – now mostly spoken in the suburbs, as city speech is not a separate language anymore. The city language is said to be intelligible with the general northeastern China lect spoken in Beijing and Hebei. So I will call Nanjing Suburbs a separate language. Lianyungang is a separate language, as is Yancheng and Huaian (Nantong, a very strange variety of Mandarin on the border of Wu and Mandarin that shares many features with Wu languages, is a separate language, as is its sister language, Tongdong. Jinsha is a dialect of Nantong. Rugao, next to Nantong, is also a separate language. Also within Jianghuai, Hefei is considered to be a separate language not intelligible with Putonghua. Anqing, in Anhui Province, is also not intelligible with Putonghua. In 1933, there were three different languages spoken in Tongcheng, Anhui – East Tongcheng, West Tongcheng and Tongcheng Wenli. Tongcheng Wenli was the classical-based language spoken by the educated elite of the city. Whether these three languages still exist is not known, but surely some of the speakers in 1933 are still alive. Chuzhou, spoken in Anhui, is not intelligible with Putonghua, although it is said to be close to Nanjing. Dangtu, also spoken in Anhui, is not intelligible with Putonghua. Dongtai is a separate language (evidence). The lects spoken in Dafeng, Taizhou, Xingua and Haian are said to be similar to Dongtai, so for the time being, we will list them as dialects of Dongtai. Jiujiang, spoken in Jiangxi Province, is a separate language, as is Xingzi, located close by. Intelligibility between Rudong, Anqing, Chuzhou, Dafeng, Taizhou, Xingua, Haian and Dangtu is not known. Yangzhou, Lianyungang, Yancheng, Huaian, Nanjing, Hefei, Anqing, the Tongchengs, Chuzhou, and Dangtu are in the Hongchao Group of Jianghuai, which has 82 lects. Dongtai, Dafeng, Taizhou, Haian, Xinghua, Jinsha, Nantong, Tongdong, Rudong, and Rugao are in the Tairu Group of Jianghuai. Tairu has 11 different lects. Jiujiang and probably Xingzi are members of the Huangxiao Group of Jianghuai, which has 20 lects. Jianghuai is composed of an incredible 120 lects and is not fully intelligible with Putonghua. Some suggest that Northeastern (Dongbei) Mandarin is a separate language. Within Northeast, Shenyang is a separate language Shenyang is a member of the Jishen Group of Northeastern Mandarin, which has 44 dialects. Within Jishen, Shenyang is a member of the Tongxi Group, which has 24 dialects. Harbin is a member of the Hafu Group of Northeastern Mandarin, which has 64 lects. Within Hafu, it is a member of the Zhaofu Group, which has 18 lects. Lanyin Mandarin in the far northwest is also a separate language (Campbell 2004). Though Lanyin is said to be intelligible with Putonghua, that does not appear to be the case. Minqin (evidence) and Lanzhou (evidence) in Gansu are not fully intelligible with Putonghua, nor is Yinchuan (evidence) in Ningxia. Intelligibility within Lanyin is not known, but Jiuquan at least appears to be a completely separate language inside Lanyin. Jiuquan is a member of the Hexi Group of Lanyin, which has 18 lects. Yinchuan is a member of the Yinwu Group of Lanyin, which has 12 lects. Lanzhou is a member of the Jincheng Group of Lanyin, which has 4 lects. Lanyin is composed of 57 separate lects. Lanyin Mandarin has 9 million speakers (Olson 1998). The Jiaoliao Mandarin spoken in Shandong contains lects such as Qingdao (evidence here and here) and Wehai (evidence) which are not fully intelligible with Putonghua. Dalian is quite different from Putonghua. Intelligibility between Qingdao, Wehai and Dalian is not known. Wehai and Dalian are members of the Denglian Group of Jiaoliao, which has 23 lects. Qingdao is a member of the Qingzhou Group of Jiaoliao, which has 16 lects. Jiaoliao is composed of 45 lects. Jiaoliao is not fully intelligible with Putonghua. Intelligibility inside of Jiaoliao is not known, but there may be multiple languages inside of it, because some Shandong Peninsula lects sound very strange even to speakers used to hearing Shandong Mandarin. Karamay is an unclassified Mandarin language spoken in Xinjaing. The Mandarin spoken around Tiantai in Zhejiang is not intelligible with Putonghua and may be a separate language. It is also unclassified. Mandarin has 873 million speakers. There are an incredible 1,526 lects of Mandarin. Although it is related to Mandarin, Jin is a completely separate language. Besides the Main Jin branch Baotou are apparently separate languages (evidence). As is possibly Taiyuan (evidence). Within Hohhot Jin, there are two separate languages. One is Hohhot Xincheng Jin, a combination of Hebei Jin, Northeastern Mandarin and the Manchu language. The other is Jiucheng Hohhot Jin, spoken by the Muslim Hui minority in the city. It is related to other forms of Jin in Shanxi Province. Yuci is a separate language from Taiyuan on a Fenyang, the language used in Chinese director Jia Zhanke’s movie Xiao Shan Going Home is not intelligible with Putonghua. Jingbian, in Shanxi, is a separate language. Yulin is also a separate language. Hohhot is a member of the Zhanghu Group of Jin, which has 29 lects. Baotou and Yulin are members of the Dabao Group of Jin, which has 29 lects. Taiyuan and Yuci are members of the Bingzhou Group of Jin, which has 16 lects. Fenyang is a member of the Luliang Group of Jin, which has 17 lects. Jingbian is a member of the Wutai Group of Jin, which has 30 lects. Jin is composed of 171 lects, and some of them are separate languages. Jin has 48 million speakers (Olson 1998). Besides Xiang Proper, assuming there even is such a thing, Shuangfeng and Changsha are separate languages, having only in the city itself. We do not know how many there are, but we know that they exist. For the moment, we shall just add one lect to Changsha, and divide it into Changsha A and Changsha B, but there may be more. Furthermore, there are changes every 10 miles or so. Intelligibility data is lacking. Mao Zedong spoke Xiangtan, a notoriously difficult Xiang language in Hunan, about which it is said, “No one can understand it.” Xiangtan itself is internally diverse, with differences between the dialect of the city and rural areas, but intelligibility data is lacking. Hengyang is apparently a Liuyang is a separate language, actually a macrolanguage, spoken in Liuyang county-level city in Changsha prefecture in Hunan. Liuyang is split into 5 divisions – Liuyang North, Liuyang South, Liuyang West, Liuyang East and Liuyang City. Liuyang South and Liuyang East are separate languages, mutually unintelligible with the others. Liuyang City has recently arisen as a sort of a Liuyang “Putonghua” that is understandable to speakers of all Liuyang lects. So within Liuyang, we have three dialects – Liuyang City, Liuyang North and Liuyang West. Outside of Liuyang Proper, there are also two separate languages – Liuyang South and Liuyang East. None of the three Liuyang languages is intelligible with Changsha. Even within this classification, each of the 5 Liuyang lects has multiple dialects. Each village is said to have its own lect in Liuyang. Hengshan (evidence) is a separate language with vast dialectal divergence divided by Mount Hengshan. There are two Xiang Hengshan lects on either side of the mountain – Yiyang Changyi and Yiyang Luoshao. Baojing at least is not intelligible with Putonghua, yet it is said to be intelligible with Chengdu Southwest Mandarin. Lingshuijiang, also spoken in Hunan by Ningxiang is said to be good sources, there is a tremendous amount of lect diversity in Western Hunan, and most of it probably involves Xiang lects, while most or all of these lects are not mutually intelligible. But until we get more data, we cannot carve any languages out of this mess yet. Shuangfeng and Lingshuijiang are a members of the Luoshao Group of Xiang, which has 21 lects. The Changshas, Hengyang, Xiangtan, Hengshan, Ningxiang and the Liuyangs are members of the Changyi Group of Xiang, which has 32 lects. Baojing, Jishou and Huayuan are members of the Jixu Group of Xiang, which has 8 lects. Xiang is composed of 74 lects. Many, or possibly all of them are separate languages. The various languages of Xiang have 50 million speakers. Wu is a major group of diverse Chinese languages that is often divided into Northern Wu and Southern Wu. Northern Wu and Southern Wu are definitely mutually unintelligible languages. Southern Wu has 4-5 unintelligible dialects across a 12 mile area. In Zhejiang, the mountains go all the way down to the sea, so there are few flat areas where language can spread out and become comprehensible. Suzhou, Shanghaiese, Wuxi (evidence), Huzhou (evidence), Changzhou (evidence), Xiaoshan (evidence), Songjiang (evidence), Jiaxing, Hangzhou (evidence), Kunshan (evidence), Ningbo and Yixing (evidence) are separate languages. Tongxiang also appears to be a separate language, as does Yuyao (evidence) and Zhoushan. Qidong, spoken in the city of Qidong, is a separate language. Lvsi, Qisi or Tongdong, spoken in the nearby town of Qisi, is a separate language from Qidong. Qidong is said to be very close to Chongming, so for the time being, we will list Chongming as a dialect of Qidong. Haimen also appears to be a dialect of Qidong. However, there are Zhangjiagang, Changsha and Kunshan may be intelligible with Suzhou, but data is lacking. Suzhou is only good intelligibility with Shanghaiese, but not vice versa. Reports vary on the intelligibility of Shanghaiese and Suzhou. Some say they understand each well, but that is probably not the case at first due to serious differences in tones. Intelligibility testing is needed. Pudong, the older form of the Shanghai language, is still spoken in the Pudong District of the city, but it is dying out. There is a question of whether or not it is mutually intelligible with Shanghaiese, but Shanghaiese speakers seem to feel it is not mutually intelligible (Gilliland 2006). Several lects are spoken in the suburbs of Shanghai. Reports vary, but Shanghai residents generally report that these lects are not mutually intelligible with Shanghaiese (Gilliland 2006). They are Baoshan, Fengxian, Nanhui, Jiading, Jinshan, Pudong (or Chuansha) and Qingpu. Hangzhou is reportedly much different from the lects of Shanghaiese, Ningbo, etc. to the northeast, and is not intelligible with Shanghaiese, nor with Suzhou. Hangzhou has 1.2 million speakers. Changzhou and Wuxi are not intelligible with Shanghaiese or Suzhou. Changzhou and Wuxi have high, but not full, intelligibility. Changzhou and Wuxi are part of a dialect chain in which eastern Changzhou speakers can communicate with western Wuxi speakers, but as one moves further west into Wuxi or east into Changzhou, intelligibility drops off. Like Czech and Slovak, it is best then to split Wuxi and Changzhou into separate languages. Changzhou itself has considerable dialectal divergence, though apparently all dialects are intelligible. Changzhou has 3 million speakers. Yixing, near Changzhou, is not intelligible with Shanghaiese. Jiangyin is spoken in Jiangyin city. It is related to Changzhou and has high intelligibility with Changzhou and Wuxi. All of the above are in the Taihu Group. Taizhou, centered around the city of Tuzhou in Eastern Zhejiang, is composed of 11 separate lects, all of which are separate languages, Huangyan (evidence), Jiaojiang, Linhai, Sanmen, Tiantai (evidence), Wenling (evidence), Ninghai (evidence), Xianju, Leqing (evidence), Yubei and Yuhuan (evidence). (Evidence for all). A single subgroup of Wuzhou, Yiwu – contains 18 separate languages, all mutually unintelligible. We will call them Yiwu A, Yiwu B, Yiwu C, Yiwu D, Yiwu E, Yiwu F, Yiwu G, Yiwu H, Yiwu I, Yiwu J, Yiwu K, Yiwu L, Yiwu M, Yiwu N, Yiwu O, Yiwu P, Yiwu Q and Yiwu R for the time being. Pucheng is a Wenzhou (evidence) is a separate language. Ouhai, Yongjia and Ruian appear to be dialects of Wenzhou, but all of them are probably separate languages, since if you go 5 miles in any direction in Wenzhou, there’s a new dialect, and it’s hard to understand people. Wenzhou is Wencheng (evidence) appears to be a separate language. Wenxi is a separate language within Oujiang, not intelligible with Wenzhou. It is spoken in one town in Qingtian County. Jinxiang also has its own Wu lect, with Mandarin influences. This is a Taihu (Northern Wu) outlier. In addition, in Taishun County, there is also an aberrant Wu lect spoken in the town of Luoyang, influenced by both Manjiang and Oujiang Wu. There is another Wu lect similar to Manjiang Eastern Min spoken in the town of Hedi in Qingyuan County in Lishui. Manhua is quite different. There is a controversy over whether or not Manhua is Macro-Min or Macro-Wu. It is probably Macro-Wu based on phonology and it also shares some similar Min-like traits with other Wu lects such as those in the Chuqu group. Within Manhua, there is a northern group spoken in the town of Yishan and a southern group spoken in the towns of Qianku and Jinxiang. Qianku is the standard for Manhua. The northern/southern divide may impede intelligibility, but we have no information yet. Wuhu is a separate language, unintelligible with Shanghaihua. Nanjing Wu is a separate language Jiaxing, Shanghaiese, Suzhou, Wuxi, Songjiang, Tongxiang, Qidong, Lvsi, Yunhe and Kunshan are all in the Hujia Group of Taihu. The Hujia Group contains 32 lects. Changzhou, Yixing, Jiangyin and Haimen are in the Piling Group of Taihu. Piling has 12 lects. Piling has 8 million speakers. Wenzhou, Ouhai, Yongjia, Ruian and Wencheng are in the Oujiang Group of Taihu, which also contains 12 lects. Hangzhou has its own group, the Hangzhou Group of Taihu. Shaoxing, Fuyang, Xiaoshan, Linan, Yuyao and Zhuji are in the Linshao Group of Taihu which also contains 12 lects. Fenghua and Zhoushan are in the Yongjiang Group of Taihu. The Yongjiang Group contains 11 lects and has 4 million speakers. Changxing is in the Taioxi Group of Taihu, which has 5 lects. The Taihu Group is composed of 75 separate lects, many or all of which are separate languages. Taihu has 47 million speakers. Lishui, Qingyuan, Jingning, Jinyun and Taishun are in the Chuzhou group of Chuqu, which contains 9 lects. Chuzhou has 1.5 million speakers. Chuqu itself contains 35 separate lects. Pucheng, Shangrao County, Shangrao City, Jiangshan, Songyang, Guangfeng, Longquan, Kaihua, Changshan, Suichang, Longyou, Yushan and Quzhou are members of the Longqu Group of Chuqu, which has 14 lects and 5 million speakers (Olson 1998). The Yiwu languages, Dongyang, Jinhua, Jinhua Xiaohuang, Lanxi, Tangxi, Wuyi, Pan’an, Pujiang and Yongkang are all members of the Wuzhou Group, which contains 27 separate languages. Wuzhou has 4 million speakers. Nanjing Wu is unclassified. The various Wu languages have 85 million speakers. Within Hui, there are at least six separate languages (Hirata 1998). Actually, there are many more. Xidi, spoken in a village at the foot of Huangshan Mountain, is a separate language. Xidi is unintelligible even to villages a few miles away. Tunxi, Wuyuan and Xiuning are separate languages. The first two are spoken in Anhui, but Xiuning is spoken in Jiangxi Province. Within the Jingzhan Group of Hui, JingdeNingguo, Qimen, Chilingkou, (spoken in Chiling, Qimen County), Meixi Xiang, and Shitai are separate languages. Within Qimen County itself, Jixi, Dexing and Dongzhi are separate languages, the first spoken in Jiangxi and the second spoken in Anhui. In the Yanzhou Group of Hui, Jiande and which has 6 lects. Meixi, the Qimens, Chilingkou, Shitai, Ningguo and Jingde are members of the Jingzhan Group of Hui. Jingzhan has 12 lects, all of which are separate languages. Jixi, Hongmen and the Shexians are members of the Jishe Group of Hui. The Jishe Group has 6 lects . Dexing and Dongzhi are members of the Qide Group of Hui. The Qide Group has 5 lects. Xidi is unclassified. The various Hui languages have 3.2 million speakers . There are 34 different Hui lects, at least 24 of which are separate languages. There is a possibility that all Hui lects are separate languages, but that remains to be proven. Cantonese is a major language spoken in the south of China. They are said to be a mix between the Yue people and the Han. They have great pride in their speech which appears to be closer to ancient Chinese than Mandarin is. When Sun Yat-Sen was President of Republican China, a vote was held on which language to base Standard Chinese on. Cantonese only lost by one vote in favor of Mandarin. Some Cantonese activists denounce Mandarin as a pidgin language spoken Manchu and Mongol invaders glommed onto the Chinese of the people they conquered. Attempts to determine intelligibility through the use of complex lexical, tonal, grammatical and phonological formulae produce results that are excessively high in terms of percentage of intelligibility. A better method is presented in Szeto 2000, in which sentences in other lects are played to speakers of Lect A, and speakers of Lect A are asked to give the basic meaning of the sentences played to them. A sentence is recorded as correct if the basic meaning was ascertained. By this better method, Standard Cantonese has only 31. In contrast, the more complex method not relying on actual informants gives false positives. By this method, Cantonese has 54. Cantonese is traditionally said to have nine tones, but phonemically, there are only six tones, since the last three are just three of the first six with a voiceless stop consonant on the end. These are often called entering tones in traditional Chinese scholarship. Entering tones have disappeared from most Mandarin lects, probably about 800 years ago due to the influence of invading Mongols speaking Turkic languages, but are still present in Cantonese, Hakka and Min. The original entering tones of Middle Chinese have merged into one or the other or Mandarin’s four tones. Traditional Chinese tones or contour tones end in a vowel or a nasal. However, in Cantonese, the entering tone has retained its original short and sharp character from Middle Chinese, so in a sense, it has a different sound quality. Besides Standard Cantonese (the Guangzhou lect in the Yuehai Group), there is Siyi, or Sze Yup, a separate language. Siyi has 8 dialects, however, there are reports that there are intelligibility problems within the Siyi lects. In particular, Enping speakers cannot understand some other dialects. Therefore, Enping is a separate language. Kaiping, or Chikan, is not fully intelligible with Enping until they get used to each others’ sounds. Kaiping is so different from Taishan that it is hard to imagine how they can communicate well, though there is partial intelligibility. In Xinhui, there is a dialect called Hetang that is very divergent and has many strange features not found in other dialects. Doubtless it is less than fully intelligible with other Siyi lects. Actually, there seems to be many more than 8 dialects of Siyi. In Taishan County alone, there are 20 townships there may be a different lect in each one. For certain, there are at least three distinct dialects of Taishan, Taishan A, Taishan B and Taishan C. Even the lects in Taishan County can be quite different. However, all lects in Taishan County appear to be mutually intelligible. Xinhui is somewhat different from Taishan, but appears to be intelligible. Heshan is said to be intelligible with Xinhui and Taishan. Nevertheless, there are calls from Taishan speakers to split their lect off from the rest of Siyi. If Taishanese is unintelligible with the rest of Siyi, this would make sense, but that does not appear to be the case. 150 years ago, there was less, but still significant, difference between Siyi and Sanyi (Standard Cantonese), but Siyi was disparaged as a “hill dialect” of poor farmers, while Sanyi was elevated as the prestige lect of the cultured and cosmopolitan. This is why Sanyi became the Standard Cantonese lect. The Siyi incorporated this negative view into their self-image even to the point where they held overseas meetings meeting in Sanyi speech. There are 3.6 million speakers of Siyi. Vietnamese Cantonese is quite different from Standard Cantonese, but it is nevertheless intelligible with it. Malay Cantonese is also quite different from Standard Cantonese. Intelligibility data between Malay Cantonese and Standard Cantonese is not known. Both are dialects of Cantonese. Hong Kong is a dialect of Guangzhou. Foshan and Nanhai are Shunde and Zhongshan are intelligible with Standard Cantonese, but others disagree. This requires further study, as they are obviously close. However, both are said to at the same time be quite different from Standard Cantonese. Even within Yuehai, Panyu is said to be a separate language (Chan 1981). Namlong, a poorly understood lect from the Pearl River area, is also a separate language, or at least it was one in 1949. Whether it still exists is not certain, but speakers must still be alive. Yuehai itself has 31 separate lects. Danija, the Cantonese lect of the Tanka fisherpeople who live on boats off the coast of Guangdong, Guangxi and Hainan, may well be a Gashiau, is spoken by a group of fisherpeople related to the Danija. This language is related to Danija but apparently not intelligible with it. Maihua, a Cantonese lect spoken on Hainan, may well be a Nanning is a dialect of Cantonese, easily understandable by a Standard Cantonese speaker. However, Lizhou is a separate language, with difficult intelligibility with Standard Cantonese. Dongguan and Zhanjiang (evidence), are separate languages. Shiqi, spoken in Guangxi, is a separate language. Speakers of Standard Cantonese Huazhou is a very divergent Cantonese lect that is very hard even for other Cantonese speakers to understand. It is surely a separate language (evidence here and here). Maoming is an extremely diverse Cantonese lect that must also be a separate language. Beihai and Hepu are reported to be very different, but intelligibility data is not known, nor is it known to what extent these two lects differ from other Cantonese. But the Quinlian Group of which they are members must surely be a separate language. One division holds that the Standard Cantonese (Guangzhou), Siyi, Zhongshan, Gaoyang and Guangfu groups are mutually unintelligible groups. The Goulou Group of Cantonese appears to be a Taishanese (includes Taishan A, Taishan B and Taishan C), along with Heshan, Jiangmen, Siqian, Doumen, Xinhui, Enping and Kaiping. Nanning is in the Yongxun Group of Cantonese, which has 12 lects. Zhanjiang and Maoming are members of the Gaoyang Group of Cantonese, which has 10 lects. Gaoyang has 5.4 million speakers. Dongguan, Shunde, Foshan, Zhongshan, Nanhai, Panyu and Hong Kong are members of the Guangfu Group of Cantonese, which has 31 lects. Guangfu has 13 million speakers. Shiqi is a member of the Zhongshan Group of Cantonese , which contains at least 3 lects. Huazhou is a member of the Wuhua Group of Cantonese, which has 2 lects. Beihai and Hepu are members of the Quinlian Group of Cantonese, which has 6 lects. Namlong is unclassified. There are 100 lects of Cantonese, and Cantonese has 64 million speakers. Pinghua, now recognized as a major split off from Cantonese, is composed of Guinan and Guibei, which are separate languages. The Guibei lects are 22 lects, and Guibei has 8 lects . There is one Pinghua lect that is unclassified. Pinghua has 31 separate lects. Ping has 26 separate lects. In addition to Tuhua Proper, the best known of the Tuhua lects is Shaozhou, referred to here as Shaozhou Proper. Shaozhou is said to be very different from other Chinese lects. Shaozhou itself consists of many different lects which are often strikingly different from the others. Some say that Shaozhou is a branch of Min Nan, while others say it is related to Hakka. In Lechang prefecture, there are five separate languages, Lechang Tuhua 1, Lechang Tuhua 2, Lechang Tuhua 3, Lechang Tuhua 4 and Lechang Tuhua 5, which are not fully intelligible with each other. Additionally, many Tuhua lects are starting to splinter recently as influences from Hakka, Cantonese and Southwest Mandarin begin to affect the younger speakers such that the language of the youngest speakers is quite a bit different from the language of the older speakers. One of the Shaozhou Tuhua lects, Xianghua, said to be an unclassified Chinese lect, is actually a branch of Tuhua that contains 6 lects of its own. Xianghua is a Jiahe Tuhua is a completely separate language, Jiangyong Tuhua is divided into two mutually unintelligible languagesNorth Jiangyong Tuhua and South Jiangyong Tuhua (Leming 2004). It is spoken in the basis for the famous nishu, “women’s script”, a secret language of women, originating from the Shangjiangxu (Xiao River) region of northeastern Jiangyong County in Hunan Province, of which much has been written lately. Also in Hunan, in Guiyang County, another Tuhua language is spoken – Guiyang Tuhua. This is apparently a separate language, and the northern and southern variants are so Danzou is a separate language. Danzou is spoken in the northwest of Hainan, and Hainanese speakers cannot understand it. It is either related to the language spoken by the Lingao or is the same language. Yet the Danzou people speak 9 different lects, including lects described as Hakka, others described as Cantonese and others described as Mandarin. Maojiahua is a form of Chinese spoken by 20,000 Hmong in southwest of Hunan, in the northeast of Guangxi and in some areas of Hubei. It is a separate language already recognized by Ethnologue, but is incorrectly lumped in with the Hmong languages by them. Linghua is an unclassified Chinese lect spoken in Yongzhou in Hunan. Linghua is a separate language. It is apparently the same as the Yongzhou Tuhua dialect. However, the Yongzhou Tuhua language has Kim Mun, incorrectly classed as an unclassified Chinese lect, is actually one of the Mien languages. It is not a Sinitic language. Wutun, or Wutunhua, is a Chinese-Mongolian-Tibetan mixed language spoken by 2,000 Tu in Qinghai Province. Whether it is a form of Chinese is controversial. Until it is proven to be Sinitic, we will not list it here.

References

Ben Hamed, Mahe´. 2005. Problems in Comparative Chinese Dialectology. The Classification of Miin and Hakka. Berlin: Walter de Gruyter. Branner, David. 2008. Personal communication. Campbell, Hilary. 2004. Chinese Grammar – Synchronic and Diachronic Perspectives. Oxford, UK: Oxford University Press. Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. January 2009. Personal communication. Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. April 2009. Personal communication. 曹志耘 (Cao, Zhiyun). 2002. 南部吴语语音研究 (Southern Wu Phonology Research). Beijing: Commercial Press (In Chinese). Chan, Marjorie K.M., Lee, Douglas W. 1981. Chinatown Chinese: A Linguistic and Historical Re-evaluation. Amerasia Journal, Volume 8, Number 1. Cheng, Chin-Chuan. 1997. Measuring Relationship Among Dialects: DOC and Related Resources. Computational Linguistics & Chinese Language Processing 2.1:41-72. Cheng, Chin-Chuan. 1998. Language Attitudes and Ideologies In Shanghai, China. MA Thesis. Columbus, OH: Ohio State University. Hirata, Shoji. 1998. Aspect: A General System and its Manifestation in Mandarin Chinese. Taipei: Student Book Company. Johnson, Eric. 2010. SIL Electronic Survey Reports 2010-027. A Sociolinguistic Introduction to the Central Taic languages of Wenshan Prefecture, China. Dallas, Texas: SIL. Lee, Kent A. 2002. Chinese Tone Sandhi and Prosody. MA Thesis. Urbana, IL: University of Illinois at Urbana-Champaign. Lien, Chinfa. August 17-19, 1998. Denasalization, Vocalic Nasalization and Related Issues in Southern Min: A Dialectal and Comparative Perspective. International Symposium on Linguistic Change and the Chinese Dialects Dedicated to the Memory of the Late Professor Li Fang-kuei in Seattle Washington. Liming, Zhao. The Women’s Script of Jiangyong: An Invention of Chinese, Chapter 4. In Tao, Jie, Zheng, Bijun, Mow, Shirley L., editors. 2004. Holding Up Half the Sky: Chinese Women Past, Present, and Future. New York: Feminist Press at the City University of New York. Mair, Victor H. 1991. What Is a Chinese ‘Dialect/Topolect’? Sino-Platonic Papers:29 McKeown, Adam. 2001. Chinese Migrant Networks and Cultural Change: Peru, Chicago, Hawaii, 1900-1936. Chicago, IL: University of Chicago Press. Ngù, George. Eastern Min speaker. 2009. Personal communication. Olson, James Stuart. 1998. An Ethnohistorical Dictionary of China. Westport, CN: Greenwood Publishing Group. Rickard, Kristine. 2006. A Linguistic-phonetic Description of Lanqi Citation Tones. Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 349-353. Edited by Paul Warren & Catherine I. Watson. University of Auckland, New Zealand. December 6-8, 2006. Auckland, NZ: Australian Speech Science & Technology Association Inc. Szeto, Cecilia .2000. Testing intelligibility among Sinitic dialects. Proceedings of ALS2K, the 2000 Conference of the Australian Linguistic Society. Thurgood, Graham. 2006. Sociolinguistics and Contact-induced Language Change: Hainan Cham, Anong, and Phan Rang Cham.‭ Tenth International Conference on Austronesian Linguistics, 17-20 January 2006, Palawan, Philippines. Linguistic Society of the Philippines and SIL International. Xun, Gong. Sichuan Mandarin and Putonghua speaker. Deyang, Sichuan, China. Personal communication. September 2009. Zheng, Rongbin. 2008. The Zhongxian Min Dialect: A Preliminary Study of Language Contact and Stratum-Formation, pp. 517-526. Edited by Chan, Marjorie K.M. and Kang, Hana. Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20). Volume 1. Columbus, Ohio: The Ohio State University.

This research takes a lot of time, and I do not get paid anything for it. If you think this website is valuable to you, please consider a a contribution to support more of this valuable research.

error

Enjoy this blog? Please spread the word :)