A Reworking of Chinese Language Classification

Updated December 3, 2014. This post runs to 112 pages so far. On March 6, 2011, Sinologist Victor Mair took on the question of Mutual Intelligibility of Sinitic Languages.
The Chinese languages have undergone a lot of reclassification lately (Mair 1991), from one Chinese language a couple of decades ago up to 14 Chinese languages today according to the latest Ethnologue.
However, Jerry Norman, one of the world’s top experts on Chinese, says that based on mutual intelligibility, there are 350-400 separate languages within Chinese (Mair 1991). According to Gong Xun, a Sichuan Mandarin speaker in Deyang, China, by my criteria of distinguishing between language and dialect, there would be 300-400 separate languages in Fujian alone.
So far, 2,500 dialects of the Chinese language have been identified, and a number of them are separate languages.
I have been doing research on this issue recently. Based on the criteria of mutual intelligibility, I have expanded the 14 Chinese languages into 365 separate languages.
There are different ways of doing mutual intelligibility. I decided to put it at 90%, with >90% being dialect and <90% being a separate language. This is based on what appears to be Ethnologue‘s criteria for establishing the line between a dialect and a language.
In the cases below where I had intelligibility data available, a number of Chinese languages had no more than 65% intelligibility between them (Cheng 1991).
Intelligibility is hard to determine. I am not interested in typological studies of lects involving either lexicon, phonology or tones, unless this can be quantified in terms of intelligibility in a scientific way (see Cheng 1991). For the most part, what I am interested in is, “Can they understand each other?”
Reasonable, fair-minded and professional comments, additions, criticisms, elaborations, presentations of evidence, etc. are highly encouraged, as long as politics and emotions are left out of it. The purpose of the classification below is more to stimulate academic interest and sprout new thinking and theory. It is not intended to be an end-all or be-all statement on the subject, in fact, it is quite the opposite.
Interested scholars, observers or speakers of Chinese languages are encouraged to contribute any knowledge that they may have to add to or criticize this data below. So far as I know, this is the first real attempt to split Chinese beyond the 14 languages elucidated by Ethnologue.
There are lapses in the data below. I mean to present this data in outline form to make it more readable.
There are also problems with the data below. In many cases, “separate language” just means that the lect is not intelligible with Putonghua. Unfortunately, I currently lack intelligibility data within the major language groups such as Gan, Xiang, Wu and the branches of Mandarin. There is probably quite a bit of lumping still to be done below. Where lects are mutually intelligible below, I have tried to lump them into one language with various dialects.
It is reasonable to ask what background and expertise I have to write such a post. I have a Masters Degree in Linguistics and have been employed as a salaried linguist for a US Indian tribe. I also sit on a peer review board for a linguistics journal and will soon publish my first work in book form via a book chapter in a book on Turkic languages that will come out soon.
I assume it will be controversial. Keep in mind that this work is extremely tentative and should not be taken as the last word on the subject by a long shot. There are claims that this study claims to be “accurate and precise.”
In truth, it claims nothing of the sort. Initial studies, which is what this is, are de facto never “accurate and precise,” and you can take an extreme argument from scientific philosophy that no science is really “accurate and precise” but is simply “correct for now” or “correct until proven otherwise.”
Gan is a separate language, already identified as such. Many individual Gan lects are unintelligible to other Gan lects. In fact, it is possible that all Gan lects are unintelligible with each other, but that remains to be proven.
Outside of Gan Proper, Leping, while very diverse, is nevertheless intelligible with nearby Gan lects and with Nanchang (Campbell January 2009).
Nanchang and Anyi are apparently separate languages within Gan based on a 200 word Swadesh test (Ben Hamed 2005). Nanchang has a great deal of dialectal diversity, with several dialects covering different cities and the rural areas. Intelligibility is not known.
Jiangyu, spoken in Hubei, is very strange and at least unintelligible to Putonghua speakers, as is Huarong (evidence). Huarong is surely a separate language.
Similarly, Wanzai must surely be a separate language, as must Yichun, Ji’an, Wanan, Fuzhou, Yingtan, Leiyang, Huaining and Dongkou.
Nanchang and Anyi are within the Changjing Group of Gan, which has 15 different lects. Yingtan and Leping are members of the Yingyi Group has 12 lects. Jiangyu and Huarong are members of the Datong Group of Gan, which has 13 lects. Yichun is a member of the Yiliu Group of Gan, which has 11 lects. Wanzai is a member of the Yiping Group of Gan, of which it is the only member.
Leiyang is a member of the Leizi Group of Gan, which has 5 lects. Wanan is a member of the Jilian Group of Gan, of which it is the only member. Ji’an is a member of the Jicha Group of Gan, which has 15 lects. Huaining is a member of the Huaiyue Group of Gan, which has 9 lects. Fuzhou is a member of the Fuguang Group of Gan, which has 15 lects. Dongkou is a member of the Dongsui Group of Gan, which has 5 lects.
Gan has 102 separate lects in it. There are 30 million speakers of the Gan languages.
Within the Min group, Northern Min (Min Bei) and Central Min (Sanminghua) have already been identified as separate languages. There are 50 million speakers of all of the Min languages (Olson 1998). Northern Min has only 0-20% intelligibility with Min Nan.
Central Min has three lects, Shaxian, Sanming and Yongan, but we don’t know if there are languages among them. Central Min has 3.5 million speakers.
Northern Min is said to be a single language, although it has 9 separate lects. Most dialects are said to be mutually intelligible, but Jianyang and Jian’ou have only about 75% intelligibility. Northern Min has 10 million speakers.
The standard dialect of Min Dong or Eastern Min is Fuzhou.
Eastern Min has only 0-20% intelligibility with Min Nan.
Chengguan, Yangzhong and Zhongxian are separate languages, all spoken in Youxi County (Zheng 2008).
Beyond that, Eastern Min is reported to have several other mutually unintelligible languages. One of them is Fuqing, located near Fuzhou but not intelligible with it, according to Wikipedia, but others say the two are mutually intelligible, although speakers are divided on the question.
It appears that possibly Fuzhou speakers can understand Fuqing speakers better than the other way around. Fuzhou and Fuqing are about 65% intelligible in praxis, and it about the same with the rest of the Hougan Group (Ngù 2009).
Ningde, Fuding and Nanping are probably other languages in this family (evidence). Of these three, Ningde is definitely a separate language. According to George Ngù, a passionate proponent of Fuzhou, “Fuzhou is not intelligible even within its many varieties.”
It’s not clear if that applies to all of Eastern Min, but it appears that it does. Therefore, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai, Fuan, Fuding, Shouning, Xiapu, Zherong and Zhouning are all separate languages.
There are two other lects lumped in with Eastern Min. Manjiang is spoken in the central part of Taishun County, and Manhua spoken in the eastern part of Cangnan County. Both of these names mean “barbarian speech.”
Both are probably mixtures of Southern Wu (Wenzhou etc.), Eastern Min, Northern Min, and maybe even pre-Sinitic languages. Manhua and Manjiang are not intelligible with Fuzhou. However, Manjiang has affinity with Shouning in phonology, vocabulary and grammar. Whether or not it is intelligible with Shouning is not known. Min Nan speakers who have looked at Manjiang data say that it doesn’t even look like a Sinitic language.
Manhua is best dealt with as a form of Wu. I discuss it further below under Wu.
Fuding, Fuan, Shouning, Xiapu, Zherong and Zhouning are in the Funing Group of Eastern Min, which has 6 lects.
Fuzhou, Fuqing, Chengguan, Yangzhong, Zhongxian, Ningde, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai and Nanping are in the Houguan Group of Eastern Min, which has 16 lects.
Eastern Min contains 23 separate lects.
Within Min Nan, Xiamen and Teochew are separate languages (evidence). There is even a proposal to split Xiamen, Qiongwen and Teochew into three separate languages before SIL.
Amoy, Taiwanese, Jinjiang, ZhangzhouTainan, Taibei, Yilan, Taichung, Quanzhou and Lufeng are part of the Xiamen group.
Jinmen is apparently a separate language, as it has poor intelligibility with Taiwanese.
A much better name for Xiamen according to the Chinese literature is Quanzhang (Campbell January 2009).
Quanzhang is a combination of Quanzhou and Zhangzhou, two of the most important dialects in the language. Xiamen has only 51% intelligibility with Teochew. Whether or not Zhangzhou and Quanzhou are intelligible in China itself is still somewhat of an open question.
Nevertheless, Quanzhou speakers in Singapore can no longer understand Taiwanese or Xiamen well, though they have partial understanding of them. They have only 30-40% intelligibility with Yilan. Nevertheless, they have good understanding of Zhangzhou. This implies that much of the understanding between at least some of the Xiamen lects was due to bilingual learning.
The Yilan dialect on Taiwan is so different that it alone has posed serious problems for the task of standardizing Taiwanese Min Nan, yet it is intelligible with the rest of Taiwanese (Campbell January 2009). Lugang is also very different but is also intelligible with Taiwanese (Campbell 2009).
There are some communication problems for Tainan speakers hearing Taipei, but it appears that they are still intelligible with each other (Campbell January 2009).
JieyangRaoping, Chaoyang, Shantou (Swatow) and Hailok’hong (Haklau) are lects in the Teochew Group (evidence) of Teochew. Teochew (Chaozhou) is the prestige version of Teochew. Chaoyang speakers can understand Jieyang, Raoping (evidence) and Shantou, but intelligibility is difficult with Haifeng and Lufeng. Shantou, Raoping, and Jieyang are then dialects of Chaoyang.
Zhangzhou and Quanzhou have marginal intelligibility with Teochew varieties. They are both spoken in Taipei, Taiwan. After all, Taiwanese itself is just a mixture between Zhangzhou and Quanzhou. The situation in Taipei was interesting. The dialects of the city were a mix of Zhangzhou and Quanzhou. The dialect of the center of the city was mixed between the two, with a slight Quanzhou lean to it. In Sulim (Shilin), people spoke with a dialect that heavily favored Zhangzhou. Other districts spoke a Tang’oann-type dialect, which is just Quanzhou mixed with a bit of Zhangzhou.
All these conditions are more common with the older generation because the new generation either does not speak Teochew at all or they favor the mixed Zhangzhou-leaning “Southern” style favored in the media, or they just do not speak the language at all. Hailok’hong (Haklau) is spoken down the coast between the Teochew zone and the Hong Kong area. It has marginal intelligibility with other Teochew lects. Nevertheless, Taiwanese speakers can no longer understand the pure Quanzhou spoken in the Chinese city of that name.
On the other hand, Chaoyang itself is unintelligible to some other Teochew lects. Shantou speakers cannot understand some of the other Teochew lects, and speakers of other lects often find Shantou hard to understand.
Sources report that Teochew lects can vary greatly in the pronunciation of even single words, and the tones can be quite different too.
There are claims that Teochew is intelligible with Zhangzhou and Quanzhou, but these claims appear to be incorrect (see above). That might make some sense, as Teochew are a group of Min speakers who broke off from Zhangzhou Min about 600-1,100 years ago. They moved down to northeast Guangdong, after hundreds of years, a heavy dose of Cantonese went in, producing modern Teochew.
chinese language map
Teochew has only 51% intelligibility with Xiamen.
Haifeng and Shanwei are members of the Luhai Teochew subgroup of Teochew, which differs markedly from Teochew and may be a separate language. Luhai is said to be halfway between Teochew and Zhangzhou. Luhai probably represents a later move from Zhangzhou towards northeast Guangdong by the same group that formed Teochew. This move may have occurred around 400 years ago.
Lufeng is said to have over 90% intelligibility with Xiamen, but if it is really halfway between, it should have 75% intelligibility. Intelligibility testing may be needed.
The Teochew spoken in Indochina – in particular, in Vietnam and Cambodia (Indochinese Teochew) may be a separate language. Some Indochinese Teochew speakers who have returned to their family villages say they could only understand 70% of the speech there.
Furthermore, intelligibility is difficult between Malay Teochew and other Teochew, such as SE Asian Teochew and Teochew on the mainland. Malay Teochew is spoken in Malaysia, Singapore and Indonesia.
The Teochew variant spoken in Malaysia is composed of many highly variant lects. Whether or not they are mutually intelligible with each other is not known. The variety spoken in Medan, Indonesia is particularly interesting. It has heavy Malay and Cantonese influence and cannot be understood by other Teochew speakers. Teochew has 10 million speakers.
Zhangping, though close to Xiamen, is a separate language according to a 200 word Swadesh test (Ben Hamed 2005).
Sanjiang appears to be a separate language .
Datian, in Fujian, is also a separate language.
A version of Hokkien called Malay Hokkien is spoken in Malaysia and in Indonesia in Sumatra and Kalimantan. In Indonesia, it is spoken in the city of Medan, the state of Riau, the city of Bagansiapiapi on Sumatra and in a few places on Kalimantan, such as Kuching and especially in Brunei. Malay Hokkien is heavily laced with Teochew.
Northern Malay Hokkien is spoken from Taiping along the coast formerly all the way to Phuket but now only to Pedang in Malaysia and in Indonesia in the city of Medan, the state of Riau, the city of Bagansiapiapi on Sumatra and in a few places on Kalimantan, such as Kuching and especially in Brunei. Speakers of Northern Malay Hokkien have a hard time understanding the Southern Malay Hokkien (see Singapore Hokkien below) spoken in Kelang, Malacca and Singapore. Northern Malay Hokkien is creolized, with Malay and Thai embedded deeply in the language.
Southern Malay Hokkien is less creolized, if at all. Singapore Hokkien lies between Northern Malay Hokkien and Taiwanese on the continuum. A very pure variety of Hokkien is spoken in the Indonesian city of Bagansiapiapi. It has avoided the Mandarinization of Hokkien that is occurring elsewhere. They speak like the Hokkien speakers of Tang’oann (Tong’an), China.
Kelantan Hokkien is spoken in the Malay state of Kelantan. It is wildly creolized with Malay and is probably not intelligible with any other form of Hokkien.
The version of Hokkien spoken in the Philippines is often called Binamhue, Banlamhue or Minanhua (Philippines Hokkien) by speakers, derives from a dialect on the outskirts of Quanzhou, and it may have drifted into a separate language. At present, it is sometimes not intelligible with Quanzhou or Xiamen. That is, some Philippines Hokkien speakers claim that they can only understand about 70% of Taiwanese television.
The version of Min Nan, Singapore Hokkien (Southern Malay Hokkien), spoken in Singapore, Kelang and Malacca is similar to that spoken in Taiwan, but many Singapore Hokkien speakers have a hard time understanding Taiwanese Hokkien, while others can understand it just fine. Older Singapore Hokkien speakers can understand Taiwanese Hokkien better than younger ones. This is due to bilingual learning more than anything else because younger Singapore Hokkien speakers are no longer good at understanding other Min Nan dialects due to lack of exposure to them.
The reason that Taiwanese speakers can seem to speak communicate well with Singapore Hokkien speakers is because they are using a simpler vocabulary. A Singapore Hokkien speaker, if immersed in Taiwan, could pick up Taiwanese fairly quickly, within say 3 months.
An umbrella term covering Malay Hokkien, Singapore Hokkien and Philippines Hokkien may be Nusantaran Hokkien.
Another language in the same group is best called Wan’an, comprising a number of dialects and possibly languages in Wan’an County of Fujian (Branner 2008).
Zhaoan, Pinghe and Yunxiao, also of Fujian, are separate languages.
Wan’an and Longyan are not mutually intelligible (Branner 2008). Longyan seems to have about 85% intelligibility with Taiwanese. Koongfu and Shizhong are apparently dialects of Longyan Min and are probably intelligible with it. Koongfu is spoken in Kanshi Township in Yongding County. Shizhong is spoken in southern Longyan County.
There are many varieties of Southern Min spoken in Western Fujian that may or may not be independent languages.
Liancheng Gutyan Junbao, Longyan Wan’an Wuzhai, Longyan Wan’an Songyang, Longyan Wan’an Tutuan, Longyan Baisha Youshui, Shiahtsuen Buhyun Liling, Shanghang Buhyun Liling, Liancheng Xuanhe Shengxing, Shanghang Gutian Laifang, Liancheng Xinquan Linguo, Liancheng Xinquan Lelian, Liancheng Pengkou Wangcheng, Liancheng Miaoqian Zhixi, Liancheng Gechuan Zhuyu, Liancheng Miaoqian Jiangshe, Liancheng Sibao Shangjian Zhenbian, Liancheng Juxi Gaoding, Liancheng Tangqian Dikeng, Liancheng Wencheng Hengming, Liancheng Xinquan Dongnancun, Liancheng Quxi Puxi Dongxiduan, Liancheng Quxi Qiaotou and Liancheng Liwu Nanban Zhangwu are spoken in Western Fujian. Shiahtsuen is spoken in Laiyuan Township in southeastern Liancheng County. (Branner 2000).
Whether or not these lects are dialects or separate languages is difficult to say. With many of these lects, they don’t understand each other at first, but after they talk to each other for a while, they start to figure out the other lect. (Branner 2008). Intelligibility testing needs to be done for these lects.
Quanzhou, Zhangzhou, Singapore Hokkien, Philippines Hokkien, Xiamen, Amoy, Yilan, Tainan, Taipei, Taichung, Taiwanese, Jinjiang, Lufeng, Lugang, Jinmen, Zhangping, Koongfu, Shizhong, Nanjing, Zhaoan, Pinghe, Yunxiao, Longyan, Wan’an, Liancheng Gutyan Junbao, Longyan Wan’an Wuzhai, Longyan Wan’an Songyang, Longyan Wan’an Tutuan, Longyan Baisha Youshui, Shiahtsuen, Shanghang Buhyun Liling, Liancheng Xuanhe Shengxing, Shanghang Gutian Laifang, Liancheng Xinquan Linguo, Liancheng Xinquan Lelian, Liancheng Pengkou Wangcheng, Liancheng Miaoqian Zhixi, Liancheng Gechuan Zhuyu, Liancheng Miaoqian Jiangshe, Liancheng Sibao Shangjian Zhenbian, Liancheng Juxi Gaoding, Liancheng Tangqian Dikeng, Liancheng Wencheng Hengming, Liancheng Xinquan Dongnancun, Liancheng Quxi Puxi Dongxiduan, Liancheng Quxi Qiaotou and Liancheng Liwu Nanban Zhangwu are all members of the Quanzhuang Group of Min Nan, which has 50 lects.
Teochew, Shantou, Lufeng, Haifeng, Chaoyang, Jieyang, SE Asian Teochew and Malaysian Teochew are members of the Chaoshan Group of Min Nan, which has 12 lects.
Datian is in its own group in Min Nan.
Min Nan consists of 68 separate lects. Clearly, the dialectal relationships of Min Nan are confusing, as many of the lects are very closely related, if not fully intelligible. Intelligibility testing may be needed to sort out some of these issues. There are 30 million speakers of Southern Min.
Zhenan Min, spoken in Zhejiang Province around Pingnang and Cangnan and in the Zhoushan Islands, is a separate language. Zhenan Min contains 4 lects, Pingyang, Cangnan, Dongtou and Yuhuan, which may or may not be languages. Zhenan Min has 574,000 speakers. Zhenan Min is influenced by Eastern and Northern Min.
Qiongwen (Hainanese) is a separate language with 8 million speakers. It has the lowest intelligibility with the rest of Southern Min as any other Min Nan lect. Qiongwen itself has 14 separate lects, all spoken on Hainan. Whether or not any of them are separate languages is not known.
Longyan (Branner 2008) is a separate language, apart from Southern Min. It is spoken in Longyan City’s Xinluo District and Zhangping City and has 740,000 speakers. It has heavy Hakka influence due to the large number of Hakka speakers in the surrounding areas.
Another split in Min is Leizhou. Leizhou Min is a separate language and is now recognized by some as a separate branch of Min altogether, along the lines of Southern and Northern Min. Leizhou consists of 7 different lects. Haikang appears to be a dialect of Leizhou.
However, at least some of the other 6 Leizhou lects are very different in phonology and lexicon. Intelligibility data is not known, but they may be mutually intelligible. Leizhou Min, with 4 million speakers, has low intelligibility with Min Nan lects and has only 50% intelligibility with Hainanese.
Shaojiang Min, or Min Gan, is said to be a completely separate high-level division of the Min language like Leizhou Min. It has four lects – Shaowu, Guangze, Jiangle and Shunchang – that are said to be mutually intelligible. There are subdialects within these larger lects. The substratum of Shaojiang is not Min, Gan or Hakka – instead, it is the ancient Baiyue language.
Puxian Min has already been identified as a separate language. Puxian has 3 separate lects. There are minor differences between these lects.
However, there is a form of Puxian Min spoken in Singapore, Hinghwa, and presently it lacks full intelligibility with Puxian Min proper. Puxian speakers are a minority in Singapore, and their language has mixed a lot with Singapore Hokkien, Malay, English and other languages spoken in Singapore, resulting in a separate language.
A Min language called Longdu, located in Guangdong, is not only a separate language (evidence here and here) but seems to be in another Min category from Southern Min. It is spoken in the southwest corner of Zhongshan City in Shaxi and Dayong.
In Guangdong Province, there are other divergent lects of Min Nan. Two of these, Nanlang (also spoken in Zhongshan) and Sanxiang, are also separate languages. Nanlang is spoken 10 miles southeast of Zhongshan in Cuiheng. It is also spoken in Nanlang and Zhangjiabian. Sanxiang is spoken to the south of Zhongshan in the hilly rural areas.
In Chinese, Longdu, Nanlang and Sanxiang are referred to as All-Lung, South Gourd and Three Rural, respectively. Sources give Longdu and Nanlang 100,000 speakers and Sanxiang 30,000 speakers. 14% of the population of Zhongshan speaks Min. Nanlang now has mostly elderly speakers.
All of these seem to be in the same group, Zhongshan Min, and all are spoken in the Pearl River Delta near Hong Kong. Zhongshan Min has 150,000 speakers.
This group is possibly a Northern or Eastern Min group stranded way down in Guangdong. They are sometimes referred to in old literature as “Northeastern Min”. That’s not really a category. It often means Northern Min, but sometimes it means Eastern Min. These languages have all borrowed extensively from the type of Cantonese spoken in the Pearl River Delta.
Looking at the whole picture, it appears that various immigrants speaking Puxian Min, Northern Min and Southern Min all settled around Zhongshan. These various Min elements, along with a hefty dose of Cantonese, have gone into the creation of Zhongshan Min.
Sanxiang, Nanlang and Longdu are apparently not mutually intelligible, although Nanlang is close to Longdu. Sanxiang is more divergent. Further, there are more dialects within these three languages, and dialectal divergence is considerable, with possible communication difficulties among them.
Sanxiang has at least two dialects, Phao and Tiopou. Phao is fairly uniform across a number of villages, but Tiopou is quite different. Nevertheless, there is near-full intelligibility between Phao and Tiopou. For now, we will just list Sanxiang, Nanlang and Longdu as separate languages, with possible dialects Phao and Tiopou (Sanxiang); Nanlang A and Nanlang B; and Longdu A and Longdu B, among them.
A very strange lect is spoken by the She people in Zhejiang, Fujian and Guangdong. The She language was originally Hmong-Mien, then added a Cantonese layer, then a Hakka layer, next a Min layer, and in Zhejiang, a Wu layer. It is best described as a Hmong-Mien language that has been Sinicized. There are probably 200,000 speakers of this language.
There is also an original She language that is non-Sinitic (Hmong-Mien) and is spoken by only about 1,000 people in Guangdong.
In Eastern Guangdong, the She speak the Chaoshan She language. They live in the Phoenix Mountains in Chao’an County in Chaozhou prefecture. It has had heavy contact with Chaoshan (Teochew) Min group. This is probably a separate language, unintelligible with other She languages and also with Chaoshan Min.
Within Hakka, besides Hakka Proper (Meixia)Tingzhou is a separate language (evidence). Wuhua Hakka is intelligible with Meixian.
Fangcheng and Dabu are close to Meixian, but intelligibility data is lacking. Fangcheng has five different lects within it, but intelligibility data is not known. Hong Kong Hakka is not intelligible with the Hakka spoken on Taiwan, nor with Dabu.
Dongguan, spoken near Hong Kong, can understand Meixian, but Meixian cannot understand Dongguan.
Taipu or Taipo is spoken in the village of the same name in Hong Kong and is not intelligible with Meixian, nor is Wakia, also spoken in Hong Kong.
A variety of Hakka spoken in a part of Hong Kong called Shataukok is called variously Satdiugok, Sathewkok, Shataukok, Satdiukok or Satdiugok. It is said to be different from other Hakka, and evidence indicates that Shataukok may indeed be a separate language. Shataukok has dialects within it and they are different, but they are generally mutually intelligible.
All three of these are dialects of a more or less intelligible language called Hong Kong Hakka.
Located near Hong Kong, Shenzhen/Bao’an is a separate language.
Haifeng and Lufeng, located near each other in Guangdong, appear to be dialects of a separate language called Hailufeng.
Longchuan in northeastern Guangdong is a separate language (evidence), with poor intelligibility with other Hakka lects. Longchuan has four different dialects, Huangbu, Sidu, Chetian and Tuocheng. Sidu and Tuocheng are close and are probably dialects of Longchuan. Sidu Longchuan has 18,000 speakers.
Boluo and Heyuan are separate languages, not mutually intelligible.
Longchuan, Boluo and Heyuan are quite distant from other Hakka. Heyuan is spoken in central Guangdong.
Huizhou is mutually intelligible with Longchuan and also with Meixia and Dabu.
Sanxiang, spoken in Zhongshan prefecture, is different from all other Hakka, but intelligibility data is lacking.
It is possible that in northern Guangdong, there may be many different Hakka languages, since dialects tend to differ from village to village, and in many cases, communication is difficult.
The Hakka spoken in Kunming, Sarawak, in Malaysia, known as Ho Po Hak, is a separate language.
It is very different from the Hakka spoken in Sabah, Malaysia, and it is similar to Hopo, spoken in Hopo, near Meizhou. Hopo is not intelligible with Dabu, Hailu or Meixian. Hopo appears to be a dialect of Jiaoling. Hopo has deep influence from Teochew Min, because it is located right next to the Teochew area.
The Gannan Group (or Ninglong Group) from Southern Jiangxi, Mingxi from Western Fujian, and the Yuemin Group from Southern Fujian and Southeastern Guangdong are separate languages.
In the Gannan Group are multiple lects. One of them is Xingguo, spoken in Xingguo County in Ganzhuo Prefecture (evidence).
The Gannan Group is extremely diverse compared to the Hakka of Guangdong and Fujian. Gannan lects differ even from village to village.
With Gannan Hakka, we may be dealing with a situation of many different languages, as with Wu, Hui, Tuhua and Xiang. In fact, it quite possible that with Jiangxi Hakka, we may be dealing with every Hakka lect being a separate language, but that remains to be proven.
In Fujian Province, there is the wildly diverse Tingzhou Hakka Group mentioned above. Even within this group, there are separate languages, including Yongding, Liancheng, Changting, Xinquan, Qingliu, Mingxi, Ninghua and Shanghang (evidence). Gucheng is probably also a member of Tingzhou.
Sources say that each Hakka village in Fujian speaks its own lect, and that the lects are far enough apart to make communication from village to village very difficult.
Therefore, we conclude that in addition to the above, we will add Wuping, Longyan, Zhaoan, Yunxiao, Shangsixiang, Fuding, Fuan, Gucheng and Nanjing Qujiang.
Luoyuan She Hakka is spoken in Fujian. It is an extremely diverse form of Hakka that differs from all other Hakka. It must surely be a separate language.
Chengdu is spoken in Chengdu, Sichuan. It is quite different from other forms of Hakka and has poor intelligibility with other forms.
On Taiwan, the Miaoli (Four Counties), Dongshi (Dapu) and Xinzhu (Hailu) lects are not mutually intelligible, nor is the mixed Gaoxiong lect created in order that these three lects could communicate with each other.
Kunbei (Zhaoan) is very different and must be a separate language. Raoping may well be a separate language, but intelligibility data is lacking. In general, speakers of other kinds of Hakka find Taiwan Hakka to be hard to understand, possibly due to Southern Min influence.
Bangka Island Indonesian Hakka, spoken on Bangka Island in Indonesia, has diverged so radically with its tones that it is now a separate language. That is, speakers of other Indonesian Hakka lects say that they cannot understand Bangka Island speakers. It’s actually said to be a Hakka creole more than anything else.
In Indonesia, two other Hakka languages are spoken, Kun Dian Indonesian Hakka, spoken in Borneo, and Belitung (Ngion Voi) Indonesian Hakka. Kun Dian Hakka is the largest Hakka group in Indonesia. Most live at Pontianak and Singkawang, where they speak two different mutually intelligible lects, but they have spread all over Indonesia. Kun Dian Hakka is a dialect of Meixian.
Belitung Hakka is spoken mostly on Sumatra and Borneo, and is characterized by a soft way of speaking. Belitung Hakka and Bangka Hakka say they cannot understand Kun Dian Hakka, but Kun Dian speakers say they can understand the other two for the most part. East Timor Hakka is a dialect of Meixian.
Jiexi is spoken in southeast Guangdong. Dayu is spoken in southern Guangxi. Liannan is spoken northwest Guangdong. Dongguan Qingxi is spoken in south-central Guangdong. Wengyuan is spoken in northern Guangdong. Ningdu is spoken in Jiangxi. Mengshan Xihe is spoken in eastern Guangxi. Hong Kong Hakka is spoken in Hong Kong.
Zhaoan Xiuzhuan is spoken in southern Fujian.
Shanghang Pengxin, Basel Mission and Shanghang Guanzhuang Shangzhuo are spoken in West Fujian (Branner 2000).
Dayu, spoken in Jiangxi, is a separate language, not intelligible at least to Central, or Meixian, Hakka speakers.
Meixian, Wuhua and Bao’an are members of the Yuetai Group of Hakka, which has 23 lects. Within Yuetai, Wuhua and Dabu are members of the Xinghua subgroup, which has 5 lects. Xinghua has 3.4 million speakers. Bao’an and Lufeng are in the Xinhui subgroup of Yuetai, which has 9 lects. Xinhui has 2.4 million speakers.
Gaoxiong, Xinzhu, Dongshi and Miaoli are members of the Jiaying Group of Hakka, which has 7 lects.
Tingzhou, Yongding, Liancheng, Changting, Xinquan, Shanghang, Basel Mission, Shanghang Pengxin, Wuping, Ninghua, Qingliu and Mingxi are all part of the diverse Tingzhou Group of Hakka. All told, Tingzhou has 12 lects, all of which are separate languages.
Longchuan, Boluo and Heyuan are members of the Yuezhong Group of Hakka, which has 5 lects.
Huizhou is in its own subgroup of Hakka.
Xingguo and Ningdu are in the Ninglong Group of Hakka, which has 13 lects. This group is said to be very diverse, with lects differing from village to village.
Liannan and Wengyuan are members of the Yuebei Group of Hakka, which has 11 lects and must surely be a separate language.
Dayu is a member of the Yugui Group of Hakka, which has 43 lects.
Ho Po Hak, Bangka Island, Nanjing Qujiang, Jiexi, Dayu, Hong Kong, Mengshan Xihe, Zhaoan Xiuzhuan, Nanjing Qujiang, Fuan, Fuding and Haifeng are unclassified.
There are 12 major Hakka lects and 210 Hakka lects altogether. Others claim that there are over 1000 Hakka lects spoken in China. There are 30 million speakers of the various Hakka languages. The dialect situation with Hakka, as with Min Nan, is quite confused and somewhat contradictory. Intelligibility testing could clear up some of the confusion. Some speakers report adequate intelligibility between lects, while others report difficulty.
Putonghua is Standard Mandarin, based on the Beijing dialect as of 1949, but it has since diverged wildly and many Putonghua speakers today cannot understand Beijing. Putonghua is being promoted as the national language of China. In addition to Putonghua, there 1,500 other dialects of Mandarin spoken in China. In general, other Mandarin dialects are not intelligible to Putonghua speakers (Campbell April 2009).
However, the Northeastern dialects and the dialects around Beijing may be more intelligible than the Mandarin dialects in the rest of the country. The implication is that there may be as many as 1,500 Mandarin languages in China. However, many of these Mandarin dialects are intelligible with at least some other Mandarin dialects. Hence, despite the lack of intelligibility with Putonghua, there is a lot of potential lumping within Mandarin.
The degree to which Mandarin dialects are intelligible to each other is very much an open question and in general is poorly investigated.
Within Mandarin, besides Putonghua, the main branch, Jinan (New Jinan), Beijing and Tianjin (evidence and here) are not intelligible with Putonghua; however, Tianjin may be intelligible with Beijing, on the other hand, Tianjin is looking more and more like a separate language.
For one thing, Tianjin’s tones are quite different from Putonghua’s, and its tone sandhi is much more complicated and it is more closely related to lects 150-500 miles away, since originally Tianjin speakers came from Anhui (Lee 2002). Some reports say that Tianjin is intelligible with Putonghua, so intelligibility testing may be needed.
Jinan is not intelligible with Putonghua, but may be learned over a period of weeks to possibly months, as it is close enough. Jinan is only 65% intelligible with Beijing.
Since Beijing, Tianjin, Nanjing City, Hebei and all of NE Mandarin may be intelligible, I am just going to make a language called Northeast Mandarin and call Beijing, Tianjin, Hebei and Nanjing City dialects of NE Mandarin for now. Beijing is has low intelligibility with other branches of Mandarin: 72% intelligible with Southwest Mandarin, 64% intelligible with Jilu Mandarin and Zhongyuan Mandarin and 55% intelligible with Jiaoliao Mandarin.
However, many Putonghua speakers claim that Beijinghua is not inherently intelligible with Putonghua. Complaints about unintelligible taxi drivers in Beijing are legendary. At the very least, competing views of the intelligibility of Beijinghua and Putonghua deserve investigation.
On the other hand, Beijinghua may be intelligible with Hebei and Nanjing City. I think that Hebei is clearly a dialect of Beijing. The lect of Beijing’s hutongs and taxi drivers is legendary for being hard to understand. It would be interesting to see whether Tianjin and Hebei speakers can understand it. Tianjin may be a separate language, since it is not intelligible with Beijinghua.
What probably happened was that Beijinghua and Putonghua have taken separate trajectories. This has also occurred in Italian, where, though Standard Italian was based on Tuscan, Standard Italian and Tuscan have taken separate trajectories since. It is said that if you see old Tuscan men on TV in Italy, a speaker of Standard Italian from southern Italy would need subtitles to understand them, but one from northern Italy would not.
Others say that Putonghua was based on the language of the Beijing suburbs, not the city itself.
For whatever reason, Beijinghua often seems to have less than 90% intelligibility with Putonghua, though the question needs further research. Beijinghua, in its pure and least mutually intelligible form, seems to be spoken mostly in the innermost hutongs and among taxi drivers and other low income and working class people. The lect of people with more education and money is probably a lot more comprehensible.
I would describe the real, pure, Putonghua as “CCTV speech”, the lect you hear on Chinese state television. Evidence that Beijinghua lacks full intelligibility with Putonghua is here, here, here, here, here, here, here and here.
The question of whether or not Beijinghua is a separate language from Putonghua is sure to be highly controversial. Perhaps intelligibility testing could settle the question.
Beijing is in a group all of its own called the Beijing Group. It contains 43 separate lects, and may contain more than one language.
We should also note here that even Putonghua, the language that was meant to tie the nation together, seems to be evolving into regional languages.
Guangdong Putonghua is not fully intelligible to speakers of the Putonghuas of Northern China and hence is probably a separate language.
There are also varieties of Putonghua that are spoken in Singapore and Taiwan. Taiwanese Mandarin is about 80-85% intelligible with Putonghua and is a separate language. Claims that Taiwan Mandarin is fully intelligible with Putonghua are incorrect.
Shanghai Putonghua is often not intelligible with Putonghua from other regions. It has heavy interference from Shanghaihua, which seriously effects the Putonghua accent. Even after four years of exposure to it, Standard Putonghua speakers often have problems with it.
In addition, Jianghuai Mandarin Putonghua and Zhengcao Mandarin Putonghua Putonghua are not intelligible with Putonghua from other areas (Campbell April 2009). These varieties of Mandarin cause a particular interference with Putonghua Mandarin that results in a severe dialectal disturbance in their Putonghua.
These Putonghuas are spoken in the regions native to the Jianghuai and Zhengcao dialects of Mandarin. Jianghuai is spoken in Anhui, Jiangsu, Hubei and to a much lesser extent Zhejiang Provinces. Zhengcao is spoken in Anhui, Henan, Shandong, Jiangsu, with one dialect is spoken in Hebei.
Although it is different, Singapore Putonghua is still intelligible with Putonghua. Malay Mandarin is said to be quite different but nevertheless intelligible. Nevertheless Malay Mandarin speakers say they have to make speech adjustments with Chinese speakers otherwise their speech is poorly intelligible. This implies that Malay Mandarin is indeed a separate language.
Yunnan Putonghua is intelligible with Putonghua from other regions (Campbell January 2009).
Cangzhou, spoken in southeastern Hebei, is a separate language. It is only partly intelligible with Putonghua. Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao and Mengcun, all spoken in Cangzhou prefecture, are all dialects of Cangzhou.
Cangzhou shares some similarities with Tianjin, but it is only partly intelligible with it.
Jinan is a member of the Liaotai Group of the larger Jilu Group, which has 37 lects.
The Baotang Group of Jilu has 52 lects. Tianjin forms its own subgroup within Baotang. Cangzhou, Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao, and Mengcun are members of the Huangle subgroup of Baotang, which has 25 lects.
Jilu itself consists of 170 lects.
Taiwanese Mandarin, while different from Putonghua, is intelligible with it. Singapore Mandarin has fewer differences then Taiwanese. Both are dialects of Putonghua.
Luoyang, Kiafeng, Changyuan and Zhengzhou, all in Henan Province, are not intelligible with Putonghua. However, all four are mutually intelligible with each other, so they are dialects of a single language, Henan Mandarin.
Xinyang, also spoken in Henan, is a separate language and cannot be understood by Luoyang speakers.
Nanyang has high but not complete intelligibility with Luoyang. After a few weeks of close contact, Luoyang speakers can understand Nanyang, but initially, comprehension is poor due to different tones. Nanyang has 15 million speakers.
Luoyang and Gushi are unintelligible with Putonghua. In addition, Gushi is different from Nanyang and may not be intelligible with it. Intelligibility between Xinyang, Gushi and Nanyang is not known. In general, intelligibility between many lects in Henan is not good, but after a week or two of close contact, they can start to understand each other.
In Shaanxi, Yanan, Xian, Huxian (evidence), Zhouzhi (evidence), and Hanzhong are not intelligible with Putonghua. Let us call this language Shaanxi Mandarin. Xi’an, for instance, is about 65% intelligible with other Mandarin groups.
Xining, spoken in Xinghai, seems to be very different from other Shaanxi lects, and is probably a separate language altogether (evidence here and here) .
In Gansu Province, Tongwei is not intelligible with Putonghua, and Gansu Mandarin seems to be very different from other forms of Mandarin. Gansu Mandarin appears to be a separate language.
However, within Gansu, there are divergent lects, such as Sale, which are unintelligible with other Gansu lects.
Bozhou (evidence), Yingshang (evidence), and Fuyang (evidence), spoken in Anhui, are at least unintelligible with Putonghua. Fuyang is very different. The lect spoken 300 km south of Jinan, around Mengcheng in rural Anhui, is said to be completely unintelligible with Putonghua, Tianjin and Beijinghua. For the time being, we will refer to this as one language, Anhui Mandarin. Intelligibility between lects of Anhui Mandarin is not known.
Anhui Mandarin Putonghua has poor intelligibility with Standard Putonghua due to its phonology. Therefore, it is a separate language.
Xian, Huxian and Zhouzhi are members of the Guanzhong Group of Zhongyuan, which has 45 lects.
Yanan, Hanzhong and Xining are members of the Qinlong Group of Zhongyuan, which has 67 lects.
Luoyang is a member of the Luoxu Group of Zhongyuan, which has 28 lects.
Kiafeng, Nanyang, Zengzhou, Changyuan, and Bozhou are members of the Zhengcao Group of Zhongyuan. The Zhengcao Group has 93 lects.
Xinyang and Gushi are in the Xinbeng subgroup of Zhongyuan, which has 20 lects.
Tongwei and Sale are part of the Longzhong Group of Zhongyuan, which has 25 lects.
Yingshang is a member of the Cailu Group of Zhongyuan, which has 30 lects.
The Mandarin spoken in Qinghai is very different from that spoken in Gansu, but it’s not known if it is a separate language. They are both usually two types of Zhongyuan Mandarin.
Zhongyuan has a shocking 388 lects. Zhongyuan Mandarin is not fully intelligible with Putonghua. Zhongyuan Mandarin has 130 million speakers (Olson 1998).
Yichang (evidence), Longchang (evidence), Chengdu, Chongqing (evidence), Guilin and Nanping (spoken near Mt. Wuyi evidence), Longcheng (evidence), Luocheng (evidence), Luzhou (evidence here and here), Lingui (evidence), Jiuzhaigou (evidence) Xindu, Wenshan (evidence), Mianzhu (evidence here and here), Yangshuo (evidence), Wuhan (evidence), and Leshan (evidence) are all unintelligible with Putonghua.
Guilin is not intelligible with general Southwest Mandarin speech. Wenshan at least is not intelligible with other Southwestern varieties (Johnson 2010).
Chengdu is part of a Sichuan Mandarin koine that is spoken in many of the larger cities in Yunnan. It includes Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang, Deyang and Guiyang and is broadly intelligible (Xun 2009). Ziyang is intelligible with the koine but has a heavy accent (Xun 2009). Leshan is unintelligible with the koine, but it can be learned in a few weeks of exposure (Xun 2009).
Dali is also not intelligible with Putonghua, but that is because Tibetan Mandarin has heavy Tibetan admixture.
Chongqing speakers cannot understand Chengdu or Luzhou speakers. The many small lects around Mt. Emei are not intelligible with Chengdu, appear to be be very different, and may one or more separate languages.
Wuhan is not intelligible to speakers of Southwest Mandarin from other provinces, for instance, it is only 80% intelligible with Chengdu. The intelligibility of Wuhan and Yichang is not known.
Dahua, spoken in and around Dahua village on the Puduhe River near Dongchuan in Yunnan Province, is apparently a separate language .
Lanping, may be a separate language. Kunming not intelligible with Tuoyuan., so Tuoyuan may be a separate language also. The language spoken in Kunming is part of the Sichuan Mandarin koine that includes Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang, Deyang and Guiyang.
Chuanlan is a little-known language spoken by the Tunbao people of Guangxi Province.
Yingshan is a separate language based on a 200 word Swadesh test (Ben Hamed 2005).
Menghai (evidence) may well be a completely separate language. The mutual intelligibility of Menghai, Guiyang and Kunming is not known. Guiyang is at least not intelligible with Putonghua. Guiyang is evolving into the Sichuan Mandarin koine, which is broadly intelligible with Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang and Deyang.
Shaoshan, apparently Mao Zedong’s lect, spoken in Hunan Province, is a separate language. It was said although Mao had a secretary who could understand him well, not many others could.
Another language spoken in Hunan, in Zhangjiajie County, is called Zhangjiajie Maoxi. The Maoxi are a tribal group there that speak a strange variety of Mandarin.
Tuoyuan in Hunan is not fully intelligible with other Southwest Mandarin lects, or at least not with Kunming.
Junhua, or military language, is a language spoken by an ethnic group on Hainan in the city of Zonghe. It is said to be “Old Mandarin” and is probably not intelligible with other lects. It is a form of Southwest Mandarin known as the Junhua Group, which contains 4 lects .
Guilin, Luocheng, Yangshuo, Liuzhou and Lingui are members of the Guiliu Group of Southwest Mandarin, which has 57 lects. Guiliu Southwest Mandarin is at least not comprehensible with Putonghua or Chengyu Southwest Mandarin.
Leshan and Longchang are members of the Guanchi Group of Southwest Mandarin, which has 85 lects. Within Guanchi, Longchang is a member of the Renfu Group , which has 13 lects.
Yichang, Chengdu, Chongqing and Yingshan are members of the Chengyu Group of Southwest Mandarin, which has 113 lects. Chengyu Southwest Mandarin is not comprehensible with Putonghua or Guiliu Southwest Mandarin.
Menghai, Kunming, Wenshan and Guiyang are members of the Kungui Group of Southwest Mandarin. The Kungui Group itself has an incredible 95 lects.
Lanping is in the Dianxi Group of Southwest Mandarin, which has 36 lects. Within Dianxi, it is a member of the Baolu subgroup, which has 21 lects.
Taoyuan is in the Changhe Group of Southwest Mandarin, which has 14 lects.
Wuhan is a member of Wutian Group of Southwest Mandarin, which has 9 lects.
Dali is a member of the Dianxi Group of Mandarin, which has 36 members. Within Dianxi, Dali is a member of the Yaoli Group, which has 15 members.
Nanping, Chuanlan, Shaoshan, Jiuzhaigou, Zhangjiajie Maoxi and Dahua are unclassified.
Southwest Mandarin itself has a stunning 519 lects and is not fully intelligible with Putonghua. There are 240 million speakers of Southwest Mandarin (Olson 1998).
Jianghuai Mandarin is a separate language.
Yangzhou is considered to be a separate language by a 200 word Swadesh test (Ben Hamed 2005). Yangzhou has about 52% intelligibility with the other branches of Mandarin.
Nanjing (evidence and here) is also a separate language – now mostly spoken in the suburbs, as city speech is not a separate language anymore. The city language is said to be intelligible with the general northeastern China lect spoken in Beijing and Hebei.
So I will call Nanjing Suburbs a separate language.
Lianyungang is a separate language, as is Yancheng and Huaian (evidence for both).
Nantong, a very strange variety of Mandarin on the border of Wu and Mandarin that shares many features with Wu languages, is a separate language, as is its sister language, Tongdong. Jinsha is a dialect of Nantong.
Rugao, next to Nantong, is also a separate language.
Also within Jianghuai, Hefei is considered to be a separate language by a 200 word Swadesh list (Ben Hamed 2005).
Rudong is at least not intelligible with Putonghua.
Anqing, in Anhui Province, is also not intelligible with Putonghua.
In 1933, there were three different languages spoken in Tongcheng, Anhui – East Tongcheng, West Tongcheng and Tongcheng Wenli. Tongcheng Wenli was the classical-based language spoken by the educated elite of the city. Whether these three languages still exist is not known, but surely some of the speakers in 1933 are still alive.
Chuzhou, spoken in Anhui, is not intelligible with Putonghua, although it is said to be close to Nanjing. Dangtu, also spoken in Anhui, is not intelligible with Putonghua.
Dongtai is a separate language (evidence).
The lects spoken in Dafeng, Taizhou, Xingua and Haian are said to be similar to Dongtai, so for the time being, we will list them as dialects of Dongtai.
Jiujiang, spoken in Jiangxi Province, is a separate language, as is Xingzi, located close by.
Intelligibility between Rudong, Anqing, Chuzhou, Dafeng, Taizhou, Xingua, Haian and Dangtu is not known.
Yangzhou, Lianyungang, Yancheng, Huaian, Nanjing, Hefei, Anqing, the Tongchengs, Chuzhou, and Dangtu are in the Hongchao Group of Jianghuai, which has 82 lects.
Dongtai, Dafeng, Taizhou, Haian, Xinghua, Jinsha, Nantong, Tongdong, Rudong, and Rugao are in the Tairu Group of Jianghuai. Tairu has 11 different lects.
Jiujiang and probably Xingzi are members of the Huangxiao Group of Jianghuai, which has 20 lects.
Jianghuai is composed of an incredible 120 lects and is not fully intelligible with Putonghua. Some suggest that all of the lects of Jianghuai are mutually unintelligible, but that remains to be proven. Jianghuai Mandarin has 65 million speakers (Olson 1998).
Northeastern (Dongbei) Mandarin is a separate language. Within Northeast, Shenyang is a separate language according to a 200 word Swadesh list (Ben Hamed 2005). Harbin is often listed as intelligible with Putonghua, but some Putonghua speakers can barely understand a word of it. Harbin may be a separate language. That classification is sure to be controversial, so intelligibility testing may be required to sort it out.
Shenyang is a member of the Jishen Group of Northeastern Mandarin, which has 44 dialects. Within Jishen, Shenyang is a member of the Tongxi Group, which has 24 dialects.
Harbin is a member of the Hafu Group of Northeastern Mandarin, which has 64 lects. Within Hafu, it is a member of the Zhaofu Group, which has 18 lects.
Lanyin Mandarin in the far northwest is also a separate language (Campbell 2004). Though Lanyin is said to be intelligible with Putonghua, that does not appear to be the case. Minqin (evidence) and Lanzhou (evidence) in Gansu are not fully intelligible with Putonghua, nor is Yinchuan (evidence) in Ningxia.
Intelligibility within Lanyin is not known, but Jiuquan at least appears to be a completely separate language inside Lanyin.
Jiuquan is a member of the Hexi Group of Lanyin, which has 18 lects.
Yinchuan is a member of the Yinwu Group of Lanyin, which has 12 lects.
Lanzhou is a member of the Jincheng Group of Lanyin, which has 4 lects.
Lanyin is composed of 57 separate lects. Lanyin Mandarin has 9 million speakers (Olson 1998).
The Jiaoliao Mandarin spoken in Shandong contains lects such as Qingdao (evidence here and here) and Wehai (evidence) which are not fully intelligible with Putonghua. Dalian is quite different from Putonghua. Intelligibility between Qingdao, Wehai and Dalian is not known.
Wehai and Dalian are members of the Denglian Group of Jiaoliao, which has 23 lects.
Qingdao is a member of the Qingzhou Group of Jiaoliao, which has 16 lects.
Jiaoliao is composed of 45 lects. Jiaoliao is not fully intelligible with Putonghua. Intelligibility inside of Jiaoliao is not known, but there may be multiple languages inside of it, because some Shandong Peninsula lects sound very strange even to speakers used to hearing Shandong Mandarin.
Karamay is an unclassified Mandarin language spoken in Xinjaing.
The Mandarin spoken around Tiantai in Zhejiang is not intelligible with Putonghua and may be a separate language. It is also unclassified.
Mandarin has 873 million speakers. There are an incredible 1,526 lects of Mandarin.
Although it is related to Mandarin, Jin is a completely separate language. Besides the Main Jin branch Baotou are apparently separate languages (evidence). As is possibly Taiyuan (evidence).
Within Hohhot Jin, there are two separate languages.
One is Hohhot Xincheng Jin, a combination of Hebei Jin, Northeastern Mandarin and the Manchu language.
The other is Jiucheng Hohhot Jin, spoken by the Muslim Hui minority in the city. It is related to other forms of Jin in Shanxi Province.
Yuci is a separate language from Taiyuan on a 200 word Swadesh test (Ben Hamed 2005).
Fenyang, the language used in Chinese director Jia Zhanke’s movie Xiao Shan Going Home is not intelligible with Putonghua.
Jingbian, in Shanxi, is a separate language.
Yulin is also a separate language.
Hohhot is a member of the Zhanghu Group of Jin, which has 29 lects.
Baotou and Yulin are members of the Dabao Group of Jin, which has 29 lects.
Taiyuan and Yuci are members of the Bingzhou Group of Jin, which has 16 lects.
Fenyang is a member of the Luliang Group of Jin, which has 17 lects.
Jingbian is a member of the Wutai Group of Jin, which has 30 lects.
Jin is composed of 171 lects, and some of them are separate languages. Jin has 48 million speakers (Olson 1998).
Besides Xiang Proper, assuming there even is such a thing, Shuangfeng and Changsha are separate languages, having only 47% intelligibility.
In fact, Changsha itself is divided into multiple languages in the city itself. We do not know how many there are, but we know that they exist. For the moment, we shall just add one lect to Changsha, and divide it into Changsha A and Changsha B, but there may be more. Furthermore, there are significant differences within the Changsha spoken in Changsha City and in the surrounding countryside.
Shuangfeng is also very different within itself, as the vocabulary changes every 10 miles or so. Intelligibility data is lacking.
Mao Zedong spoke Xiangtan, a notoriously difficult Xiang language in Hunan, about which it is said, “No one can understand it.” Xiangtan itself is internally diverse, with differences between the dialect of the city and rural areas, but intelligibility data is lacking.
Hengyang is apparently a separate language, as is Jishou (evidence). There is significant dialectal diversity in Hengyang, but intelligibility data is lacking.
Liuyang is a separate language, actually a macrolanguage, spoken in Liuyang county-level city in Changsha prefecture in Hunan. Liuyang is split into 5 divisions – Liuyang North, Liuyang South, Liuyang West, Liuyang East and Liuyang City.
Liuyang South and Liuyang East are separate languages, mutually unintelligible with the others. Liuyang City has recently arisen as a sort of a Liuyang “Putonghua” that is understandable to speakers of all Liuyang lects. So within Liuyang, we have three dialects – Liuyang City, Liuyang North and Liuyang West. Outside of Liuyang Proper, there are also two separate languages – Liuyang South and Liuyang East. None of the three Liuyang languages is intelligible with Changsha.
Even within this classification, each of the 5 Liuyang lects has multiple dialects. Each village is said to have its own lect in Liuyang.
Hengshan (evidence) is a separate language with vast dialectal divergence divided by Mount Hengshan.
There are two Xiang Hengshan lects on either side of the mountain – Qianshan and Houshan – that are very different and must be separate languages. Huayuan (evidence) is at least not intelligible with Putonghua.
In the city of Yiyang, Henan Province, 3 lects are spoken. One is a Yiyang Changyi Xiang lect, another is a Yiyang Luoshao Xiang lect, and a third is Luoyang Southwest Mandarin, a dialect of Henan Mandarin, described above. All appear to be separate languages.
We will call the two Xiang lects Yiyang Changyi and Yiyang Luoshao.
Baojing at least is not intelligible with Putonghua, yet it is said to be intelligible with Chengdu Southwest Mandarin.
Lingshuijiang, also spoken in Hunan by 300,000 people, may well be a separate language.
Ningxiang is said to be very different from Changsha. Given the dramatic divergence present even as background in Xiang, this must mean that Ningxiang is at least not intelligible with Changsha.
According to good sources, there is a tremendous amount of lect diversity in Western Hunan, and most of it probably involves Xiang lects, while most or all of these lects are not mutually intelligible. But until we get more data, we cannot carve any languages out of this mess yet.
Shuangfeng and Lingshuijiang are a members of the Luoshao Group of Xiang, which has 21 lects.
The Changshas, Hengyang, Xiangtan, Hengshan, Ningxiang and the Liuyangs are members of the Changyi Group of Xiang, which has 32 lects.
Baojing, Jishou and Huayuan are members of the Jixu Group of Xiang, which has 8 lects.
Xiang is composed of 74 lects. Many, or possibly all of them are separate languages. The various languages of Xiang have 50 million speakers.
Wu is a major group of diverse Chinese languages that is often divided into Northern Wu and Southern Wu. Northern Wu and Southern Wu are definitely mutually unintelligible languages. Southern Wu has 18 million speakers. In general, the list below just lists Wu lects that are utterly unintelligible with Putonghua. My opinion is that in general, the Wu lects are mostly separate languages, however, some are merely dialects of other Wu lects.
A good general rule for Zhejiang lects is that people say they can sort of understand the next city over, but two cities away was incomprehensible. For instance, in the Taizhou prefecture region, there are 4-5 unintelligible dialects across a 12 mile area. In Zhejiang, the mountains go all the way down to the sea, so there are few flat areas where language can spread out and become comprehensible.
Suzhou, Shanghaiese, Wuxi (evidence), Huzhou (evidence), Changzhou (evidence), Xiaoshan (evidence), Songjiang (evidence), Jiaxing, Hangzhou (evidence), Kunshan (evidence), Ningbo and Yixing (evidence) are separate languages.
Tongxiang also appears to be a separate language, as does Yuyao (evidence) and Zhoushan.
Qidong, spoken in the city of Qidong, is a separate language.
Lvsi, Qisi or Tongdong, spoken in the nearby town of Qisi, is a separate language from Qidong. Qidong is said to be very close to Chongming, so for the time being, we will list Chongming as a dialect of Qidong.
Haimen also appears to be a dialect of Qidong. However, there are 2 lects spoken in Haimen, and they are apparently not mutually intelligible. We will leave Haimen A as a dialect of Qidong, while we will set Haimen B as a separate language as it is not intelligible with Haimen A.
There are differences between Chongming and Haimen A, but the degree of them is not known. Changyinsha is very similar to Haimen, Chongming and Qidong, so it is probably a dialect of Qidong also. Another name for Qidong is Qihai, which refers to the speech of Qidong, Haimen and Tongzhou. For the time being, we will list Haimen A, Changyinsha and Chongming as dialects of Qidong. Chongming, and hence Qidong, is not intelligible with Shanghaiese.
Zhangjiagang, Changsha and Kunshan may be intelligible with Suzhou, but data is lacking. Suzhou is only 43% intelligible with Wenzhou. None of these lects is intelligible with Shanghaiese.
Ningbo has good intelligibility with Shanghaiese, but not vice versa.
Reports vary on the intelligibility of Shanghaiese and Suzhou. Some say they understand each well, but that is probably not the case at first due to serious differences in tones. Intelligibility testing is needed.
Pudong, the older form of the Shanghai language, is still spoken in the Pudong District of the city, but it is dying out. There is a question of whether or not it is mutually intelligible with Shanghaiese, but Shanghaiese speakers seem to feel it is not mutually intelligible (Gilliland 2006).
Several lects are spoken in the suburbs of Shanghai. Reports vary, but Shanghai residents generally report that these lects are not mutually intelligible with Shanghaiese (Gilliland 2006).
They are Baoshan, Fengxian, Nanhui, Jiading, Jinshan, Pudong (or Chuansha) and Qingpu.
Hangzhou is reportedly much different from the lects of Shanghaiese, Ningbo, etc. to the northeast, and is not intelligible with Shanghaiese, nor with Suzhou. Hangzhou has 1.2 million speakers.
Changzhou and Wuxi are not intelligible with Shanghaiese or Suzhou. Changzhou and Wuxi have high, but not full, intelligibility. Changzhou and Wuxi are part of a dialect chain in which eastern Changzhou speakers can communicate with western Wuxi speakers, but as one moves further west into Wuxi or east into Changzhou, intelligibility drops off. Like Czech and Slovak, it is best then to split Wuxi and Changzhou into separate languages.
Changzhou itself has considerable dialectal divergence, though apparently all dialects are intelligible. Changzhou has 3 million speakers.
Yixing, near Changzhou, is not intelligible with Shanghaiese.
Jiangyin is spoken in Jiangyin city. It is related to Changzhou and has high intelligibility with Changzhou and Wuxi.
All of the above are in the Taihu Group.
Taizhou, centered around the city of Tuzhou in Eastern Zhejiang, is composed of 11 separate lects, all of which are separate languages, Huangyan (evidence), Jiaojiang, Linhai, Sanmen, Tiantai (evidence), Wenling (evidence), Ninghai (evidence), Xianju, Leqing (evidence), Yubei and Yuhuan (evidence). (Evidence for all).
A single subgroup of Wuzhou, Yiwu – contains 18 separate languages, all mutually unintelligible. We will call them Yiwu A, Yiwu B, Yiwu C, Yiwu D, Yiwu E, Yiwu F, Yiwu G, Yiwu H, Yiwu I, Yiwu J, Yiwu K, Yiwu L, Yiwu M, Yiwu N, Yiwu O, Yiwu P, Yiwu Q and Yiwu R for the time being.
Pucheng is a separate language. Pucheng has 2 dialects, Nampo and North Dabei. Intelligibility data is not known. Pucheng is so diverse that some say it is a language isolate and is not even a part of Wu (Norman 1988).
There are two groups of Southern Wu which are said to be both highly divergent and to have very low intelligibility internally. These groups are sometimes called Jinqu and Shangli.
Jinqu consists of at least 30 languages: Jinhua, Jinhua Xiaohuang, Tangxi, Lanxi, Pujiang, Yiwus A-R, Dongyang, Pan’an, Yongkang (evidence), Wuyi (evidence), Quzhou (evidence), Longyou and Jinyun. Lanxi has 660,000 speakers (Rickard 2006). Quzhou is apparently not intelligible with Wenzhou. Jinqu is roughly equivalent to the Wuzhou Group.
Shangli contains at least 18 languages: Shangrao City, Shangrao County, Guangfeng, Yushan, Kaihua, Changshan, Jiangshan, Lishui (evidence), Suichang , Songyang, Xuanping, Qingtian (evidence here and here), Yunhe, Jingning, Longquan, Qingyuan, Taishun and Pucheng.
This group is roughly equivalent to the Longqu and Chuzhou Groups of Chuqu. Some members of this group extend beyond Zhejiang and into northeastern Jiangxi and northern Fujian.
We are going to cautiously classify all of these lects as separate languages since they are said to be much more divergent and much less mutually intelligible than Taihu, and Taihu itself seems to have pretty low internal intelligibility.
Wenzhou (evidence) is a separate language.
Ouhai, Yongjia and Ruian appear to be dialects of Wenzhou, but all of them are probably separate languages, since if you go 5 miles in any direction in Wenzhou, there’s a new dialect, and it’s hard to understand people.
Wenzhou is 43% intelligible with Suzhou.
Wencheng (evidence) appears to be a separate language.
Wenxi is a separate language within Oujiang, not intelligible with Wenzhou. It is spoken in one town in Qingtian County.
Jinxiang also has its own Wu lect, with Mandarin influences. This is a Taihu (Northern Wu) outlier.
In addition, in Taishun County, there is also an aberrant Wu lect spoken in the town of Luoyang, influenced by both Manjiang and Oujiang Wu.
There is another Wu lect similar to Manjiang Eastern Min spoken in the town of Hedi in Qingyuan County in Lishui.
Manhua is quite different. There is a controversy over whether or not Manhua is Macro-Min or Macro-Wu. It is probably Macro-Wu based on phonology and it also shares some similar Min-like traits with other Wu lects such as those in the Chuqu group.
Within Manhua, there is a northern group spoken in the town of Yishan and a southern group spoken in the towns of Qianku and Jinxiang. Qianku is the standard for Manhua. The northern/southern divide may impede intelligibility, but we have no information yet.
Wuhu is a separate language, unintelligible with Shanghaihua.
Nanjing Wu is a separate language
Jiaxing, Shanghaiese, Suzhou, Wuxi, Songjiang, Tongxiang, Qidong, Lvsi, Yunhe and Kunshan are all in the Hujia Group of Taihu. The Hujia Group contains 32 lects.
Changzhou, Yixing, Jiangyin and Haimen are in the Piling Group of Taihu. Piling has 12 lects. Piling has 8 million speakers.
Wenzhou, Ouhai, Yongjia, Ruian and Wencheng are in the Oujiang Group of Taihu, which also contains 12 lects.
Hangzhou has its own group, the Hangzhou Group of Taihu.
Shaoxing, Fuyang, Xiaoshan, Linan, Yuyao and Zhuji are in the Linshao Group of Taihu which also contains 12 lects.
Fenghua and Zhoushan are in the Yongjiang Group of Taihu. The Yongjiang Group contains 11 lects and has 4 million speakers.
Changxing is in the Taioxi Group of Taihu, which has 5 lects.
The Taihu Group is composed of 75 separate lects, many or all of which are separate languages. Taihu has 47 million speakers.
Lishui, Qingyuan, Jingning, Jinyun and Taishun are in the Chuzhou group of Chuqu, which contains 9 lects. Chuzhou has 1.5 million speakers. Chuqu itself contains 35 separate lects.
Pucheng, Shangrao County, Shangrao City, Jiangshan, Songyang, Guangfeng, Longquan, Kaihua, Changshan, Suichang, Longyou, Yushan and Quzhou are members of the Longqu Group of Chuqu, which has 14 lects and 5 million speakers (Olson 1998).
The Yiwu languages, Dongyang, Jinhua, Jinhua Xiaohuang, Lanxi, Tangxi, Wuyi, Pan’an, Pujiang and Yongkang are all members of the Wuzhou Group, which contains 27 separate languages. Wuzhou has 4 million speakers.
Nanjing Wu is unclassified.
The various Wu languages have 85 million speakers.
Within Hui, there are at least six separate languages (Hirata 1998). Actually, there are many more.
Xidi, spoken in a village at the foot of Huangshan Mountain, is a separate language. Xidi is unintelligible even to villages a few miles away.
Tunxi, Wuyuan and Xiuning are separate languages. The first two are spoken in Anhui, but Xiuning is spoken in Jiangxi Province.
Within the Jingzhan Group of Hui, JingdeNingguo, Qimen, Chilingkou, (spoken in Chiling, Qimen County), Meixi Xiang, and Shitai are separate languages.
Within Qimen County itself, there are 6 different Hui lects, with low intelligibility between them. It is quite possible that we are talking about 6 different languages here. One of them appears to be Chilingkou above. The others we will just call: Qimen A, Qimen B, Qimen C, Qimen D and Qimen F. All except Meixi are spoken in Anhui Province. Meixi is spoken in Meixi, Jiangxi.
Jixi, Hongmen and Shexian are separate languages.
Within Shexian, there are two different languages that we will only call Shexian A and Shexian B for now. Jixi and the Shexian languages are spoken in Anhui.
Dexing and Dongzhi are separate languages, the first spoken in Jiangxi and the second spoken in Anhui.
In the Yanzhou Group of Hui, Jiande and Chunan are separate languages.
There are two other lects in the group, Suian and Shouchang. Chunan and Suian are very diverse and are in all probability separate languages. Shouchang is also extremely diverse, and Jiande has some differences with Shouchang.
The Yanzhou languages are interesting because there is controversy whether they are Wu or Hui languages. Careful examination reveals that they cannot be subsumed under Southern Wu due to their great divergence, despite having some similarities with Wu. Some authors feel that they are Hui-Wu merged lects, and their similarity with both is given as a reason for merging Wu and Hui into a supergroup.
While it is best to classify them as Hui, they are much different from most Hui lects. All are spoken in western Zhejiang. The Yanzhou Group has four languages. Discussion here.
Huangshan, Tunxi, Wuyuan and Xiuning are members of the Xiuyi Group of Hui, which has 6 lects.
Meixi, the Qimens, Chilingkou, Shitai, Ningguo and Jingde are members of the Jingzhan Group of Hui. Jingzhan has 12 lects, all of which are separate languages.
Jixi, Hongmen and the Shexians are members of the Jishe Group of Hui. The Jishe Group has 6 lects .
Dexing and Dongzhi are members of the Qide Group of Hui. The Qide Group has 5 lects.
Xidi is unclassified.
The various Hui languages have 3.2 million speakers . There are 34 different Hui lects, at least 24 of which are separate languages. There is a possibility that all Hui lects are separate languages, but that remains to be proven.
Cantonese is a major language spoken in the south of China. They are said to be a mix between the Yue people and the Han. They have great pride in their speech which appears to be closer to ancient Chinese than Mandarin is. When Sun Yat-Sen was President of Republican China, a vote was held on which language to base Standard Chinese on. Cantonese only lost by one vote in favor of Mandarin.
Some Cantonese activists denounce Mandarin as a pidgin language spoken Manchu and Mongol invaders glommed onto the Chinese of the people they conquered.
Attempts to determine intelligibility through the use of complex lexical, tonal, grammatical and phonological formulae produce results that are excessively high in terms of percentage of intelligibility. A better method is presented in Szeto 2000, in which sentences in other lects are played to speakers of Lect A, and speakers of Lect A are asked to give the basic meaning of the sentences played to them. A sentence is recorded as correct if the basic meaning was ascertained.
By this better method, Standard Cantonese has only 31.3% intelligibility of Siyi, 7.2% of Hakka, 2.7% of Teochew and 2.5% of Xiamen. This paper also highlights the very important role morphological and syntactic differences play in intelligibility, even apart from phonology and other factors.
In contrast, the more complex method not relying on actual informants gives false positives. By this method, Cantonese has 54.7% intelligibility of Hakka, 47.4% of Xiamen 43.5% of Teochew. This method falsely overestimates the intelligibility of Hakka by 7.6 X, of Teochew by 16.1 X and of Xiamen by 19 X.
Cantonese is traditionally said to have nine tones, but phonemically, there are only six tones, since the last three are just three of the first six with a voiceless stop consonant on the end. These are often called entering tones in traditional Chinese scholarship.
Entering tones have disappeared from most Mandarin lects, probably about 800 years ago due to the influence of invading Mongols speaking Turkic languages, but are still present in Cantonese, Hakka and Min. The original entering tones of Middle Chinese have merged into one or the other or Mandarin’s four tones.
Traditional Chinese tones or contour tones end in a vowel or a nasal. However, in Cantonese, the entering tone has retained its original short and sharp character from Middle Chinese, so in a sense, it has a different sound quality.
Besides Standard Cantonese (the Guangzhou lect in the Yuehai Group), there is Siyi, or Sze Yup, a separate language. Siyi has 8 dialects, however, there are reports that there are intelligibility problems within the Siyi lects.
In particular, Enping speakers cannot understand some other dialects. Therefore, Enping is a separate language.
Kaiping, or Chikan, is not fully intelligible with Enping until they get used to each others’ sounds. Kaiping is so different from Taishan that it is hard to imagine how they can communicate well, though there is partial intelligibility.
In Xinhui, there is a dialect called Hetang that is very divergent and has many strange features not found in other dialects. Doubtless it is less than fully intelligible with other Siyi lects.
Actually, there seems to be many more than 8 dialects of Siyi. In Taishan County alone, there are 20 townships there may be a different lect in each one. For certain, there are at least three distinct dialects of Taishan, Taishan A, Taishan B and Taishan C. Even the lects in Taishan County can be quite different. However, all lects in Taishan County appear to be mutually intelligible.
Xinhui is somewhat different from Taishan, but appears to be intelligible. Heshan is said to be intelligible with Xinhui and Taishan.
Nevertheless, there are calls from Taishan speakers to split their lect off from the rest of Siyi. If Taishanese is unintelligible with the rest of Siyi, this would make sense, but that does not appear to be the case.
150 years ago, there was less, but still significant, difference between Siyi and Sanyi (Standard Cantonese), but Siyi was disparaged as a “hill dialect” of poor farmers, while Sanyi was elevated as the prestige lect of the cultured and cosmopolitan. This is why Sanyi became the Standard Cantonese lect. The Siyi incorporated this negative view into their self-image even to the point where they held overseas meetings meeting in Sanyi speech.
There are 3.6 million speakers of Siyi.
Vietnamese Cantonese is quite different from Standard Cantonese, but it is nevertheless intelligible with it. Malay Cantonese is also quite different from Standard Cantonese. Intelligibility data between Malay Cantonese and Standard Cantonese is not known. Both are dialects of Cantonese.
Hong Kong is a dialect of Guangzhou. Foshan and Nanhai are close to Guangzhou and may be intelligible with it. Nanhai and Shunde are mutually intelligible.
Some say that Shunde and Zhongshan are intelligible with Standard Cantonese, but others disagree. This requires further study, as they are obviously close. However, both are said to at the same time be quite different from Standard Cantonese.
Even within Yuehai, Panyu is said to be a separate language (Chan 1981).
Namlong, a poorly understood lect from the Pearl River area, is also a separate language, or at least it was one in 1949. Whether it still exists is not certain, but speakers must still be alive. Yuehai itself has 31 separate lects.
Danija, the Cantonese lect of the Tanka fisherpeople who live on boats off the coast of Guangdong, Guangxi and Hainan, may well be a separate language.
In Hong Kong, another Cantonese language, Gashiau, is spoken by a group of fisherpeople related to the Danija. This language is related to Danija but apparently not intelligible with it.
Maihua, a Cantonese lect spoken on Hainan, may well be a separate language also.
Nanning is a dialect of Cantonese, easily understandable by a Standard Cantonese speaker.
However, Lizhou is a separate language, with difficult intelligibility with Standard Cantonese.
Dongguan and Zhanjiang (evidence), are separate languages.
Shiqi, spoken in Guangxi, is a separate language. Speakers of Standard Cantonese cannot necessarily understand Shiqi, but Shiqi people can understand Guangzhou. Shiqi is spoken in the urban part of Zhongshan City.
Huazhou is a very divergent Cantonese lect that is very hard even for other Cantonese speakers to understand. It is surely a separate language (evidence here and here).
Maoming is an extremely diverse Cantonese lect that must also be a separate language.
Beihai and Hepu are reported to be very different, but intelligibility data is not known, nor is it known to what extent these two lects differ from other Cantonese.
But the Quinlian Group of which they are members must surely be a separate language.
One division holds that the Standard Cantonese (Guangzhou), Siyi, Zhongshan, Gaoyang and Guangfu groups are mutually unintelligible groups.
The Goulou Group of Cantonese appears to be a separate language from all of the rest of Cantonese, and is probably in a group of its own away from the rest of Cantonese, and linked with Pinghua and Tuhua. Yulin is a representative lect in Goulou, and is said to present form of Chinese that is closest to Old Chinese.
Siyi has at least 11 dialects, includes the famous Taishanese (includes Taishan A, Taishan B and Taishan C), along with Heshan, Jiangmen, Siqian, Doumen, Xinhui, Enping and Kaiping.
Nanning is in the Yongxun Group of Cantonese, which has 12 lects.
Zhanjiang and Maoming are members of the Gaoyang Group of Cantonese, which has 10 lects. Gaoyang has 5.4 million speakers.
Dongguan, Shunde, Foshan, Zhongshan, Nanhai, Panyu and Hong Kong are members of the Guangfu Group of Cantonese, which has 31 lects. Guangfu has 13 million speakers.
Shiqi is a member of the Zhongshan Group of Cantonese , which contains at least 3 lects.
Huazhou is a member of the Wuhua Group of Cantonese, which has 2 lects.
Beihai and Hepu are members of the Quinlian Group of Cantonese, which has 6 lects.
Namlong is unclassified.
There are 100 lects of Cantonese, and Cantonese has 64 million speakers.
Pinghua, now recognized as a major split off from Cantonese, is composed of Guinan and Guibei, which are separate languages. The Guibei lects are very different, but we don’t have any intelligibility data.
Guinan has 22 lects, and Guibei has 8 lects .
There is one Pinghua lect that is unclassified.
Pinghua has 31 separate lects. Ping has 2 million speakers.
Tuhua is a separate branch of Chinese spoken in Guangdong and Hunan Provinces. It has 26 separate lects.
In addition to Tuhua Proper, the best known of the Tuhua lects is Shaozhou, referred to here as Shaozhou Proper. Shaozhou is said to be very different from other Chinese lects. Shaozhou itself consists of many different lects which are often strikingly different from the others. Some say that Shaozhou is a branch of Min Nan, while others say it is related to Hakka.
In Lechang prefecture, there are five separate languages, Lechang Tuhua 1, Lechang Tuhua 2, Lechang Tuhua 3, Lechang Tuhua 4 and Lechang Tuhua 5, which are not fully intelligible with each other.
Additionally, many Tuhua lects are starting to splinter recently as influences from Hakka, Cantonese and Southwest Mandarin begin to affect the younger speakers such that the language of the youngest speakers is quite a bit different from the language of the older speakers.
One of the Shaozhou Tuhua lects, Longgui Tuhua, spoken in Qujiang County in Guangdong, is a separate language. Longgui Tuhua has 2,000 speakers.
Actually, Tuhua is not really a language group, but a wastebasket group for various lects derisively referred to as “tuhua” – or “farmer’s language.”
Xianghua, said to be an unclassified Chinese lect, is actually a branch of Tuhua that contains 6 lects of its own. Xianghua is a completely separate and highly diverse language that is spoken in Western Hunan.
Jiahe Tuhua is a completely separate language, unintelligible with other lects. Furthermore, there are huge dialectal differences within Jiahe Tuhua that may or may not constitute separate languages.
Jiangyong Tuhua is divided into two mutually unintelligible languagesNorth Jiangyong Tuhua and South Jiangyong Tuhua (Leming 2004). It is spoken in the rural areas of Jiangyong County in Hunan Province. There are multiple lects within these two languages, which have considerable distance between them.
A subdialect of North Jiangyong Tuhua – the suburban, or “upper street language” dialect, was the basis for the famous nishu, “women’s script”, a secret language of women, originating from the Shangjiangxu (Xiao River) region of northeastern Jiangyong County in Hunan Province, of which much has been written lately.
Also in Hunan, in Guiyang County, another Tuhua language is spoken – Guiyang Tuhua. This is apparently a separate language, and the northern and southern variants are so divergent that they are separate languages also – Northern Guiyang Tuhua and Southern Guiyang Tuhua. In addition, there are a lot of diverse dialects within the two Guiyang Tuhua languages, but intelligibility data is lacking.
Yantang Tuhua, one of these dialects, may well be a separate language, as may Yangshi Tuhua. Jiangyong and Guiyang are in the Tuhua branch of Tuhua. Yantang and Yangshi are unclassified.
Furthermore, initial examination suggests that a number of things.
First of all, that the Tuhua lects, especially those of Southern Hunan, are very diverse, possibly as diverse as Wu, Xiang and Hui. Many or all of them may well be separate languages. Further, they are poorly studied and dialectally very diverse. There are many dialects inside the known Tuhua lects, and these dialects are often very different. So there appear to be languages inside even the known Tuhua lects.
Further, there appear to be links with the Tuhua lects of Southern Hunan, the Tuhua lects of northern Guangdong and the Ping lects of northern Guangxi, which border each other. They all appear to be related, and to have descended from a common ancestor.
Danzou is a separate language. Danzou is spoken in the northwest of Hainan, and Hainanese speakers cannot understand it. It is either related to the language spoken by the Lingao or is the same language. Yet the Danzou people speak 9 different lects, including lects described as Hakka, others described as Cantonese and others described as Mandarin.
Maojiahua is a form of Chinese spoken by 20,000 Hmong in southwest of Hunan, in the northeast of Guangxi and in some areas of Hubei. It is a separate language already recognized by Ethnologue, but is incorrectly lumped in with the Hmong languages by them.
Linghua is an unclassified Chinese lect spoken in Yongzhou in Hunan. Linghua is a separate language. It is apparently the same as the Yongzhou Tuhua dialect.
However, the Yongzhou Tuhua language has 17 different dialects: Yongzhou Tuhua A, Yongzhou Tuhua B, Yongzhou Tuhua C, Yongzhou Tuhua D, Lanshan Tushi Tuhua, Lanjiaoshan Tuhua, Xintian Southern Rural Tuhua, Xintian Northern Rural Tuhua, Ningyuan Zhangjia Tuhua, Ningyuan Pinghua, Lanshan Shangdong Tuhua, Lanshang Taiping Tuhua, Daoxian Xianglinpu Tuhua, Daoxian Xiaojia Tuhua, Shuangpai Lijiaping Tuhua and Jianghua Baimangying Tuhua.
Of these, Lanshan Tushi Tuhua may well be a separate language.
Intelligibility between lects is not known, but dialectal divergence within Tuhua lects is typically great, and some or all of the above may be separate languages.
Pingde Yahua or Kim Mun, incorrectly classed as an unclassified Chinese lect, is actually one of the Mien languages. It is not a Sinitic language.
Wutun, or Wutunhua, is a Chinese-Mongolian-Tibetan mixed language spoken by 2,000 Tu in Qinghai Province. Whether it is a form of Chinese is controversial. Until it is proven to be Sinitic, we will not list it here.


Ben Hamed, Mahe´. 2005. Neighbour-nets Portray the Chinese Dialect Continuum and the Linguistic Legacy of China’s Demic History. Proc. R. Soc. B 272:1015–1022.
Bodman, Nicholas C. 1988. Two Divergent Southern Min Dialects of the Sanxiang District, Zhongshan, Guangdong. BIHP 59 (2): 401-423.
Branner, David. 2000. Problems in Comparative Chinese Dialectology. The Classification of Miin and Hakka. Berlin: Walter de Gruyter.
Branner, David. 2008. Personal communication.
Campbell, Hilary. 2004. Chinese Grammar – Synchronic and Diachronic Perspectives. Oxford, UK: Oxford University Press.
Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. January 2009. Personal communication.
Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. April 2009. Personal communication.
曹志耘 (Cao, Zhiyun). 2002. 南部吴语语音研究 (Southern Wu Phonology Research). Beijing: Commercial Press (In Chinese).
Chan, Marjorie K.M., Lee, Douglas W. 1981. Chinatown Chinese: A Linguistic and Historical Re-evaluation. Amerasia Journal, Volume 8, Number 1.
Cheng, Chin-Chuan. 1997. Measuring Relationship Among Dialects: DOC and Related Resources. Computational Linguistics & Chinese Language Processing 2.1:41-72.
Cheng, Chin-Chuan. 1998. Extra-Linguistic Data for Understanding Dialect Mutual Intelligibility. Taipei, Taiwan: Paper delivered at the 1998 Annual Conference of the Pacific Neighborhood Consortium.
Gilliland, Joshua. 2006. Language Attitudes and Ideologies In Shanghai, China. MA Thesis. Columbus, OH: Ohio State University.
Hirata, Shoji. 1998. Aspect: A General System and its Manifestation in Mandarin Chinese. Taipei: Student Book Company.
Johnson, Eric. 2010. SIL Electronic Survey Reports 2010-027. A Sociolinguistic Introduction to the Central Taic languages of Wenshan Prefecture, China. Dallas, Texas: SIL.
Lee, Kent A. 2002. Chinese Tone Sandhi and Prosody. MA Thesis. Urbana, IL: University of Illinois at Urbana-Champaign.
Lien, Chinfa. August 17-19, 1998. Denasalization, Vocalic Nasalization and Related Issues in Southern Min: A Dialectal and Comparative Perspective. International Symposium on Linguistic Change and the Chinese Dialects Dedicated to the Memory of the Late Professor Li Fang-kuei in Seattle Washington.
Liming, Zhao. The Women’s Script of Jiangyong: An Invention of Chinese, Chapter 4. In Tao, Jie, Zheng, Bijun, Mow, Shirley L., editors. 2004. Holding Up Half the Sky: Chinese Women Past, Present, and Future. New York: Feminist Press at the City University of New York.
Mair, Victor H. 1991. What Is a Chinese ‘Dialect/Topolect’? Sino-Platonic Papers:29
McKeown, Adam. 2001. Chinese Migrant Networks and Cultural Change: Peru, Chicago, Hawaii, 1900-1936. Chicago, IL: University of Chicago Press.
Ngù, George. Eastern Min speaker. 2009. Personal communication.
Olson, James Stuart. 1998. An Ethnohistorical Dictionary of China. Westport, CN: Greenwood Publishing Group.
Rickard, Kristine. 2006. A Linguistic-phonetic Description of Lanqi Citation Tones. Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 349-353. Edited by Paul Warren & Catherine I. Watson. University of Auckland, New Zealand. December 6-8, 2006. Auckland, NZ: Australian Speech Science & Technology Association Inc.
Szeto, Cecilia .2000. Testing intelligibility among Sinitic dialects. Proceedings of ALS2K, the 2000 Conference of the Australian Linguistic Society.
Thurgood, Graham. 2006. Sociolinguistics and Contact-induced Language Change: Hainan Cham, Anong, and Phan Rang Cham.‭ Tenth International Conference on Austronesian Linguistics, 17-20 January 2006, Palawan, Philippines. Linguistic Society of the Philippines and SIL International.
Xun, Gong. Sichuan Mandarin and Putonghua speaker. Deyang, Sichuan, China. Personal communication. September 2009.
Zheng, Rongbin. 2008. The Zhongxian Min Dialect: A Preliminary Study of Language Contact and Stratum-Formation, pp. 517-526. Edited by Chan, Marjorie K.M. and Kang, Hana. Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20). Volume 1. Columbus, Ohio: The Ohio State University.

This research takes a lot of time, and I do not get paid anything for it. If you think this website is valuable to you, please consider a a contribution to support more of this valuable research.

Please follow and like us:
Tweet 20

98 thoughts on “A Reworking of Chinese Language Classification”

  1. Recently on Facebook a number of us discussed which Hokkien and Teochew dialects we understand best, next best, worst, next worst, etc. I recall that of the 3 or 4 of us that spoken “General Taiwan” or “Haiteng” (a district of Ciangciu very close to Amoy), all of us agreed that the Coanciu City dialect was no easier to understand than Teochew, or at least Swatow-type Teochew.
    Regarding Manjiang, known natively as “Mango” or something like that — there is very little data on it floating around. I saw some data once and it doesn’t “look” like a Sinitic language, no more than Vietnamese does…

  2. I find this exercise intellectually stimulating, but otherwise not all that practical. The central criteria used, mutual intelligibility, is changing all the time. It is a fact that language standardization, displacement, and loss is ongoing, especially in China. Hundreds of years of illiterate, uneducated villagers living out their entire lives within 30 miles of their village have created tons of mutually unintelligible languages in China. But such isolation – the engine of new language creation – has come to an end. Today, mass communication, standardized education, and mobile populations have completely changed the landscape of mutual intelligibility within China. It won’t be long before Chinese speak just one of 7-8 major languages, and then just one of 2-3 major languages, and eventually just one language.
    It’s worthwhile to keep track of major languages eg Cantonese, Hakka, etc. because they have a shot of surviving for a few generations, at the minimum, and therefore being relevant to the future. But the exercise of trying to track down whether a sub-dialect within a sub-dialect is, in fact, a language, is futile. In 1-2 generations, it’s going to be gone, before you even gather the money & resources needed to test its mutual intelligibility with other sub-dialects.

  3. True, as in most of the rest of the world. I would not say the loss is “ongoing” in China, though. It really only started about 30-35 years ago.
    Given the socio-political environment, we could reasonably expect all ethnic Chinese to be speaking Mandarin in a century’s time. However, the situation on the ground with Cantonese defies such expectations at this point. The youngest generation of Mandarin speakers in the Cantonese megalopolis is actually shifting TO Cantonese during their school years in spite of instruction being mostly in Mandarin. Some kind of “soft power play” is in effect. There’s an outside chance that Mandophone China will switch to English — say, in 150 years — while Cantophone China continues to speak Cantonese. I say from direct observation on the ground in the Mega Cantopolis.

Leave a Reply

Your email address will not be published. Required fields are marked *