An autodidact named Mike Campbell has issued a long critique of my Chinese language classification.
There are problems with his analysis.
First of all, Campbell says we need to defer to the Chinese on what is a dialect and what is a language. But top Sinologists in the West are saying that the Chinese are falling down on the job and not working according to the modern scientific definition of what is a language and what is a dialect.
The Chinese linguists operate, like Chinese medicine, according to a completely different format that is pretty much at odds with the one used in the West and in much of the rest of the world.
One element of this format is the fangyan. A fangyan has many meanings, but in Chinese it tends to mean “dialect,” or better yet, “topolect.” It also tends to mean the speech form of a given county. But the Chinese definition of the word “dialect” differs radically from the definition used by linguists elsewhere in the world. For one thing, questions of intelligibility with other lects are left out of the definition of fangyan.
Chinese linguists also use hua, which means something like “speech.” This tends to be more expansive than fangyan, but at the same time it can occur down to the level of dialect. Examples include Putonghua, Shanghaihua, Beijinghua, etc, but also Pinghua and Tuhua. It tends to be geographically based – the speech of a particular geographical location, however that geographical location can be expansive or very restricted. But this is not the case in Putonghua, which is just “average speech”, and is spoken all over China.
The third category is yu. Yu is probably the category that Western linguists would most commonly associate with “language” or even “language family.” Yu only refers to separate languages within Chinese. Outside Chinese, the word wen tends to be used. Examples are Wuyu, Minyu, Huiyu, etc.
No one seems to quite know exactly what the Chinese classification is at any given time.
According to Campbell, we must not do anything until the Chinese act first, but they only make a new language maybe once every few years, and they are failing even at that.
Campbell states that Scots and Bavarian are dialects, not languages. He says that Scots is a dialect of English and Bavarian is a dialect of German. However, Ethnologue says that Scots is a separate language and so is Bavarian. The intelligibility of Bavarian and German is only 40%. I lack figures for Scots, but clearly intelligibility is lower than 90%.
Ethnologue is run by SIL. SIL has been granted the task of assigning all of the new ISO numbers. An ISO number means that a lect has been officially recognized by the world linguistic community as a separate language. So SIL are the linguistic scientists who world community has given the task of deciding what is a language and what is not. Campbell is saying that SIL does not know what they are talking about.
Campbell states that mutual intelligibility cannot be determined by talking to speakers and simply asking them whether or not they can understand “those people over there.”
According to Campbell, this is inaccurate. He says the only way to determine intelligibility is through scientific testing methods looking for % in phonology, lexicon, morphology, syntax, etc. He also says that tonal differences are irrelevant for Chinese, because differences in tones do not impede communication, but I would beg to differ on that. Chinese speakers have told me that closely related lects with much different tones can be very difficult to understand, at least at first.
On Ethnologue’s Mexico page, extensive tests have been done on various lects spoken in small villages determining intelligibility between one lect and another. Intelligibility testing is commonly done by simply sitting a speaker of Lect A down in front of a recorded corpus of Lect B and see how much they can understand.
Campbell says that intelligibility testing on human informants is inherently erroneous because as speakers of Close Lect A hear more and more of Close Lect B, they can understand it over a period of time (the exposure factor). This is the problem of interdialectal learning.
Interdialectal learning (the tendency of closely related lects to hear each others’ lects and quickly learn to speak them and hence muddy the waters of intelligibility), trumpeted by Campbell as a reason that intelligibility testing cannot be done on human informants, is regarded by SIL as different from inherent intelligibility. Inherent intelligibility is best regarded as a test of the ability to use the mother tongue.
In other words, when two lects are said to be “inherently unintelligible” this appears to be referring to “virgin” speakers who have not yet had the opportunity to learn each other’s dialects.
Similarly, members of Lect A may simply be bilingual in Lect B, which also invalidates intelligibility testing. However, measures have already been developed to determine bilingualism and the degree of it. A favorite one is SLOPE. SRT is also used in bilingualism testing. Like other intelligibility testing instruments, they have been subjected to tests for reliability and validity over the years.
Further, testing has evolved to the point where we can begin to ferret out bilingualism from inherent intelligibility. In Casad 1974 the author describes testing done on speakers of Mazatec, a Mexican Indian language.
Intelligibility testing was done to see how well they understood Huautla, a related language. Three female speakers had scores in the 50-60% range, and three males had scores in the 90-100% range. Huautla is a local market language that is learned as a second language by many non-Huautla in the surrounding area. I would gather that 55% represents true inherent intelligibility and the 95% speakers represent practiced bilinguals.
At any rate, in the survey, the figures were averaged together so that Mazatec speakers had 76% intelligibility with Huautla and Mazatec and Huautla were said to be separate languages.
Campbell also throws out a red herring in the notion that certain members of a group may simply refuse to hear the language of another group and insist that they do not understand it. Although existent, this problem has little relevance in intelligibility testing. SIL does testing with cross sections of communities.
Furthermore, SIL notes that intelligibility is typically distributed evenly across a community with regard to sex, class and age.
The SD’s for inherent intelligibility in a community are narrow, less than 15%, whereas the SD’s for bilingualism are much higher. This is because in the case of bilingualism, communities differ. Some feel a strong need to learn the other language, others feel no need at all. Further, members differ in their access to an opportunity to learn the other language, even though they may wish to learn it.
This should throw out the notion that females, the aged, the young or the old, the wealthy or the poor, will automatically give us false data on intelligibility.
Campbell hints that intelligibility is poorly defined. However, SIL has listed a hierarchy of intelligibility. SIL says that intelligibility below 70% is “unintelligible” and intelligibility over 90% is “adequately intelligible” (this usually conforms to our ideas of a dialect). Between 71-89% is what SIL calls “marginally intelligible.” Lately, SIL throws most lects with under 90% intelligibility into separate languages.
Campbell recommends throwing out all intelligibility testing with informants as inherently inaccurate and focusing instead of measures of language similarity.
However, SIL notes that linguistic similarity is not an adequate single predictor of intelligibility. For instance, testing in the Philippines revealed pairs of lects with vocabulary similarity of 52, 66, 72 and 74% which had over 90% intelligibility (were inherently intelligible). Over 80% vocabulary similarity for lect pairs resulted in several cases of inherent intelligibility. So lexical similarity is not an adequate measure at all for measuring intelligibility.
In testing of Polynesian, Siouan and Buang, it was found that the higher the level of lexical similarity up to a certain point, the lower the intelligibility scores were. This is counterintuitive, but it shows once again that lexical similarity is poor measure.
Morris Swadesh was the founder of lexicostatistics, the study of lexical similarity. Lexicostatistics has its uses, but determining between closely related languages and dialects is apparently not one of them.
This myth seems to be dying a hard death. Robert Longacre and Sarah Gudschinsky were involved in long debates with Swadesh about the validity of lexical similarity measures, and they seem to have been proven right. The latest findings calculate that any study that uses lexical similarity alone to determine intelligibility of lects has a 4.5-1 chance of failing to do so with any reliability.
Word lists still have their uses. Where word lists show similarities between lects below 60%, odds are that we are dealing two separate languages, and there is no need to do any further intelligibility testing. And they have obvious uses in historical linguistics and in determining genetic relationships between languages.
Vocabulary similarity below 67%, though, typically reveals intelligibility estimates below 60%. Intelligibility below 60% is inadequate for all but the very simplest communication. Before any kind of even slightly complex or revealing messages can be conveyed, intelligibility usually needs to be over 85%. Casad found that 90% intelligibility on a narrative test was necessary before one could move to more complex kinds of communication. Here once again we get into the dialects.
Intelligibility is usually asymmetrical. In other words, Lect A can understand 80% of Lect B, but Lect B can only understand 70% of Lect A. There are arguments about the reasons for this, but one suggestion is that higher figures result from some sort of bilingual learning.
Campbell also points out that it is not uncommon that people speaking the same language cannot always understand each other. He asks how often we have heard a fellow English speaker of the same dialect say something and we did not catch what they were saying for some reason or other. The implication is that we need to throw out all testing with informants due to this.
SIL has actually examined this, and they often include a test called “home-town” in which people are presented with narratives within their own dialect and an intelligibility score is given for that. It is true that sometimes this is lower than 100%, but it is typically not much lower. Nevertheless, using the “home-town factors” of Lects A and B as controls in factor analysis helps greatly when moving on to actual intelligibility between Lect A and Lect B.
One thing to do is to throw out all sentences or questions that score less than 100% on home-town, since if the speakers can’t even understand these sentences well when their own people speak them, how can we measure how well they understand them when speakers of other lects speak them?
Campbell suggests that there are no tests available to use on human informants that pass the smell test of empiricism. This is not the case.
One test, the Sentence Repetition Test (SRT), has been used for decades, subjected to many papers and studies, and criticized and modified in many ways.
In this case of SRT, testing of group members individually has been shown to be superior to testing them in groups. The reason for this is because when you do intelligibility testing in a group of say eight people, you can run into a strong personality or high-ranking male in that group who might say he understands much more than he really does for some reason or another, possibly to show off. The other less dominant group members then follow his lead and give false high readings on the intelligibility test.
Many linguists, led by SIL, have been leading the way in intelligibility testing for decades now. Some of the top figures in in this subfield are the couple Joseph and Barbara Grimes of SIL. Joseph Grimes is a retired linguistics professor from Cornell.
In addition, a number of computer programs have been created that help the researcher to test intelligibility.
Another charge, that intelligibility testing lacks adequate controls, has been shown to be false. Bias in both experimenter and subject has been shown to be a problem, as is the case in most or all science, and measures have been undertaken to deal with it.
The notion that this subfield of Linguistics, intelligibility testing, is unscientific should be laid to rest.
Ethnologue seems to place tremendous importance on mutual intelligibility, however defined. Mutually unintelligible lects are assumed to be separate languages by Ethnologue. Their criteria for splitting off a dialects into languages seems to be 90%. Below 90%, separate languages. Above 90%, dialects of a single language.
In conclusion, Mr. Campbell’s principal contentions in his critique are all incorrect.
First, he suggests that the very concept of mutual intelligibility between lects is impossible to define or prove. SIL has shown that the concept can be defined and tested by reliable instruments.
Second, he says that the use of human informants in mutual intelligibility testing is so prone to error that it cannot guarantee satisfactory results. This is not the case. SIL has proven, through decades of testing, that mutual intelligibility is best done, or possibly can only be reliably done, through intelligibility tests with human informants.
Third, he throws up a number of red herrings that supposedly prove the inherent unreliability of human informants in intelligibility testing. All of these are shown to be the very red herrings that I claim they are, although it is true that unrecognized bilingualism is a problem, but it can often be ferreted out.
Fourth, he says that the only way to reliably test for intelligibility is to compare lects via tones, phonology, morphology, syntax and lexicon. This is an extremely complicated process utilizing math and computer programs and can only be undertaken by practiced linguists. In truth, such elaborate testing, while interesting, is entirely unnecessary.
Fifth, he suggests that any Western reformulations of Chinese language classification need to first defer to the Chinese. The problem here is that the Chinese have completely fallen down on the job. We cannot defer to the Chinese without upsetting our entire system of language classification. The Chinese are entitled to their system, but it is at odds with that used by the rest of the world.
- Casad, Eugene H. 1974. Dialect Intelligibility Testing. Summer Institute of Linguistics Publications in Linguistics and Related Fields, 38. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.
Casad, Eugene H. 1992. “State of the Art: Dialect Survey Fifteen Years Later.” In Eugene H. Casad (ed.), Windows on Bilingualism, 147-58. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.
Grimes, Barbara F. 1992. “Notes on Oral Proficiency Testing (SLOPE).” In Eugene H. Casad (ed.), Windows on Bilingualism, 53-60. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.
Grimes, Joseph E. 1992. “Calibrating Sentence Repetition Tests.” In Eugene H. Casad (ed.), Windows on Bilingualism, 73-85. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.
Grimes, Joseph E. 1992. “Correlations Between Vocabulary Similarity and Intelligibility.” In Eugene H. Casad (ed.), Windows on Bilingualism, 17-32. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.