Problems of Onomatopoeia in Historical Linguistics

The forms in the first part of this paper are from a paper by Geoffrey Kimball referenced at the end. He examined the relationship between the Muskogean languages and the isolates Tunica, Natchez, Atakapa, and Chitimacha in a proposed Amerindian family of the Southeastern US called Gulf. He concluded that although the languages were probably related, there was no way to prove it was so.

Granted, there are problems in the relationship, that’s for sure. But I went through a paper by Pam Munro looking at the relationship between Yuki-Wappo and Gulf and found the relationship convincing. The Gulf languages seemed obviously related; either that or massive borrowing had occurred.

People who oppose long-range language proposals like the above like to harp on a few caveats that get in the way of showing language relationship. For instance, they throw out all onomatopoetic words – words that are based on the sound of something. In animal names, this often refers to the sound of an animal.

Buzz for insects, slam or punch for a fist pounding, chirp for bird noises, meow for cat sounds, on and on. All of these are supposed to be rejected because people make up words based on the sound something makes. This gets in the way of proving relationship because we can always argue that these words are not cognates; instead, they are just words based on the typical sounds an animal makes.

However, I would say, “Not so fast now!” Follow me.

It’s from a scan, so it’s not letting me cut and paste but if you look at pp. 33-34, neighboring Siouan Quapaw has shikkokkoke.

Then in Gulf, we have


Tunica           wishkoku
Proto-Muskogean  tsiskoko/kwiskoko
Natchez          mishkokwa

Siouan           shikkokkoke

He says onomatopoeia, and then he says heavy borrowing. Also, as a birdwatcher, I am unconvinced that words for “robin” always end up looking like this due to the sound the bird makes. I don’t buy it. Our word is “robin.” That look like any of the words above.

I’m curious why any of these tribes would have to borrow a word for “robin” which has been found in their area since the tribes were founded. Animal names are usually borrowed when a group moves into a new territory with new animals with no names for them.

Onomatopoeia – fine, but why the same phoneme sequence over and over? There must be 100 onomatopoeic ways to describe this bird.

Kimball says this term is obviously widely borrowed.

More likely: A Gulf term, widely disseminated in Gulf and even reconstructed all the way back to Proto-Muskogean. That’s an old word! Remnants remain in Tunica and Natchez Gulf languages. Quite possibly borrowed by the Siouan Quapaw, who were migrants to the area. In this case, it would have been borrowed into Quapaw from Tunica because they neighbored each other.

It’s found in a number of Gulf languages and only in one Siouan language. They all could have borrowed the Siouan word for “robin.” Also, the Siouans were migrants from themselves, from Ohio. The Gulf languages probably came from Mexico, but robins winter there. Why would all of those Gulf languages independently borrow a Siouan word? Majority rules. If a form is widespread in one family and only present in one language of a neighboring family, the form was borrowed by the single language from the widespread form in the other family.

Also notice that this word in all of these languages has the same set of phonemes. Why would this set of phonemes be necessary to describe this bird? Our word robin doesn’t sound like any of those words. I am a birdwatcher but I’m not aware that that word represents the sound of a robin. The word has the same set of phonemes in different languages because it’s not onomatopoetic, that’s why! So it’s either genetic or widely borrowed.

Granted, maybe it was borrowed around in Gulf, but maybe not. I don’t know how to tease that apart.

Next we have two “woodpecker” words:

"Pileated woodpecker"

Tunica           pahpahka-na 
Proto-Muskogean  kwakakwa
Natchez          papaku-shil

"Redheaded woodpecker"
Tunica           chuchuhi-na
Proto-Muskogean  chaxchah-ka
Natchez          tsawtsa  

Once again, oddly enough, this word is ancient, going all the way back to Proto-Muskogean. Once again, it shows up in both Tunica and Natchez. How odd that these three same languages always get affected by these words.

Why would these three languages be more likely to borrow words from each other than any of the other languages?

Onomatopoeia is brought up again, but why would the two woodpeckers have phonemes that are exactly alike for each separate one? Woodpeckers don’t sound all that different. I’m a birdwatcher.

Tunica and Natchez were not adjacent, so it’s hard to see how there could be borrowing between them. Also, Proto-Muskogean was spoken in Mexico! The Proto-Muskogeans moved from Mexico to an area around Tennessee. We don’t know where the homeland of the Gulf languages was. It was possibly in Mexico too. Possibly Proto-Gulf was spoken in Mexico, and it migrated to the Southeastern US. Neither of these woodpeckers is found in Mexico, so both would have been new to the Gulf migrants.

If the entire Gulf family moved into Southeastern Louisiana, the two woodpeckers above might have been new to them. Proto-Gulf could very well have coined those two terms for the different woodpeckers because they had never seen them before. The words then filtered down through the years into the present-day languages.

The conclusion here is that the feint to onomatopoeia by anti-long rangers is a potential dodge, and just because a disseminated word is onomatopoetic, that doesn’t mean that all of those languages made it up based on the sound of the object. If the words for an animal always take the same phonemic shape, this tends to argue against onomatopoeia because you would think different groups would make up different words for objects that make the same sound. Why would they all make up the exact same word with the exact same phonemes? It doesn’t fly.

As an example, see this segment copied from an article by John Bengtson (an acquaintance) in Mother Tongue November-December 1989.


J.D. Bengtson

Soon after I began actively comparing the languages of the world some three decades ago I noticed a recurring phonetic pattern in word for ‘butterfly’ all around the world. They all involved syllables with a labial (usually /p/) followed by a vowel and a liquid resonant (usually /r/ or /l/). The syllables were often repeated or reduplicated with partial or full reduplication. The collection of these words grew until they make up what is now Table 1.

Table 1: Words for ‘butterfly’ containing labials and liquids

Indo-European: *pXpili- ~ *pòpili- >
Italic: Latin pXpilie (pXpilion-) ‘butterfly’ > French papillon, papillot ‘butterfly, leaflet’, pavillon ‘’tent, pavilion’ (> Engl. pavilion), Venetian paveğa, Tyrolian pavel, Friulian paveye, Provençal pabalho, Catalan papalhó, Calabrian parpaggyune, etc.
Germanic: Old High German fîfaltra (> German falter, [dial. fifalter, pfeipfalter, etc.] Yiddish flaterl); Old Saxon vîvoldara, Dutch vlinder; Old English fòfealde; Icelandic fifrildi ~ fijrildi, Norwegian fivreld(e), Swedish fjäril (dial. fjörald, fervel, fjärafalla, etc.) ‘butterfly’, etc. [Italian farfalla < Germanic: cf. Swed. dial. fjärafalla]

Semitic: Hebrew parpXr, Aramaic furfr- ‘butterfly’

Kartvelian: Georgian p’ep’el-, Mingrelian parpal(ia)-, papralia, Laz parpal-, Svan p’ärp’old, p’ärp’and  ‘butterfly’

Basque: pinpirin ~ pinpilin ‘butterfly’
Caucasian: Udi päpäläk ~ Udi (Nidzh) pampaluk Andi pirinpa ‘butterfly’; Abkhaz a-parpal’  ‘moth’;

Dravidian: Kui pipili ‘moth’ Kodagu pa:pÈli  ‘butterfly, moth’; Kurukh paplX  Naiki pipuli, Parji pilpili, Gondi pòplò, pòprò, Kuwi pubuli ‘butterfly’

Austronesian: Tagalog papaló ~ paparó ~ parú-paró ‘butterfly’

Trans-New Guinea: Kare purupuru, Bunabun piropir ‘butterfly’
Andamanese: Önge bebele, Aka-Bale pomÃlÃ, Aka-Bea pQmilÃ-dÃ, Aka-Puchikwar and Aka-Bo bQmilÃ-dÃ, Aka-Kol bÃmilà ‘’butterfly

Hokan: Tequistlatec pápalo ‘butterfly’
Uto-Aztecan: Aztec: Zacapoaxtla paapaaloo-t, Tetelcingo pöpölu-tl; Hopi pó:voli ‘butterfly’
Andean: Quechuan *pimpilitu, *pil¨pintu ‘butterfly’

It should immediately be noted that the words in Table 1 are not necessarily all the words of this type in all the world’s languages, only those that have come to my attention since I began collecting them some thirty years ago. What are we to make of these very similar words for ‘butterfly’, found in diverse areas of the world? Most historical linguists would probably dismiss the similarities, attributing them to independent and recent origins.

For example, R.L. Trask (1997: 296), remarking on the Basque word pimpirina ‘butterfly’ and others: “This impressive collection of regional terms can hardly represent anything of any great antiquity; most of these terms appear to be strongly phonaesthetic in motivation.” (Italics added.) Trask (1997: 258) defines a phonaesthetic word as “one which has apparently been coined out of thin air purely because of its appealing sound.”

While such words as referred to by Trask may exist, “coining out of thin air” can hardly be applied to the words for ‘butterfly’ listed above. Why indeed would Europeans, Asians, Pacific peoples, and Native Americans independently arrive at almost the same shapes for these words? With further analysis, we find these words can be subdivided into the following types:

Table 2: Simple reduplication:

Hebrew  p  a  r  p  X     r 
Tagalog p  a  r  ú  p  a  r  ó 
Kare    p  u  r  u  p  u  r  u 
Bunabun p  i  r  o  p  i  r

Here the syllable type PVR(V) is simply reduplicated. This syllable closely resembles the form of a global etymology meaning ‘to fly’, which Merritt Ruhlen and I gave the approximate phonetic shape of PAR (Bengtson & Ruhlen 1994, pp. 317-318).

Table 3: Reduplication with apophony or dissimilation:

Basque      p  i  n  p  i  r - in 
Udi (Nidzh) p  a  m  p  a  l - uk 
Abkhaz     -p  a  r  p  a  l - ‘ 
Quechua     p  i  m  p  i  l - itu ~ 
            p  i  l¨ p  i  n - tu

In these words simple reduplication (as in Table 2) has been altered either by apophony1 (alternation of r ~ l ~ n), or dissimilation substitution of similar sounds, here other resonants r ~ l ~ n > m, as in English “pilgrim”, ultimately from Latin peregrinus). Note, for example, the similar results in widely separated Basque and Quechua.

Table 4: Partial reduplication (type PùPVLV):

Latin    p  X  p  i  l  i - ion-
Georgian p’ e  p’ e  l-
Udi      p  ä  p  ä  l  ä - k
Kodagu   p  a: p  È  l  i
Tagalog  p  a  p  a  l  ó
Önge     b  e  b  e  l  e
Hopi     p  ó: v  o  l  i

1 For more on apophony (consonantal ablaut) see Wescott (1974, 1998), Bengtson (1998). In these examples, which I find the most interesting of all, we find the common elements of:

  • initial labial stop [p], voiced [b] in Andamanese;
  • first vowel, sometimes stressed and/or long [X, a:, ó:];
  • medial labial stop [p] (the Hopi change of *p > v is parallel to the change from Latin pXpilion- > French pavillon); [b] in Andamanese
  • a second vowel (with some variation [i ~ È ~ e ~ o];
  • a third consonant – always lateral [l];
  • the original final (thematic) vowel, in three of the languages [i].

In the face of these closely parallel common elements, I find independent coinage extremely unlikely, borrowing between these diverse languages just as unlikely, so we are left with one viable explanation: a very ancient common origin of these words for ‘butterfly’.

But if so, how do we account for the amazing similarity after what could be 50,000 years or more? I propose that the reason they have been preserved almost intact in widely separated areas is due to the preservative effect of phonosymbolism. Phonosymbolism, which symbolizes an action or state of being, is not the same as onomatopoeia, which imitates it.

As explained by Frederic G. Cassidy (1985):

One may guess that to keep the ‘same’ bases from spreading apart phonologically (as speakers spread apart geographically) to the point where all plausible or obvious similarity is lost, there must be some restraining forces at work – and one of these would be phonosymbolism. … So, phonosymbolism would perhaps exert a centripetal force holding basic forms together despite their having lost geographic contact.

We actually have historic documentation of this preservative effect in the case of the French doublets papillon ‘butterfly’ and pavillon ‘tent’ (both from Latin pXpilion-). In the former case phonosymbolism (PVPVL symbolizing the silent flapping of wings) has acted to preserve the medial [p], contrary to regular sound change, while the latter word (pavillon), with the secondary meaning of tent, has evaded the preservative effect and changed [p] to [v] in the regular manner (cf. savon ‘soap’ < Lat. sapone-, etc.).

But phonosymbolism did not manage to keep all the eventual words similar: we saw in Hopi pó:voli that the regular sound change (intervocalic *p > v) was not impeded by phonosymbolism. And note that many other radical phonetic changes have taken place, especially in Germanic!

In conclusion, I propose that all or most of the words for ‘butterfly’ listed in Table 1 are extremely ancient, and most likely traceable to Proto-Human. Also present in Proto-Human were at least two mechanisms for the creation of phonosymolic words:

(a) simple reduplication of the type PVR(V)PVR(V), as shown in Table 2, from which the variants in Table 3 can be derived; and

(b) partial reduplication of the type PùPVLV (with r ~ l apophony: see Bengtson 1998), as shown in Table 4. Subsequent to the initial creation of these words, phonosymbolism continued to exert a centripetal or preservative force, keeping the words similar after geographic dispersal.


  1. Here are some Dutch verbs that refer to the noises that animals make. Let’s see how many you can guess.

    1 – balken
    2 – hinniken
    3 – loeien (oe is pronounced oo)
    4 – brullen
    5 – knorren
    6 – piepen
    7 – blaten
    8 – blaffen
    9 – kakelen
    10 – krijsen
    11 – grommen
    12 – zoemen
    13 – koeren
    14 – mekkeren
    15 – kraaien
    16 – keffen
    17 – janken
    18 – huilen
    19 – klokken
    20 – snateren

  2. A great word. Most people wouldn’t even know it means having the ability to mimic animal sounds. I sure as hell didn’t know it until late in my 20’s.

