What causes relative frequency of consonants? Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Are there natural languages that do not obey Zipf's law?Formal Language theory (context free grammars, pushdown automata)relative complexity of languagesCalculating writing system efficiency with respect to reading ambiguity?What are good (state of the art?) methods for automatic grammar correction?Distribution of the set of meanings of a given word, in a corpusHow can I calculate if the difference between two word frequencies in one corpus is significant?Pointing words in a sentence to their specific dictionary defininitionWhat grammatical features do SOV languages often share?How to quantify and compare different ways of segmenting and transliterating (reading) a text in terms of uncertainty/leeway?
How to write the following sign?
As a beginner, should I get a Squier Strat with a SSS config or a HSS?
What initially awakened the Balrog?
Denied boarding although I have proper visa and documentation. To whom should I make a complaint?
Is it fair for a professor to grade us on the possession of past papers?
How to tell that you are a giant?
Is a ledger board required if the side of my house is wood?
SF book about people trapped in a series of worlds they imagine
Question about debouncing - delay of state change
Did Krishna say in Bhagavad Gita "I am in every living being"
A term for a woman complaining about things/begging in a cute/childish way
Central Vacuuming: Is it worth it, and how does it compare to normal vacuuming?
How much damage would a cupful of neutron star matter do to the Earth?
Crossing US/Canada Border for less than 24 hours
Is there any word for a place full of confusion?
How can I reduce the gap between left and right of cdot with a macro?
Maximum summed subsequences with non-adjacent items
How were pictures turned from film to a big picture in a picture frame before digital scanning?
If Windows 7 doesn't support WSL, then what does Linux subsystem option mean?
How does light 'choose' between wave and particle behaviour?
What was the first language to use conditional keywords?
What would you call this weird metallic apparatus that allows you to lift people?
Can the Great Weapon Master feat's damage bonus and accuracy penalty apply to attacks from the Spiritual Weapon spell?
Why does it sometimes sound good to play a grace note as a lead in to a note in a melody?
What causes relative frequency of consonants?
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Are there natural languages that do not obey Zipf's law?Formal Language theory (context free grammars, pushdown automata)relative complexity of languagesCalculating writing system efficiency with respect to reading ambiguity?What are good (state of the art?) methods for automatic grammar correction?Distribution of the set of meanings of a given word, in a corpusHow can I calculate if the difference between two word frequencies in one corpus is significant?Pointing words in a sentence to their specific dictionary defininitionWhat grammatical features do SOV languages often share?How to quantify and compare different ways of segmenting and transliterating (reading) a text in terms of uncertainty/leeway?
So, can you point me to some research, what causes the relative frequency of consonants in various languages?
The fact that vowels are more common than consonants is obviously caused by phonotactics, but I don't see a simple explanation for the fact that some consonants appear to be way more frequent than others. For much of my research, I simply assumed that most of it is caused by syntax, but, evidently, syntax plays only a minor role. As I've explained on this web-page, it's relatively easy to measure the effect syntax has on relative frequency of consonants.
To summarize the relevant part of the web-page, I made a simple computer program in C (source code is available on the web-page) that randomly picks two consonants from a text-file a million times, and counts how many times the two consonants happened to be the same. If you run it on a long English text, it will print that the probability of choosing the same consonant two times in a row is 1/11, and that the most common consonant is t (presumably because of the words like the and that). However, if you run it on an English word-list for a spell-checker, it will print that that probability drops to 1/13, and that the most common consonant is r (probably because of the common English prefix re- and the common English suffix -er). Similarly, if you run it on a long Croatian text, it will print that the probability of choosing two same consonants in a row is 1/13, and, if you run it on a Croatian word-list, the probability will be 1/14 (in both cases, the most common consonant will be n, probably because ne- and na- are very common prefixes forming Croatian words). And, if you run it on a long German text, it will print that the probability of choosing the same consonant two times in a row is 1/12, and that the probability of that happening in a spell-checker word-list is 1/15. In both cases, the most common consonant is n, and I can't really guess why.
So, as you can see from the above data, while syntax indeed plays some role in the relative frequency of consonants, that's not all there is to it. To what extent is the rest of the effect caused by phonology, and to what extent is it caused by morphology?
computational-linguistics linguistic-typology
|
show 1 more comment
So, can you point me to some research, what causes the relative frequency of consonants in various languages?
The fact that vowels are more common than consonants is obviously caused by phonotactics, but I don't see a simple explanation for the fact that some consonants appear to be way more frequent than others. For much of my research, I simply assumed that most of it is caused by syntax, but, evidently, syntax plays only a minor role. As I've explained on this web-page, it's relatively easy to measure the effect syntax has on relative frequency of consonants.
To summarize the relevant part of the web-page, I made a simple computer program in C (source code is available on the web-page) that randomly picks two consonants from a text-file a million times, and counts how many times the two consonants happened to be the same. If you run it on a long English text, it will print that the probability of choosing the same consonant two times in a row is 1/11, and that the most common consonant is t (presumably because of the words like the and that). However, if you run it on an English word-list for a spell-checker, it will print that that probability drops to 1/13, and that the most common consonant is r (probably because of the common English prefix re- and the common English suffix -er). Similarly, if you run it on a long Croatian text, it will print that the probability of choosing two same consonants in a row is 1/13, and, if you run it on a Croatian word-list, the probability will be 1/14 (in both cases, the most common consonant will be n, probably because ne- and na- are very common prefixes forming Croatian words). And, if you run it on a long German text, it will print that the probability of choosing the same consonant two times in a row is 1/12, and that the probability of that happening in a spell-checker word-list is 1/15. In both cases, the most common consonant is n, and I can't really guess why.
So, as you can see from the above data, while syntax indeed plays some role in the relative frequency of consonants, that's not all there is to it. To what extent is the rest of the effect caused by phonology, and to what extent is it caused by morphology?
computational-linguistics linguistic-typology
2
For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.
– sumelic
Apr 14 at 11:50
1
Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?
– sumelic
Apr 14 at 12:00
I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.
– FlatAssembler
Apr 14 at 12:01
1
Oh, I see, it's the second-to-last paragraph.
– sumelic
Apr 14 at 12:03
2
Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...
– Hagen von Eitzen
Apr 14 at 18:55
|
show 1 more comment
So, can you point me to some research, what causes the relative frequency of consonants in various languages?
The fact that vowels are more common than consonants is obviously caused by phonotactics, but I don't see a simple explanation for the fact that some consonants appear to be way more frequent than others. For much of my research, I simply assumed that most of it is caused by syntax, but, evidently, syntax plays only a minor role. As I've explained on this web-page, it's relatively easy to measure the effect syntax has on relative frequency of consonants.
To summarize the relevant part of the web-page, I made a simple computer program in C (source code is available on the web-page) that randomly picks two consonants from a text-file a million times, and counts how many times the two consonants happened to be the same. If you run it on a long English text, it will print that the probability of choosing the same consonant two times in a row is 1/11, and that the most common consonant is t (presumably because of the words like the and that). However, if you run it on an English word-list for a spell-checker, it will print that that probability drops to 1/13, and that the most common consonant is r (probably because of the common English prefix re- and the common English suffix -er). Similarly, if you run it on a long Croatian text, it will print that the probability of choosing two same consonants in a row is 1/13, and, if you run it on a Croatian word-list, the probability will be 1/14 (in both cases, the most common consonant will be n, probably because ne- and na- are very common prefixes forming Croatian words). And, if you run it on a long German text, it will print that the probability of choosing the same consonant two times in a row is 1/12, and that the probability of that happening in a spell-checker word-list is 1/15. In both cases, the most common consonant is n, and I can't really guess why.
So, as you can see from the above data, while syntax indeed plays some role in the relative frequency of consonants, that's not all there is to it. To what extent is the rest of the effect caused by phonology, and to what extent is it caused by morphology?
computational-linguistics linguistic-typology
So, can you point me to some research, what causes the relative frequency of consonants in various languages?
The fact that vowels are more common than consonants is obviously caused by phonotactics, but I don't see a simple explanation for the fact that some consonants appear to be way more frequent than others. For much of my research, I simply assumed that most of it is caused by syntax, but, evidently, syntax plays only a minor role. As I've explained on this web-page, it's relatively easy to measure the effect syntax has on relative frequency of consonants.
To summarize the relevant part of the web-page, I made a simple computer program in C (source code is available on the web-page) that randomly picks two consonants from a text-file a million times, and counts how many times the two consonants happened to be the same. If you run it on a long English text, it will print that the probability of choosing the same consonant two times in a row is 1/11, and that the most common consonant is t (presumably because of the words like the and that). However, if you run it on an English word-list for a spell-checker, it will print that that probability drops to 1/13, and that the most common consonant is r (probably because of the common English prefix re- and the common English suffix -er). Similarly, if you run it on a long Croatian text, it will print that the probability of choosing two same consonants in a row is 1/13, and, if you run it on a Croatian word-list, the probability will be 1/14 (in both cases, the most common consonant will be n, probably because ne- and na- are very common prefixes forming Croatian words). And, if you run it on a long German text, it will print that the probability of choosing the same consonant two times in a row is 1/12, and that the probability of that happening in a spell-checker word-list is 1/15. In both cases, the most common consonant is n, and I can't really guess why.
So, as you can see from the above data, while syntax indeed plays some role in the relative frequency of consonants, that's not all there is to it. To what extent is the rest of the effect caused by phonology, and to what extent is it caused by morphology?
computational-linguistics linguistic-typology
computational-linguistics linguistic-typology
asked Apr 14 at 11:43
FlatAssemblerFlatAssembler
895
895
2
For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.
– sumelic
Apr 14 at 11:50
1
Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?
– sumelic
Apr 14 at 12:00
I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.
– FlatAssembler
Apr 14 at 12:01
1
Oh, I see, it's the second-to-last paragraph.
– sumelic
Apr 14 at 12:03
2
Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...
– Hagen von Eitzen
Apr 14 at 18:55
|
show 1 more comment
2
For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.
– sumelic
Apr 14 at 11:50
1
Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?
– sumelic
Apr 14 at 12:00
I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.
– FlatAssembler
Apr 14 at 12:01
1
Oh, I see, it's the second-to-last paragraph.
– sumelic
Apr 14 at 12:03
2
Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...
– Hagen von Eitzen
Apr 14 at 18:55
2
2
For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.
– sumelic
Apr 14 at 11:50
For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.
– sumelic
Apr 14 at 11:50
1
1
Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?
– sumelic
Apr 14 at 12:00
Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?
– sumelic
Apr 14 at 12:00
I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.
– FlatAssembler
Apr 14 at 12:01
I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.
– FlatAssembler
Apr 14 at 12:01
1
1
Oh, I see, it's the second-to-last paragraph.
– sumelic
Apr 14 at 12:03
Oh, I see, it's the second-to-last paragraph.
– sumelic
Apr 14 at 12:03
2
2
Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...
– Hagen von Eitzen
Apr 14 at 18:55
Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...
– Hagen von Eitzen
Apr 14 at 18:55
|
show 1 more comment
2 Answers
2
active
oldest
votes
Frequency of a thing can be in terms of all languages or a single language; it can be in terms of yes/no existence or in terms of actual use; if the latter it has to be relative to some defined corpus. As an example, [ʕ] is a zero-frequency consonant in English, and a low-frequency consonant in human language. I won't venture a guess about its frequency in Arabic, but it is not the least-frequent consonant of Arabic (Classical, at least). [t] on the other hard is very high frequency in and across languages. There is a vague concept out there of "markedness" that is invoked to encapsulate differing frequency of attestations, whereby it is said that [ʕ] is "marked" relative to [t].
Two factors that have the greatest influence on frequency of attestation are (a) intrinsic phonetic properties and (b) historical precedent. Ejectives are extremely low frequency in Indo-European languages because the proto-language lacked ejectives (I ignore the claim to the contrary), and [p] is low frequency in modern Arabic dialects because Classical Arabic didn't have [p]. However, English has [f] while PIE did not, so languages do develop new sounts.
Factor (a), intrinsic properties, is hard to explain or even establish satisfactorily. A popular idea applied to crosslinguistically low-frequency consonants is that they are "hard to pronounce"; the problem is that this can't be directly measured in an objective way, and seems to reflect the struggle that people have when trying to pronounce a sound that is not in their own language (it's hard). Phonetically based dispreference is the result of aerodynamic, acoustic and articulatory factors. However, it is also hard to separate (a) from (b), that is, I don't find [ʕ] hard in any sense, but it is not part of my native language and I often elide the consonant when pronouncing words of Arabic (names) in an English discourse. It's possible that through massive social change that Arabic could influence English and we would nativize some words containing [ʕ], thus that consonant could be properly a part of the English consonant inventory where it was not one historically. This happened in the case of some Bantu languages of Southern Africa, which adopted click sounds from neighboring Khoisan languages, thus increasing the attestation frequency of clicks.
A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].
The question of possible influence of syntax, phonology or morphology on consonant frequency depends on what frequency you are speaking of. W.r.t. crosslinguistic frequency, the effect is zero. Token frequency within a language can be influenced by syntax, phonology or morphology, and yes/no frequency can be influenced by phonology (people often say that the lack of [ʕ] in English is a fact encoded in the phonological grammar of English). There is no general way to know in advance what the influence of syntax, phonology or morphology is on token frequency, because you don't know if a language is going to have rules deleting g in some context, or turning /k/ into [g] in some context, either of which will influence token frequency. Syntax and morphology can influence token frequency in case e.g. /k/ is figures in widely-used affixes, but again not every language has a ubiquitous affix /s/ or /d/ which increases the token frequency of these sounds in English. Post hoc, you can compute the percentage of tokens that are attributable to some affix or syntagmeme, but there's no predictive power apart from general predictions about samples from a different corpus.
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
add a comment |
Following your argument I guess, the frequency in German is pushed up by indefinite articles "ein-", case inflections and regular verb inflections on "-(e)n", and the plural marker "-en". The common mnemonic for most frequent letters in German is "ERNSTL", famously in the wheel of fortune game shows, ordered for ease of pronunciation of the mnemonic. One should wonder why words that are favored by the syntax contain those consonants.
Looking at the Wikipedia article for liquid consonants, we see claimed that they are very frequent. Down the page we see that the term originally described "the sonorant consonants (/l, r, m, n/) of classical Greek"--three of those matching our "ERNSTL". Mind that, while German "R" is nominally a trill, it is often produced as a mere approximant, or plainly elided. There's a lot to say about that, and about relations between "d" and "n" ("d" is just a non-nasalized "n"), that escape me at the moment.
We discriminate mainly two points of articulation, front and back, respectively the tip of the tongue and whatever your local accent prefers (uvular for me, rhotic for many Americans). "ng" is a velar nasal on the other hand, so somewhere in the middle, but we still hear most of it like a dental or alveolar; Those in turn are even represented with the same IPA sign; Many other IPA symbols are reminiscent of n, too.
The heart of the matter that I'm getting at is that those consonants are prefered, where the least effort is expanded in speech. Similarly, written speech optimizes for ease of writing and represents several sounds with the same symbol. Only if trying to be precise--talking clearly, or writing phonetically--will the difference be highlighted. However, we nevertheless hear, or see the difference if we expect it, even if it's hardly even there or only hinted at by context.
I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.
Note that [m], another nasal, is one of the earliest learned sounds of a child (and one researcher figured that was helped by the most basic lip action a baby gets, sucking on the mamaries). Note variants like "nana", "anna", etc. Whereas [p] is learned rather late. This alone implies levels of difficulty.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "312"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f31164%2fwhat-causes-relative-frequency-of-consonants%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Frequency of a thing can be in terms of all languages or a single language; it can be in terms of yes/no existence or in terms of actual use; if the latter it has to be relative to some defined corpus. As an example, [ʕ] is a zero-frequency consonant in English, and a low-frequency consonant in human language. I won't venture a guess about its frequency in Arabic, but it is not the least-frequent consonant of Arabic (Classical, at least). [t] on the other hard is very high frequency in and across languages. There is a vague concept out there of "markedness" that is invoked to encapsulate differing frequency of attestations, whereby it is said that [ʕ] is "marked" relative to [t].
Two factors that have the greatest influence on frequency of attestation are (a) intrinsic phonetic properties and (b) historical precedent. Ejectives are extremely low frequency in Indo-European languages because the proto-language lacked ejectives (I ignore the claim to the contrary), and [p] is low frequency in modern Arabic dialects because Classical Arabic didn't have [p]. However, English has [f] while PIE did not, so languages do develop new sounts.
Factor (a), intrinsic properties, is hard to explain or even establish satisfactorily. A popular idea applied to crosslinguistically low-frequency consonants is that they are "hard to pronounce"; the problem is that this can't be directly measured in an objective way, and seems to reflect the struggle that people have when trying to pronounce a sound that is not in their own language (it's hard). Phonetically based dispreference is the result of aerodynamic, acoustic and articulatory factors. However, it is also hard to separate (a) from (b), that is, I don't find [ʕ] hard in any sense, but it is not part of my native language and I often elide the consonant when pronouncing words of Arabic (names) in an English discourse. It's possible that through massive social change that Arabic could influence English and we would nativize some words containing [ʕ], thus that consonant could be properly a part of the English consonant inventory where it was not one historically. This happened in the case of some Bantu languages of Southern Africa, which adopted click sounds from neighboring Khoisan languages, thus increasing the attestation frequency of clicks.
A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].
The question of possible influence of syntax, phonology or morphology on consonant frequency depends on what frequency you are speaking of. W.r.t. crosslinguistic frequency, the effect is zero. Token frequency within a language can be influenced by syntax, phonology or morphology, and yes/no frequency can be influenced by phonology (people often say that the lack of [ʕ] in English is a fact encoded in the phonological grammar of English). There is no general way to know in advance what the influence of syntax, phonology or morphology is on token frequency, because you don't know if a language is going to have rules deleting g in some context, or turning /k/ into [g] in some context, either of which will influence token frequency. Syntax and morphology can influence token frequency in case e.g. /k/ is figures in widely-used affixes, but again not every language has a ubiquitous affix /s/ or /d/ which increases the token frequency of these sounds in English. Post hoc, you can compute the percentage of tokens that are attributable to some affix or syntagmeme, but there's no predictive power apart from general predictions about samples from a different corpus.
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
add a comment |
Frequency of a thing can be in terms of all languages or a single language; it can be in terms of yes/no existence or in terms of actual use; if the latter it has to be relative to some defined corpus. As an example, [ʕ] is a zero-frequency consonant in English, and a low-frequency consonant in human language. I won't venture a guess about its frequency in Arabic, but it is not the least-frequent consonant of Arabic (Classical, at least). [t] on the other hard is very high frequency in and across languages. There is a vague concept out there of "markedness" that is invoked to encapsulate differing frequency of attestations, whereby it is said that [ʕ] is "marked" relative to [t].
Two factors that have the greatest influence on frequency of attestation are (a) intrinsic phonetic properties and (b) historical precedent. Ejectives are extremely low frequency in Indo-European languages because the proto-language lacked ejectives (I ignore the claim to the contrary), and [p] is low frequency in modern Arabic dialects because Classical Arabic didn't have [p]. However, English has [f] while PIE did not, so languages do develop new sounts.
Factor (a), intrinsic properties, is hard to explain or even establish satisfactorily. A popular idea applied to crosslinguistically low-frequency consonants is that they are "hard to pronounce"; the problem is that this can't be directly measured in an objective way, and seems to reflect the struggle that people have when trying to pronounce a sound that is not in their own language (it's hard). Phonetically based dispreference is the result of aerodynamic, acoustic and articulatory factors. However, it is also hard to separate (a) from (b), that is, I don't find [ʕ] hard in any sense, but it is not part of my native language and I often elide the consonant when pronouncing words of Arabic (names) in an English discourse. It's possible that through massive social change that Arabic could influence English and we would nativize some words containing [ʕ], thus that consonant could be properly a part of the English consonant inventory where it was not one historically. This happened in the case of some Bantu languages of Southern Africa, which adopted click sounds from neighboring Khoisan languages, thus increasing the attestation frequency of clicks.
A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].
The question of possible influence of syntax, phonology or morphology on consonant frequency depends on what frequency you are speaking of. W.r.t. crosslinguistic frequency, the effect is zero. Token frequency within a language can be influenced by syntax, phonology or morphology, and yes/no frequency can be influenced by phonology (people often say that the lack of [ʕ] in English is a fact encoded in the phonological grammar of English). There is no general way to know in advance what the influence of syntax, phonology or morphology is on token frequency, because you don't know if a language is going to have rules deleting g in some context, or turning /k/ into [g] in some context, either of which will influence token frequency. Syntax and morphology can influence token frequency in case e.g. /k/ is figures in widely-used affixes, but again not every language has a ubiquitous affix /s/ or /d/ which increases the token frequency of these sounds in English. Post hoc, you can compute the percentage of tokens that are attributable to some affix or syntagmeme, but there's no predictive power apart from general predictions about samples from a different corpus.
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
add a comment |
Frequency of a thing can be in terms of all languages or a single language; it can be in terms of yes/no existence or in terms of actual use; if the latter it has to be relative to some defined corpus. As an example, [ʕ] is a zero-frequency consonant in English, and a low-frequency consonant in human language. I won't venture a guess about its frequency in Arabic, but it is not the least-frequent consonant of Arabic (Classical, at least). [t] on the other hard is very high frequency in and across languages. There is a vague concept out there of "markedness" that is invoked to encapsulate differing frequency of attestations, whereby it is said that [ʕ] is "marked" relative to [t].
Two factors that have the greatest influence on frequency of attestation are (a) intrinsic phonetic properties and (b) historical precedent. Ejectives are extremely low frequency in Indo-European languages because the proto-language lacked ejectives (I ignore the claim to the contrary), and [p] is low frequency in modern Arabic dialects because Classical Arabic didn't have [p]. However, English has [f] while PIE did not, so languages do develop new sounts.
Factor (a), intrinsic properties, is hard to explain or even establish satisfactorily. A popular idea applied to crosslinguistically low-frequency consonants is that they are "hard to pronounce"; the problem is that this can't be directly measured in an objective way, and seems to reflect the struggle that people have when trying to pronounce a sound that is not in their own language (it's hard). Phonetically based dispreference is the result of aerodynamic, acoustic and articulatory factors. However, it is also hard to separate (a) from (b), that is, I don't find [ʕ] hard in any sense, but it is not part of my native language and I often elide the consonant when pronouncing words of Arabic (names) in an English discourse. It's possible that through massive social change that Arabic could influence English and we would nativize some words containing [ʕ], thus that consonant could be properly a part of the English consonant inventory where it was not one historically. This happened in the case of some Bantu languages of Southern Africa, which adopted click sounds from neighboring Khoisan languages, thus increasing the attestation frequency of clicks.
A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].
The question of possible influence of syntax, phonology or morphology on consonant frequency depends on what frequency you are speaking of. W.r.t. crosslinguistic frequency, the effect is zero. Token frequency within a language can be influenced by syntax, phonology or morphology, and yes/no frequency can be influenced by phonology (people often say that the lack of [ʕ] in English is a fact encoded in the phonological grammar of English). There is no general way to know in advance what the influence of syntax, phonology or morphology is on token frequency, because you don't know if a language is going to have rules deleting g in some context, or turning /k/ into [g] in some context, either of which will influence token frequency. Syntax and morphology can influence token frequency in case e.g. /k/ is figures in widely-used affixes, but again not every language has a ubiquitous affix /s/ or /d/ which increases the token frequency of these sounds in English. Post hoc, you can compute the percentage of tokens that are attributable to some affix or syntagmeme, but there's no predictive power apart from general predictions about samples from a different corpus.
Frequency of a thing can be in terms of all languages or a single language; it can be in terms of yes/no existence or in terms of actual use; if the latter it has to be relative to some defined corpus. As an example, [ʕ] is a zero-frequency consonant in English, and a low-frequency consonant in human language. I won't venture a guess about its frequency in Arabic, but it is not the least-frequent consonant of Arabic (Classical, at least). [t] on the other hard is very high frequency in and across languages. There is a vague concept out there of "markedness" that is invoked to encapsulate differing frequency of attestations, whereby it is said that [ʕ] is "marked" relative to [t].
Two factors that have the greatest influence on frequency of attestation are (a) intrinsic phonetic properties and (b) historical precedent. Ejectives are extremely low frequency in Indo-European languages because the proto-language lacked ejectives (I ignore the claim to the contrary), and [p] is low frequency in modern Arabic dialects because Classical Arabic didn't have [p]. However, English has [f] while PIE did not, so languages do develop new sounts.
Factor (a), intrinsic properties, is hard to explain or even establish satisfactorily. A popular idea applied to crosslinguistically low-frequency consonants is that they are "hard to pronounce"; the problem is that this can't be directly measured in an objective way, and seems to reflect the struggle that people have when trying to pronounce a sound that is not in their own language (it's hard). Phonetically based dispreference is the result of aerodynamic, acoustic and articulatory factors. However, it is also hard to separate (a) from (b), that is, I don't find [ʕ] hard in any sense, but it is not part of my native language and I often elide the consonant when pronouncing words of Arabic (names) in an English discourse. It's possible that through massive social change that Arabic could influence English and we would nativize some words containing [ʕ], thus that consonant could be properly a part of the English consonant inventory where it was not one historically. This happened in the case of some Bantu languages of Southern Africa, which adopted click sounds from neighboring Khoisan languages, thus increasing the attestation frequency of clicks.
A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].
The question of possible influence of syntax, phonology or morphology on consonant frequency depends on what frequency you are speaking of. W.r.t. crosslinguistic frequency, the effect is zero. Token frequency within a language can be influenced by syntax, phonology or morphology, and yes/no frequency can be influenced by phonology (people often say that the lack of [ʕ] in English is a fact encoded in the phonological grammar of English). There is no general way to know in advance what the influence of syntax, phonology or morphology is on token frequency, because you don't know if a language is going to have rules deleting g in some context, or turning /k/ into [g] in some context, either of which will influence token frequency. Syntax and morphology can influence token frequency in case e.g. /k/ is figures in widely-used affixes, but again not every language has a ubiquitous affix /s/ or /d/ which increases the token frequency of these sounds in English. Post hoc, you can compute the percentage of tokens that are attributable to some affix or syntagmeme, but there's no predictive power apart from general predictions about samples from a different corpus.
edited Apr 14 at 16:14
answered Apr 14 at 12:53
user6726user6726
36.2k12471
36.2k12471
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
add a comment |
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.
– vectory
Apr 14 at 19:55
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.
– user6726
Apr 14 at 20:16
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.
– vectory
Apr 14 at 20:22
add a comment |
Following your argument I guess, the frequency in German is pushed up by indefinite articles "ein-", case inflections and regular verb inflections on "-(e)n", and the plural marker "-en". The common mnemonic for most frequent letters in German is "ERNSTL", famously in the wheel of fortune game shows, ordered for ease of pronunciation of the mnemonic. One should wonder why words that are favored by the syntax contain those consonants.
Looking at the Wikipedia article for liquid consonants, we see claimed that they are very frequent. Down the page we see that the term originally described "the sonorant consonants (/l, r, m, n/) of classical Greek"--three of those matching our "ERNSTL". Mind that, while German "R" is nominally a trill, it is often produced as a mere approximant, or plainly elided. There's a lot to say about that, and about relations between "d" and "n" ("d" is just a non-nasalized "n"), that escape me at the moment.
We discriminate mainly two points of articulation, front and back, respectively the tip of the tongue and whatever your local accent prefers (uvular for me, rhotic for many Americans). "ng" is a velar nasal on the other hand, so somewhere in the middle, but we still hear most of it like a dental or alveolar; Those in turn are even represented with the same IPA sign; Many other IPA symbols are reminiscent of n, too.
The heart of the matter that I'm getting at is that those consonants are prefered, where the least effort is expanded in speech. Similarly, written speech optimizes for ease of writing and represents several sounds with the same symbol. Only if trying to be precise--talking clearly, or writing phonetically--will the difference be highlighted. However, we nevertheless hear, or see the difference if we expect it, even if it's hardly even there or only hinted at by context.
I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.
Note that [m], another nasal, is one of the earliest learned sounds of a child (and one researcher figured that was helped by the most basic lip action a baby gets, sucking on the mamaries). Note variants like "nana", "anna", etc. Whereas [p] is learned rather late. This alone implies levels of difficulty.
add a comment |
Following your argument I guess, the frequency in German is pushed up by indefinite articles "ein-", case inflections and regular verb inflections on "-(e)n", and the plural marker "-en". The common mnemonic for most frequent letters in German is "ERNSTL", famously in the wheel of fortune game shows, ordered for ease of pronunciation of the mnemonic. One should wonder why words that are favored by the syntax contain those consonants.
Looking at the Wikipedia article for liquid consonants, we see claimed that they are very frequent. Down the page we see that the term originally described "the sonorant consonants (/l, r, m, n/) of classical Greek"--three of those matching our "ERNSTL". Mind that, while German "R" is nominally a trill, it is often produced as a mere approximant, or plainly elided. There's a lot to say about that, and about relations between "d" and "n" ("d" is just a non-nasalized "n"), that escape me at the moment.
We discriminate mainly two points of articulation, front and back, respectively the tip of the tongue and whatever your local accent prefers (uvular for me, rhotic for many Americans). "ng" is a velar nasal on the other hand, so somewhere in the middle, but we still hear most of it like a dental or alveolar; Those in turn are even represented with the same IPA sign; Many other IPA symbols are reminiscent of n, too.
The heart of the matter that I'm getting at is that those consonants are prefered, where the least effort is expanded in speech. Similarly, written speech optimizes for ease of writing and represents several sounds with the same symbol. Only if trying to be precise--talking clearly, or writing phonetically--will the difference be highlighted. However, we nevertheless hear, or see the difference if we expect it, even if it's hardly even there or only hinted at by context.
I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.
Note that [m], another nasal, is one of the earliest learned sounds of a child (and one researcher figured that was helped by the most basic lip action a baby gets, sucking on the mamaries). Note variants like "nana", "anna", etc. Whereas [p] is learned rather late. This alone implies levels of difficulty.
add a comment |
Following your argument I guess, the frequency in German is pushed up by indefinite articles "ein-", case inflections and regular verb inflections on "-(e)n", and the plural marker "-en". The common mnemonic for most frequent letters in German is "ERNSTL", famously in the wheel of fortune game shows, ordered for ease of pronunciation of the mnemonic. One should wonder why words that are favored by the syntax contain those consonants.
Looking at the Wikipedia article for liquid consonants, we see claimed that they are very frequent. Down the page we see that the term originally described "the sonorant consonants (/l, r, m, n/) of classical Greek"--three of those matching our "ERNSTL". Mind that, while German "R" is nominally a trill, it is often produced as a mere approximant, or plainly elided. There's a lot to say about that, and about relations between "d" and "n" ("d" is just a non-nasalized "n"), that escape me at the moment.
We discriminate mainly two points of articulation, front and back, respectively the tip of the tongue and whatever your local accent prefers (uvular for me, rhotic for many Americans). "ng" is a velar nasal on the other hand, so somewhere in the middle, but we still hear most of it like a dental or alveolar; Those in turn are even represented with the same IPA sign; Many other IPA symbols are reminiscent of n, too.
The heart of the matter that I'm getting at is that those consonants are prefered, where the least effort is expanded in speech. Similarly, written speech optimizes for ease of writing and represents several sounds with the same symbol. Only if trying to be precise--talking clearly, or writing phonetically--will the difference be highlighted. However, we nevertheless hear, or see the difference if we expect it, even if it's hardly even there or only hinted at by context.
I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.
Note that [m], another nasal, is one of the earliest learned sounds of a child (and one researcher figured that was helped by the most basic lip action a baby gets, sucking on the mamaries). Note variants like "nana", "anna", etc. Whereas [p] is learned rather late. This alone implies levels of difficulty.
Following your argument I guess, the frequency in German is pushed up by indefinite articles "ein-", case inflections and regular verb inflections on "-(e)n", and the plural marker "-en". The common mnemonic for most frequent letters in German is "ERNSTL", famously in the wheel of fortune game shows, ordered for ease of pronunciation of the mnemonic. One should wonder why words that are favored by the syntax contain those consonants.
Looking at the Wikipedia article for liquid consonants, we see claimed that they are very frequent. Down the page we see that the term originally described "the sonorant consonants (/l, r, m, n/) of classical Greek"--three of those matching our "ERNSTL". Mind that, while German "R" is nominally a trill, it is often produced as a mere approximant, or plainly elided. There's a lot to say about that, and about relations between "d" and "n" ("d" is just a non-nasalized "n"), that escape me at the moment.
We discriminate mainly two points of articulation, front and back, respectively the tip of the tongue and whatever your local accent prefers (uvular for me, rhotic for many Americans). "ng" is a velar nasal on the other hand, so somewhere in the middle, but we still hear most of it like a dental or alveolar; Those in turn are even represented with the same IPA sign; Many other IPA symbols are reminiscent of n, too.
The heart of the matter that I'm getting at is that those consonants are prefered, where the least effort is expanded in speech. Similarly, written speech optimizes for ease of writing and represents several sounds with the same symbol. Only if trying to be precise--talking clearly, or writing phonetically--will the difference be highlighted. However, we nevertheless hear, or see the difference if we expect it, even if it's hardly even there or only hinted at by context.
I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.
Note that [m], another nasal, is one of the earliest learned sounds of a child (and one researcher figured that was helped by the most basic lip action a baby gets, sucking on the mamaries). Note variants like "nana", "anna", etc. Whereas [p] is learned rather late. This alone implies levels of difficulty.
answered Apr 14 at 20:53
vectoryvectory
40212
40212
add a comment |
add a comment |
Thanks for contributing an answer to Linguistics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f31164%2fwhat-causes-relative-frequency-of-consonants%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.
– sumelic
Apr 14 at 11:50
1
Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?
– sumelic
Apr 14 at 12:00
I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.
– FlatAssembler
Apr 14 at 12:01
1
Oh, I see, it's the second-to-last paragraph.
– sumelic
Apr 14 at 12:03
2
Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...
– Hagen von Eitzen
Apr 14 at 18:55