Acoustic Theory of Speech Production

Acoustic Theory of Speech Production: The acoustic theory of speech production delves into the mechanics of how our vocal apparatus produces sound, enabling us to express thoughts, emotions, and ideas. This article aims to unravel the secrets of speech production, examining the intricate interplay between the human vocal tract, airflow, and articulatory gestures.

The Human Vocal Tract: A Dynamic Resonator


The human vocal tract serves as a dynamic resonator responsible for shaping sounds into recognizable speech. It comprises the pharyngeal, oral, and nasal cavities, each playing a vital role in sound production. As air from the lungs passes through the vocal tract, it interacts with various structures, leading to the formation of distinct speech sounds. 

Understanding Articulatory Gestures: The Building Blocks of Speech  

Articulatory gestures are precise movements of the speech organs that form speech sounds. The articulators, including the tongue, lips, and jaw, execute intricate maneuvers to produce different phonemes, the smallest units of sound in language. 

The Source-Filter Theory: Generating Speech Sounds 


Vocal tract function

The source-filter theory elucidates the process of speech sound generation. According to the Source Filter Theory of Speech Production theory, the vocal folds act as the sound source, producing a buzzing sound as air passes through them. This sound then travels through the vocal tract, which functions as a filter, modifying the source sound to create distinct speech sounds.

“The filter is defined by the resonances of the vocal Tract” 


The contributions are added instead of multiplied if each of these quantities is expressed in decibels (dB).    

THE SOURCE FUNCTION LOSES ABOUT 10% OF ITS STRENGTH i.e.12 dB/oct.  The mouth’s radiation efficiency increases by approximately 6 dB/octave; hence, the net decrease caused by the first and third terms in the equation is approximately 6 dB/octave.


The Spectrum of Sound Production

Vowel Production: Shaping Resonant Sounds 

Vowels are essential components of speech, characterized by their resonance and the positioning of the tongue and lips. The varying shapes of the vocal tract during vowel production create a range of resonant sounds that differentiate one vowel from another.

Consonant Production: Constraining Sound  

Unlike vowels, consonants involve constriction in the vocal tract. Consonant sounds are created by partially or completely blocking the airflow, resulting in distinct sounds such as “p,” “t,” and “s.” 

Coarticulation: The Seamless Blend of Sounds  

Coarticulation refers to the phenomenon where speech sounds overlap during articulation. It allows us to produce speech fluidly and efficiently, as one sound prepares the articulators for the next in a seamless blend. 

Prosody: The Rhythmic Pattern of Speech  

Prosody encompasses the rhythmic and melodic aspects of speech. It includes intonation, stress, and rhythm, which add expressiveness and convey emotions and emphasis. 

Speech Disorders and Their Acoustic Signatures  

Various speech disorders can arise from anomalies in speech production. Conditions like stuttering, apraxia, and dysarthria have unique acoustic signatures, allowing researchers to study and treat them effectively. 

The Role of Acoustic Phonetics in Speech Production Research 

Acoustic phonetics is a branch of linguistics that studies the physical properties of speech sounds. It plays a crucial role in speech production research, aiding in the development of speech recognition systems, language learning tools, and communication aids for individuals with speech impairments. 


The acoustic theory of speech production is a captivating exploration of the human capacity for communication. From the subtle movements of our articulators to the resonance of our vocal tracts, speech production involves an extraordinary interplay of biological and physical processes. Understanding these mechanisms not only deepens our appreciation for the beauty of language but also opens new doors for innovative technologies and therapies. 

Perturbation Theory  

Perturbation Theory states that: Vibration is minimal (node) at some sites and maximal (antinodes) at other points in the vocal tract. Volume velocity fluctuations reflect these vibration patterns.  

  • The antinode and node of F1 are located at the open and closed ends, respectively.  
  • There are two antinodes and two nodes for F2. 
  •  There are three nodes and three antinodes for F3. 

If there is a change in cross sectional area (a perturbation), the formant frequency will either rise or fall depending on whether it is close to a node or an antinode. Lip constrictions lower all formant frequencies. All formant frequencies increase in response to laryngeal constriction. 

Amplitude relationships 

Amounts are influenced by formant frequencies. When two formant frequencies get closer together, both peaks get bigger. If F1 is dropped (raised), A1 is lowered (rises). 

The Connection Between Acoustic Theory of Speech Production and Perturbation Theory of Speech Production 

Acoustic Theory of Speech Production and Perturbation Theory of Speech Production are interconnected through their shared focus on the study of disturbances and deviations. In the context of Perturbation Theory of Speech Production helps us understand how small changes or perturbations in the sound-producing system can affect the resulting sound. By applying Perturbation Theory of Speech Production, we can gain valuable insights into the behavior of sound waves and the factors influencing their production. 

The Role of Perturbation Theory of Speech Production 

The Perturbation Theory of Speech Production plays a significant role in elucidating the intricate processes involved in sound production. By examining the effects of perturbations on sound-producing systems, we can uncover the underlying mechanisms that shape the characteristics of the generated sound. Perturbation theory allows us to analyze phenomena such as changes in pitch, timbre, and intensity, providing a deeper understanding of the complexities of sound production. 

Exploring the Vocal Mechanism 

The human voice is a remarkable instrument capable of producing a wide range of sounds. Perturbation theory plays a vital role in understanding vocal production, including the control of pitch, resonance, and vowel formation. By analyzing the intricate interactions between the vocal folds, vocal tract, and airflow, we can gain insights into the complexities of the vocal mechanism and the role of perturbations in vocal sound production.

The Impact of Perturbation Theory of Speech Production 

Speech production involves a delicate interplay between the articulatory organs, such as the tongue, lips, and vocal folds. Perturbation Theory of Speech Production offers a valuable framework for studying the influence of perturbations on speech sounds. Perturbation Theory of Speech Production allows us to examine phenomena like coarticulation, where adjacent sounds influence each other, and the production of speech sounds under different speaking rates and contexts. 

Pipe Model Theory of Speech Production


Model of Vowel Production

The pipe model of sound production is a concept used to understand how sound is produced in wind instruments, such as flutes, clarinets, and organ pipes. Pipe Model Theory of Speech Production helps explain the fundamental principles behind the generation of musical notes in these instruments.

In the Pipe model Theory of Speech Production model, the instrument is represented as an air column enclosed within a tube or pipe. When a musician blows air into the instrument, the air column inside the pipe starts vibrating, producing sound. 

The key components of the Pipe model Theory of Speech Production include: 

  • Air column: This represents the volume of air inside the instrument’s tube or pipe. When the musician blows air into the instrument, the air column begins to oscillate. 
  • Embouchure: The embouchure is the small opening at one end of the tube where the musician blows air into the instrument. The shape and size of the embouchure affect the airflow and thus influence the pitch and tone of the sound produced. 
  • Resonance: As the air column vibrates, it produces sound waves. The length of the air column determines the fundamental frequency, which corresponds to the lowest pitch the instrument can produce. Changing the length of the vibrating air column (by using different fingerings or stops in the case of organ pipes) allows the musician to produce different notes. 
  • Harmonics: Besides the fundamental frequency, the vibrating air column also produces higher-frequency harmonics. These are integer multiples of the fundamental frequency and contribute to the timbre or tone color of the sound. 
  • Register: Wind instruments often have multiple registers or octaves, which are produced by varying the airflow and the intensity of vibration of the air column. 

The Pipe model Theory of Speech Production helps explain the physics of how wind instruments produce sound. It is a valuable tool for musicians to understand how to control and manipulate their instruments effectively. The Pipe Model Theory of Speech Production also provides insight into how different types of wind instruments produce unique sounds and why they can cover a wide range of pitches and timbres.

Component Tube Theory of Speech Production  

The Component Tube Theory, proposed by Gunnar Fant in 1960, revolutionized the understanding of speech production. Component Tube Theory simplifies the complexities of speech into a model that considers the vocal tract as a series of tubes with adjustable lengths and diameters. The Component Tube Theory highlights the role of the vocal tract’s shape in generating various speech sounds. 

The Respiratory System: The role of airflow


Speech Sub-system

The process of speech production begins with the respiratory system. The lungs provide the airflow necessary for speech, while the diaphragm and intercostal muscles regulate the inhalation and exhalation of air. 

Inhalation and Exhalation 

During speech, inhalation and exhalation play critical roles. Inhalation brings air into the lungs, while controlled exhalation facilitates speech production by regulating the outflow of air. 

Controlling Air Pressure 

The regulation of air pressure is essential for creating different speech sounds. Varying the air pressure within the vocal tract allows us to produce sounds with varying loudness and intensity.

The Phonatory System: Phonation and Voicing 

Located within the larynx, the phonatory system comprises the vocal folds. As air passes through the glottis, the vocal folds vibrate, producing voiced sounds. 

Vocal Fold Vibration 

Phonation occurs when the vocal folds vibrate, creating voiced sounds. The rate of vibration determines the pitch of the sound produced. 

Voiced and Voiceless Sounds 

Speech sounds are categorized into voiced and voiceless sounds. Voiced sounds involve the vibration of the vocal folds, while voiceless sounds are produced without vocal fold vibration.

The Articulatory System 

The articulatory system encompasses various speech organs, including the tongue, lips, teeth, and soft palate. The precise movements and configurations of these organs are crucial for producing distinct speech sounds. 


The tongue is one of the most critical articulators. It can move forward, backward, up, and down within the mouth. The position and shape of the tongue create different speech sounds. For example, the placement of the tongue against the roof of the mouth produces different consonants such as “t,” “d,” “n,” “l,” and many more. 


The lips are highly versatile in speech production. They can be rounded or spread, and their position helps form various sounds, especially vowels. Vowel sounds like “oo” as in “food” and “ee” as in “see” involve significant lip movement. 


The jaw plays a role in producing different vowels by controlling the height of the tongue. Opening and closing the jaw modify the resonance of the vocal tract, affecting vowel quality. 


The upper and lower teeth can come together or be apart to produce certain speech sounds. For example, the “th” sound in “think” and “this” is created by placing the tongue between the teeth. 

Alveolar Ridge

The alveolar ridge is a small, bony ridge located just behind the upper front teeth. It is crucial for producing many common sounds, like “t,” “d,” “n,” and “s.” 

Hard Palate

The hard palate is the hard, bony part of the roof of the mouth. It is involved in producing certain consonant sounds, such as “k” and “g.” 

Soft Palate (Velum)

The soft palate is the soft, flexible part of the roof of the mouth towards the back. It can be raised to close off the nasal passage, producing oral sounds like “p,” “b,” “m,” or lowered to allow air to flow through the nose for nasal sounds like “n” and “ng.” 


The glottis is the space between the vocal cords. It plays a role in voice production and can be controlled to produce different voice qualities and speech sounds, such as the distinction between voiced and voiceless consonants. 


  • A Course in Phonetics Sixth Edition – PETER LADEFOGED [Book]
  • Speech and Hearing Science Anatomy and Physiology 3rd edition – Zemlin [Book]
  • Anatomy and Physiology of Speech, Language and Hearing – Siekel [Book]
  • Speech Science Primer (Sixth Edition) Lawrence J. Raphael, Gloria J. Borden, Katherine S. Harris [Book]

You are reading about:

Acoustic Theory of Speech Production


July 27, 2023

Follow us on

For more updates follow us on Facebook, Twitter, Instagram, Youtube and Linkedin

You may also like….


If you have any Suggestion or Question Please Leave a Reply