How silence suppression saves WAN bandwidth

Silence suppression, or voice activation detection (VAD), technology is used to save WAN bandwidth used by VoIP traffic, while comfort noise generation (CNG) counteracts negative affects on VoIP speech quality.

What is silence suppression?

To help understand silence suppression technology, consider this example: Some people talk all the time. Even they have to take a breath and they sometimes listen and are quiet. On a phone call, it is quite common that there is silence in one direction of the call while there is speech carried in the other direction. The public switched telephone network (PSTN) does not take advantage of this condition. The PSTN opens up a path in both directions, between the speakers, and allocates 100% of the path capacity to the call, even when there is silence.

IP networks operate differently. Packets are transmitted when there is data and control information. Packets are not sent when there is nothing to send. In essence, when there is data silence, no IP network bandwidth is consumed.

Silence suppression, also called voice activity detection voice activation detection (VAD) is the application of this IP network principle to VoIP calls. When there is silence, you don't send voice packets full of silence. Silence suppression can save bandwidth, especially on IP trunks. The savings can be 40% to 50%. The packet-sending phone or gateway implements the VAD function in the codec.

Comfort noise generation

Think of the technology of VAD as a speech door. It opens for speech and closes for silence. What if this is applied to VoIP? Pure silence is an indication to the listener that the connection has been lost. There is always some low-level background noise on the call. The developers of VAD implement comfort noise generation (CNG) in the receiving phone or gateway so that the listener does not conclude that the call has been disconnected. CNG is locally generated at the receiving end of the call and does not consume bandwidth. The loudness of the comfort noise can be adjusted in most IP phones. There is usually a default setting for the CNG.

Determining silence

The codec in the transmitting phone is the detector of silence. Actually, the codec detects when the incoming sound drops below a threshold. There is rarely pure silence. There is always some background noise. When the sound received by the codec exceeds the threshold, speech packets are transmitted. If the codec has too low a threshold, background noise will be transmitted, negating the bandwidth savings. If the threshold is too high, the beginning of the word will be clipped off and speech comprehension will suffer. This is called front end clipping (FEC). The FEC function can be adjusted in some products. Check the vendor's best practices guide.

Finding the end of speech

The codec is also responsible for detecting when the speech is finished. If the threshold is set too high, then the end of the word will be clipped off. This is more common for words that end in low loudness components, such as words ending in an "F" or "S." End-of-word clipping can be more annoying than beginning-of-word clipping. Some IP phones and gateways have an adjustable timeout for the end of the word, so the clipping is reduced or eliminated. Check with the vendor's guides for the method of making this end-of-word adjustment. The penalty is that as you extend the timeout, less bandwidth is saved; but the speech is improved.

More speech quality issues

What if the speaker turns his mouth away from the phone? The sound level will probably fall below the threshold. Then speech is classified as noise and no packets are transmitted. The listener must then ask for the speech again or is perhaps unaware that words were lost.

What happens when the background noise exceeds the threshold? The codec then transmits packets of noise. There is no bandwidth savings during a loud background noise condition such as street noise, other conversations not on the call, TV or radio sound.

VAD on the LAN

LAN bandwidth is not taxed by VoIP calls. In the previous tip, "VoIP bandwidth: Calculate consumption," the highest bit rate is less than 90 Kbps per call, an insignificant amount of bandwidth on a 10 or 100 Mbps connection. Even though many calls will be combined on the back of the LAN switch, the total bandwidth consumed is far less than 10 Mbps. Therefore, there is no real value to applying VAD to the LAN environment.

VAD on the WAN

The WAN is the place where bandwidth is limited and the cost is high, so there is an incentive to apply VAD. The anticipated goal of saving as much as 50% of the WAN bandwidth, when using IP trunking, is very attractive. The first consideration is the number of simultaneous calls that will be carried over the IP trunk. Cisco recommends that the number of simultaneous calls should be 24 or greater. The probability that all speakers will talk at the same time decreases as the number of speakers increases on an IP trunk, just as the more lanes on a highway you want to travel, the more likely there will be an empty lane. The fewer the speakers, the more likely it is that many or all of them will speak at the same time and that there will not be enough bandwidth available. This will lead to extra delay, jitter and packet loss. All these impairments reduce the voice quality.

Assume that there are more than 24 calls on an IP trunk. Do not go for the 50% bandwidth savings. A more achievable goal is a bandwidth savings of 35%. Even this goal may not be achievable. First experiment with the VAD function turned off. Then turn it on and observe the voice quality degradation. The degradation may be more than is acceptable. When music-on-hold is in use, VAD is useless. The use of fax through a gateway negates the use of VAD. I had one client that used music-on-hold frequently and for long durations -- minutes. This client actually had to buy extra bandwidth than was originally expected, even with VAD.

The conclusion is that VAD is a great idea, but it will not work well on lower-bandwidth IP trunks and has no value on the LAN.

About the author:
Gary Audin has more than 40 years of computer, communications and security experience. He has planned, designed, specified, implemented and operated data, LAN and telephone networks. These have included local area, national and international networks, as well as VoIP and IP convergent networks, in the U.S., Canada, Europe, Australia and Asia.

More VoIP tips from Gary Audin

Dig Deeper on VoIP and IP telephony