Trying to measure VoIP call quality

The quality of a voice-over-IP call can’t be monitored in conventional ways. Traditional network probes focus on isolated traffic statistics, not on the inherent quality or clarity of speech, or the end-user’s perception.

These traditional tools provide discrete data on link throughput and utilization, jitter, delay, errors, packet loss totals or rates. These statistics give little insight into call quality because they disregard burst loss, miss jitter buffer discards and do not incorporate the perceptual effects of network impairments.

Packet loss tends to be a major cause of lost voice signal. It arises primarily from network congestion. Codecs used to encode/decode digitally sampled audio signals try to mask packet loss by replaying the last packet, interpolating from previous packets or adding noise. These packet-loss-concealment techniques suffice when packets are lost individually or at random. They are fairly ineffective with burst loss, in which much more signal is lost.

Also, packet loss may occur at different rates throughout the course of a call, so call quality will vary. Other important aspects of call quality are perception and short-term auditory memory, or time.

AT&T Corp. researchers found that moving a burst of noise from the start to the end of a call can greatly affect perceived quality. The effect can be modeled by measuring the time delay or the time between the end of a call and the last significant burst of packet loss, and calculate how much the burst of packet loss the listener forgot.

A new type of intelligent call-quality monitor based on the International Telecommunication Union’s E-Model for estimating voice quality is designed to accurately model packet-loss distribution and end-user perception, and correlate this with codec type and delay, to give a single score. This information is important for network managers monitoring a VoIP service-level agreement.

This monitoring approach is lightweight, computationally efficient to allow real-time monitoring of calls, and standards-based. It is part of a new European Telecommunications Standards Institute standard on quality-of-service measurement published in November 2000. The standard is an extension of the ITU’s standard G.107 E-Model, a network transmission-planning tool used for estimating voice call quality.

This approach essentially adds packet and burst loss distribution and time models to E-Model, which looks at functions such as delay and equipment impairments to provide a more accurate assessment of call quality.

The ideal location to monitor call packet-stream impairments is from inside a VoIP end system such as a VoIP gateway or IP phone. There, the technology can be logically positioned between the jitter buffer and codec. Most VoIP end systems employ adaptive jitter buffers that effectively remove jitter but increase delay.

Excessively delayed packets are received by the end station but discarded by the jitter buffer, effectively creating additional packet loss. Being placed between the jitter buffer and the codec, the model sees not only packets lost in the network but also packets discarded by the jitter buffer, which are not seen by typical “on the wire” devices.

Other traditional approaches to measuring call quality, such as Perceptual Speech Quality Measure, did not specifically consider packet loss burstiness or human memory. They are compute-intensive, generating and comparing original transmitted signals to received signals. The complexity and compute intensiveness of these applications make them unsuitable for real time or embedding in VoIP end systems. Also, as active testers, generating additional traffic on the network, they are not suitable as monitors but more as sampling devices.

Massad is vice-president of marketing at Telchemy Inc. He can be reached at