Bandwidth is misleading
Grenville Armitage, 5/2/01

Ever notice how public train services differentiate themselves from cars and buses by pointing to the enormous power of their diesel-electric engines?

Or how the airlines encourage you to compare the enormous thrust of their jet engines with those of their competitors?

Not really? Me neither.

And yet that's precisely what we fall for as customers when we allow IP service providers to compete on bandwidth. Bandwidth - the number of bits per second taken from us or delivered to us - is like thrust, or torque, or some other measure of underlying 'system strength'. But it is misguided to express our needs, as customers, merely in terms of bandwidth. Whether I'm talking about train travel, air travel, or data networking, my needs really boil down to loss rates, latency, and jitter.

Consider air travel. As consumers, we're interested in the published trip times, percentage of on-time departures and arrivals, whether our luggage will be lost, and how the airline will treat us when things go wrong. I don't know about you, but the thrust of the jet engines doesn't generally factor into my choice of airlines. I leave it to the airlines to obtain aircraft with whatever thrust they need to carry enough passengers in accordance with the published timetables that they stay in business. The competitve market revolves around reliability and timeliness of the airline's ability to carry me and my luggage (with perks such as club lounges and priority check-in lines at either end to buffer the experience during periods of service congestion).

Likewise, bandwidth is a means to an end - a commodity we should expect our service provider to provision in sufficient quantity to meet our latency, loss, and jitter needs. But defining that 'sufficient quantity' should not be our concern. (It is similarly not our concern how the provider meets latency goals. Allowing for the intrinsic lower bound set by the speed of light and the distance between your application end points, latency can be affected by the average queue lengths in routers along your traffic's path, and the number of router/switch hops along your traffic's path. But that's not something you need worry about if your agreement is specified in terms of latency bounds.)

"Ah!", I hear you exclaim, "My application needs a certain amount of bandwidth, and so it is reasonable to ask for at least that much!"  Well, sort of. However, move your perspective around a little. Yes, your application may have known upper bounds on the rate at which it may generate traffic. And yes, this becomes a boundary condition for the network service provider, representing how the application is likely to behave. However, it doesn't actually capture what the application needs from the network. An application's needs are better expressed as some combination of latency, loss, and jitter.

Consider an interactive multimedia application (voice, video, multi-player gaming, etc...)  The overall latency requirements are bounded by the desire for tolerable interactivity (made up of latency in the end to end network path and latencies in the end hosts themselves). Humans can adapt to certain levels of latency, but will find it harder to adapt if the latency itself is varying unpredictably with time (jitter). Packet loss diminishes the user experience because it forces each end to guess (or totally miss) what was sent (audio and video codecs interpolate to compensate for missing sampes, interactive game environments may briefly pause game action for the affected participants). End-host tricks like playout buffers can smooth the effects of jitter, but at the cost of additional average latency.

In all of this, where's the bandwidth requirement on the network provider? It is indirect. Clearly, if the application is transmitting data into the network at a higher rate than the network can handle, there'll be a backlog inside the network (queues will build at points of congestion). But such behaviour will reveal itself through an increase in end to end latency. If your service agreement specifies certain latency bounds, and you've appropriately characterized the peak rates at which your application will generate data, the network provider will simply do what they must to ensure those bounds are met. The bandwidth of the service provider's connection between your sites or application end hosts is no longer your problem to define.

Although they are often more relaxed in terms of jitter and latency, non-interactive applications (for example,  web surfing, file transfers, email, and streaming media) can be considered the same way. In the case of web surfing, page download speed is a combined function of webserver load and TCP 'goodput' (the throughput actually achieved for each HTTP connection). File transfers and email also depend on TCP's achievable goodput. Yet TCP itself doesn't have a specific transmission rate you can specify to the network provider. TCP's typical behavior is to continuously probe for the maximum data rate the network can handle, retreating only when the network 'pushes back' by losing packets.

So the question becomes, what is the data rate below which the network must exhibit extremely low packet loss rates? This upper rate bound depends on the application's users - web surfers will probably take all the goodput they can get, while email can usually tolerate transport times measured in tens of seconds or minutes. Once you've decided this upper bound, ask the network provider to ensure packet loss rates are minimal whenever the application is generating traffic below your specified rate. Above that rate, the network can do whatever it likes. (You may also want to specify latency bounds too, since TCP's performance can drop off in the presence of very high round trip times.)  Notice that the actual bandwidth of the service provider's connection between your sites or application end hosts is, again, no longer your problem to define.

"But what about statistical multiplexing?", I hear you ask. Indeed. Statistical multiplexing doesn't create capacity from thin air. It allows multiple traffic sources to share a network resource statistically, trading network utilization efficiency against tighter bounds on latency, loss, and jitter. It is a tool for the network provider. If you specify your traffic solely in terms of an upper rate bound, the network provider may not have much wiggle room for stat muxing your traffic internally with other traffic they are carrying. However, if your traffic has statistical properties (for example, the offered data rate fluctuates widely over short time periods) there may be room for statistical multiplexing gains. So go learn and understand the temporal characteristics of your favorite application. Discuss it with your service providers. Let them squeeze out efficiencies while meeting your latency, loss, and jitter goals.....

....and remember, bandwidth is a means to an end. You're the customer. Ask about their commitment to their timetables, not the power of their engines.