This
short report infers the percentage of Internet users who are behind NAT/NAPT
boxes by studying the traffic into two public QuakeIII
Arena game servers. The results are only approximate (guesstimates,
really) and there are weaknesses in my methodology. But I think the analysis
is still of some interest, if only as a starting point for other estimates
of NAT/NAPT
penetration in the consumer Internet space.
I ran
two QuakeIII servers for over 2 weeks, and calculated the percentage of
clients who connected to the server from 'odd' UDP ports. I consider such
clients to represent players sitting behind NAT/NAPT boxes (since QuakeIII
clients by default transit packets from a well known UDP port). Given certain
assumptions, approximately 18 to 19% of the sample community are using
NAT boxes.
The approach I used is to look for packets whose TCP/UDP port numbers had been modified in transit (a typical NAPT 'fingerprint'), and deduce from that the likely existence of a NAT box somewhere along the packet's path. But how do you know if port numbers are being modified in transit? The trick is to monitor traffic in and out of the server of a client/server application whose clients originate their packets from a constant, well-known port number. In my case, I've sampled a subset of the Internet community who play the online interactive game "QuakeIII Arena".
Two QuakeIII servers ran pretty much 7x24 on a single machine (600MHz Celeron, FreeBSD4.2, QuakeIII v1.17) hosted at a private residence on the US west coast at the end of February 2001. I was fortunate enough to find an ex-colleague in the silicon valley area with a T1 line into his house direct from a local, well-connected ISP. This made the servers quite attractive, as players frequently seek servers with low ping times. Server logs collected information on the IP addresses and client side UDP port numbers of every player who joined (whether or not they actually stayed to play).
QuakeIII
clients who joined from a UDP port outside the range 27960-27963 were considered
likely NAT/NAPT candidates.
For my purposes, a client is a unique host attached to the Internet either directly or through a NAT box. The challenge is to estimate a reasonably accurate count of actual clients from the information (each player's ASCII playername and their IP address) logged by the QuakeIII server.
A simple definition would be to consider each distinct IP address as a separate client. However, this fails to account for people whose ISP allocates their addresses dynamically (either through DHCP or PPPoE) from a common address pool. A single 'client' player might appear across a number of IP addresses, or the same IP address might be re-assigned to different 'client' players over a period of time if they share a common ISP.
A number of IP address and playername permutations must be considered.
Finally,
an IP address may appear multiple times in the logfile. It is considered
to have an 'odd' UDP port number if at any time it appears in the
logfile associated with a port number outside the range 27960 to 27963
(regardless of how many times it appears in the logfile with a port number
inside this range).
Over a period of 17 days my QuakeIII servers saw 3068 unique IP addresses. Of these, 534 IP addresses belonged to players coming from unusual UDP port numbers (i.e. outside the range 27960 to 27963).
Based
on this simplistic analysis, approximately 17.4% of the QuakeIII playing
community sits behind NAT/NAPT boxes.
The first step is isolating the DHCP/PPPoE assigned addresses. 2659 unique playernames were seen during the trial period, with 266 (10%) of these playernames appearing multiple times across a mix of IP addresses. If the IP addresses for each of these 266 playernames fall under a common prefix, that prefix is assumed to represent a DHCP/PPPoE managed address pool. All seen IP addresses are then re-checked and marked as DHCP/PPPoE assigned adddresses if they fall under one of the probable DHCP/PPPoE address pool prefixes.
The client count then proceeds through the following stages:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For example, of the total 266 playernames who had multiple IP addresses, 131 playernames (49.2%) had multiple IP addresses all falling under a common 20 bit prefix. This led to a calculation of 2919 unique 'clients' of which 541 clients were logged connecting in from an odd UDP port (and therefore represent probable instances of NAT/NAPT).
Naturally, the number of multi-address playernames with addresses that appear to be DHCP/PPPoE assigned rises as the target common prefix gets shorter, and drops as the prefix gets longer. Which prefix you chose to derive results from depends on your own particular belief about the typical size of ISP address pools. If Class-B sized address pools (of the form w.x.0.0/16) seem likely, then NAT boxes are used by approximately 19% of the QuakeIII playing community. Interestingly, this isn't far from the 17.4% calculated through very simplistic analysis.
(If
you're interested, click here
for a table of all the playernames seen, but be aware this is a 250KB webpage.)
You could probably drive a truck (or at least a small scooter) through some of assumptions underpinning this analysis. It is constrained by the inherent ambiguities in discerning which playername/IPaddress combinations represent unique clients, and my desire to derive a result solely from the logged data set's internal characteristics. A couple of modifications could be made to this analysis:
Over a period of 17 days my QuakeIII servers saw 3068 unique IP addresses and 2659 unique playernames (of which 266 playernames were associated with two or more IP addresses). By observing the number of players connecting from non-QuakeIII client UDP port numbers (a NAPT fingerprint) I estimate that 18 to 19% of the QuakeIII playing community are behind NAT boxes.
How
these results might generalize to the broader Internet community is currently
unclear (to me, at least).