Estimating the penetration of NAT/NAPT in the online game community
Grenville Armitage, 3/4/01


This short report infers the percentage of Internet users who are behind NAT/NAPT boxes by studying the traffic into two public QuakeIII Arena game servers. The results are only approximate (guesstimates, really) and there are weaknesses in my methodology. But I think the analysis is still of some interest, if only as a starting point for other estimates of NAT/NAPT penetration in the consumer Internet space.

I ran two QuakeIII servers for over 2 weeks, and calculated the percentage of clients who connected to the server from 'odd' UDP ports. I consider such clients to represent players sitting behind NAT/NAPT boxes (since QuakeIII clients by default transit packets from a well known UDP port). Given certain assumptions, approximately 18 to 19% of the sample community are using NAT boxes.



Detecting NAT/NAPT

The approach I used is to look for packets whose TCP/UDP port numbers had been modified in transit (a typical NAPT 'fingerprint'), and deduce from that the likely existence of a NAT box somewhere along the packet's path. But how do you know if port numbers are being modified in transit? The trick is to monitor traffic in and out of the server of a client/server application whose clients originate their packets from a constant, well-known port number. In my case, I've sampled a subset of the Internet community who play the online interactive game "QuakeIII Arena".

Two QuakeIII servers ran pretty much 7x24 on a single machine (600MHz Celeron, FreeBSD4.2, QuakeIII v1.17) hosted at a private residence on the US west coast at the end of February 2001. I was fortunate enough to find an ex-colleague in the silicon valley area with a T1 line into his house direct from a local, well-connected ISP. This made the servers quite attractive, as players frequently seek servers with low ping times.  Server logs collected information on the IP addresses and client side UDP port numbers of every player who joined (whether or not they actually stayed to play).

QuakeIII clients who joined from a UDP port outside the range 27960-27963 were considered likely NAT/NAPT candidates.



Defining a 'client'

For my purposes, a client is a unique host attached to the Internet either directly or through a NAT box. The challenge is to estimate a reasonably accurate count of actual clients from the information (each player's ASCII playername and their IP address) logged by the QuakeIII server.

A simple definition would be to consider each distinct IP address as a separate client. However, this fails to account for people whose ISP allocates their addresses dynamically (either through DHCP or PPPoE) from a common address pool. A single 'client' player might appear across a number of IP addresses, or the same IP address might be re-assigned to different 'client' players over a period of time if they share a common ISP.

A number of IP address and playername permutations must be considered.

If a playername appears multiple times in the logfiles with different IP addresses, and these IP addresses fall under a reasonably long (e.g. 16 bits or more) prefix, then I assume I've identified a DHCP/PPPoE address pool. (The evidence might also represent multiple clients using the same ISP, whose addresses are statically assigned from a closely related pool of addresses, and who just happen to use the same playername. However, since playernames seem to highly diverse I believe this case can be considered rare. Click here for a table of all the playernames seen, but be aware this is a 250KB webpage.)

Finally, an IP address may appear multiple times in the logfile. It is considered to have an 'odd' UDP port number if at any time it appears in the logfile associated with a port number outside the range 27960 to 27963 (regardless of how many times it appears in the logfile with a port number inside this range).



Simplistic Analysis

Over a period of 17 days my QuakeIII servers saw 3068 unique IP addresses. Of these, 534 IP addresses belonged to players coming from unusual UDP port numbers (i.e. outside the range 27960 to 27963).

Based on this simplistic analysis, approximately 17.4% of the QuakeIII playing community sits behind NAT/NAPT boxes.



Account for DHCP/PPPoE address pools

The first step is isolating the DHCP/PPPoE assigned addresses. 2659 unique playernames were seen during the trial period, with 266 (10%) of these playernames appearing multiple times across a mix of IP addresses. If the IP addresses for each of these 266 playernames fall under a common prefix, that prefix is assumed to represent a DHCP/PPPoE managed address pool. All seen IP addresses are then re-checked and marked as DHCP/PPPoE assigned adddresses if they fall under one of the probable DHCP/PPPoE address pool prefixes.

The client count then proceeds through the following stages:

Naturally, detecting DHCP/PPPoE address pools is highly approximate. The results depend on my choice of prefix length under which I compare the IP addresses belonging to each playername. The table below summarizes my results for a range of theoretical prefix lengths from 8 to 30 bits.
 
Common prefix length (bits)
Possible DHCP/PPPoE pools
Number of Unique Clients
Number of Clients with Odd Ports
Estimated NAT Percentage
8
198 (74.4%)
2934
590
20.1%
10
190 (71.4%)
2857
563
19.7%
12
182 (68.4%)
2855
562
19.7%
14
165 (62.0%)
2900
559
19.3%
16
154 (57.9%)
2901
545
18.8%
18
147 (55.3%) 
2903
545
18.8%
20
131 (49.2%)
2919
541
18.5%
22
110 (41.4%)
2945
543
18.4%
24
 59 (22.2%)
3000
539
18.0%
26
 22 (8.3%)
3052
536
17.6%
28
 12 (4.5%)
3061
535
17.5%
30
3 (1.1%)
3064
534
17.4%

For example, of the total 266 playernames who had multiple IP addresses, 131 playernames (49.2%) had multiple IP addresses all falling under a common 20 bit prefix. This led to a calculation of 2919 unique 'clients' of which 541 clients were logged connecting in from an odd UDP port (and therefore represent probable instances of NAT/NAPT).

Naturally, the number of multi-address playernames with addresses that appear to be DHCP/PPPoE assigned rises as the target common prefix gets shorter, and drops as the prefix gets longer. Which prefix you chose to derive results from depends on your own particular belief about the typical size of ISP address pools. If Class-B sized address pools (of the form w.x.0.0/16) seem likely, then NAT boxes are used by approximately 19% of the QuakeIII playing community. Interestingly, this isn't far from the 17.4% calculated through very simplistic analysis.

(If you're interested, click here for a table of all the playernames seen, but be aware this is a 250KB webpage.)



Caveats

You could probably drive a truck (or at least a small scooter) through some of assumptions underpinning this analysis. It is constrained by the inherent ambiguities in discerning which playername/IPaddress combinations represent unique clients, and my desire to derive a result solely from the logged data set's internal characteristics. A couple of modifications could be made to this analysis:

Another issue is how the number of clients suspected of using NAT rises above the 534 clients ascertained from simpistic analysis as the common prefix gets shorter. My logfile analysis algorithm allows for multiple playernames to map to the same IP address, and if the IP address is believed to be from a DHCP/PPPoE pool then each distinct playername counts as a distinct client. If that particular IP address was ever seen with an 'odd' port number, each of the playernames mapping to that IP address count as 'odd' clients. As the prefix gets shorter, many more IP addresses are being marked as 'probably from a DHCP/PPPoE pool'. I should improve the algorithm at some point, since this may be overestimating the number of potential NAT boxes slightly.


Conclusion

Over a period of 17 days my QuakeIII servers saw 3068 unique IP addresses and 2659 unique playernames (of which 266 playernames were associated with two or more IP addresses). By observing the number of players connecting from non-QuakeIII client UDP port numbers (a NAPT fingerprint) I estimate that 18 to 19% of the QuakeIII playing community are behind NAT boxes.

How these results might generalize to the broader Internet community is currently unclear (to me, at least).



Grenville Armitage, 3/4/01