Home use and NAT penetration in the QuakeIII Community
Grenville Armitage, 5/2/01
Updated 5/13/01


Real-time, interactive games are poised to become a key driver for adding QoS capabilities to last-mile/first-mile consumer IP access services (such as DSL, CableModem, and 56K dial-up). Unfortunately the ISP industry lacks much insight into the demographics of such game players, making it hard to quantify the importance of consumer gamers.  This short report provides a rough, lower bound on the number of QuakeIII players who typically connect in from home. This report also provides an estimate of current NAT/NAPT penetration across the QuakeIII playing community (worth understanding since NAT/NAPT-enabled routers can complicate IP QoS schemes).

I ran two public QuakeIII Arena game servers between mid-February 2001 and the end of April 2001. The servers ran 7x24 on a single machine (600MHz Celeron, FreeBSD4.2, QuakeIII v1.17) hosted at a private residence on the US west coast, connected via a dedicated T1 line to a local, well-connected ISP. Each server had its own separate four-map cycle, ran two 'bots' to attract players, and was limited to a maximum of 6 additional (remote) clients at a time. Server logs collected the player-assigned ASCII playernames, IP addresses, and client side UDP port numbers of every player who joined, and how long they stayed.

The 63 calendar days of server operation generated a table of 7847 distinct playernames and 9068 unique associated IP addresses. 7291 (80.4%) of the IP addresses could be resolved to domain names. I use domain names in conjunction with playernames to distinguish individual clients even when an ISP is assigning different IP addresses to the same client from a dynamic address pool. Where a playername has a set of IP addresses sharing a common domain suffix (differing only in the hostname part of the domain name) I consider this to be a single client. Across all the seen playernames, 8954 unique clients where detected.

Many playernames are just tourists, dropping in briefly but not really constituting serious representation of the QuakeIII gaming community. For example, based on playernames that saw at least 8 minutes of playing time over the 63 days, there were 5081 unique clients detected. 3237 of these clients had associated domain names under the non-regional, top level domains ".net", ".com", ".edu", and ".org". Of these, 2289 clients (approximately 71% of the non-regional clients, or 45.1% of all clients) appeared to be home users.

The penetration of NAT/NAPT functionality in the QuakeIII community was assessed by noting the percentage of clients who connected from 'odd' UDP ports. I considered such clients to represent players sitting behind NAT/NAPT boxes (since QuakeIII clients by default transit packets from a well known UDP port).

The rest of this report describes how I define a 'client' and 'home user', the distribution of domain names of the clients seen using the server, and finally the number of clients who appear to be home users. My results aren't rigorously defended, and there are weaknesses in my methodology. Please feel free to provide suggestions that you think would improve the analysis.



Defining a client

For my purposes, a client is a game-playing host attached to the Internet. The first challenge is identifying actual clients based on the information logged by the QuakeIII server (each player's self-chosen ASCII playername, and their IP address).

Treating distinct IP addresses as separate clients fails to account for clients whose ISP allocates their addresses dynamically (either through DHCP or PPPoE) from a common address pool. A single 'client' player might appear across a number of IP addresses, or the same IP address might be re-assigned to different 'client' players over a period of time if they share a common ISP. In and attempt to account for such scenarios, I use both ASCII playernames and IP addresses in order to distinguish and count clients, and consider the following playername and IP address permutations:

The first case indicates a unique client. I assume the second case is a variation of the first. Because players tend to pick, and stick with, one expressive playername the second case represents multiple clients where the IP address just happens to have been re-used by the ISP (one client per unique <playername,ipaddress> pair). The third case is more complex, and represents a mix of clients whose ISPs use dynamic IP address assignment and clients who just happen to use the same playername from different parts of the Internet.

I infer dynamic address pools from the domain names associated with each logged IP address. I noticed that ISPs using dynamic address assignments seem to group the domain names of such addresses under a common domain suffix (such as "XXX.dsl.snfc21.pacbell.net" or "XXX.dsl.mindspring.com", where XXX is a hostname component associated with the dynamically assigned IP address). Thus, if a playername is seen logging in from multiple IP addresses, and these addresses fall under a common domain suffix (differing only in the hostname part) I consider that <playername, common_suffix> tuple to represent a single client.

Reverse-lookups occur in the .in-addr.arpa meta-domain (using gethostbyaddr()), and failed for some of the logged IP addresses. For lack of a better solution, where a playername had one or more unresolved IP addresses associated with it I treated each <playername, unresolved_address> tuple as a distinct client.

One notable exception to the above rules is the playername "UnnamedPlayer" - the default playername used by unconfigured QuakeIII clients. Because clients scattered all over the Internet were seen logging in briefly with "UnnamedPlayer" before changing their names to something unique, I chose to ignore statistics associated with the UnnamedPlayer. [Including the UnnamedPlayer in my results would have increased the number of seen IP addresses to 9493, with 7541 (79.4%) resolved to domain names.]

Finally, a client is only relevant if the associated playername has been seen playing for at least N minutes (summed over all the appearances this playername made during the 63 days of server operation). When N is 0 minutes, the analysis includes clients that simply logged in, looked around, and left ('tourists'). Setting N to some non-zero number of minutes weeds out the impact of tourists and increases the focus on clients who actually represent regular QuakeIII players.

I've placed the list of seen playernames on a separate page.



Defining 'home user'

I consider the difference between a 'work' and a 'home' machine as:

To differentiate between 'home' and 'work' machines, I analyse the domain name (or common suffix) associated with each client's IP address (or addresses).

The top level domain (TLD) alone doesn't directly help determine 'home' vs 'work'. For example, although ".com" nominally represents commercial (aka, 'work') addresses, some ISPs serve their consumer customers with IP addresses that reverse-map to ".com" names (the ".home.com" subdomain being an example in my data set). Similarly, country-specific TLDs (for example, ".au" or ".uk") only tell me the client's IP address is owned by an ISP who choses to register their domain names under a regional naming registrar.

Fortunately, many ISPs name their internal sub-domains names in such a way that we can sometimes infer whether an IP address originates in a home or business. (Lacking detailed information about every ISP's topology and customer base makes it hard to be precise, but it is better than nothing.)  I compiled a list of domains that I believe represent 'home' accounts, and compared all the clients against this 'home-domains' list. Clients whose addresses fall under on of the home-domains are considered to be 'home users', and if not then they are 'work users'.



Detecting NAT/NAPT

The existence of a NAT box somewhere along a packet's path can be deduced from the existence of packets whose TCP/UDP port numbers have been modified in transit (a typical NAPT 'fingerprint'). Fortunately, QuakeIII clients by default use one of four constant, well known UDP source port numbers. I modified the QuakeIII server slightly to log the UDP ports from which each player connected, and treated QuakeIII clients who joined from a UDP port outside the range 27960-27963 as likely NAT/NAPT candidates.

(An earlier report on the penetration of NAT boxes  used a similar detection scheme, but a totally different method of classifying and counting clients.)



How many, and where from?
Updated 5/13/01

Judging from all the resolved top level domains, my servers saw clients from many locations around the world. However, by far the majority fell under the ".com" and ".net" domains. Additionally, the distribution of clients in each domain shifts as we exclude players who were being 'tourists' (clients logged in for less than a few minutes).

The following table summarizes the total number of clients inferred from playernames who logged more than N minutes (for N equal to 0, 4, 16, and 40), the number of those clients inferred from resolved IP addresses, and the number of such clients whose IP addresses resolved to domain names under the non-regional domains of ".com", ".net", and ".org".
 

Playername logged time (min)
Total number of clients
Clients from resolved addresses
Clients resolved to non-regional domains
0
8954
6774
5335 (78.8%)
4
6085
4656
3858 (82.9%)
16
3723
2769
2366 (85.4%)
40
2001
1414
1242 (87.8%)

The relative contribution of clients from the ".net" domain increases as we exclude tourists, while the ".com" domain shifts only slightly (the table shows % relative to the total number of clients whose IP addresses could be resolved to domain names). The total contribution of ".com" and ".net" rises, suggesting that regional (most likely non-US) clients were more likely to leave without playing much (unsurprising - the latency would be poor for clients who aren't well connected to the US West Coast).
 

Playername logged time (min)
'Resolved' clients under .com
'Resolved' clients under .net
Both .com and .net
0
2610 (38.5%)
2556 (37.7%)
5116 (75.5%)
4
1886 (40.5%)
1860 (39.9%)
3746 (80.4%)
16
1154 (41.7%)
1137 (41.1%)
2291 (82.7%)
40
578 (40.9%)
627 (44.3%)
1205 (85.2%)

 (If you'd like to look at a partial domain suffix tree, here's the domain name tree for IP addresses associated with playernames seen for at least 16 minutes.)



So how many clients are HOME users?

I do not trust my insights into the structure of regional ISPs, and they are only a small percentage of my players, so I prefer to estimate the percentage of home users relative to the number of clients seen under the non-regional ".com", ".net", ".edu", and ".org" domains. As mentioned earlier, I compiled a list of non-regional domains that I believe represent 'home' accounts, and compared all the clients against this 'home-domains' list. Clients whose addresses fall under one of the home-domains are considered to be 'home users'.

When calculating the percentage of clients who are home users, we should account for the clients whose IP addresses couldn't be resolved to domain names. We could just assume the home/work distribution will be the same as for clients whose domain names could be resolved. However, it is also plausible that unresolvable addresses are unresolvable precisely because they *are* work addresses, and therefore represent work clients (private companies being less likely to register .in-arpa.addr reverse mappings for all their externally visible IP addresses).

The following table shows the count of clients falling under US-centric 'home' domains. The percentage values reflect the number of home users relative to the number of clients with resolved addresses in the non-regional domains, and relative to the total number of clients (counting those with un-resolved address as non-home clients) respectively.
 

Playername logged time (min)
Clients resolved to non-regional domains
Non-regional clients resolved to 'Home' domains
Home clients as % of all clients
0
5335
3624 (67.9%)
40.5%
4
3858
2684 (69.6%)
44.1%
8
3229
2289 (70.9%)
45.1%
16
2366
1707 (72.1%)
45.9%
32
1459
1073 (73.5%)
45.6%
40
1242
912 (73.4%)
45.6%

If you believe the unresolved IP address client distribution is similar to the resolved address client distribution, then approximately 71% of my QuakeIII clients were home users.

If you believe the unresolved IP address clients represent non-home users, then at least 40% of my QuakeIII clients were home users.

(By their very nature we cannot infer whether clients with unresolvable IP addresses are regional or non-regional. As a consequence, the "% of all clients" column reflects the ratio of clients resolved under non-regional 'home' domains to all the clients seen from around the world by my QuakeIII server. I think it is fair to see these percentages as a lower bound on the % of home users.)



How many NAT/NAPT clients?

The percentage of NAT/NAPT clients across all clients is shown in the following table. It would appear that around 22-25% of the QuakeIII playing community are using (or being subject to the influence of) NAT boxes. It would also appear that dedicated players are slightly more likely to have a NAT box (judging from the increase in percentage as tourists are eliminated).
 

Playername logged time (min)
Total number of clients
Total NAT/NAPT Clients
0
8954
 1990 (22.2%)
4
6085
 1347 (22.1%)
8
5081
1154 (22.7%)
16
3723
874(23.5%)
32
2351
590 (25.1%)
40
2001
512(25.6%)

Because unresolved IP addresses add some uncertainty to the 'client' determination algorithm, the following table breaks down the number of NAT/NAPT clients who were counted as such due to a resolved and unresolved IP addresses respectively. The percentage values are relative to the total number of resolved and unresolved clients respectively.
 

Playername logged time (min)
'Resolved' clients
'Resolved' NAT/NAPT Clients
'Unresolved' Clients
'Unresolved' NAT/NAPT clients
0
6774
1324 (19.5%)
2180
666 (30.6%)
4
4656
899 (19.3%)
1429
448 (31.4%)
8
3859
754 (19.5%)
1222
400 (32.7%)
16
2769
552 (19.9%)
954
322 (33.8%)
32
1675
341 (20.4%)
676
249 (36.8%)
40
1414
287 (20.3%)
587
225 (38.3%)

There's definitely a greater proportion of NAT boxes among the people with unresolvable IP addresses, but since we can't be entirely sure about why these addresses are unresolvable it is hard to make any strong hypotheses. What is nice to see is that there's no huge discrepency between the percentages of NAT calculated for resolved-address and unresolved-address client pools.

The percentage NAT/NAPT penetration values obtained here for the resolved-address clients seems to match closely with results obtained in an earlier report on NAT penetration (which used a different algorithm to determine 'clients').



Discussion and Caveats

This analysis depends significantly on drawing conclusions from inconclusive data. There are a number of considerations to keep in mind:



Conclusion

Using two public QuakeIII servers to attract 8954 game players over a period of 63 calendar days I was able to estimate that between 40% to 73% of the game playing community are logging in from home accounts. The lower bound is due to the uncertainty surrounding approximately 20% of the 9078 logged IP addresses that cannot be resolved back into domain names. The upper bound depends on the accuracy my list of home-domains, and how aggresively we weed out 'tourist' players.  NAT/NAPT penetration was estimated at between 22% and 25% of the playing popuation.


Grenville Armitage, 5/2/01
Updated 5/13/01