Sunday, February 10, 2008

The Domain Name System

The whole network of programs and databases that cooperates to translate hostnames to IP addresses is called ‘DNS’ (Domain Name System). When you see references to a ‘DNS server’, that means what we just called a nameserver. Internet hostnames are composed of parts separated by dots. A domain is a collection of machines that share a common name suffix. Domains can live inside other domains. For example, the machine www.google.com lives in the .google.com subdomain of the .com domain.

Each domain is defined by an authoritative name server that knows the IP addresses of the other machines in the domain. The authoritative (or ‘primary') name server may have backups in case it goes down; if we see references to a secondary name server or (‘secondary DNS') it's talking about one of those. These secondaries typically refresh their information from their primaries every few hours, so a change made to the hostname-to-IP mapping on the primary will automatically be propagated.

Now here's the important part. The nameservers for a domain do not have to know the locations of all the machines in other domains (including their own subdomains); they only have to know the location of the nameservers. In our example, the authoritative name server for the .com domain knows the IP address of the nameserver for .google.com but not the address of all the other machines in .google.com.

The domains in the DNS system are arranged like a big inverted tree. At the top are the root servers. Everybody knows the IP addresses of the root servers; they're wired into your DNS software. The root servers know the IP addresses of the nameservers for the top-level domains like .com and .org, but not the addresses of machines inside those domains. Each top-level domain server knows where the nameservers for the domains directly beneath it are, and so forth.

DNS is carefully designed so that each machine can get away with the minimum amount of knowledge it needs to have about the shape of the tree, and local changes to subtrees can be made simply by changing one authoritative server's database of name-to-IP-address mappings.

When we query for the IP address of www.google.com, what actually happens is this: First, your nameserver asks a root server to tell it where it can find a nameserver for .com. Once it knows that, it then asks the .org server to tell it the IP address of a .google.com nameserver. Once it has that, it asks the .google.com nameserver to tell it the address of the host www.google.com.

Most of the time, the nameserver doesn't actually have to work that hard. Nameservers do a lot of cacheing; when yours resolves a hostname, it keeps the association with the resulting IP address around in memory for a while. This is why, when you surf to a new website, you'll usually only see a message from your browser about "Looking up" the host for the first page you fetch. Eventually the name-to-address mapping expires and DNS has to re-query — this is important so we don't have invalid information hanging around forever when a hostname changes addresses. Our cached IP address for a site is also thrown out if the host is unreachable.

No comments: