ICT BLog: February 2008

Friday, February 15, 2008

HTTP, an Application Protocol

Now let's get back to our example. Web browsers and servers speak an application protocol that runs on top of TCP/IP, using it simply as a way to pass strings of bytes back and forth. This protocol is called HTTP (Hyper-Text Transfer Protocol) and we've already seen one command in it — the GET shown above.

When the GET command goes to www.google.com webserver with service number 80, it will be dispatched to a server daemon listening on port 80. Most Internet services are implemented by server daemons that do nothing but wait on ports, watching for and executing incoming commands.

If the design of the Internet has one overall rule, it's that all the parts should be as simple and human-accessible as possible. HTTP, and its relatives (like the Simple Mail Transfer Protocol, SMTP, that is used to move electronic mail between hosts) tend to use simple printable-text commands that end with a carriage-return/line feed.

This is marginally inefficient; in some circumstances you could get more speed by using a tightly-coded binary protocol. But experience has shown that the benefits of having commands be easy for human beings to describe and understand outweigh any marginal gain in efficiency that you might get at the cost of making things tricky and opaque.

Therefore, what the server daemon ships back to you via TCP/IP is also text. The beginning of the response will look something like this (a few headers have been suppressed):

HTTP/1.1 200 OK
Date: Sat, 10 Oct 1998 18:43:35 GMT
Server: Apache/1.2.6 Red Hat
Last-Modified: Thu, 27 Aug 1998 17:55:15 GMT
Content-Length: 2982
Content-Type: text/html

These headers will be followed by a blank line and the text of the web page (after which the connection is dropped). Your browser just displays that page. The headers tell it how (in particular, the Content-Type header tells it the returned data is really HTML).

Sunday, February 10, 2008

TCP and IP

To understand how multiple-packet transmissions are handled, we need to know that the Internet actually uses two protocols, stacked one on top of the other.

The lower level, IP (Internet Protocol), is responsible for labeling individual packets with the source address and destination address of two computers exchanging information over a network. For example, when you access http://www.google.com, the packets you send will have your computer's IP address, such as 192.168.1.101, and the IP address of the www.google.com computer, 152.2.210.81. These addresses work in much the same way that your home address works when someone sends you a letter. The post office can read the address and determine where you are and how best to route the letter to you, much like a router does for Internet traffic.

The upper level, TCP (Transmission Control Protocol), gives you reliability. When two machines negotiate a TCP connection (which they do using IP), the receiver knows to send acknowledgements of the packets it sees back to the sender. If the sender doesn't see an acknowledgement for a packet within some timeout period, it resends that packet. Furthermore, the sender gives each TCP packet a sequence number, which the receiver can use to reassemble packets in case they show up out of order. (This can easily happen if network links go up or down during a connection.)

TCP/IP packets also contain a checksum to enable detection of data corrupted by bad links. (The checksum is computed from the rest of the packet in such a way that if either the rest of the packet or the checksum is corrupted, redoing the computation and comparing is very likely to indicate an error.) So, from the point of view of anyone using TCP/IP and nameservers, it looks like a reliable way to pass streams of bytes between hostname/service-number pairs. People who write network protocols almost never have to think about all the packetizing, packet reassembly, error checking, checksumming, and retransmission that goes on below that level.

Packets and Routers

What the browser wants to do is send a command to the Web server on www.google.com that looks like this:

GET /LDP/HOWTO/Fundamentals.html HTTP/1.0

Here's how that happens. The command is made into a packet, a block of bits like a telegram that is wrapped with three important things; the source address (the IP address of your machine), the destination address (152.19.254.81), and a service number or port number (80, in this case) that indicates that it's a World Wide Web request.

Our machine then ships the packet down the wire (Our connection to our ISP, or local network) until it gets to a specialized machine called a router. The router has a map of the Internet in its memory — not always a complete one, but one that completely describes your network neighborhood and knows how to get to the routers for other neighborhoods on the Internet.

The packet may pass through several routers on the way to its destination. Routers are smart. They watch how long it takes for other routers to acknowledge having received a packet. They also use that information to direct traffic over fast links. They use it to notice when another router (or a cable) have dropped off the network, and compensate if possible by finding another route.

There's an urban legend that the Internet was designed to survive nuclear war. This is not true, but the Internet's design is extremely good at getting reliable performance out of flaky hardware in an uncertain world. This is directly due to the fact that its intelligence is distributed through thousands of routers rather than concentrated in a few massive and vulnerable switches (like the phone network). This means that failures tend to be well localized and the network can route around them.

Once the packet gets to its destination machine, that machine uses the service number to feed the packet to the web server. The web server can tell where to reply to by looking at the command packet's source IP address. When the web server returns this document, it will be broken up into a number of packets. The size of the packets will vary according to the transmission media in the network and the type of service.

The Domain Name System

The whole network of programs and databases that cooperates to translate hostnames to IP addresses is called ‘DNS’ (Domain Name System). When you see references to a ‘DNS server’, that means what we just called a nameserver. Internet hostnames are composed of parts separated by dots. A domain is a collection of machines that share a common name suffix. Domains can live inside other domains. For example, the machine www.google.com lives in the .google.com subdomain of the .com domain.

Each domain is defined by an authoritative name server that knows the IP addresses of the other machines in the domain. The authoritative (or ‘primary') name server may have backups in case it goes down; if we see references to a secondary name server or (‘secondary DNS') it's talking about one of those. These secondaries typically refresh their information from their primaries every few hours, so a change made to the hostname-to-IP mapping on the primary will automatically be propagated.

Now here's the important part. The nameservers for a domain do not have to know the locations of all the machines in other domains (including their own subdomains); they only have to know the location of the nameservers. In our example, the authoritative name server for the .com domain knows the IP address of the nameserver for .google.com but not the address of all the other machines in .google.com.

The domains in the DNS system are arranged like a big inverted tree. At the top are the root servers. Everybody knows the IP addresses of the root servers; they're wired into your DNS software. The root servers know the IP addresses of the nameservers for the top-level domains like .com and .org, but not the addresses of machines inside those domains. Each top-level domain server knows where the nameservers for the domains directly beneath it are, and so forth.

DNS is carefully designed so that each machine can get away with the minimum amount of knowledge it needs to have about the shape of the tree, and local changes to subtrees can be made simply by changing one authoritative server's database of name-to-IP-address mappings.

When we query for the IP address of www.google.com, what actually happens is this: First, your nameserver asks a root server to tell it where it can find a nameserver for .com. Once it knows that, it then asks the .org server to tell it the IP address of a .google.com nameserver. Once it has that, it asks the .google.com nameserver to tell it the address of the host www.google.com.

Most of the time, the nameserver doesn't actually have to work that hard. Nameservers do a lot of cacheing; when yours resolves a hostname, it keeps the association with the resulting IP address around in memory for a while. This is why, when you surf to a new website, you'll usually only see a message from your browser about "Looking up" the host for the first page you fetch. Eventually the name-to-address mapping expires and DNS has to re-query — this is important so we don't have invalid information hanging around forever when a hostname changes addresses. Our cached IP address for a site is also thrown out if the host is unreachable.

Names and Locations

The first thing the browser has to do is to establish a network connection to the machine where the document lives. To do that, it first has to find the network location of the host www.google.com (‘host’ is short for ‘host machine’ or ‘network host'; www.google.org is a typical hostname). The corresponding location is actually a number called an IP address (we'll explain the ‘IP’ part of this term later).

To do this, the browser queries a program called a name server. The name server may live on your machine, but it's more likely to run on a service machine that yours talks to. When you sign up with an ISP, part of your setup procedure will almost certainly involve telling the Internet software the IP address of a nameserver on the ISP's network.

The name servers on different machines talk to each other, exchanging and keeping up to date all the information needed to resolve hostnames (map them to IP addresses). The nameserver may query three or four different sites across the network in the process of resolving www.google.com, but this usually happens very quickly (as in less than a second). The nameserver will tell the browser that www.google.com IP address is 64.233.167.99; knowing this, the machine will be able to exchange bits with www.google.com directly.

Wednesday, February 6, 2008

Disadvantages of Extranet

Extranets can be expensive to implement and maintain within an organization (e.g.: hardware, software, employee training costs) — if hosted internally instead of via an ASP.
Security of extranets can be a big concern when dealing with valuable information. System access needs to be carefully controlled to avoid sensitive information falling into the wrong hands.
Extranets can reduce personal contact (face-to-face meetings) with customers and business partners. This could cause a lack of connections made between people and a company, which hurts the business when it comes to loyalty of its business partners and customers.

Extranet: Industry Uses

During the late 1990s and early 2000s, several industries started to use the term "extranet" to describe central repositories of shared data made accessible via the web only to authorized members of particular work groups.

For example, in the construction industry, project teams could login to and access a 'project extranet' to share drawings and documents, make comments, issue requests for information, etc. In 2003 in the United Kingdom, several of the leading vendors formed the Network of Construction Collaboration Technology Providers, or NCCTP, to promote the technologies and to establish data exchange standards between the different systems. The same type of construction-focused technologies have also been developed in the United States, Australia, Scandinavia, Germany and Belgium, among others. Some applications are offered on a Software as a Service (SaaS) basis by vendors functioning as Application service providers (ASPs).

Specially secured extranets are used to provide virtual data room services to companies in several sectors (including law and accountancy).

There are a variety of commercial extranet applications, some of which are for pure file management, and others which include broader collaboration and project management tools. Also exist a variety of Open Source extranet applications and modules, which can be integrated into other online collaborative applications such as Content Management Systems. Companies can use an extranet to:

Exchange large volumes of data using Electronic Data Interchange (EDI)
Share product catalogs exclusively with wholesalers or those "in the trade"
Collaborate with other companies on joint development efforts
Jointly develop and use training programs with other companies
Provide or access services provided by one company to a group of other companies, such as an online banking application managed by one company on behalf of affiliated banks
Share news of common interest exclusively with partner companies

Extranet

An extranet is a private network that uses Internet protocols, network connectivity, and possibly the public telecommunication system to securely share part of an organization's information or operations with suppliers, vendors, partners, customers or other businesses. An extranet can be viewed as part of a company's Intranet that is extended to users outside the company (e.g.: normally over the Internet). It has also been described as a "state of mind" in which the Internet is perceived as a way to do business with a preapproved set of other companies business-to-business (B2B), in isolation from all other Internet users. In contrast, business-to-consumer (B2C) involves known server(s) of one or more companies, communicating with previously unknown consumer users.

Briefly, an extranet can be understood as a private intranet mapped onto the Internet or some other transmission system not accessible to the general public, but is managed by more than one company's administrator(s). For example, military networks of different security levels may map onto a common military radio transmission system that never connects to the Internet. Any private network mapped onto a public one is a virtual private network (VPN). In contrast, an intranet is a VPN under the control of a single company's administrator(s).

An argument has been made that "extranet" is just a buzzword for describing what institutions have been doing for decades, that is, interconnecting to each other to create private networks for sharing information. One of the differences that characterized an extranet, however, is that its interconnections are over a shared network rather than through dedicated physical lines. With respect to Internet Protocol networks, RFC 2547 states "If all the sites in a VPN are owned by the same enterprise, the VPN is a corporate intranet. If the various sites in a VPN are owned by different enterprises, the VPN is an extranet. A site can be in more than one VPN; e.g., in an intranet and several extranets. We regard both intranets and extranets as VPNs. In general, when we use the term VPN we will not be distinguishing between intranets and extranets. Even if this argument is valid, the term "extranet" is still applied and can be used to eliminate the use of the above description."

Another very common use of the term "extranet" is to designate the "private part" of a website, where "registered users" can navigate, enabled by authentication mechanisms on a "login page".

An extranet requires security and privacy. These can include firewalls, server management, the issuance and use of digital certificates or similar means of user authentication, encryption of messages, and the use of virtual private networks (VPNs) that tunnel through the public network.

Many technical specifications describe methods of implementing extranets, but often never explicitly define an extranet. RFC 3547 presents requirements for remote access to extranets. RFC 2709 discusses extranet implementation using IPSec and advanced network address translation (NAT).

Sunday, February 3, 2008

Today's Internet

Aside from the complex physical connections that make up its infrastructure, the Internet is facilitated by bi- or multi-lateral commercial contracts (e.g., peering agreements), and by technical specifications or protocols that describe how to exchange data over the network. Indeed, the Internet is essentially defined by its interconnections and routing policies.

As of September 30, 2007, 1.244 billion people use the Internet according to Internet World Stats. Writing in the Harvard International Review, philosopher N.J.Slabbert, a writer on policy issues for the Washington DC-based Urban Land Institute, has asserted that the Internet is fast becoming a basic feature of global civilization, so that what has traditionally been called "civil society" is now becoming identical with information technology society as defined by Internet use. Some suggest that as low as 2% of the World's population regularly accesses the internet.

Internet

The Internet is a worldwide, publicly accessible series of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It is a "network of networks" that consists of millions of smaller domestic, academic, business, and government networks, which together carry various information and services, such as electronic mail, online chat, file transfer, and the interlinked web pages and other resources of the World Wide Web (WWW).

History

The USSR's launch of Sputnik spurred the United States to create the Advanced Research Projects Agency, known as ARPA, in February 1958 to regain a technological lead. ARPA created the Information Processing Technology Office (IPTO) to further the research of the Semi Automatic Ground Environment (SAGE) program, which had networked country-wide radar systems together for the first time. J. C. R. Licklider was selected to head the IPTO, and saw universal networking as a potential unifying human revolution.

Licklider moved from the Psycho-Acoustic Laboratory at Harvard University to MIT in 1950, after becoming interested in information technology. At MIT, he served on a committee that established Lincoln Laboratory and worked on the SAGE project. In 1957 he became a Vice President at BBN, where he bought the first production PDP-1 computer and conducted the first public demonstration of time-sharing.

At the IPTO, Licklider recruited Lawrence Roberts to head a project to implement a network, and Roberts based the technology on the work of Paul Baran[citation needed] who had written an exhaustive study for the U.S. Air Force that recommended packet switching (as opposed to circuit switching) to make a network highly robust and survivable. After much work, the first two nodes of what would become the ARPANET were interconnected between UCLA and SRI International in Menlo Park, California, on October 29, 1969. The ARPANET was one of the "eve" networks of today's Internet. Following on from the demonstration that packet switching worked on the ARPANET, the British Post Office, Telenet, DATAPAC and TRANSPAC collaborated to create the first international packet switched network service. In the UK, this was referred to as the International Packet Stream Service (IPSS), in 1978. The collection of X.25-based networks grew from Europe and the US to cover Canada, Hong Kong and Australia by 1981. The X.25 packet switching standard was developed in the CCITT (now called ITU-T) around 1976. X.25 was independent of the TCP/IP protocols that arose from the experimental work of DARPA on the ARPANET, Packet Radio Net and Packet Satellite Net during the same time period. Vinton Cerf and Robert Kahn developed the first description of the TCP protocols during 1973 and published a paper on the subject in May 1974. Use of the term "Internet" to describe a single global TCP/IP network originated in December 1974 with the publication of RFC 674, the first full specification of TCP that was written by Vinton Cerf, Yogen Dalal and Carl Sunshine then at Stanford University. During the next nine years, work proceeded to refine the protocols and to implement them on a wide range of operating systems.

The first TCP/IP-wide area network was made operational by January 1, 1983 when all hosts on the ARPANET were switched over from the older NCP protocols to TCP/IP. In 1985, the United States' National Science Foundation (NSF) commissioned the construction of a university 56 kilobit/second network backbone using computers called "fuzzballs" by their inventor, David L. Mills. The following year, NSF sponsored the development of a higher speed 1.5 megabit/second backbone that become the NSFNet. A key decision to use the DARPA TCP/IP protocols was made by Dennis Jennings, then in charge of the Supercomputer program at NSF.

The opening of the network to commercial interests began in 1988. The US Federal Networking Council approved the interconnection of the NSFNET to the commercial MCI Mail system in that year and the link was made in the summer of 1989. Other commercial electronic email services were soon connected, including OnTyme, Telemail and Compuserve. In that same year, three commercial Internet Service Providers were created: UUNET, PSINET and CERFNET. Important, separate networks that offered gateways into, then later merged with the Internet include Usenet and BITNET. Various other commercial and educational networks, such as Telenet, Tymnet, Compuserve and JANET were interconnected with the growing Internet. Telenet (later called Sprintnet) was a large privately-funded national computer network with free dial-up access in cities throughout the U.S. that had been in operation since the 1970s. This network was eventually interconnected with the others in the 1980s as the TCP/IP protocol became increasingly popular. The ability of TCP/IP to work over virtually any pre-existing communication networks allowed for a great ease of growth although the rapid growth of the Internet was due primarily to the availability of commercial routers from companies such as Cisco Systems, Proteon and Juniper, the availability of commercial Ethernet equipment for local area networking and the widespread implementation of TCP/IP on the UNIX operating system.

Although the basic applications and guidelines that make the Internet possible had existed for almost a decade, the network did not gain a public face until the 1990s. On August 6, 1991, CERN, which straddles the border between France and Switzerland, publicized the new World Wide Web project. The Web was invented by English scientist Tim Berners-Lee in 1989.

An early popular web browser was ViolaWWW based upon HyperCard. It was eventually replaced in popularity by the Mosaic web browser. In 1993, the National Center for Supercomputing Applications at the University of Illinois released version 1.0 of Mosaic, and by late 1994 there was growing public interest in the previously academic/technical Internet. By 1996 usage of the word "Internet" had become commonplace, and consequently, so had its misuse as a reference to the World Wide Web.

Meanwhile, over the course of the decade, the Internet successfully accommodated the majority of previously existing public computer networks (although some networks, such as FidoNet, have remained separate). During the 1990s, it was estimated that the Internet grew by 100% per year, with a brief period of explosive growth in 1996 and 1997. This growth is often attributed to the lack of central administration, which allows organic growth of the network, as well as the non-proprietary open nature of the Internet protocols, which encourages vendor interoperability and prevents any one company from exerting too much control over the network.

New findings in the field of communications during the 1960s, 1970s and 1980s were quickly adopted by universities across the United States.

Examples of early university Internet communities are Cleveland FreeNet, Blacksburg Electronic Village and NSTN in Nova Scotia. Students took up the opportunity of free communications and saw this new phenomenon as a tool of liberation. Personal computers and the Internet would free them from corporations and governments (Nelson, Jennings, Stallman).

Graduate students played a huge part in the creation of ARPANET. In the 1960’s, the network working group, which did most of the design for ARPANET’s protocols was composed mainly of graduate students.

ICT BLog