Copyright © 2000 by Greg Reddick
The Internet can
basically be viewed as a bunch of computers hooked together with a bunch of
wire, fiberoptic cable, and wireless receivers that understands a single
protocol for exchanging data called Internet Protocol or IP. A protocol
is a way of arranging data so that it can be understood by something else that
understands the same protocol. It is like a common language; anyone who speaks
Spanish can communicate with anyone else who also speaks Spanish.

Each machine on the
Internet is given an address in the form of a four-byte number called an IP
address. The IP address, when written out is generally expressed with
the decimal values of the bytes separated by periods, such as 207.46.230.219.

The IP address for a computer can either be permanently attached to the
particular computer, which is called a Static IP Address, or assigned to
the computer when it connects to the Internet, called a Dynamic IP Address.
Most dial-in lines of services, such as America Online, use Dynamic IP
addresses, whereas machines such as web servers and email servers use Static IP
addresses.
A series of bytes are
packaged up using the Internet Protocol into a packet and moved from one
machine to another over the Internet. The Internet Protocol adds some bytes to
the front of the packet, called a packet header. Among the information
in the packet header is the eventual IP address that it should reach.

When one machine gets a packet, it looks at the IP address, does some lookups
into some tables and based off that information sends it off to another
computer. This continues until the packet either reaches its destination
computer, or a timeout is reached. If the timeout is reached, the packet is
simply deleted. [http://www.ietf.org/rfc/rfc0791.txt]
Each machine connected
to the Internet has an Internet Service Provider or ISP. The Internet
Service Provider provides the connection between the local machine and the
Internet as a whole. The ISP also generally provides other services such as
Domain Name Service and email. For this service, the user of the local machine
generally pays a monthly or annual fee.

Several ISPs connect to a larger level ISP that eventually connects to a sort of
super-ISP called a Backbone.

The backbone computers are connected to each other. So a local machine's packet
will be sent up the chain of ISPs until it gets to a machine that is the parent
of the machine the packet is meant for whereupon it is sent down through the
ISPs until it reaches the destination IP address. If instead, the packet
reaches a backbone computer that isn't a parent of the destination computer, it
is redirected to another backbone that is the parent of the destination
machine. All this exchange happens very quickly, so it generally takes a packet
less than a second to go from one computer to another, even if it is on the
other side of the planet.

Because IP addresses are
hard to remember, a scheme was formulated to add a friendlier name to some IP
addresses, called a Domain Name (or sometimes simply a Domain). An
example of a Domain Name is MICROSOFT.COM, which gets turned into the IP
address 207.46.230.219. Generally only Static IP addresses get Domain Names
assigned to them.
First level domains are .COM, .NET, .ORG, .EDU, .GOV, .MIL, and .INT. Second
level domains are names such as MICROSOFT.COM. Third level domains are names
such as WWW.MICROSOFT.COM. Second level domains are rented from a Domain Name
Registry, such as Network Solutions, Inc. There are also two-letter
first level domains for each country; for example, ES for Spain.
Each local computer
attached to the Internet has designated a machine that it uses for Domain Name
resolution, called a Domain Name Service or DNS, usually provided by the
ISP. The Domain Name Service turns a Domain Name into an IP Address when asked.
When a DNS is asked to resolve a Domain Name, such as WWW.MICROSOFT.COM, into an
IP address, it talks to a machine called a Domain Server. A Domain Server
is a machine that keeps a table of Domain Names and their corresponding IP
Addresses. A Domain Server may either keep that information directly or
delegate the responsibility for keeping that information to another Domain
Server.
For example, a request for the IP Address of WWW.MICROSOFT.COM will be passed to
the DNS, which asks the COM Domain Server for the information. It has delegated
that information to the MICROSOFT.COM Domain Server. This Domain Server has, in
turn, delegated that information to the WWW.MICROSOFT.COM Domain Server. This
server keeps the information on the IP Address for WWW.MICROSOFT.COM,
207.46.230.219, and returns that information to the DNS, which forwards it on
to the local computer. As this information changes infrequently, if ever, the
DNS also caches the information it retrieved, so that it doesn't have to look
it up again if another request to WWW.MICROSOFT.COM is made.

When a new second level domain is registered with the Domain Name Registry, it
must specify the Domain Server to which the first level domain will delegate
the resolving of the second level and higher Domain Names. The ISP generally
provides all second level and higher Domain Servers, as well as the DNS. [http://www.ietf.org/rfc/rfc1034.txt]
When the destination
machine gets the packet, it must know what it should do with it. So additional
protocols are added to the packet. The most common protocol is the Transmission
Control Protocol or TCP. This is an additional packet header added to
the packet that tells it how to take a series of packets and put them together
to form a larger message. [http://www.ietf.org/rfc/rfc0793.txt]
Another common protocol is the User Datagram Protocol or UDP, which is
used for single packet exchanges between machines. [http://www.ietf.org/rfc/rfc0768.txt].
The Internet is frequently thought to use a combination of TCP and IP, and
commonly refered to as TCP/IP, although in reality it also uses UDP/IP, and the
other protocols as well.
When a TCP or UDP packet is exchanged from one machine to another, it is
designated as being addressed to a number on the destination called a port.
A port is simply a number between 1 and 65535 that each machine may have a
single program wait for a packet to arrive. A packet is also designated as
coming from a port, and the receiving program may send information back to that
original port if it needs to.

Within a set of TCP packets, a third level of protocols is used to further
identify the information within. Two common protocols are HyperText Transfer
Protocol or HTTP, and File Transfer Protocol or FTP. [
http://www.ietf.org/rfc/rfc2616.txt][
http://www.ietf.org/rfc/rfc0959.txt] FTP is the older of the two, and
is used to move a file from one machine to another. HTTP is similar, except
that the content of the packets is considered to contain a text file formatted
using HyperText Markup Language or HTML. Usually, a given protocol has a
default port that a packet is addressed to. For example, HTTP is by default
addressed to port 80, and FTP is addressed to port 21. This default port can be
overridden, and a HTTP packet could be addressed to port 8080 instead, for
example.

A Firewall is a
program, machine, or device that prevents certain packets passing through it
from going to the intended destination. A firewall looks at each packet, and
based off of a specified criteria decides whether to forward the packet or
reject it. A firewall may reject a packet if it doesn't come from a designated
IP address, or that it is addressed to some unexpected port, for example. A
company commonly places a firewall between its internal network (called an
Intranet), and the Internet at large. A large organization may have several
layers of firewalls between various machines.

A proxy is a
machine that intercepts the normal communication between a local machine and a
destination machine. The first time a request is made for some information, it
may cache the information on the proxy machine so that further requests for the
same information are retrieved from the cache instead of from the Internet.
This interception is done transparently to the local machine. Large
organizations use proxies to reduce the amount of traffic across their
connection to the Internet. If 300 people on the Intranet all request
http://www.dilbert.com, only one copy of the web site actually gets downloaded
to the proxy. Further requests for that URL are retrieved from the proxy
instead of from the destination machine until the proxy decides that the
information might be out of date and refreshes its content. Frequently the
services of a proxy and a firewall are combined in some way.

One arrangement of cheap
hardware that is capable of exchanging packets from place to place is called an Ethernet,
although there are many others. The hardware involved in exchanging packets is
irrelevant to the Internet Protocol. An ethernet adds some information to the
packets. Another device that may be used in the transfer of packets is a Modem,
which turns electrical representation of sound into bytes, and vice versa.
So a packet might contain:
Ethernet addressing information
IP packet header
TCP packet header
HTTP packet header
Data
When a local machine
needs a Dynamic IP Address, it must be assigned to it somehow. This allocation
is done by a Dynamic Host Configuration Protocol Server, or DHCP Server.
A DHCP Server assigns IP addresses to local machines as necessary. The ISP
provides the DHCP service as well. [http://www.ietf.org/rfc/rfc0959.txt]
A Uniform Resource
Locator, or URL, is an identifier of a particular resource on the
Internet. It is broken into several parts:
-
An identifier of the protocol used to transfer the information, followed by a
colon. For example, http:.
-
The domain name or IP address of the machine that the resource is found on. For
example, www.microsoft.com.
-
An optional port number to communicate on, preceeded by a colon. For example,
:8080.
-
An optional identifier of a resource on the destination machine. Typically,
there is a hierarchy of information separated by slashes. For example,
/ie/default.asp.
-
An optional query string to pass information to the resource, preceeded by a
question mark. For example, ?query=select+*+from+table.
Put together, the example URL would be
http://www.microsoft.com/ie/default.asp?query=select+*+from+table. [http://www.ietf.org/rfc/rfc2396.txt
With the exception of
HTTP, all of the technologies in the previous discussion has been available for
many years. What changed the Internet dramatically and turned it into what it
is today is the invention of HTTP, HTML, and the Web Browser.
Continue to Part II: Web Basics
|