The Uniform Resource Identifier (URI) is the basic addressing mechanism of the World Wide Web. This is the address information that always appears in the little box at the top of the browser. It tells the browser uniquely how to find any file in the Web, no matter how obscure its location. Often we discuss URLs (Universal Resource Locaters) instead of URIs. There are subtle distinctions between the meanings of the terms, but they are not worth mentioning - we will use them interchangeably.
A URI can take various forms, but they are all roughly like
or some combination of these.
The scheme name (sometimes called the protocol) specifies the "language" with which the browser will communicate with the server. A choice of a scheme tells the browser how to contact the server, and how to communicate with it after contact is established. There are several choices here. The most common is HTTP, the HyperText Transfer Protocol. This is the protocol that allows you to transfer entire documents comprising text, images, movies, and even applications. The File Transfer Protocol (FTP) is designed specifically for raw file transfer, without any formatting or display concerns. The telnet protocol is an antiquated insecure way to log on to remote machines. Most browsers still understand the antiquated gopher protocol for file transfer as well. There are also encrypted versions of those protocols available: https and ftps.
Sometimes a scheme is not a protocol. In particular, the file scheme allows us to look at files on our own computer. The format here is
The machinename is almost always "localhost", so we usually type file:///directory/path.
Following the protocol specification comes identification of the computer on which the document you want resides. This may take the form of a name or alias for the machine, or the IP address for the machine. There are countless variations on this part of the URI. It might be as simple as the IP address for the machine you want to contact, or it might be the actual name of the machine, or it might be one of several aliases the computer goes by, or it might be an alias that does not even refer directly to the machine, but rather to a file system on the remote computer. The only thing of which you can be certain is that any name you put in the spot is actually registered on a DNS server somewhere.
An optional number after the computer name gives the TCP port through which you want to contact the server. Ordinarily there is a standard port to which the various protocols listen, and the browser knows those ports. However, sometimes there is a reason to use a nonstandard port. When that happens, you put that port number here. For example, student and faculty web sites at the WSU server usually are accessed through port 8080. The following is a valid URI:
For the record, the standard port for HTTP is 80, and the standard for FTP is 21. Thus, an alternate URI for the server of this page is
After the computer address and port, we must give directions for finding the file on the remote computer. These come in the form of a path to the file. In other words, we specify where the file is in the directory structure of the server machine, separating directories using the forward slash "/". The first slash denotes the document root directory, and after that, every subdirectory is separated from its parent by another slash. Thus, the URI
says that there is a subdirectory of the document root called students, and there is a subdirectory of that called voles, and another one called 300, and inside that directory is a file called content.html.
The document root directory could lie anywhere in the file system for the server computer. The only requirement is for the server software for the machine to keep track of the document root. Thus, the directories you can see through HTTP or FTP on a computer are only a small subset of all of the filesystems for that machine.
There are games that Webmasters play here also. It is not difficult to make a file appear to be in a different place in the directory structure than one might guess from its URI. This is done in particular for so-called CGI script files. In that way it becomes possible to run the CGI script, but not possible to look at its contents.
At the tail end of the URI are details pertaining to the file itself. One possibility is to see the file name followed by a pound sign (#) and more text. The text following the pound sign is a name for a location inside the file. For example
says that on a machine called www.math.wsu.edu there is a subdirectory of the document root called "faculty", and in that directory there is a file called "faculty.html", and in that file, there is a place named "W".
Another symbol that might appear after the file name is a question mark (?). This should only occur when the file specified handles form input - e.g. it is a CGI (Common Gateway Interface) program, or a PHP page. These are applications that run on the server, but whose results you can see on your browser. We won't go into details concerning how to specify input to these programs, except to say that the input to the program always follows a question mark in the URI.