Spinning the Web: Setting Up World Wide Web Servers

September 1994

DISCLAIMER -- All of the technologies described in this document are changing rapidly, so please consult appropriate software/hardware vendors and applicable USENET groups and LISTSERV lists


James Powell
Scholarly Communications Project, University Libraries
Virginia Polytechnic Institute and State University
jpowell@scholar.lib.vt.edu

HTML is but one component of the World Wide Web. You are probably most familiar with a second: client applications such as Mosaic, but if you are just starting a web project, then you are probably new to the concept of servers The term server refers both to the computer to which users connect for data and to the software used to distribute the data. Setting up a server requires some planning. You must select hardware and server software to suit your current needs but that will also support some growth.

Hardware

Before you can consider what world wide web server is best for you, you must evaluate the equipment you have available and the type of network connections. An ethernet connection with TCP/IP software configured so that the server workstation is a node on the Internet is essential. In other words, the machine must have a reachable Internet address, or at least accessible to your intended audience. That could be a department on a closed network, but for the remainder of this talk, let's assume your audience is fairly widely dispersed on the Internet, a worst case scenario if you will. Try telneting from the intended server, or FTP, or ask for help from a networking guru.

Assuming you pass the network requirements, look now at your available platforms. Most any recent computing platform, and recent means 1988 onwards, will be acceptable as a light traffic server. By that, I mean perhaps eight or ten simultaneous users. Fewer machines qualify as suitable for high traffic, so you are unlikely to just have a powerful server lying around unused. But a cluster of machines can function just as well, if document placement is planned carefully. For example, say your home page is extremely popular, and so are one or two documents you publish. Simply place the popular documents on a different server than the home page, and configure appropriate links in the home page. Suddenly you have a more powerful and more responsive distributed server. Not only that, but you can also duplicate the contents of both servers on both machines, and reconfigure the links in the event of a system failure. Then, unless the building burned down, you can stay online through the worst hard drive crash or power supply failure.

But back to platforms. The Scholarly Communications Project, for which I provide technical support, publishes a modestly popular collection of documents to the Internet using a single system no more powerful than an Intel 486 clone. This system is a Nextstation, a semi-antique in the computing world. It has never as yet, experienced a major hardware failure in three years of almost continuous operation. And it has experienced ice storms, bad cables, power fluctuations, and several operating system upgrades with less than 24 hours of down time per incident. What this means is that reliability to me is that reliability is more important than horsepower. Any 486 system that has a good service record will make an excellent server. However, Nextstations have fairly speedy networking built into the motherboard, so a 486 with a VESA, EISA or PCI ethernet adaptor is recommended. These acronyms all stand for speed. Ordinary pc's had a low speed ISA data transfer bus. These support what are called 16bit ethernet adaptors. In my opinion, these are too slow for a server system. Network speed will probably be your biggest problem and you may as well do the best you can at your end to keep things moving fast.

Before we move onto Macintosh and other hardware platforms, let me talk a bit about operating systems. Many people do not believe it matters what operating system you use on a PC when that machine is used as a server. I believe it does, and I think there are some good choices and a few bad ones. As a rule, I believe single user operating systems are simply not suitable for permanent information servers. One General Protection Fault under windows can take a server down for ten minutes or more. If you must use Microsoft Windows as a server environment, give up using the machine for any other purpose. Don't let anyone install software on it, or use it even for a few minutes here and there. Windows was not designed to support network server activity, it sometimes has trouble functioning as a good Internet client environment. If you are serious about running a server on an Intel 486 platform, there are many excellent alternatives to DOS/Windows. These environments are designed for multiple users, and perform true multitasking. This means system resources are well distributed among users and applications. Each user and process receives their own segment of memory which is protected from other users, thus greatly reducing the likelihood of a system-wide crash. For familiarity, try Windows NT. This environment protects applications from one another and from crashing the system by proving each package with its own portion of memory like UNIX. IBM's OS/2 environment is equally as stable but I would go OS/2 native rather than install OS/2 for Windows. Finally there are several excellent UNIX environments available. For the beginner, nothing is easier than Nextstep. But if you cannot get the education pricing, then it is rather expensive. LINUX is a public domain UNIX for PC's that is popular and widely supported. It isn't nearly as easy as Nextstep, but many people on the Internet use it for anything you can imagine and will be willing to help if you post a question to a LINUX Usenet news group. I've also heard good things about Unixware and Interactive Unix, but I believe Sun Microsystems Solaris is probably a bad choice, and wouldn't recommend it due to many problems reported by people using the Sun computer version. Nextstep and LINUX are probably the two most widely supported PC Unix environments for Information servers such as world wide web, and are worth looking into if you. UNIX, NT and OS/2 make efficient use of memory and manage system resources well for multiuser access.

The Macintosh hardware has recently evolved into a class of capable servers, although one drawback is the Macintosh environment. Like Microsoft Windows, it is

a single user environment. While there is little doubt it is a better graphical environment than Windows, more mature, and well thought out, it is not my first choice as a server environment. One problem is the poor memory management. Memory useage requires a great deal of hand tuning. But on the bright side, Apple makes some very fast computers these days. While a Mac Quadra with a 68040 microprocessor will usually match a similar PC configuration, a power mac can pass most PC clones easily. Networking is usually built in, for further performance gains over pieced together PC clones. And if the Macintosh web server software is a native power pc application, rather than an older Macintosh program that must be run in emulation mode on the Power Mac, then you have a potentially powerful server. But you must once again swear off on using the machine for any other purpose. When that little bomb pops up on your screen, the other users simply get nothing, but perhaps angry that your server goes up and down so often.

Even for light traffic servers, PC or Mac, memory is an important consideration. I would recommend that a server have at least 16mb of RAM. Some of the PC operating systems I recommend require this much at least. And of course, there is no such thing as too much of a good thing with memory. Get 32mB if you can at all afford it. But memory is not cheap, so if 16Mb is all you can afford, make sure you have a fast hard drive and adapter card. All modern operating systems can use harddrive space as additional memory - this is virtual memory. It can save you money and make your machine support more users.

Now, if you are planning to put up a server for a library or university, I encourage you to investigate purchasing one or more Unix workstations to function as your server. Consider systems from Sun Microsystems, Hewlett Packard, IBM and Digital Equipment Corporation. Sun workstations are fairly well supported, although they essentially switched operating systems in the last two years leaving some unhappy customers. But they are rebounding and have some good prices on Sun Classic models. If you get a Classic, buy two -they're small. Hewlett Packard recently released a new low cost line of workstations called the Gecko. They are definitely worth a look and have the added bonus of Nextstep as an alternative to HP's own unix. As I said before, Nextstep is the easiest UNIX there is and excellent for beginning systems administrators. IBM sells many types of its workhorse RS6000, including a new power pc version, but I've heard many stories about difficulties with their operating system AIX. But if you can get a web server up on it easily and you probably can, then it is a good choice. Finally, DEC offers excellent deals to higher education and offers some of the fastest and most affordable servers on the market in its Alpha line of workstations. Here again, the stumbling block will be their Unix OSF but on the bright side, there is a version of Windows NT for Alpha workstations that might be worth considering. When selecting a server in this class, 32 Mb of RAM is the minimum, and I recommend that you not purchase system with less than 64Mb of RAM for this purpose. But memory is expensive and you can probably get by with 32Mb until you have some access logs to justify the extra expense.

We've covered memory, and networking capabiltiy, but what about storage? I recommend that you start a light traffic server with 1 Gb of online storage. Hard drives are cheap and you will fill it up faster than you think. A UNIX workstation should have at least 4Gb of storage. There are new 9Gb drives for what you would have paid for a 4Gb drive last year, so why not consider one of these. There is also RAID technology but I believe that is overkill for a server like this where data is not being constantly updated. But that does lead me to an important issue - backing up. Be absolutely sure to consider data backup and recovery before you establish a permanent server. Put together a workable plan and purchase what you need to perform frequent backups. Weekly backups are essential, so buy a tape drive that can store all the data on your initial configuration on one tape so you have no excuse not to perform frequent backups. DAT drives and 8mm Exabyte drives are relatively inexpensive. Buy one and protect your data from hardware failures and security violations.

And remember, you can skimp on things like monitors and keyboards since you won't be using the machine as anything but a server.

Web Servers - How they work

A World Wide Web server is simply a program that answers requests for documents from world wide web clients over the Internet. All world wide web servers use a language, or protocol to communicate with web clients called the HyperText Transfer Protocol. This is where the http in a web url comes from. All types of data can be exchanged using this protocol including HTML, graphics, sound and video. Data types are identified by the server and preceded by a MIME header (MIME is Multipurpose Mail Extensions). Web clients convert open URL commands into HTTP GET requests. So if you type http://scholar.lib.vt.edu/jpowell.html, Mosaic would convert this to a GET jpowell.html command, connect to the web server running on scholar.lib.vt.edu, issue the command and wait for a response. The response can be the requested document or an error message. You can actually simulate a web client by telneting to a web server and specifying port 80, the web server along with the internet address, then type GET (in all upper case) and the name of a file that exists on the server, and its path if necessary. After the document or error is returned, the connection is closed. HTTP is a stateless protocol, which means there is no continuous connection between the client and server as with for example, telnet. You may be starting to realize that a web client does a lot of work. It receives only raw HTML or other data and has to perform formatting or launch a helper application such as a sound player after determining what type of data it has received. The server only send the data and goes away. Web clients are responsible for interacting with non-web servers such as gopher or ftp directly, and they create a virtual HTML document while doing so.

So what else does a web server do? They can log activity. Servers can record the internet address, time and request made for each connection. Servers can also protect certain files from non-authenticated users. Finally, servers can forward requests for data that neither the client nor the server can access directly to applications called gateways. With gateways, the web can support datatypes and resources not even conceived of when it was invented. Gateways can allow web clients and servers to function as Z39.50 or relational database clients. Data is gathered by the Web client, usually using an HTML form, sent to the server along with the name of a gateway program to be run. Then the gateway reformats the data and sends it to an information server, receives a response and reformats that response as an HTML document which is delivered to the web client. Gateway support, logging and user athentication are important features to look for when selecting a specific web server, with logging an absolute necessity, both for useage statistics and security.

So what server software is available on the Internet?

One of the easiest server applications to install and use is MacHTTP for Macintosh systems. It is installed like any Mac application and launched just like any other program, by double clicking. It comes preconfigured with its help files as HTML documents available once the server is running. Simply launch a web client on any networked computer and open the URL:"http://your.server.name/" to browse the documentation. Then customize the server software as needed and place it in your startup folder. Configurable data is stored in MacHTTP.config. The number of simultaneous users can be set from 3 to 1000, and access can be restricted by IP addresses. Most other settings are probably fine at the defaults. This is the only Macintosh web server I know of, although there may be others. One recently added feature is native PowerPC support. This means the server will run at workstation speeds on a PowerMacintosh.

Since Microsoft Windows is such a popular end-user environment, it is not big surprise that several web servers exist. The best by far is NCSA's HTTPD for Windows. While this is a large and complex package, installation is unbelievably simple. After retrieving the software, you simply unzip it into an HTTPD directory off the root directory, and add a SET command to your Autoexec and reboot. Once windows is up again, simply double click on the server icon. Once launched, up to eight simultaneous users can access your documents stored in the server's root directory. One thing I discovered right away is that this server does not function as well with windows 32 bit extensions as without. So if you are installing it on a machine with a recent 2.0 version of Mosaic which requires these extensions, you may wish to reinstall Windows to and do without Mosaic, for optimum performance. NCSA's server supports all the funcitonality of UNIX servers. It also supports accessing data from other Windows applications which opens up possibilties perhaps not available with a UNIX server, such as retrieving data from a spreadsheet or word processor. It relies on Visual Basic 3.0 for scripting, so be sure to get a copy of this if you are interested in forms support. Our Computing Center recommends SerWeb or Web4Ham available from popular FTP sites in the Winsock directory. Both run with Trumpet Winsock, and have an initialization, or INI file which has to be edited to customize server functions. These initialization files are short and mostly self-explanatory. I would recommend NCSA's HTTPD for Windows over either of these servers. It is a much more serious attempt at bringing the full functionality of Unix web servers to Windows. It is so similar to the UNIX server that experience with it would not be wasted if you later chose to advance to a UNIX-based server. Even for testing, this is preferable to the minimal functionality of SerWeb and Web4Ham. Windows web servers are most suited for a group authoring web documents or to share documents among a small group of people.

Windows NT and OS/2 are each supported directly by one native server. HTTPS is a full fledged web server for Windows NT, that matches all the features in current UNIX servers. OS/2 can be a web server using Web2.

Finally, UNIX has the most web servers available. The first web server from CERN was created for UNIX, and new versions are frequently posted at their ftp site. This server and NCSA's httpd for UNIX support basically the same features: the HTTP protocol, authentication, gateways, activity logging and image maps. The CERN server has, in my experience been the easier of the two to set up, but gateway support was late coming to it, so we ultimately switched to NCSA's server. With UNIX servers, the source code is sometimes all that is distributed. This can be a collection of C programming language files that need to be compiled on the computer before the server can be used. However, both NCSA and CERN supply precompiled versions for some computing platforms. Retrieve one of these if available, or get help if you've never compiled anything before, or are wondering what it means to compile something. Another interesting and easy to use UNIX server is the GN server. It has the unique ability to function as both a gopher and a World Wide Web server. This is easy to configure and I recommend it if you are just starting out. It has the advantage of being able to support two access methods to one collection of data. It has a simple gopher/web item search feature, and can also search archives of documents up to about 100 files.

Searching is another consideration with running a web server. With few exceptions, web servers do not provide any built in method of searching collections of HTML documents, nor do they provide direct support for search WAIS or relational databases. There are a few options depending on the platform you are using. For UNIX, GN has limited built in searching that may be sufficient for your needs. Then there is freeWAIS which provides relavence feedback and boolean full text searching. WAIS stands for Wide Area Information Server and is a widely used full text indexing, search and retrieval engine available on the Internet. There is also a commercial version of WAIS that does all freeWAIS can do and more, and comes with technical support. There are SQL gateways for UNIX to allow you to create HTML forms for searching relational databases such as SYBASE or ORACLE.

The public domain WAIS software is also available for Windows NT. And on the Macintosh, applesearch can be integrated with MacHTTP to provide full text searching.

So develop a plan detailing roughly the number of documents you wish to publish, determine the size of your potential audience, and determine how much you can spend on your server project. Use this plan to select a server and software, and a volunteer to run it. And be sure to keep track of useage so it will be easier to justify funding for expansion in the future. World Wide Web is already the most popular client-server information tool on the Internet, and may become as common as electronic mail soon. So think big.

Finally, don't forget to register your new web server for inclusion in what's new, regional server lists, and topical server lists such as CERN's Virtual Library. Usually, a note to the World Wide Web listserv and to the Usenet group comp.infosystems.www.misc will be sufficient. And be sure to read comp.infosystems.www.providers for new server and HTML developments.


Spinning the Web: Setting Up World Wide Web Servers (Slides)