Chapter One. The Global Recipe

© 1997 by Prentice Hall PTR

The order in which the different courses of a meal are served is not something you find in a traditional cookbook. Eating a sweet dessert after a savory main course seems obvious since it is what we have learned from childhood. The order in which to perform tasks when establishing your information server is far from obvious. You need a global recipe that orders the recipes for each course.

This chapter introduces the steps you need to follow to establish a fully functional Internet information server. Each step is given on the left in a frames-capable browser. Otherwise it can be found here

Moving from top to bottom in the global recipe orders the steps to follow for developing the Internet information server. Each box in the illustration of the global recipe is a specific step in the process. The numbers on the left represent the chapters in which each step is covered. The types of information we will encounter are shown on the left of the boxes and the types of software tools on the right.

This book considers the three most common types of Internet information service in use today - anonymous ftp, Listservers, and Web-based services (i.e., those using the http protocol). Anonymous ftp is efficient at file transfer and available to a wide audience; a Listserver is a convenient way to broadcast a single mail message to a list of subscribers who may not have any other service; a Web server is a way to deliver text, graphics, and other forms of multimedia as well as support more complex forms of interaction between a user and an information server, for example, have the server perform some complex calculation. Gopher is considered to have been superseded by Web based services and is not discussed. To reach the widest possible audience you should support all three types of service.

To be successful in installing and maintaining one or more of these services you need to follow the steps in the global recipe in the order given. Let us look at the definition of each of these steps.

Planning - First you have to decide who is your audience, and then what information you are going to provide to this intended audience. Basic ftp requires no additional software beyond UNIX and about half an hour to set up correctly. Supporting a Listserver requires that you install and configure one piece of software and takes a couple of hours. Installing a Web-based server can take anything from a couple of hours to months, depending on the complexity of the information being served.

In supporting ftp you have to decide whether to support the basic stock-standard ftp daemon or install a security enhanced ftp server. In supporting a Listserver you have to decide which one and the types of list you will permit. In supporting a Web-based service you have to decide which type of Web server, and the complexity of the information to be supported.

A good server with lots of information provides multiple ways (I call these "entry points") for finding information. I shall explore ways of providing useful entry points. The most direct way is providing the equivalent of an index. You will need to decide (i) whether to provide an index and (ii) what tools to use to generate, maintain, and browse that index. If you are going to provide information and graphics in the form of hypertext, you need to provide tools for preparing information in this form, for example, HTML and graphics editors and format converters to turn existing documents in formats such as Rich Text Format (RTF) and LaTeX into HTML, and for viewing HTML-based information using Web browsers such as Netscape, Lynx, and HotJava. If you are going to provide sound and/or video you will need tools for producing, editing, and browsing these formats. Certainly support for sound and/or video will place resource demands on the server, which takes us to infrastructure.

Prerequisite Infrastructure - Foremost in providing an Internet server is to have Internet access! I shall consider briefly how to obtain Internet access if you do not already have access, and how to manage it on your UNIX machine once you are connected. Support for an information server places a demand on the computer supporting it that should be considered ahead of time. The most immediate concern is likely to be sufficient disk space to store all the information. If the server is successful, you need to ensure that memory, disk I/O and network bandwidth, and to a lesser extent CPU cycles, don't become rate-limiting. We shall consider what these demands are likely to be and when in the development cycle they are likely to occur. The infrastructure extends beyond the hardware. Further, changes may be required in the configuration of the UNIX operating system, particularly relating to security and to the X/Motif environment. The latter usually relates to X resources and the availability of fonts.

Information Layout - How you organize the various types of information on your server becomes important as the amount of information grows. The tricks are to: (I) avoid duplication; (ii) provide easy file retrieval; and (iii) simplify evolution, that is simplify movement of the information hierarchy to another server or collection of file systems.

Install Client - With the planning done and suitable hardware and software selected you are ready to get started on building your information server. First, you need to install the client components of all the software for which you will provide the server components. You will be the first to use your server to determine whether the server functions as expected and the information is presented the way you want. I briefly discuss installing client software components that you are likely to need. As is true of all discussions in this book, I will make good use of reference material available on the Internet and point you to a myriad of books on Internet clients.

Install Server - Once you can access ftp archives, Listservers and Web pages on remote servers, it is time to download and configure your own information server components. For ftp this requires that you choose between using the ftp program that comes standard with the UNIX operating system, or you install an enhanced ftp server that provides greater security. I will show you how to install and configure wu-ftpd, an enhanced ftp program from Washington University. While there are several Listservers to choose from, I will cover Majordomo, the most popular Listserver. I will install and configure two of the most popular Web servers, the National Center for Supercomputer Applications (NCSA) httpd, and wn developed by John Franks. NCSA's server is simpler to install and maintain, but does not have all the security features of wn. Finally, you may also need to provide an index to the information on your server. I will show you how to install and configure the Harvest text indexing system, which includes a gatherer, indexer, and search engine.

Populate Server - This is perhaps the most critical part of the whole book! How you organize the information so it is accessible, readable, useful, and easily maintainable is paramount. Anyone can provide a server with little effort (even if it does not appear so from the previous discussion), the usability and value to the intended audience of the server requires work. Obviously many of us cannot make the development and maintenance of the server our life's work. I shall dissect information organization on a couple of real and imaginary servers to see what makes them tick. From this you will be able to determine the level of effort required to maintain information servers of different complexity. Servers range from support of a simple text and graphics browser with limited links, to a large server that couples together a hybrid of databases and files containing information in many formats. Users can query this environment through a WWW/forms interface and refine their queries. I will cover the latest in Web server technology, including clickable maps, server side includes, Java applets, and HTML frames.

Maintain Server - Once the server is running there is still more to do. Access statistics need to be gathered and future needs anticipated. Error logs should be monitored. Finally, usage of system resources needs to be monitored and, of course, new information added. I will show you how to download and configure the most popular tools for these tasks.

· Epilogue - Internet information servers involve rapidly changing technology and you need to keep abreast of developments that may effect your server. I shall point you to some interesting sources of information on these developments, and, of course, give you some thoughts of my own.