WebRelay: A Multithreaded HTTP Relay Server

WebRelay: A Multithreaded HTTP Relay Server

The nature of the requirements for authentication and authorization services has been changing rapidly as we make transition from computing that is primarily focused within an organization to a more fully realized networked information environment.

In particular, many libraries, including the Library of the University of Calgary, have embarked on the age of the "electronic library", providing their customers with on-line access to electronic journals, databases that are often run by commercial vendors. Since almost all of these services are provided through the Web, it comes to a point that we must make sure that while contractive terms with the vendor must hold, all of our legitimate users should have access to the services.

In this project, I have strived to design and develop a relay server that can ensure "session control" for a basically "stateless" HTTP protocol and establish connections with a designated remote database Web server on behalf of our patrons.

This relay server is able to authenticate users. Once the user is authenticated, it will start a new session. The relay server maintains the session in its memory and checks validity of the session whenever the client wants to access any references from the remote database server. If a session is timed out, or is bogus, the relay server will shut down the connection, and the user will have to restart a new session.

To make sure that any timeout sessions are deleted from the memory, the relay server spawns a garbage collection thread to periodically scan over the session control data and delete any dead sessions.

Because a remote database web server can only see the relay server, instead of the real client, any communications between the client and remote server must go through the relay server. Most notably, the relay server does remapping of references (hyperlinks) in any web pages sent back from the remote database server, and then sends the modified web page back to the client. This is to ensure that any further request from the client will be directed to the relay server. Once the relay server gets a request from the client, it is able to figure out the "real" request for the "real" target database web server before the request is sent to the remote database web server on behalf of the client.

The relay server must be able to store and manage any cookies set by the remote database web server. I have handled cookies as a member of the session control data structure in memory. It is easy to insert a cookie into or retrieve a cookie from the session control data.

As any web server in today's world, the relay server anticipates very high hit rates. The relay server utilizes POSIX threads, instead of child processes, to ensure high performance, scalibility, and efficient treatment of session control.

The webrelay software is written for AIX 4.1, 4.2, and 4.3 systems.

Save the webrelay.tar.gz file into a directory where you have enough space. Uncompress the file with the gunzip command and then untar it. That is

	# gunzip webrelay.tar.gz
	# tar xvf webrelay.tar

An executable for AIX 4.3 system is included, the filename is webrelay. Move the executable to /usr/bin/webrelay as

	# cp webrelay /usr/bin/webrelay

Create a directory webrelay under /usr/local/, that is

	# cd /usr/local
	# mkdir webrelay

Edit the sample configuration file, dburls.conf, to suit your own needs, then put that file into /usr/local/webrelay/ directory.

	# cp dburls.conf /usr/local/webrelay/dburls.conf

Another configuration file, sitesuth.conf, is needed for a remote site that requires a site-wise username and password to authenticate the user in addition to IP checking, where the remote web server uses the Basic Authentication method of HTTP. If you don't support this kind of site authentication, you create an empty file for it.

	# cp siteauth.conf /usr/local/webrelay/siteauth.conf

Add lines into the /etc/rc.tcpip file like

	# Start up webrelay

so that the program will be automatically brought up when the machine is rebooted.

The webrelay logs any user access into /var/spool/local/log/log.ar_acclog file, any errors will be logged into /var/spool/local/log/log.ar_errlog file. These files have a similar format as normal webserver log files. They can be used for statistical analysis.

Many of the default values can be overriden by command line options. See the README file in the package for details.

To actually make use of the webrelay server, you need to set up a "gateway" homepage where references to the "virtual" directories corresponding to all the remote webservers must be introduced to your users. This is usually the place where you would advertise all your stuff about your library to your users. Users click on hyperlinks on that gateway homepage to access the service provided by the webrelay.

Many things are for the moment hardcoded in the program. For example, your site's IP network address and netmask are included in the delcaration of string constants of UCIP and UCMASK. Some of the special HTML pages are hardcoded too with settings designed for the University of Calgary Library. You may want to change all these hardcoded things first and then recompile it before you install the program.

Report any problems to

Peter C.Y. Zhang (zhangc@ucalgary.ca)

Back to Home of Peter C.Y. Zhang

Last updated Apr 5, 2000