Next: 37. crond and atd Up: rute Previous: 35. The LINUX File   Contents
In this chapter, we will show how to set up a web server running virtual domains and dynamic CGI web pages. HTML is not covered, and you are expected to have some understanding of what HTML is, or at least where to find documentation about it.
In Section 26.2 we showed a simple HTTP session with the telnet command. A web server is really nothing more than a program that reads a file from the hard disk whenever a GET /<filename>.html HTTP/1.0 request comes in on port 80. Here, we will show a simple web server written in shell script. [Not by me. The author did not put his name in the source, so if you are out there, please drop me an email.] You will need to add the line
to your /etc/inetd.conf file. If you are running xinetd, then you will need to add a file containing
to your /etc/xinetd.d/ directory. Then, you must stop any already running web servers and restart inetd (or xinetd).
You will also have to create a log file ( /usr/local/var/log/sh-httpd.log) and at least one web page ( /usr/local/var/sh-www/index.html) for your server to serve. It can contain, say:
Note that the server runs as nobody, so the log file must be writable by the nobody user, and the index.html file must be readable. Also note the use of the getpeername command, which can be changed to PEER="" if you do not have the netpipes package installed. [I am not completely sure if other commands used here are unavailable on other UNIX systems.].
Now run telnet localhost 80, as in Section 26.2. If that works and your log files are being properly appended (use tail -f ...), you can try to connect to http://localhost/ with a web browser like Netscape.
Notice also that the command getsockname (which tells you which of your own IP addresses the remote client connected to) could allow the script to serve pages from a different directory for each IP address. This is virtual domains in a nutshell. [Groovy, baby, I'm in a giant nutshell.... how do I get out?]
Because all distributions package Apache in a different way, here I assume Apache to have been installed from its source tree, rather than from a .deb or .rpm package. You can refer to Section 24.1 on how to install Apache from its source .tar.gz file like any other GNU package. (You can even install it under Windows, Windows NT, or OS/2.) The source tree is, of course, available from The Apache Home Page <http://www.apache.org>. Here I assume you have installed it in --prefix=/opt/apache/. In the process, Apache will have dumped a huge reference manual into /opt/apache/htdocs/manual/.
Apache has several legacy configuration files: access.conf and srm.conf are two of them. These files are now deprecated and should be left empty. A single configuration file /opt/apache/conf/httpd.conf may contain at minimum:
With the config file ready, you can move the index.html file above to /opt/apache/htdocs/. You will notice the complete Apache manual and a demo page already installed there; you can move them to another directory for the time being. Now run
and then point your web browser to http://localhost/ as before.
Here is a description of the options. Each option is called a directive in Apache terminology. A complete list of basic directives is in the file /opt/apache/htdocs/manual/mod/core.html.
The above is merely the general configuration of Apache. To actually serve pages, you need to define directories, each with a particular purpose, containing particular HTML or graphic files. The Apache configuration file is very much like an HTML document. Sections are started with <section parameter > and ended with </section >. The most common directive of this sort is <Directory /directory > which does such directory definition. Before defining any directories, we need to limit access to the root directory. This control is critical for security.
This configuration tells Apache about the root directory, giving clients very restrictive access to it. The directives are [Some of these are extracted from the Apache manual.]:
You can see that we give very restrictive Options to the root directory, as well as very restrictive access. The only server feature we allow is FollowSymLinks, then we Deny any access, and then we remove the possibility that a .htaccess file could override our restrictions.
The <Files ... > directive sets restrictions on all files matching a particular regular expression. As a security measure, we use it to prevent access to all .htaccess files as follows:
We are now finally ready to add actual web page directories. These take a less restrictive set of access controls:
Our users may require that Apache know about their private web page directories ~/www/. This is easy to support with the special UserDir directive:
For this feature to work, you must symlink /opt/apache/htdocs/home to /home, and create a directory www/ under each user's home directory. Hitting the URL http://localhost/~jack/index.html will then retrieve the file /opt/apache/htdocs/home/jack/www/index.html. You will find that Apache gives a Forbidden error message when you try to do this. This is probably because jack's home directory's permissions are too restrictive. Your choices vary between now making jack's home directory less restricted or increasing the privileges of Apache. Running Apache under the www group by using Group www, and then running
is a reasonable compromise.
Sometimes, HTML documents will want to refer to a file or graphic by using a simple prefix, rather than a long directory name. Other times, you want two different references to source the same file. The Alias directive creates virtual links between directories. For example, adding the following line, means that a URL /icons/bomb.gif will serve the file /opt/apache/icons/bomb.gif:
We do, of course, need to tell Apache about this directory:
You will find the directory lists generated by the preceding configuration rather bland. The directive
causes nice descriptive icons to be printed to the left of the file name. What icons match what file types is a trick issue. You can start with:
This requires the Alias directive above to be present. The default Apache configuration contains a far more extensive map of file types.
You can get Apache to serve gzipped files with this:
Now if a client requests a file index.html, but only a file index.html.gz exists, Apache decompresses it on-the-fly. Note that you must have the MultiViews options enabled.
The next options cause Apache to serve index.html.language-code when index.html is requested, filling in the preferred language code sent by the web browser. Adding these directives causes your Apache manual to display correctly and will properly show documents that have non-English translations. Here also, the MultiViews must be present.
The LanguagePriority directive indicates the preferred language if the browser did not specify any.
Some files might contain a .koi8-r extension, indicating a Russian character set encoding for this file. Many languages have such custom character sets. Russian files are named webpage .html.ru.koi8-r. Apache must tell the web browser about the encoding type, based on the extension. Here are directives for Japanese, Russian, and UTF-8 [UTF-8 is a Unicode character set encoding useful for any language.], as follows:
Once again, the default Apache configuration contains a far more extensive map of languages and character sets.
Apache actually has a built-in programming language that interprets .shtml files as scripts. The output of such a script is returned to the client. Most of a typical .shtml file will be ordinary HTML, which will be served unmodified. However, lines like
will be interpreted, and their output included into the HTML--hence the name server-side includes. Server-side includes are ideal for HTML pages that contain mostly static HTML with small bits of dynamic content. To demonstrate, add the following to your httpd.conf:
Create a directory /opt/apache/htdocs/ssi with the index file index.shtml:
and then a file footer.html containing anything you like. It is obvious how useful this procedure is for creating many documents with the same banner by means of a #include statement. If you are wondering what other variables you can print besides DATE_LOCAL, try the following:
You can also goto http://localhost/manual/howto/ssi.html to see some other examples.
(I have actually never managed to figure out why CGI is called CGI.) CGI is where a URL points to a script. What comes up in your browser is the output of the script (were it to be executed) instead of the contents of the script itself. To try this, create a file /opt/apache/htdocs/test.cgi:
Make this script executable with chmod a+x test.cgi and test the output by running it on the command-line. Add the line
to your httpd.conf file. Next, modify your Options for the directory /opt/apache/htdocs to include ExecCGI, like this:
After restarting Apache you should be able to visit the URL http://localhost/test.cgi. If you run into problems, don't forget to run tail /opt/apache/logs/error_log to get a full report.
To get a full list of environment variables available to your CGI program, try the following script:
The script will show ordinary bash environment variables as well as more interesting variables like QUERY_STRING: Change your script to
and then go to the URL http://localhost/test/test.cgi?xxx=2&yyy=3. It is easy to see how variables can be passed to the shell script.
The preceding example is not very interesting. However, it gets useful when scripts have complex logic or can access information that Apache can't access on its own. In Chapter 38 we see how to deploy an SQL database. When you have covered SQL, you can come back here and replace your CGI script with,
This script will dump the table list of the template1 database if it exists. Apache will have to run as a user that can access this database, which means changing User nobody to User postgres. [Note that for security you should really limit who can connect to the postgres database. See Section 38.4.]
To create a functional form, use the HTTP <FORM> tag as follows. A file /opt/apache/htdocs/test/form.html could contain:
which looks like:
Note how this form calls our existing test.cgi script. Here is a script that adds the entered data to a postgres SQL table:
Note how the first lines of script remove all unwanted characters from QUERY_STRING. Such processing is imperative for security because shell scripts can easily execute commands should characters like $ and ` be present in a string.
To use the alternative ``POST'' method, change your FORM tag to
The POST method sends the query text through stdin of the CGI script. Hence, you need to also change your opts= line to
Running Apache as a privileged user has security implications. Another way to get this script to execute as user postgres is to create a setuid binary. To do this, create a file test.cgi by compiling the following C program similar to that in Section 33.2.
Then run chown postgres:www test.cgi and chmod a-w,o-rx,u+s test.cgi (or chmod 4550 test.cgi). Recreate your shell script as test.sh and go to the URL again. Apache runs test.cgi, which becomes user postgres, and then executes the script as the postgres user. Even with Apache as User nobody your script will still work. Note how your setuid program is insecure: it takes no arguments and performs only a single function, but it takes environment variables (or input from stdin) that could influence its functionality. If a login user could execute the script, that user could send data via these variables that could cause the script to behave in an unforeseen way. An alternative is:
This script nullifies the environment before starting the CGI, thus forcing you to use the POST method only. Because the only information that can be passed to the script is a single line of text (through the -e q option to sed) and because that line of text is carefully stripped of unwanted characters, we can be much more certain of security.
CGI execution is extremely slow if Apache has to invoke a shell script for each hit. Apache has a number of facilities for built-in interpreters that will parse script files with high efficiency. A well-known programming language developed specifically for the Web is PHP. PHP can be downloaded as source from The PHP Home Page <http://www.php.net> and contains the usual GNU installation instructions.
Apache has the facility for adding functionality at runtime using what it calls DSO (Dynamic Shared Object) files. This feature is for distribution vendors who want to ship split installs of Apache that enable users to install only the parts of Apache they like. This is conceptually the same as what we saw in Section 23.1: To give your program some extra feature provided by some library, you can either statically link the library to your program or compile the library as a shared .so file to be linked at run time. The difference here is that the library files are (usually) called mod_name and are stored in /opt/apache/libexec/. They are also only loaded if a LoadModule name _module appears in httpd.conf. To enable DSO support, rebuild and reinstall Apache starting with:
Any source package that creates an Apache module can now use the Apache utility /opt/apache/bin/apxs to tell it about the current Apache installation, so you should make sure this executable is in your PATH.
You can now follow the instructions for installing PHP, possibly beginning with ./configure --prefix=/opt/php --with-apxs=/opt/apache/bin/apxs --with-pgsql=/usr. (This assumes that you want to enable support for the postgres SQL database and have postgres previously installed as a package under /usr.) Finally, check that a file libphp4.so eventually ends up in /opt/apache/libexec/.
Your httpd.conf then needs to know about PHP scripts. Add the following lines
and then create a file /opt/apache/htdocs/hello.php containing
and test by visiting the URL http://localhost/hello.php.
Programming in the PHP language is beyond the scope of this book.
Virtual hosting is the use of a single web server to serve the web pages of multiple domains. Although the web browser seems to be connecting to a web site that is an isolated entity, that web site may in fact be hosted alongside many others on the same machine.
Virtual hosting is rather trivial to configure. Let us say that we have three domains: www.domain1.com, www.domain2.com, and www.domain3.com. We want domains www.domain1.com and www.domain2.com to share IP address 126.96.36.199, while www.domain3.com has its own IP address of 188.8.131.52. The sharing of a single IP address is called name-based virtual hosting, and the use of a different IP address for each domain is called IP-based virtual hosting.
If our machine has one IP address, 184.108.40.206, we may need to configure a separate IP address on the same network card as follows (see Section 25.9):
For each domain /opt/apache/htdocs/www.domain? .com/, we now create a top-level directory. We need to tell Apache that we intend to use the IP address 220.127.116.11 for several hosts. We do that with the NameVirtualHost directive. Then for each host, we must specify a top-level directory as follows:
All that remains is to configure a correct DNS zone for each domain so that lookups of www.domain1.com and www.domain2.com return 18.104.22.168 while lookups of www.domain3.com return 22.214.171.124.
You can then add index.html files to each directory.
Next: 37. crond and atd Up: rute Previous: 35. The LINUX File   Contents
Dies ist ein Mirror des RUTE-Projekts von Paul Sheer (RUTE = Rute User's Tutorial and Exposition). Die offizielle Projekt-Homepage findet sich im Web unter www.icon.co.za/~psheer/rute-home.html. Dieser Mirror wurde zuletzt aktualisiert auf die Version 1.0.0 am Samstag, 28 Januar 2006 22:06 +0100. Das RUTE-Tutorial kann auch zum Offline-Lesen in verschiedenen Dateiformaten heruntergeladen werden.