CGI: Getting information about the user

Last updated: 08.05.2002


This note describes how to get information about a remote user from within a CGI-bin script, and lists the information available. The CallerID script is an example of this.


Environment variables

The primary source of information about a user comes from the environment. The HTTP server sets up certain environment variables to contain information about the remote user. In Perl, these variables are accessible through the %ENV associative array. In C, you will need to use the getenv call to access the values of the variables. The relevant variables are:

Variable Description
REMOTE_USER Name of the remote user
REMOTE_IDENT Name of the remote user (as returned by identd)
REMOTE_HOST Fully-qualified domain name of the remote host
REMOTE_ADDR IP number of the remote host
HTTP_USER_AGENT Name of the software used (i.e. the browser)
HTTP_REFERER URL of the page from which the script was accessed

In most cases, you can count on REMOTE_ADDR and HTTP_USER_AGENT being set up. HTTP_REFERER will only be set up if the user actually came to the script from a page that refers to it (as opposed to simply typing in the URL of the script). REMOTE_HOST is often, but not always, set up. In particular, it is unlikely to be set up for remote users with dynamically-allocated IP numbers (i.e. most dialup users). The administrator of your server may also have switched off reverse DNS lookups - which improves the efficiency of the server noticeably - in which case REMOTE_HOST will be unavailable for all hosts.

The REMOTE_USER variable is almost never set up correctly, unless the user has been forced to 'log on' to a page by an access control mechanism such as htaccess. Instead, it may be possible to use the value of REMOTE_IDENT, which is returned from any machine running an identd daemon or similar. Generally, only UNIX machines run identd, and not all such machines will answer an ident query. If your server administrator has switched off ident lookups on your server, REMOTE_IDENT will never be set up for any user.

Getting more information

Given an IP number, you may be able to use the gethostbyaddr function to find out the name of the host. Here's a Perl snippet that does that:

    $packed_address = pack("C4",split(/\./,$ENV{"REMOTE_ADDR"});
    ($host,$aliases,$type,$length,@addrs)
        = gethostbyaddr($packed_address,2);
    $addresses = "";
    foreach (@addrs) { 
        $addresses .= join(".",unpack("C4",$_)) . " "; 
    }

This snippet turns the string containing the remote host's IP number into a 32-bit number, and passes that to gethostbyaddr. The second argument to gethostbyaddr is the addressing type, and will always (?) be type AF_INET, which is to say, 2. The gethostbyaddr looks up the IP number and returns the name of the host associated with that number, and some related information. If the server has successfully filled in the REMOTE_HOST variable, this is more or less redundant - you're unlikely to learn much more of any value by doing gethostbyaddr. If, on the other hand, DNS lookups have been switched off for your server, then this technique may well be useful for recovering hostnames.

Using finger

You can use the finger command to try to find out more. If you have a valid user and hostname, you can build it into an address, open a pipe to the finger command and display or process the result. This is what the CallerID script does.

Note that you must be very careful when executing shell commands that contain user input. If the remote user is able to convince your server that their name is, for example, "foo; rm -r *;", then using this name as an argument to finger could have disastrous consequences. You should take care to 'untaint' the arguments passed before use. Here is another example snippet:

   $tainted = "$user@$host";
        $tainted =~ /^([@\-\w.]*)$/;
        $fingerstring = $1;

This will only accept strings that contain word characters (not punctuation), dashes and '@'. Note that on some machines even this may not be secure; a well-known method of attack is to pass such a long string to finger that it overflows the buffer allocated to hold command-line arguments and writes into the area containing the next instruction to execute. Most UNIX systems have patched this loophole now, but you may wish to take only a sub-string of the argument passed anyway.

If you don't have a valid username, you may still be able to instruct finger to list the users currently on the remote host, e.g. finger @foo.com to see who's currently logged on.

Typically, only UNIX hosts will respond to a finger request although Interarchy provides the same capability for Apple Macintosh systems. You cannot generally make any assumptions about the format of information returned by finger or, indeed, whether any information will be returned at all.

[raingod:resources:perl] -- [up][links][home]