Perl CGI debugging

Tips on debugging simple Perl CGI scripts

Overview

This note describes possible causes for errors in CGI scripts, particularly those that lead to a 'Server 500 Error' and a terse and unhelpful message advising you to contact the server administrator. It also contains some suggestions for writing Perl scripts in a way that facilitates debugging. The note mostly deals with Perl scripts and UNIX platforms, but some of the suggestions can apply to CGI programs written in other languages or running on other platforms as well. You can treat this note as a kind of checklist - when you get an error, read through it and check each possible cause in turn.

This note was written many years ago, before alternatives to Perl (such as PHP) were widely available. As a result, it's somewhat dated. Moreover, some tips such as use of eval() are now simply part of any Perl programmer's toolkit of best practices. However, I have kept this here in the hope that it might still be occasionally useful to someone.

Error 500

Messages reporting 'Error 500' or 'internal server error' can be caused by a number of possible problems. This section briefly lists and explains some of the commonest causes of such errors.

Execute permissions not set correctly

For a script to run, the permissions bits must be set to allow it to be executed. If your server runs as nobody, you'll need to make your script world-executable with:

chmod a+x MyScript.pl

If the server runs scripts with your own UID, you can get away with

chmod u+x MyScript.pl

Bad interpreter line

A Perl script or other interpreted CGI script should begin with a line that identifies the interpreter on the local system, e.g.

#!/usr/bin/perl

If the line doesn't give the path of a valid interpreter, an error will be generated. On UNIX systems, you can typically find the path to the Perl interpreter by using:

which perl

The interpreter line must be the first line of the script, and must begin with '#!'. The same principles apply when writing shell scripts using python, sh, csh etc.

Syntax error in script

Any syntax errors in your script will cause a 500 error. Perl allows you to check the script at the command line with:

perl -c MyScript.pl

Another useful option is:

perl -w MyScript.pl

which issues compiler warnings that may highlight possible runtime problems with the script. You may also find the error message generated by Perl in your system's error log.

Missing required file

If a library requested with require or use can't be found, a 500 error will occur. Normally syntax-checking the script should detect this error, but in some cases the environment in which scripts are executed by the server may be different from the environment set up when the script is run at the command-line. Perl might find the library file correctly in one case, but not in another.

Runtime error in script

If the script runs into an execution error -- for instance, a file not found, division by zero etc. - then a 500 error will occur. See below for a tip on how to trap runtime errors for easier debugging. Runtime errors can cause intermittent failure of the script, where the script generates an error only with certain input values. Runtime errors, particularly of the 'file not found' variety, can often be caused by differences between the Perl environment when executed under the server and the environment when executed from the command line. Trapping the error and printing the full path to the file that the script thinks it wants can often be very helpful.

Invalid HTTP header sent

The first output sent by a script must always be a valid HTTP header, typically:

Content-type: text/html

The headers must then be followed by two linefeeds before the actual content begins. Remember to check that print statements writing to other streams really do write to those other streams and not to standard output. Also beware of debugging comments written to standard error: some servers take their input from either STDOUT or STDERR, and because STDERR is line-buffered (text written to STDERR is sent out as soon as the line is complete) while STDOUT typically defaults to block-buffering (text is written when enough of it has been printed to fill a buffer), STDERR may 'get there first'. The following script would cause errors on some systems:

#!/usr/local/bin/perl

print "Content-type: text/plain\n\n";
print STDERR "OK so far\n";
print "Succeeded.";
       				

because the server would see the 'OK so far' first, and fail to make sense of it. You can force line-buffering on STDOUT with:

$|=1;

placed early on in your script, which is probably a good thing to do anyway.

Wrong interpreter version

Make sure that you're not trying to run a Perl 5 script on a Perl 4 (or earlier!) interpreter. For Perl on UNIX, try:

perl -e 'print "$]\n"'

or

perl -v

at the UNIX prompt to see which version of Perl you're running. If it turns out to be Perl 4, there may be a Perl 5 interpreter installed in the same directory, but with the name perl5. Note also that Perl 4 scripts run on Perl 5 interpreters may give problems; in particular, strings that contain '@' may cause errors; '@' now needs to be escaped by inserting a backslash before it wherever it occurs in a string.

Server configuration errors

The bottom line, of course, is that the server may be configured wrongly. Some servers, for example, will throw a server error if you use POST to invoke a script stored in a directory that the server doesn't think should hold scripts. If using POST causes an error, and using GET (on the same script) returns the text of the script, it's quite likely that the directory isn't recognised as a valid directory for CGI scripts. At which point there's nothing for it but to descend into the configuration file with pick and shovel. But remember, if you're not sure what you're doing, "Meddle not in the affairs of servers, for they are subtle, and quick to anger."

Error 404

404 errors are 'Object not found'. Just as you can get 'not found' errors with HTML documents, so you can get them with CGI scripts, and for the same reason. The solution is to check your URL, and the path to your file, and ensure that the URL really does point to the place that you think it does. The relationship between a pathname on the server and the correct URL for referencing it can sometimes be non-obvious, particularly if server scripts are run in a chrooted environment (where, for security reasons, scripts run by the server are set up to see only a part of the filesystem), or the server configuration files specify a complex mapping from a URL path to an actual path in the filesystem. Remember also that some servers may not allow subdirectories in a designated cgi-bin directory, thus:

https://www.foo.com/cgi-bin/MyScript.pl

might be valid, while:

https://www.foo.com/cgi-bin/stuff/MyScript.pl

would fail.

"Document contains no data"

Sometimes, rather than getting an error message from the server, you may simply see a 'Document contains no data' message from your browser. This typically suggests that your script sent the content-header successfully - in other words, that it managed to execute at least in part - but failed to send any further information. Among the possible explanations are:

Script error

If your script fails after it has successfully sent a valid content header, you may not see any error reports from the server. Any of the usual runtime errors could cause this (such as files that couldn't be opened, 'die()' statements, division by zero etc.). Use the eval() technique to trap this kind of error.

Server timeout

If your script takes a very long time to run, the connection may time out before it sends any more data. Either the server or the browser might give up in disgust if it looks like the script isn't ever going to finish. You could possibly adjust the server's timeout period, but in general a user's attention span is likely to be shorter than the server timeout, so if your script doesn't finish in time, you've probably lost them anyway. If your script really does take a long time to run, your best solution is to make the results available as they come in; rather than buffering them up, try to trickle them down towards the user as they become available. If this isn't practical (your script has to do a lengthy computation before it can deliver any data, but there are no intermediate results that could usefully be sent) then you may have to adopt another strategy. One is simply to mail the results to the user when they're ready. The other is to use server push techniques to feed progress reports to the user intermittently. However long your computation takes, it will seem longer if the user has no feedback to suggest that something's actually happening. A visible progress indicator and an estimate of time remaining might just persuade them to stay online rather than clicking on to the next item.

No output

For some reason, your script genuinely isn't sending any more data. It executes successfully, but success doesn't include writing anything to STDOUT. Check the logic of your script to be sure that all output statements are actually reached, and check how your streams are set up to be sure that you're really writing to STDOUT. Running the script at the shell prompt may help spot this kind of error.

Transient network or server failures

More rarely, errors can result from broken connections or from server failure (i.e. running out of memory, unable to fork any more processes). This is rare, but if your script runs intermittently, or runs one day and not the next, you might be suffering from this kind of problem.

Other problems

Some other possible problems include:

Script text displayed instead of executing

When you try to run the script, the script doesn't execute; instead, the browser simply displays the text of the script. The most likely reason for this is that you've installed the script in the wrong place, or given it the wrong name. Most servers are set up to execute only scripts placed in certain directories, or, more rarely, with certain suffixes (e.g. '.cgi'). If you put a script in the wrong directory, it won't be executed, and the text of the script will be returned as an ordinary text file.

Browser prompts you to save file when running a script

Instead of the output page you expect to see, you may receive a message from your browser asking you to save a file of type 'unknown' (or 'application/x-url-encoded' etc.). This is typically caused by a failure to send a proper Content-type: header. See the note about bad HTTP headers above.

'Here-document' problems

Among the hardest syntax errors to debug in Perl are those associated with 'here-documents'. These are expressions of the form:

    print << "EndOfText";
Here's some text
EndOfText
				

The quoted 'EndOfText' token and the token that terminates the text to be printed must match exactly, or you'll see an error (when you check syntax) that reads:

'EndOfText' not found anywhere before end of file

Whitespace characters before or after the terminating token can cause the match to fail, so the 'EndOfText' token (or whatever string you're using) must be flush against the left margin, and there must be no trailing space after it. A particularly nasty 'gotcha' occurs when moving text files from DOS/Windows to UNIX. Because DOS/Windows uses two characters - <CR> and <LF> - to delimit lines, where UNIX only uses one - <LF> - you could end up with invisible <CR> characters between your token and the following linefeed. You can't see these, but Perl can, and for Perl 'EndOfText<CR>' isn't the same as 'EndOfText'. The compilation will fail mysteriously.

Debugging Tips

Some simple debugging techniques can make it easier to track down the source of problems with your script.

Using eval()

One of the best techniques for simplifying debugging in Perl is to use eval to trap runtime errors. A good general model for scripts might be:

{ ... initialization code ... }

eval("main");
if ($@) {
    print "Content-type: text/plain\n\n",
          "The script failed because the error\n$@\noccurred.";
}

sub main { ... }
					

If you get a 500 error, then you know that it's caused by something outside the main subroutine, typically one of the other error causes discussed above. If a runtime error occurs inside the main subroutine, you get useful debugging output to help you identify it.

Using the error log

Errors that occur during execution of scripts should normally logged to your server's error log. Unlike the default message that appears in the browser window, which simply notes that an error occurred and gives the class of the error, the error log will usually contain the actual error message issued by the Perl interpreter. Depending on where and how your site is hosted, you may or may not have access to the error log, but if you do, you should use it. Note that the location of the log will vary from system to system, depending on how your server is set up. The error log will typically be stored at the same location as the main access log.

Other resources

This list of reasons why things go wrong is probably not exhaustive. If it doesn't cover your problem, try the excellent Idiot's Guide to Solving Perl CGI Problems, which covers these points and more in a useful question-and-answer format. More general Perl information is available from perl.com. If you have Usenet access, the comp.lang.perl newsgroup is also a good place to ask for help and advice.

If you have other debugging tips which you think are worth mentioning here, or there's something here that you disagree with or don't understand, please contact me.