Why is thttpd Secure?

Detailed Code Walkthrough

Here's where the weak of heart may want to stop reading. The remainder of this paper consists of a detailed code walkthrough that examines every line of the code (for the W3 daemon) as it operates to show that the daemon does not allow information to flow where it does not belong, that allocation errors cannot result from attacker-provided information, and other such things.

The Global Variables and Macros

We begin this code walkthrough with the definition of the global variables so that the rest of the code can be understood when it references them.

int CHECKUSER=1;int DOCHROOT=1;
#define BUFSIZE 4096
#define MAXSIZE 2048
char	line[BUFSIZE],name[BUFSIZE],bs1[BUFSIZE],bs2[BUFSIZE],bs3[BUFSIZE], timestamp[64],
	logline[BUFSIZE], remotehost[BUFSIZE], remoteuser[BUFSIZE];
struct stat buf;

Note that all arrays are of fixed length and predefined. This is done so that no space allocation is required by the program through its variables once the daemon is operating. Although these operations can be done reliably, it is desirable for simplicity that we limit the operations to those absolutely required for the program's function. Notice also that DOCHROOT=1 defines CHROOT as desired. This can be altered at compile time to prevent the CHROOT function if so desired.

Next come the defined macros and constants, again so we can understand the context.

#define ERRORLINE "The requested document has moved to here.

\n" #define REDIRECT "Location: http://all.net/\n" #define WWWUID 101 #define WWWDIR "/u/www/htdocs" #define WWWDefaultFile "/testserver.html" #define WWWlog "/log" #define LOG2(x,y) {FILE *F;F=fopen(WWWlog,"a+");if (F != NULL) {logfile(F);fprintf(F,x,y);} fclose(F);} #define LOG3(x,y,z) {FILE *F;F=fopen(WWWlog,"a+");if (F != NULL) {logfile(F);fprintf(F,x,y,z);} fclose(F);} #define LOG4(x,y,z,w) {FILE *F;F=fopen(WWWlog,"a+");if (F != NULL) {logfile(F);fprintf(F,x,y,z,w);} fclose(F);}

The Main Program

Now we will demonstrate operation by starting at the main program, where the program starts when it is run, and showing what it does, why it does it that way, and hopefully, why the operation is safe. We have added line numbers for reference purposes.

01 main(argc,argv,envp)
02 int argc; char *argv[],*envp[];
03 {if (0 != chdir(WWWDIR))error("Cannot change to WWW directory");
04 if (DOCHROOT == 1) if (chroot(".") != 0)  error("Cannot change root directory to .");
05 if (0 != setuid(WWWUID)) error("setUID failed");	/* become user www or die */
06 if (argc>1) strncpy(remotehost,argv[1],MAXSIZE); else strcpy(remotehost,"nowhere");
07 if (argc>2) strncpy(remoteuser,argv[2],MAXSIZE); else strcpy(remoteuser,"nobody");
08 remotehost[MAXSIZE]='\0';remoteuser[MAXSIZE]='\0';
09 read(0,line,MAXSIZE);line[MAXSIZE]='\0'; sscanf(line, "%s %s %s", bs1, name, bs2);	/* get request */
10 if ((name[0] != '\0') && (name[strlen(name)-1] == '\r')) name[strlen(name)-1]='\0';
11 if ((name[0]=='/') && ((name[1]=='\0') || (name[1]==' '))) strcpy(name,WWWDefaultFile);
12 if (DOCHROOT!=1) {strcpy(bs3,WWWDIR);strcat(bs3,name);strcpy(name,bs3);}
13 if (strncasecmp(bs1,"get",5) == 0) fetch();	/* get */
14 error("Unknown request");	/* all other requests fail */ }

Lines 01 and 02 are standard C program startup information defining the main program and the three normal arguments. Notice that there are no local variables in the main program.

Line 03 attempts to change directories to the area being used for services. If this cannot be done, the program immediately fails, never having examined any outside information. The error() routing will be detailed later. Line 03 is required in order to implement the Chroot function that limits access to a subset of the file system.

Line 04 changes the "root" directory to the current directory. Note that since no outside input has been brought into the program yet, the DOCHROOT=1 setting cannot have been altered by any outside program input. If the Chroot function fails, the program immediately fails, never having examined any outside information. This line is required in order to operate in a Chroot environment.

Line 05 changes the effective user ID of this program to the special (www) user ID provided for the server function. Note that in order for the Chroot function above to be performed, it is necessary to begin operation as the superuser, and that after the necessary and sufficient operations are performed with privileges, this function immediately removes those privileges in favor of the lower privileges of the www user. If the SetUID function fails, the program immediately fails, never having examined any outside information. This line is required in order to operate in a less privileged environment after doing a Chroot.

Line 06 first verifies that there is a first command line argument, and if one is present, copies its value into the remotehost array. If it is not present, it copies the fixed string "nowhere" into the remotehost array. The command line arguments (if any) are provided by the TCP wrapper program used to invoke the daemon or by the inetd daemon used to invoke the daemon. The array is (#define BUFSIZE 4096) 4096 bytes long, which is larger than the system-defined limit on the size of a command-line argument and larger than the largest possible argument generated by the TCP wrappers program. In addition, strncpy limits the number of bytes copied to MAXSIZE and line 08 enforces this boundary with a terminating '\0' byte. Thus, the input (provided indirectly from an outside Domain Name Server) is confined to the array "remotehost". Line 07 does the same operation for the second command-line argument and stores the result in remoteuser, with line 08 again enforcing this boundary condition.

Line 09 reads the only input provided by the user through the normal TCP channel into the array "line", limiting the number of bytes read to MAXSIZE. It then uses the "scanf" function to split the line into Bs1, Name, and Bs2, each of which has the size BUFSIZE. Again, this enforces confinement of the input to Bs1, Name, and Bs2.

Line 10 removes any trailing characters from the input line caused by some browsers. This is not required for the confinement or security of the daemon, but addresses incompatibilities between browsers and leaves the data confined to Name. Line 11 checks to see if Name is empty, in which case, the protocol requires the use of WWWDefaultFile as the name of the file to be retrieved. In this case, the lack of input results in WWWDefaultFile being contained to Name.

Line 11 is for installations not desiring to use the Chroot environment. In this insecure use, the program prepends the pathname of WWWDIR to the filename so that operation is transparent. NOTE: If DOCHROOT is not 1 the program is not operating in a secure mode! Also note that if DOCHROOT is 1 at program initialization, it remains 1 at this point because it has not been explicitly changed and all input has been confined so as to not affect this variable.

Line 12 determines if the request was one that this daemon handles (GET) by looking at the first 5 bytes of Bs1 (non-case sensitive). Note that the only possible effect of this examination is that the routine "fetch" is called. At most the first 5 bytes of Bs1 are examined, and the only side effect is the calling of fetch. Line 13 causes an error result if the request was not valid.

It has been commented that: 'Since "get" is only 3 letters long no more than 4 (3 + null) characters of bs1 will be looked at.' You may call it paranoia, but in case the implementation copies both strings before checking lengths and has too-small fixed-length or erroneously generated storage, the limitation of 5 forces additional constraints. It was correctly commented that it might be a better idea to write a little routine to do this check, or to simply use an if with a proper conditional. I agree, but in an earlier version, another reader commented on the lack of elegance in using the if statement, so I guess you can never win.

At this point in the program's execution, only two possibilities exist for program flow. Either error is called with the fixed string argument "Unknown request", or fetch is called with no arguments. In either case, all results of input are stored in the first MAXSIZE bytes of arrays Remotehost, Remoteuser, Bs1, Name, and Bs2 (capitals used for emphasis).

It has been commented that: 'If the request and headers don't fit in MAXSIZE bytes, real clients will hang waiting for the server to read the rest of the request.' and also that: 'On slow networks, read may return less than a line of input ( will return whats available) so valid requests will fail.' Both are probably true, however, almost all current requests are limited to less than the specified length, and in the insecure versions of httpd fixed string lengths are also used, and without proper bounds checking. In the case of longer requests, we will have already returned an answer before the reqquest is completed, and all W3 clients we have tested respond properly. In the case of the too-short request, we believe that trying to wait for a request of that sort would cause denial service possibilities in that an attacker could launch a series of incomplete http requests designed to overrun the number of available processes in the process table, and thus cause denial. By treating all requests as "one-shot", we avoid the classical "allocation problem" which is unsolvable. In practice, this condition has never been detected, and if it were to happen, it would only deny services to select clients on rare occasions while greatly reducing the complexity and thus increasing the security of the server.

Error Handling and Logging

The "error" routine takes the fixed length compile-time argument and produces two results. It returns an error to the requesting client, and it appends error information to the daemon's log file.

01 error(s)			/* simulate a 302 - document moved */
02 char	*s;
03 {printf("HTTP/1.0 302 Found\n");printf("Server: ManAl/0.1\n");
04 printf("MIME-version: 1.0\n");printf(REDIRECT);printf("Content-type: text/html\n");
05 printf("Document moved\n");
06 printf("

Document moved

\n");printf(ERRORLINE);printf("(%s) \n",s); 07 LOG4("Error:%s - %s %s\n",s,bs1,name);exit();}

Lines 01 and 02 define the routine and it's character string input. Lines 03, 04, 05, and 06 print fixed length strings. None of these can be affected by the inputs at this point in the program because the inputs are still confined as described above. Line 07 executes the macro LOG4 and then exits the program. LOG4 (from above) allocates a file pointer, opens the fixed-name file specified for the logfile at compile time, executes the "logile" function (described next), and using the fixed format specified in line 07, prints internal variables and external values stored in bs1 and name into that logfile. It then closes the logfile. Hence bs1 and name are confined to the output file and cannot otherwise affect the daemon. Only the "logfile" function remains to be examined to assure that the "error" execution path retains the confinement properties that prevent inputs from adversely affecting the daemon.

01 void	logfile(F)
02 FILE	*F;
03 {time_t *tloc;time_t	t;
04 t=time(NULL);strftime(timestamp, 20, "%Y/%m/%d %T", localtime(&t));
05 fprintf(F,"%s %s %s ",remotehost,remoteuser,timestamp);}

Lines 01 and 02 of "logfile" define the function as having a single argument, that being the file pointer to the logfile. Lines 03 and 04 prepare a time stamp by calling system time functions and storing the results in the first 20 bytes of the "timestamp" array (which is far longer than 20 bytes). Line 05 prints the external information in confined variables remotehost and remoteuser to the log file. Hence, these values are confined to the logfile only, and the whole error routine maintains the confinement principles that protect the daemon from external attack.

As a side note, the time printed by the daemon is affected by the Chroot environment. In the worst case, this causes all times to be reported in GMT. If properly installed per the installation instructions, the daemon reports local times properly.

The Get Command

The only remaining execution path is through the "fetch" routine which we will now examine.

01 fetch()		/* if www owns it, it can be put - else, forget it */
02 {int	staterr;
03 staterr=stat(name,&(buf));
04 if (staterr != 0) error("Can't stat file");		/* can't stat the file - die */
05 if (0 == S_ISREG(buf.st_mode)) error("Can't fetch directories");
06 if (CHECKUSER==1) if (buf.st_uid != geteuid()) error("Not owner of file"); /* don't own it - die */
07 if (0 != (S_IROTH & buf.st_mode)) {cat(name); LOG2("cat %s\n",name);exit();}	/* Send it*/
08 error("Access Denied");}

Lines 01 and 02 define the routine and its one internal variable "staterr". Line 03 uses the confined input name as the name of the file to get the status of. This can only effect the results stored in "staterr" and the repository for results "buf". If there is no file of the specified name or it cannot be detected by the stat routine for any other reason, "staterr" will return a non-zero value, which will result in calling the safe error routine described above in line 04. If the name specified is the name of a legitimate file, then the result placed in "buf" contains specific information relating to that file.

Line 05 prevents fetching directories. Although this does not present a security risk, it does not fit within the html protocol suite, so it is eliminated to prevent browser errors. Line 05 retains the confinement properties or calls "error" which retains them.

Line 06 verifies that the named file is owned by the current (www) user. If it is not owned by that user, "error" is called. Line 06 retains the confinement properties or calls "error" which retains them.

Line 07 verifies that the file is readable by "world" and if so, sends the requested file to the requesting user via the "cat" function (described below), and uses the logfile procedure described earlier to report results to the log file. Thus, line 09 retains the confinement properties described earlier.

Line 08 exits the program with an error if no other exit has been exersized to indicate an access denial because the requested file is protected against world access. This retains the confinement properties described earlier and leaves us only with the routine "cat" to verify.

void	cat(s)
char	s[];
{int	i,n;FILE *F;
i=open(s,0); while ((n=read(i,bs2,MAXSIZE)) > 0) write(1,bs2,n);close(i);}

The first three lines of the "cat" function are purely definitional. In the last line, the file named by the outsider is "open"ed for reading only. If the file doesn't exist (even though it was previously confirmed as existing) or cannot be opened, the while loop returns a non-positive result, no write is done, and the file pointer is removed via the close operation. If the file exists and can be read, the contents of the file are sent to the standard output of the daemon, or in other words, to the client who made the request. Again, the name of the file stored in "s" is confined so as to have no effect on the daemon or on any other part of the server other than the desired shipment of the result to the requesting client.