Server-side Statistics Scripting in PHP
Main Article Content
Abstract
On the UCLA Statistics WWW server there are a large number of demos and calculators that can be used in statistics teaching and research. Some of these demos require substantial amounts of computation, others mainly use graphics. These calculators and demos are implemented in various different ways, reflecting developments in WWW based computing.
As usual, one of the main choices is between doing the work on the client-side (i.e. in the browser) or on the server-side (i.e. on our WWW server). Obviously, client-side computation puts fewer demands on the server. On the other hand, it requires that the client downloads Java applets, or installs plugins and/or helpers. If JavaScript is used, client-side computations will generally be slow. We also have to assume that the client is installed properly, and has the required capabilities. Requiring too much on the client-side has caused browsing machines such as Netscape Communicator to grow beyond all reasonable bounds, both in size and RAM requirements. Moreover requiring Java and JavaScript rules out such excellent browsers as Lynx or Emacs W3.
For server-side computing, we can configure the server and its resources ourselves, and we need not worry about browser capabilities and configuration. Nothing needs to be downloaded, except the usual HTML pages and graphics. In the same way as on the client side, there is a scripting solution, where code is interpreted, or a ob ject-code solution using compiled code. For the server-side scripting, we use embedded languages, such as PHP/FI. The scripts in the HTML pages are interpreted by a CGI program, and the output of the CGI program is send to the clients. Of course the CGI program is compiled, but the statistics procedures will usually be interpreted, because PHP/FI does not have the appropriate functions in its scripting language. This will tend to be slow, because embedded languages do not deal efficiently with loops and similar
constructs.
Thus a first step towards greater efficiency is to compile the necessary primitives into the PHP/FI executable. This is easy to do, because the API is quite simple. In the extensions below, we have added the complete ranlib and dcdflib to PHP, plus some additional useful functions. The source code for these extensions, plus Solaris binaries for libranlib.a and libdcdf.a can be obtained from our server.
Interpreting a PHP script, even with our new primitives, still requires starting up a CGI process for each page that is read. Again, this can be improved upon. We could use FastCGI to keep the CGI process around on a permanent basis. Instead, we have chosen a more direct method. PHP can be compiled as an Apache module, i.e. it can be compiled into the Apache HTTPD server binary. This means that PHP scripts are interpreted by the WWW server, which is always around, and which will fork additional children if necessary. No CGI processes need to be started. The PHP install process creates a libphp.a and mod_php.c in the Apache source directories, which can be used to build an enhanced server. This has the additional advantage of security, because all security features of the server can be used, and none of the pitfalls of using CGI or Java apply.
Using PHP, in combination with the WWW server, also has some disadvantages. Although we can make simple static plots, using the gd library, we cannot use any dynamics, and interaction between the user and the page is somewhat limited. Java, or scripts using a client-side Xlisp-Stat as a helper, are more flexible in this respect. As a consequence, the UCLA Statistics pages still use a combined approach, with server-side PHP and CGI and client-side Xlisp-Stat and Java/JavaScript. Sometime this year, server-side Java scripting will become available, and then it seems advisable to switch as much of the code as possible to the server-side.
As usual, one of the main choices is between doing the work on the client-side (i.e. in the browser) or on the server-side (i.e. on our WWW server). Obviously, client-side computation puts fewer demands on the server. On the other hand, it requires that the client downloads Java applets, or installs plugins and/or helpers. If JavaScript is used, client-side computations will generally be slow. We also have to assume that the client is installed properly, and has the required capabilities. Requiring too much on the client-side has caused browsing machines such as Netscape Communicator to grow beyond all reasonable bounds, both in size and RAM requirements. Moreover requiring Java and JavaScript rules out such excellent browsers as Lynx or Emacs W3.
For server-side computing, we can configure the server and its resources ourselves, and we need not worry about browser capabilities and configuration. Nothing needs to be downloaded, except the usual HTML pages and graphics. In the same way as on the client side, there is a scripting solution, where code is interpreted, or a ob ject-code solution using compiled code. For the server-side scripting, we use embedded languages, such as PHP/FI. The scripts in the HTML pages are interpreted by a CGI program, and the output of the CGI program is send to the clients. Of course the CGI program is compiled, but the statistics procedures will usually be interpreted, because PHP/FI does not have the appropriate functions in its scripting language. This will tend to be slow, because embedded languages do not deal efficiently with loops and similar
constructs.
Thus a first step towards greater efficiency is to compile the necessary primitives into the PHP/FI executable. This is easy to do, because the API is quite simple. In the extensions below, we have added the complete ranlib and dcdflib to PHP, plus some additional useful functions. The source code for these extensions, plus Solaris binaries for libranlib.a and libdcdf.a can be obtained from our server.
Interpreting a PHP script, even with our new primitives, still requires starting up a CGI process for each page that is read. Again, this can be improved upon. We could use FastCGI to keep the CGI process around on a permanent basis. Instead, we have chosen a more direct method. PHP can be compiled as an Apache module, i.e. it can be compiled into the Apache HTTPD server binary. This means that PHP scripts are interpreted by the WWW server, which is always around, and which will fork additional children if necessary. No CGI processes need to be started. The PHP install process creates a libphp.a and mod_php.c in the Apache source directories, which can be used to build an enhanced server. This has the additional advantage of security, because all security features of the server can be used, and none of the pitfalls of using CGI or Java apply.
Using PHP, in combination with the WWW server, also has some disadvantages. Although we can make simple static plots, using the gd library, we cannot use any dynamics, and interaction between the user and the page is somewhat limited. Java, or scripts using a client-side Xlisp-Stat as a helper, are more flexible in this respect. As a consequence, the UCLA Statistics pages still use a combined approach, with server-side PHP and CGI and client-side Xlisp-Stat and Java/JavaScript. Sometime this year, server-side Java scripting will become available, and then it seems advisable to switch as much of the code as possible to the server-side.