apache to nginx migration

apache is the webserver of note. Although hitting it's peak of serving nearly 80% of websites worldwide in the early 2000's, Apache has recently made a comeback taking market share from IIS in the process.

nginx is a reverse mail/http proxy and web server written by Igor Sysoev. It's shown some very tempting benchmarked performance statistics over apache. It has gained enough popularity to become the 5th largest web-server vendor reported by Netcraft by August, 2008. Originally written and used mostly in the Russian community, it has found a place in the English speaking community and there are now English language docs written for it.

why?

If Apache is used by 60% of the websites out there, aren't all software stacks tuned to use it? Shouldn't it be the best option even if it is not the best webserver?

Webservers have to support and handle many different tasks. Some of them are especially efficient at handling some types of tasks (like serving static media) while being incapable of performing others. nginx allows you to take a "best of breed" approach by being an extremely fast load balancer, thus being (in theory) able to delegate subtasks out to applications or application stacks that excel in handling those tasks.

basic nginx architecture possibilities

nginx is a "pure" asynchronous server, which means requests are handled by processes asynchronously (in one loop), not in threading. This is where much of it's memory & performance enhancements come into play; there are no threads and little locking in pure asynchronous architectures.

As it is both a load balancer and an HTTP server, it can change the way you develop your application structure. Classic 3-tier architecture is much more attractive when each tier will be lightweight memory-wise and blazing fast. Most DIY Apache configurations end up being 2-tier, with the webserver & appserver combined and memory-resident "for speed". This is the way that Apache/mod_php and Apache/mod_python work.

3-tier with Apache is done mostly with FastCGI, but is starting to become popular in the python world via mod_wsgi. In this architecture, the webserver & appserver are separated. This usually calls for more configuration, but leads to a lot of flexibility. Although both FastCGI & mod_wsgi are usually run as Apache modules, the actual processes that are taking care of the requests are run separately from the server.

fastCGI

FastCGI is exactly what it suggests it might be, if you actually understood what CGI is. FastCGI is a new set of protocols for a gateway interface between a webserver and an appserver. Whereas cgi requires you to load an entire new process (fed with information from the webserver) each time, FastCGI allows persistent appservers to deal with requests over a long period of time for a set (or infinite) number of requests. The problem with traditional CGI was that process startup times were taken for every request, something that becomes a big deal as you fire up an entire instance of the perl or python interpreter when each request comes in. FastCGI fixes this while also letting you expire those workers after a certain number of requests to mitigate the effects of long term memory leaks.

nginx comes with FastCGI support built in, and the configuration is very simple:

server {
    listen example.com:80
    server_name www.example.com;
    # set indexes to index.php or index.html
    index index.php index.html;
    root /var/www/example.com;

    location ~ .php$ {
        # 'pass' the request to FastCGI running @ localhost:9000
        fastcgi_pass 127.0.0.1:9000;
        fastcgi_index index.php;
        # set the parameter 'SCRIPT_FILENAME' to be the docroot + script name
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    }
}

mod_wsgi

WSGI is a Web Server Gateway Interface designed for use with python appservers & applications. mod_wsgi (henceforth called apache-mod_wsgi for reasons that will become apparent) is a project by Graham Dumpleton to bridge to embed a very small and simple WSGI delegation in Apache. apache-mod_wsgi has different modes of operation: embedded mode and daemon mode. Embedded mode works similar to mod_python, in which code is executed (via a WSGI interface, not mod_python handlers) within the context of the apache process and inhabits the same memory as the webserver and any other modules loaded into it. Daemon mode works much like FastCGI works; worker processes (that are WSGI handlers) are spawned off, and apache-mod_wsgi proxies requests to the worker processes. Daemon mode has many of the same benefits over embedded mode as FastCGI has over traditional 2-Tier infrastructure, the limitation being speed. However, unlike FastCGI, apache-mod_wsgi does not communicate to it's worker daemons over TCP, so they cannot be on a separate server, thus limiting the distribution of the Webserver & Appserver tiers.

There is a separate (and confusingly named) project for nginx also called mod_wsgi, henceforth known as nginx-mod_wsgi. This project's aim is similar to apache-mod_wsgi in that it turns nginx into a WSGI server. However, unlike Apache, nginx is purely asynchronous, which means that processes executing under nginx-mod_wsgi block the webserver, stopping it from answering requests. Because of this, it is not recommended that you use nginx-mod_wsgi to serve non-wsgi applications; rather, you set up a vanilla version of nginx as a proxy in front of nginx-mod_wsgi. Since nginx's primary goal is being a super-fast proxy, this actually works quite well in practice. In addition to this limitation/structural consideration, nginx is not designed with modularity in mind (it doesn't fit with it's architecture), so nginx-mod_wsgi exists as a source code patch to nginx.

serving PHP content

PHP is servable through FastCGI. There is really only one "approach" to serving PHP content, although there are a few PHP FastCGI daemons that people have reported varying degrees of success with:

  • PHP's built in php-fcgi FastCGI handler

  • use lighttpd's spawn-fcgi

  • Use php-fpm, a patch to PHP with an improved FastCGI handler

Although ideally one would do some benchmarking to determine which one to use, I've seen lots of reports that php-fpm is the best choice of the 3. Although it unfortunately requires compiling PHP from source (as php-fpm is a source patch to PHP), it is not difficult to set up, has simple configuration, and the binary acts similarly to an init script.

serving python content

Python is natively servable through FastCGI as well as WSGI. The common FastCGI serving method for python is to use flup. Flup's FastCGI implementation is written entirely in python, and might be quite slow. It's also, as of late 2009, unmaintained and not recommended at all anymore. WSGI has become the dominant method of serving python applications, and apache's mod-wsgi module has become the dominant WSGI application server.

Although it remains to be seen if nginx-mod_wsgi has speed comparable to apache-mod_wsgi, Apache's module appears far more robust and has more development going on. There are known configurations for various important pieces of software (like MoinMoin and Trac) which actually involved tweaking at the mod_wsgi layer. These pieces of software might be difficult to set up on the WSGI implementation for nginx.

For now, the best solution appears to be to act as a proxy to a background appserver. The best available python appserver is an Apache instance w/ worker MPM and apache-mod_wsgi in daemon mode. This will essentially give us a fairly heavy-weight WSGI application server, but also minimize the Apache overhead.

MigratingFromApacheToNginx (last edited 2009-10-16 00:55:44 by jmoiron)

Edit and actions menu

  • Immutable Page
  • Info
  • Attachments