Prerequisites
You should be comfortable with C. Not just "C-syntax"; you should know your way around a struct and not be scared off by pointers and function references, and be cognizant of the preprocessor. If you need to brush up, nothing beats K&R.
Basic understanding of HTTP is useful. You'll be working on a web server, after all.
You should also be familiar with Nginx's configuration file. If you're not, here's the gist of it: there are four contexts (called main, server, upstream, and location) which can contain directives with one or more arguments. Directives in the main context apply to everything; directives in the server context apply to a particular host/port; directives in the upstream context refer to a set of backend servers; and directives in a location context apply only to matching web locations (e.g., "/", "/images", etc.) A location context inherits from the surrounding server context, and a server context inherits from the main context. The upstream context neither inherits nor imparts its properties; it has its own special directives that don't really apply elsewhere. I'll refer to these four contexts quite a bit, so... don't forget them.
Let's get started.
High-Level Overview of Nginx's Module Delegation
Nginx modules have three roles we'll cover:
* handlers process a request and produce output
* filters manipulate the output produced by a handler
* load-balancers choose a backend server to send a request to, when more than one backend server is eligible
Modules do all of the "real work" that you might associate with a web server: whenever Nginx serves a file or proxies a request to another server, there's a handler module doing the work; when Nginx gzips the output or executes a server-side include, it's using filter modules. The "core" of Nginx simply takes care of all the network and application protocols and sets up the sequence of modules that are eligible to process a request. The de-centralized architecture makes it possible for *you* to make a nice self-contained unit that does something you want.
Note: Unlike modules in Apache, Nginx modules are not dynamically linked. (In other words, they're compiled right into the Nginx binary.)
How does a module get invoked? Typically, at server startup, each handler gets a chance to attach itself to particular locations defined in the configuration; if more than one handler attaches to a particular location, only one will "win" (but a good config writer won't let a conflict happen). Handlers can return in three ways: all is good, there was an error, or it can decline to process the request and defer to default handler (typically something that serves static files).
If the handler happens to be a reverse proxy to some set of backend servers, there is room for another type of module: the load-balancer. A load-balancer takes a request and a set of backend servers and decides which server will get the request. Nginx ships with two load-balancing modules: round-robin, which deals out requests like cards at the start of a poker game, and the "IP hash" method, which ensures that a particular client will hit the same backend server across multiple requests.
If the handler does not produce an error, the filters are called. Multiple filters can hook into each location, so that (for example) a response can be compressed and then chunked. The order of their execution is determined at compile-time. Filters have the classic "CHAIN OF RESPONSIBILITY" design pattern: one filter is called, does its work, and then calls the next filter, until the final filter is called, and Nginx finishes up the response.
The really cool part about the filter chain is that each filter doesn't wait for the previous filter to finish; it can process the previous filter's output as it's being produced, sort of like the Unix pipeline. Filters operate on buffers, which are usually the size of a page (4K), although you can change this in your nginx.conf. This means, for example, a module can start compressing the response from a backend server and stream it to the client before the module has received the entire response from the backend. Nice!
So to wrap up the conceptual overview, the typical processing cycle goes:
Client sends HTTP request ? Nginx chooses the appropriate handler based on the location config ? (if applicable) load-balancer picks a backend server ? Handler does its thing and passes each output buffer to the first filter ? First filter passes the output to the second filter ? second to third ? third to fourth ? etc. ? Final response sent to client
I say "typically" because Nginx's module invocation is extremely customizable. It places a big burden on module writers to define exactly how and when the module should run (I happen to think too big a burden). Invocation is actually performed through a series of callbacks, and there are a lot of them. Namely, you can provide a function to be executed:
* Just before the server reads the config file
* For every configuration directive for the location and server for which it appears;
* When Nginx initializes the main configuration
* When Nginx initializes the server (i.e., host/port) configuration
* When Nginx merges the server configuration with the main configuration
* When Nginx initializes the location configuration
* When Nginx merges the location configuration with its parent server configuration
* When Nginx's master process starts
* When a new worker process starts
* When a worker process exits
* When the master exits
* Handling a request
* Filtering response headers
* Filtering the response body
* Picking a backend server
* Initiating a request to a backend server
* Re-initiating a request to a backend server
* Processing the response from a backend server
* Finishing an interaction with a backend server
Holy mackerel! It's a bit overwhelming. You've got a lot of power at your disposal, but you can still do something useful using only a couple of these hooks and a couple of corresponding functions. Time to dive into some modules.