# Chapter 1: Introduction to Apache Architecture ## What is Apache HTTP Server? Apache HTTP Server (commonly called "Apache" or "httpd") is the world's most widely used web server software. Unlike simpler web servers, Apache is designed as a highly modular, extensible system that can be customized for virtually any use case. For a C/Linux developer approaching Apache for the first time, think of it as a **framework** rather than a monolithic application. The core is relatively small - most functionality lives in modules that plug into a well-defined architecture. ## High-Level Architecture Overview ```mermaid %%{init: {"gantt": {"displayMode": "compact", "barHeight": 30, "leftPadding": 85}}}%% gantt title Apache HTTP Server Architecture tickInterval 10day dateFormat YYYY-MM-DD axisFormat " " section HTTP Core Request parsing, response generation, protocol logic : 2024-01-01, 5d section Modules mod_ssl : 2024-01-01, 1d mod_proxy : 2024-01-02, 1d mod_cgi : 2024-01-03, 1d mod_rewrite : 2024-01-04, 1d mod_... : 2024-01-05, 1d section Hook System Modules register callbacks at processing phases : 2024-01-01, 5d section Filter Chain Bucket Brigades - I/O abstraction layer : 2024-01-01, 5d section MPM Multi-Processing Module (prefork / worker / event) : 2024-01-01, 5d section APR Apache Portable Runtime (memory, I/O, threads, strings) : 2024-01-01, 5d section OS Operating System (Linux, Windows, BSD) : 2024-01-01, 5d ``` ## The Key Abstractions Apache's architecture is built on several key abstractions. Understanding these is crucial before diving into the code: ### 1. APR (Apache Portable Runtime) The foundation layer. APR provides cross-platform APIs for: - Memory management (pools) - File I/O - Network sockets - Threading and process management - Hash tables, arrays, strings **Why it matters**: You'll never see raw `malloc()` or `socket()` calls in Apache code. Everything goes through APR. ### 2. Pools (Memory Management) Apache uses a hierarchical pool-based memory allocator. Instead of manually tracking every allocation, you allocate from a pool, and when the pool is destroyed, everything allocated from it is freed automatically. ```c // Instead of: char *buf = malloc(1024); // ... use buf ... free(buf); // Easy to forget! // Apache uses: char *buf = apr_palloc(pool, 1024); // ... use buf ... // Automatically freed when pool is destroyed ``` ### 3. Modules Everything in Apache is a module. Even core functionality like HTTP protocol handling is implemented as modules. A module is a struct that declares: - What hooks it wants to register callbacks for - What configuration directives it provides - What filters it implements ### 4. Hooks Hooks are the extension points in Apache's request processing. At various phases, Apache calls all modules that registered for that hook. For example: - {httpd}`ap_hook_handler` - Called to generate response content - {httpd}`ap_hook_access_checker` - Called to check access permissions - {httpd}`ap_hook_translate_name` - Called to map URL to filesystem ### 5. Filters and Bucket Brigades All I/O in Apache flows through filters arranged in chains. Data is passed between filters as "bucket brigades" - linked lists of data chunks. This allows: - mod_ssl.c to transparently encrypt/decrypt - mod_deflate.c to compress responses - Custom modules to transform content ### 6. MPM (Multi-Processing Module) The MPM controls how Apache handles concurrency: - **prefork**: One process per connection (safe, but heavy) - **worker**: Multiple threads per process - **event**: Async I/O with thread pool (most efficient) Only one MPM is active at a time. ## Source Code Organization When you look at the Apache source tree, here's what you'll find: ``` httpd-2.4.x/ ├── server/ # Core server code │ ├── main.c # Entry point │ ├── config.c # Configuration parsing │ ├── core.c # Core module │ ├── request.c # Request processing │ ├── protocol.c # HTTP protocol handling │ └── ... ├── modules/ # All modules organized by category │ ├── aaa/ # Authentication/Authorization │ ├── filters/ # Content filters │ ├── generators/ # Content generators (CGI, etc.) │ ├── http/ # HTTP protocol modules │ ├── loggers/ # Logging modules │ ├── mappers/ # URL mapping modules │ ├── proxy/ # Proxy functionality │ ├── ssl/ # SSL/TLS support │ └── ... ├── include/ # Public headers │ ├── httpd.h # Main definitions │ ├── http_config.h # Configuration API │ ├── http_core.h # Core module API │ ├── http_protocol.h │ ├── http_request.h │ ├── ap_*.h # Various APIs │ └── ... ├── srclib/ # Bundled libraries │ ├── apr/ # Apache Portable Runtime │ └── apr-util/ # APR utilities ├── os/ # OS-specific code └── support/ # Helper utilities ``` ## Key Data Structures Before reading Apache code, familiarize yourself with these fundamental structures: ### {httpd}`server_rec` - Server Configuration Represents a virtual host. Contains all configuration for a server context. ```c struct server_rec { const char *defn_name; // Config file where defined const char *server_hostname; // ServerName apr_port_t port; // Port number /* ... many more fields ... */ }; ``` ### {httpd}`conn_rec` - Connection Represents a client connection. Lives for the duration of a TCP connection (may serve multiple requests with keep-alive). ```c struct conn_rec { apr_pool_t *pool; // Connection pool server_rec *base_server; // Virtual host void *conn_config; // Per-connection module configs apr_socket_t *client_socket; // The actual socket const char *client_ip; // Client IP address /* ... */ }; ``` ### {httpd}`request_rec` - HTTP Request The central structure. Contains everything about a single HTTP request/response. ```c struct request_rec { apr_pool_t *pool; // Request pool (freed after response) conn_rec *connection; // Parent connection server_rec *server; // Server handling this request // Request info const char *the_request; // First line of request char *method; // GET, POST, etc. char *uri; // Request URI char *filename; // Translated to filesystem path // Headers apr_table_t *headers_in; // Request headers apr_table_t *headers_out; // Response headers // Response info int status; // HTTP status code const char *content_type; // Response Content-Type // Module configurations void *per_dir_config; // Per-directory config vector void *request_config; // Per-request module data /* ... many more fields ... */ }; ``` ## The Request Lifecycle (Preview) When a request arrives, Apache processes it through distinct phases: 1. **Connection accepted** - MPM accepts TCP connection 2. **pre_connection hooks** - Modules can set up connection-level state 3. **Read request** - HTTP request line and headers parsed 4. **Post-read-request hooks** - First chance to examine request 5. **URI translation** - Map URI to handler/filename 6. **Access checking** - IP-based access control 7. **Authentication** - Who is the user? 8. **Authorization** - Is user allowed? 9. **MIME type checking** - Determine content type 10. **Fixups** - Last chance to modify before handling 11. **Handler** - Generate response content 12. **Logging** - Record what happened 13. **Cleanup** - Free request resources Each phase has associated hooks where modules can participate. ## Building Apache from Source For development and fuzzing, you'll want to build Apache from source: ```bash # In the httpd source directory ./configure --prefix=/path/to/install \ --enable-modules=most \ --enable-static-support \ --with-included-apr make make install ``` Key configure options: - `--enable-modules=most` - Build most modules - `--enable-static-support` - Build modules statically (easier for fuzzing) - `--with-included-apr` - Use bundled APR instead of system ## What's Next In the following chapters, we'll dive deep into each component: - **Chapter 2**: APR - The foundation library - **Chapter 3**: Memory pools - Apache's memory management - **Chapter 4**: Configuration system - **Chapter 5**: MPM - Process/thread models - **Chapter 6**: Hook system - Extending Apache - **Chapter 7**: Filters and bucket brigades - **Chapter 8**: Request processing pipeline - **Chapter 9**: Module anatomy - Writing your own - **Chapter 10**: Building and linking - **Chapter 11**: Fuzzing Apache Each chapter builds on the previous, and by the end, you'll understand Apache well enough to build a fuzzing harness that exercises the entire request processing pipeline.