Chapter 1: Introduction to Apache Architecture

What is Apache HTTP Server?

Apache HTTP Server (commonly called “Apache” or “httpd”) is the world’s most widely used web server software. Unlike simpler web servers, Apache is designed as a highly modular, extensible system that can be customized for virtually any use case.

For a C/Linux developer approaching Apache for the first time, think of it as a framework rather than a monolithic application. The core is relatively small - most functionality lives in modules that plug into a well-defined architecture.

High-Level Architecture Overview

        %%{init: {"gantt": {"displayMode": "compact", "barHeight": 30, "leftPadding": 85}}}%%
gantt
    title Apache HTTP Server Architecture
    tickInterval 10day
    dateFormat YYYY-MM-DD
    axisFormat " "
    section HTTP Core
    Request parsing, response generation, protocol logic : 2024-01-01, 5d

    section Modules
    mod_ssl      : 2024-01-01, 1d
    mod_proxy    : 2024-01-02, 1d
    mod_cgi      : 2024-01-03, 1d
    mod_rewrite  : 2024-01-04, 1d
    mod_...      : 2024-01-05, 1d

    section Hook System
    Modules register callbacks at processing phases : 2024-01-01, 5d

    section Filter Chain
    Bucket Brigades - I/O abstraction layer : 2024-01-01, 5d

    section MPM
    Multi-Processing Module (prefork / worker / event) : 2024-01-01, 5d

    section APR
    Apache Portable Runtime (memory, I/O, threads, strings) : 2024-01-01, 5d

    section OS
    Operating System (Linux, Windows, BSD) : 2024-01-01, 5d

The Key Abstractions

Apache’s architecture is built on several key abstractions. Understanding these is crucial before diving into the code:

1. APR (Apache Portable Runtime)

The foundation layer. APR provides cross-platform APIs for:

Memory management (pools)
File I/O
Network sockets
Threading and process management
Hash tables, arrays, strings

Why it matters: You’ll never see raw malloc() or socket() calls in Apache code. Everything goes through APR.

2. Pools (Memory Management)

Apache uses a hierarchical pool-based memory allocator. Instead of manually tracking every allocation, you allocate from a pool, and when the pool is destroyed, everything allocated from it is freed automatically.

// Instead of:
char *buf = malloc(1024);
// ... use buf ...
free(buf);  // Easy to forget!

// Apache uses:
char *buf = apr_palloc(pool, 1024);
// ... use buf ...
// Automatically freed when pool is destroyed

3. Modules

Everything in Apache is a module. Even core functionality like HTTP protocol handling is implemented as modules. A module is a struct that declares:

What hooks it wants to register callbacks for
What configuration directives it provides
What filters it implements

4. Hooks

Hooks are the extension points in Apache’s request processing. At various phases, Apache calls all modules that registered for that hook. For example:

ap_hook_handler() - Called to generate response content
ap_hook_access_checker() - Called to check access permissions
ap_hook_translate_name() - Called to map URL to filesystem

5. Filters and Bucket Brigades

All I/O in Apache flows through filters arranged in chains. Data is passed between filters as “bucket brigades” - linked lists of data chunks. This allows:

mod_ssl.c to transparently encrypt/decrypt
mod_deflate.c to compress responses
Custom modules to transform content

6. MPM (Multi-Processing Module)

The MPM controls how Apache handles concurrency:

prefork: One process per connection (safe, but heavy)
worker: Multiple threads per process
event: Async I/O with thread pool (most efficient)

Only one MPM is active at a time.

Source Code Organization

When you look at the Apache source tree, here’s what you’ll find:

httpd-2.4.x/
├── server/           # Core server code
│   ├── main.c        # Entry point
│   ├── config.c      # Configuration parsing
│   ├── core.c        # Core module
│   ├── request.c     # Request processing
│   ├── protocol.c    # HTTP protocol handling
│   └── ...
├── modules/          # All modules organized by category
│   ├── aaa/          # Authentication/Authorization
│   ├── filters/      # Content filters
│   ├── generators/   # Content generators (CGI, etc.)
│   ├── http/         # HTTP protocol modules
│   ├── loggers/      # Logging modules
│   ├── mappers/      # URL mapping modules
│   ├── proxy/        # Proxy functionality
│   ├── ssl/          # SSL/TLS support
│   └── ...
├── include/          # Public headers
│   ├── httpd.h       # Main definitions
│   ├── http_config.h # Configuration API
│   ├── http_core.h   # Core module API
│   ├── http_protocol.h
│   ├── http_request.h
│   ├── ap_*.h        # Various APIs
│   └── ...
├── srclib/           # Bundled libraries
│   ├── apr/          # Apache Portable Runtime
│   └── apr-util/     # APR utilities
├── os/               # OS-specific code
└── support/          # Helper utilities

Key Data Structures

Before reading Apache code, familiarize yourself with these fundamental structures:

server_rec - Server Configuration

Represents a virtual host. Contains all configuration for a server context.

struct server_rec {
    const char *defn_name;      // Config file where defined
    const char *server_hostname; // ServerName
    apr_port_t port;            // Port number
    /* ... many more fields ... */
};

conn_rec - Connection

Represents a client connection. Lives for the duration of a TCP connection (may serve multiple requests with keep-alive).

struct conn_rec {
    apr_pool_t *pool;           // Connection pool
    server_rec *base_server;    // Virtual host
    void *conn_config;          // Per-connection module configs
    apr_socket_t *client_socket; // The actual socket
    const char *client_ip;      // Client IP address
    /* ... */
};

request_rec - HTTP Request

The central structure. Contains everything about a single HTTP request/response.

struct request_rec {
    apr_pool_t *pool;           // Request pool (freed after response)
    conn_rec *connection;       // Parent connection
    server_rec *server;         // Server handling this request

    // Request info
    const char *the_request;    // First line of request
    char *method;               // GET, POST, etc.
    char *uri;                  // Request URI
    char *filename;             // Translated to filesystem path

    // Headers
    apr_table_t *headers_in;    // Request headers
    apr_table_t *headers_out;   // Response headers

    // Response info
    int status;                 // HTTP status code
    const char *content_type;   // Response Content-Type

    // Module configurations
    void *per_dir_config;       // Per-directory config vector
    void *request_config;       // Per-request module data

    /* ... many more fields ... */
};

The Request Lifecycle (Preview)

When a request arrives, Apache processes it through distinct phases:

Connection accepted - MPM accepts TCP connection
pre_connection hooks - Modules can set up connection-level state
Read request - HTTP request line and headers parsed
Post-read-request hooks - First chance to examine request
URI translation - Map URI to handler/filename
Access checking - IP-based access control
Authentication - Who is the user?
Authorization - Is user allowed?
MIME type checking - Determine content type
Fixups - Last chance to modify before handling
Handler - Generate response content
Logging - Record what happened
Cleanup - Free request resources

Each phase has associated hooks where modules can participate.

Building Apache from Source

For development and fuzzing, you’ll want to build Apache from source:

# In the httpd source directory
./configure --prefix=/path/to/install \
            --enable-modules=most \
            --enable-static-support \
            --with-included-apr

make
make install

Key configure options:

--enable-modules=most - Build most modules
--enable-static-support - Build modules statically (easier for fuzzing)
--with-included-apr - Use bundled APR instead of system

What’s Next

In the following chapters, we’ll dive deep into each component:

Chapter 2: APR - The foundation library
Chapter 3: Memory pools - Apache’s memory management
Chapter 4: Configuration system
Chapter 5: MPM - Process/thread models
Chapter 6: Hook system - Extending Apache
Chapter 7: Filters and bucket brigades
Chapter 8: Request processing pipeline
Chapter 9: Module anatomy - Writing your own
Chapter 10: Building and linking
Chapter 11: Fuzzing Apache

Each chapter builds on the previous, and by the end, you’ll understand Apache well enough to build a fuzzing harness that exercises the entire request processing pipeline.