# Harness Design
This document explains how the fuzzing harness works - the design decisions behind replacing Apache's network I/O layer with in-memory filters. For Apache internals background, see the [Apache Internals](../apache-internals/README.md) guide. For the fuzzing engine and protobuf integration, see [Fuzzing Engines](fuzzing-engines.md).
## Goal
Feed arbitrary bytes to Apache's full request processing pipeline - HTTP parsing, hook phases, module handlers, filter chains - without network I/O. The harness needs to be fast enough for coverage-guided fuzzing (thousands of executions per second) while running the exact same code paths as a production Apache server.
## Where the Harness Fits in Apache's Filter Stack
Apache processes all I/O through **filters** - composable functions arranged in a chain, each assigned a **type** that determines its position. Types range from {httpd}`AP_FTYPE_RESOURCE` (content generation, at the top) down to {httpd}`AP_FTYPE_NETWORK` (raw socket I/O, at the bottom). Data flows down through the chain for responses and up for requests. For a full explanation of filter types, bucket brigades, and how to write filters, see [Chapter 7: Filters and Bucket Brigades](../apache-internals/07-filters-buckets.md).
We only replace the bottom of the stack - where Apache would normally read from and write to a socket. Everything above that (SSL, HTTP protocol framing, content compression, module-specific processing) runs exactly as it would in production:
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 5, 'rankSpacing': 3, 'padding': 1, 'diagramPadding': 5}}}%%
flowchart TB
subgraph out["Output Filter Chain (response)"]
direction TB
_s(( )):::hidden ~~~ H
H["Handler output"] --> CS["mod_deflate, mod_include, ...
(AP_FTYPE_CONTENT_SET)"]
CS --> PR["HTTP headers, chunked encoding
(AP_FTYPE_PROTOCOL)"]
PR --> CN["mod_ssl
(AP_FTYPE_CONNECTION)"]
CN --> NW["FUZZ_OUTPUT
(AP_FTYPE_NETWORK - 1)"]
end
classDef hidden fill:none,stroke:none,color:none
style NW fill:#f96,stroke:#333,color:#333
```
`FUZZ_INPUT` is registered at {httpd}`AP_FTYPE_NETWORK` and `FUZZ_OUTPUT` at `AP_FTYPE_NETWORK - 1`. From the perspective of every other filter in the chain, the data source and sink look identical to a real socket.
## Architecture Overview
The following diagram shows how fuzz input flows from LibFuzzer through the proto converter, into Apache's request pipeline, and back:
```{image} /_static/images/fuzzer-architecture.drawio.svg
```
The harness does a few things that are non-obvious:
1. **Replaces network I/O** with custom filters and a bucket injection hook
2. **Discriminates connections** - fuzz client connections get in-memory I/O, proxy backend connections use real sockets
3. **Provides a fake MPM** so Apache thinks it's running inside the event MPM
4. **Manages the connection lifecycle** - creates a `conn_rec`, runs the pipeline, destroys the pool, repeat
All of this lives in `fuzz_common.c`. The proto harnesses (`.cc` files) and converters sit on top and just call `fuzz_one_input()` with raw HTTP bytes.
## Key Design Decisions
### 1. Filter Registration
During `fuzz_init()`, we register two filters and two hooks:
```c
fuzz_input_filter_handle =
ap_register_input_filter("FUZZ_INPUT", fuzz_input_filter, NULL, AP_FTYPE_NETWORK);
fuzz_output_filter_handle =
ap_register_output_filter("FUZZ_OUTPUT", fuzz_output_filter, NULL, AP_FTYPE_NETWORK - 1);
ap_hook_pre_connection(fuzz_pre_connection, NULL, NULL, APR_HOOK_REALLY_FIRST);
ap_hook_insert_network_bucket(fuzz_insert_network_bucket, NULL, NULL, APR_HOOK_FIRST);
```
We store the filter handles and add them to each connection in `pre_connection` - not at registration time. The `insert_network_bucket` hook is how we inject fuzz data into the bucket brigade instead of reading from a socket.
### 2. Connection Discrimination
Not all connections in the harness are fuzz targets. When fuzzing proxy modules (`mod_proxy_uwsgi`, etc.), Apache creates backend connections to the upstream server. We need those to use real socket I/O (or our backend mock), not the fuzz input buffer.
The solution: we tag fuzz client connections with a note in `fuzz_one_input()`:
```c
apr_table_setn(c->notes, "fuzz_client", "1");
```
Then both `fuzz_pre_connection` and `fuzz_insert_network_bucket` check for this note. If it's missing, they return `DECLINED` and let the normal socket path handle it.
### 3. Pre-Connection Hook
`fuzz_pre_connection` runs at `APR_HOOK_REALLY_FIRST` and does the actual I/O replacement for tagged connections:
```c
static int fuzz_pre_connection(conn_rec *c, void *csd)
{
if (!apr_table_get(c->notes, "fuzz_client"))
return DECLINED;
fuzz_net_rec *net = apr_pcalloc(c->pool, sizeof(*net));
net->c = c;
net->bb = apr_brigade_create(c->pool, c->bucket_alloc);
ap_set_core_module_config(c->conn_config, g_dummy_socket);
ap_add_input_filter_handle(fuzz_input_filter_handle, net, NULL, c);
ap_add_output_filter_handle(fuzz_output_filter_handle, NULL, NULL, c);
c->master = c;
return OK;
}
```
A few things to note:
- **`g_dummy_socket`** is created once in `fuzz_init()` and reused for every connection. `core_pre_connection` calls `apr_socket_opt_set(csd, ...)` which would crash on NULL, so we need a real (but unconnected) socket.
- **`c->master = c`** makes `core_pre_connection` think this is a secondary connection (like an HTTP/2 stream) and skip its own socket filter registration. Without this, core would try to read socket metadata from our dummy socket and fail. This trick is a cursed CTF tactic, but it works! :D
- **Returns `OK`**, not `DONE`. This is important - returning `DONE` would stop the hook chain and prevent other modules (mod_remoteip, mod_logio, etc.) from running their `pre_connection` hooks.
### 4. The `fuzz_one_input()` Lifecycle
This is the function that proto converters call after building the raw HTTP bytes. Each call creates a fresh connection, runs it through Apache, and tears it down:
```c
int fuzz_one_input(const char *data, size_t size)
{
// Set global input buffer (read by the input filter)
g_input_data = (char *)data;
g_input_size = size;
// Create a transaction pool (destroyed after this request)
apr_pool_create(&ptrans, g_pconf);
// Build a conn_rec with fake loopback addresses
c = apr_pcalloc(ptrans, sizeof(*c));
c->local_addr = create_fake_sockaddr(ptrans, "127.0.0.1", 80);
c->client_addr = create_fake_sockaddr(ptrans, "127.0.0.1", 12345);
// Tag as fuzz client so our hooks intercept it
apr_table_setn(c->notes, "fuzz_client", "1");
// Run the full Apache pipeline
ap_process_connection(c, g_dummy_socket);
// Cleanup
apr_pool_destroy(ptrans);
g_input_data = NULL;
return 0;
}
```
The transaction pool (`ptrans`) is key to performance - destroying it frees all memory allocated during the request in one shot, including bucket allocators, filter contexts, and request data. No individual `free()` calls needed.
(harness-input-filter)=
### 5. Input Filter: Handling Apache's Read Modes
The input filter is the most complex part because Apache's HTTP parser uses multiple read modes. The parser calls {httpd}`ap_get_brigade` with different {httpd}`ap_input_mode_t` flags depending on what it's reading:
```mermaid
flowchart TD
Parser["Apache HTTP Parser"] -->|"AP_MODE_GETLINE
(read headers line-by-line)"| GetLine
Parser -->|"AP_MODE_READBYTES
(read N bytes of body)"| ReadBytes
Parser -->|"AP_MODE_SPECULATIVE
(peek without consuming)"| Speculative
Parser -->|"AP_MODE_EXHAUSTIVE
(drain everything)"| Exhaustive
GetLine["apr_brigade_split_line()
split at newline boundaries"]
ReadBytes["apr_brigade_partition()
move exactly N bytes"]
Speculative["Copy buckets without
removing from internal brigade"]
Exhaustive["Concat entire brigade"]
```
`AP_MODE_GETLINE` is the tricky one - Apache reads headers one line at a time by requesting data up to the next `\n`. If the input filter returns the entire buffer at once, the parser fails with "Invalid whitespace in request" errors. We use `apr_brigade_split_line` to split correctly at line boundaries.
On first read, the filter populates its internal brigade from the global `g_input_data` buffer and appends an EOS bucket. Subsequent reads consume from this internal brigade until it's empty.
(harness-output-filter)=
### 6. Output Filter
The output filter iterates over the response bucket brigade. In `LIBFUZZER` mode, output is discarded (we're looking for crashes, not checking responses). In non-libfuzzer builds, it writes to stdout for debugging:
```c
rv = apr_bucket_read(b, &data, &len, APR_BLOCK_READ);
#if !defined(LIBFUZZER)
if (rv == APR_SUCCESS && len > 0)
fwrite(data, 1, len, stdout);
#endif
```
### 7. Backend Mocking (fuzz_backend.c)
For harnesses that fuzz proxy modules (like `mod_fuzzy_proto_uwsgi`), we need to mock the backend server response. `fuzz_backend.c` provides this - it hooks `pre_connection` for non-fuzz-client connections (the proxy backend side) and serves a pre-prepared response buffer instead of connecting to a real upstream.
The harness enables this by setting `fuzz_extra_hooks` to register the backend hooks:
```c
fuzz_extra_hooks = apatchy_register_backend_hooks;
```
### 8. Coverage-Safe Exit
`fuzz_exit()` handles a subtle problem: LLVM coverage data (`.profraw` files) is normally written via `atexit` handlers, but we use `_exit()` instead of `exit()` to avoid deadlocking on `mod_watchdog` threads that Apache spawns. So we manually call `__llvm_profile_write_file()` before `_exit()`:
```c
void fuzz_exit(int status)
{
fflush(stdout);
if (__llvm_profile_write_file)
__llvm_profile_write_file();
_exit(status);
}
```
The `__llvm_profile_write_file` symbol is a weak reference - it resolves to the real function in coverage builds and stays NULL otherwise.