In-memory fuzzing with DynamoRIO
I’m currently playing around with the DynamoRIO binary instrumentation framework. It is similar in concept to Valgrind but with a few crucial differences. First of all it doesn’t really use an intermediate representation like VEX/Ucode. Instead you work on the actual decompiled instructions with a whole array of convenience functions to access opcodes, operands, jump targets etc. The downside is that you lose the abstractions provided by Valgrind, so instead of having to just deal with Put/Get you have to consider all instructions that can be abstracted to these two VEX operators. The upside is that you don’t suffer the incredible slowdown that Valgrind inevitably brings and it is cross platform. Also the convenience functions like instr_is_call(), instr_is_mov() etc. means your analysis code isn’t that much more verbose than the Valgrind equivalent.
(You can get DynamoRIO from here)
This post is primarily going to be about using the syscall hooks provided by DynamoRIO. The first idea to pop into my head when i saw these was ‘fuzzer’ so I proceeded to build one. It turns out to be incredibly simple and the final code can be found in the verification wiki.
The first action we have to take is to register our handler functions. DynamoRIO provides hooks for a number of events including handling syscalls, basic blocks, signals and thread creation/destruction. The following code registers our syscall handlers:
dr_register_filter_syscall_event(event_filter_syscall); dr_register_pre_syscall_event(event_pre_syscall); dr_register_post_syscall_event(event_post_syscall);
the function signatures of which are:
static bool event_filter_syscall(void *drcontext, int sysnum); static bool event_pre_syscall(void *drcontext, int sysnum); static void event_post_syscall(void *drcontext, int sysnum);
The event_filter_syscall hook is used to filter out syscalls we’re not interested in. If it returns true then the syscall is passed on to the event_pre_syscall. You’ll notice that in event_pre_syscall there is also a check on the syscall number. This is because the filter is not guaranteed to filter out all syscalls we check for. It may not be possible for DynamoRIO to determine in advance what the syscall number is and in that case the filter will be skipped. The interesting part of event_filter_syscall is the following:
read_fd = dr_syscall_get_param(drcontext, 0); read_buf = (void*)dr_syscall_get_param(drcontext, 1); read_count = dr_syscall_get_param(drcontext, 2);
DynamoRIO provides a number of functions for working with syscalls. These can be found in dr_tools.h. We can get and set syscall paramaters, get/set the result of syscalls and also invoke other syscalls. The comments are fairly detailed in the code so I won’t explain any further. DynamoRIO has no reliable way of telling how many paramaters are passed to each syscall so you’ll have to check the relevant man pages for that information. The final point of interest in the pre syscall handler is the return statement. Returning false at this point would mean the syscall isn’t actually called. Using this you could emulate failing system calls and other possible sources of error/interest. In this case we return true having harvested the information we need (read_buf etc)
The actual fuzzing takes place in the event_post_syscall function. Here we get the result of the syscall with dr_syscall_get_result which tells us how much data has been read. The fuzzing code itself is fairly uninteresting as it simply flips bytes using /dev/urandom. In the midst of this code we can see the use of dr_syscall_set_result after we have decided to change the number of bytes in the buffer pointed to by read_buf
As mentioned in the code comments, modifying this to a useful fuzzer would require some extra work. Primarily it would be necessary to record the fuzz data used. The easiest way to do this would probably be to record the fuzz buffer on every syscall and then use the signal handler event_signal to write out the buffer to a file after a crash. Another consideration is that we really only want to fuzz certain read syscalls because only some file descriptors will be under our control and thus exploitable. To do this we would need a map of file desciptors to file names. This is relatively simple as we can also instrument the open/close syscalls and build our map that way.