Who's Online

We have 1 guest online

In-memory fuzzing with DynamoRIO

I’m currently playing around with the DynamoRIO binary instrumentation framework. It is similar in concept to Valgrind but with a few crucial differences. First of all it doesn’t really use an intermediate representation like VEX/Ucode. Instead you work on the actual decompiled instructions with a whole array of convenience functions to access opcodes, operands, jump targets etc. The downside is that you lose the abstractions provided by Valgrind, so instead of having to just deal with Put/Get you have to consider all instructions that can be abstracted to these two VEX operators. The upside is that you don’t suffer the incredible slowdown that Valgrind inevitably brings and it is cross platform. Also the convenience functions like instr_is_call(), instr_is_mov() etc. means your analysis code isn’t that much more verbose than the Valgrind equivalent.

(You can get DynamoRIO from here)

This post is primarily going to be about using the syscall hooks provided by DynamoRIO. The first idea to pop into my head when i saw these was ‘fuzzer’ so I proceeded to build one. It turns out to be incredibly simple and the final code can be found in the verification wiki.

The first action we have to take is to register our handler functions. DynamoRIO provides hooks for a number of events including handling syscalls, basic blocks, signals and thread creation/destruction. The following code registers our syscall handlers:

dr_register_filter_syscall_event(event_filter_syscall);
dr_register_pre_syscall_event(event_pre_syscall);
dr_register_post_syscall_event(event_post_syscall);

the function signatures of which are:

static bool event_filter_syscall(void *drcontext, int sysnum);
static bool event_pre_syscall(void *drcontext, int sysnum);
static void event_post_syscall(void *drcontext, int sysnum);

The event_filter_syscall hook is used to filter out syscalls we’re not interested in. If it returns true then the syscall is passed on to the event_pre_syscall. You’ll notice that in event_pre_syscall there is also a check on the syscall number. This is because the filter is not guaranteed to filter out all syscalls we check for. It may not be possible for DynamoRIO to determine in advance what the syscall number is and in that case the filter will be skipped. The interesting part of event_filter_syscall is the following:

read_fd = dr_syscall_get_param(drcontext, 0);
read_buf = (void*)dr_syscall_get_param(drcontext, 1);
read_count = dr_syscall_get_param(drcontext, 2);

DynamoRIO provides a number of functions for working with syscalls. These can be found in dr_tools.h. We can get and set syscall paramaters, get/set the result of syscalls and also invoke other syscalls. The comments are fairly detailed in the code so I won’t explain any further. DynamoRIO has no reliable way of telling how many paramaters are passed to each syscall so you’ll have to check the relevant man pages for that information. The final point of interest in the pre syscall handler is the return statement. Returning false at this point would mean the syscall isn’t actually called. Using this you could emulate failing system calls and other possible sources of error/interest. In this case we return true having harvested the information we need (read_buf etc)

The actual fuzzing takes place in the event_post_syscall function. Here we get the result of the syscall with dr_syscall_get_result which tells us how much data has been read. The fuzzing code itself is fairly uninteresting as it simply flips bytes using /dev/urandom. In the midst of this code we can see the use of dr_syscall_set_result after we have decided to change the number of bytes in the buffer pointed to by read_buf

As mentioned in the code comments, modifying this to a useful fuzzer would require some extra work. Primarily it would be necessary to record the fuzz data used. The easiest way to do this would probably be to record the fuzz buffer on every syscall and then use the signal handler event_signal to write out the buffer to a file after a crash. Another consideration is that we really only want to fuzz certain read syscalls because only some file descriptors will be under our control and thus exploitable. To do this we would need a map of file desciptors to file names. This is relatively simple as we can also instrument the open/close syscalls and build our map that way.

MS release tool for determining exploitability, Slashdot gets it completely wrong

MS released an open source tool called !exploitable at CanSecWest. You can get the tool from here and the conference slides from here.

The point of the tool is that given a bundle of program crashes it will group them into those that result from unique vulnerabilities and it will also rate them in terms of how ‘exploitable’ they are. It is not a tool for finding exploits, as this Slashdot article title might have you believe.

Confusion aside this has the potential to be an incredibly useful tool for helping people maximise the output of bug hunting sessions.

Meet the Practical Software Verification wiki

There seems to be quite a lot of interest in the topic of software verification tools these days in the hacking/security industries so I’ve created a wiki that will hopefully become a repository for all the useful research on the topic out there. The aim is to collect material that can support the building of tools that find potentially dangerous bugs.

Building tools based on verification theory can be a bit more difficult than your average fuzzer but you gain a number of benefits including guarantees on the completeness of your test, truly directed searching and potentially much higher code coverage. Some of the best tools actually combine static analysis and fuzzing to get the best of both worlds by using static analysis to direct the fuzzer and then using the fuzzer to find the bugs. The techniques used can also be extended to deal with a variety of other problems including exploit generation, intrusion detection and…erm… solving sudoko puzzles.

Anyways, the wiki is a little content bare at the moment so I encourage people to contribute as much as possible. There’s also an IRC channel over on irc.smashthestack.org #formal.

Check it out! http://www.unprotectedhex.com/psv

(Thanks to adc for repeatedly reminding me to set up the wiki and for a number of content contributions)

OSCON talk accepted

I heard back today from OSCON who informed me that my talk has been accepted. (So did a lot of other people apparently). I’m speaking on the 23rd of July at 13:45 and the title is “If you don’t own it, we will: Find vulnerabilities in your code with fuzzing“.

I’m hoping to give a pretty general overview of automated vulnerability detection techniques, with a focus on fuzzing, followed by an extensive tutorial type talk on finding vulnerabilities in real world programs. My aim is to give attendees enough background information and practical demonstrations that they can start effectively testing their code for security vulnerabilities. The talk won’t contain any new security research but the aim is to distill all the information on fuzzing from the security world into a practical, hands-on talk that will be useful for a developer audience.

Should be fun!

DEFCON 16 video

In the time since I posted the links to the audio and slides a version of the video recording synced with the slides has gone up. Bathe in the reflected glow of my pastey white features.

When exploit generation becomes vulnerability detection

As I begin to think more about the topic of automatically generating exploits, the problem seems to fall into neat and separate sub-problems. The main dividing line between my thoughts at the moment splits the problem into two distinct camps. I will get to these but first I need to give some background on how my thought process developed to this point.

Consider an input to a program that results in a crash. There are three concepts that are in play here. One is the concept of a vulnerability, it is some design flaw or error by a programmer that can be triggered by a certain input. To be able to talk about distinct vulnerabilities I will use the concept of a vulnerability point. This vulnerability point is unique per vulnerability and is the point where execution is hijacked e.g. a ret instruction that puts a compromised EIP into the EIP register. If two different instructions in a program can result in an execution hijack then I consider them to be separate vulnerabilities. The other concept, is that of a vulnerability trace. This is a trace through the program, it could be a sequence of instructions or basic blocks, that leads to the vulnerability point. Each vulnerability point could be reached via potentially infinite (in the presence of loops) sequences of states and therefore there may be many vulnerability traces associated with a single vulnerability point.

All this is necessary when you begin to consider what the possible outcomes could be when you attempt to produce an input that exploits a particular vulnerability.

I will assume that we have discovered a vulnerability and that we have followed a specific path through the program to the vulnerability point i.e. we have a valid trace through the program. I will also assume that we have some input buffer to the program that resulted in this trace. It may be the output of a fuzzing session or something generated by a symbolic analysis tool.

There are three possibilities now:

- The vulnerability may not be exploitable (e.g. many null pointer dereferences)
- The vulnerability may be exploitable via the trace we have discovered (e.g a nice stack smash where we can craft our input data to do everything required to avoid ASLR and co. and execute our shellcode and still take the same path through the program)
- The vulnerability may be exploitable but some deviation is required from the trace we have discovered that eventually comes back to the vulnerability point (e.g in the case of a single byte overwrite it may be possible to take a different path and overwrite more data)

It becomes clear that the latter two problems are really entirely separate issues. In the second case, we can use a variety of techniques from software verification (that I’ll go into in a later post) to gather constraints and solve them for a satisfying input that reaches the vulnerability point and then hijacks execution. There are some details I’m glossing over here, such as how to describe the extra constraints required for control hijacking, but we can ignore that for now. This case is what is handled by existing commercial automatic exploit generation systems. They proceed by identifying what bytes from the input end up overwriting the control structures and then replace them with their desired redirect address. This is very much hit-and-hope and highly error prone.

Even in this situation we can do far better than such tools. Firstly, if an output is generated for an exploit it is ‘guaranteed’ to work (assuming there is no non-determinism in the code or bugs in the exploit generation software, both of which will occur but not often enough to cause concern) as by definition it meets all constraints along that trace as well as the constraints required to hijack execution. Secondly, any possible transformations performed on the input data are accounted for. So, if your shellcode has a sequence of bit transformations done to it then a constraint based tool can output the required input to give your shellcode *after* these transformations which is basically impossible for any tool not tracking the effects of every instruction. Thirdly, we can solve for multiple return addresses and shellcodes (and any other data restrictions you care to specify) all at the one time.

This brings us to the third case, for which two possible solutions occur to me. The first is the easiest, we simply ignore the vulnerability and return no exploit. Arguably, creating an exploit in this case is outside the bounds of of the current problem domain. The tool is designed to generate exploits in the case where an exploit is possible for the combined trace and vulnerability point. If this is the definition we use then that combination of vulnerability point and trace is not in fact exploitable and it is correct not to return an exploit.

A practical motivation of this attitude is that an exploit creation tool should not also be a vulnerability detection tool. Bundling the two together is mixing goals and will probably result in compromises in both tools. If I trust my vulnerability detection tool then throwing out this vulnerability point/trace is perfectly acceptable as if an exploitable trace for this vulnerability point does indeed exist, the vulnerability detection tool will eventually provide it and thus no exploits will be missed. The problem is, for the most part you can’t trust vulnerability detection tools. Take fuzzing for example, you might hit the same vulnerability point again under different conditions but then again, you might not and you’ve no meaningful way to offer any sort of guarantees. Unless you’re using a vulnerability detection tool that offers some form of soundness/completeness guarantees then ignoring cases that aren’t directly exploitable could mean false negatives and missed opportunities.

Basically, this comes down to a decision where you need to weigh how complete a solution you want versus how much time you have. The ideal solution to this problem would be a tool that creates an exploit for a vulnerability point regardless of the initial trace it is given. To achieve this though we would need to basically implement a fully functional vulnerability detection toolkit to supplement the exploit generation toolkit when it fails. The other end of the scale is described in the previous paragraphs, where we are happy to be guided by our vulnerability detection engine. Given the current time constraints I’m working under I intend to aim for somewhere in between. While I won’t have the time to build a separate verification based vulnerability detection engine I do intend to use a couple of different tricks to maximise the chance of generating an exploit when using the initial trace fails.

I’ll go over these tricks in a later post because I think this one has dragged on long enough by now ;)

The Irish accent does not lend itself to being taken seriously….

The audio of all the talks from DEFCON 16 are now online. Amid the gravitas of the well spoken British and the laid back drawl of the Americans, is a speech on VoIP security that sounds like it should be delivered over a pint and a packet crisps in a smokey pub. That speech is ‘VoIPER: Smashing the VoIP stack while you sleep’ and the presenter is yours truly.

Audio
Slides

I’ve uploaded the slides because the version on the DEFCON website is different to the actual slides I used during the talk (despite my requests for them to be updated :P )