Menu

Using Symbolic Execution To Improve Modern Fuzzing

published 2020-04-28, written by Christian Hartlage

How Symbolic Execution Complements Modern Fuzzing

What is symbolic execution? 

Symbolic execution is a software testing technique that substitutes the normal inputs into a program (e.g. numbers)  through symbolic values (formulae) during the program execution. When program execution branches based on a symbolic value, the system follows both branches (paths) and maintains a path condition for each. Once a path terminates due to a bug, a test case can be generated by solving the current path condition for concrete values.

 

Diagram 1: Symbolic Execution Tree and Input Objects (Source: researchgate.net)

 

Symbolic Execution vs Fuzzing

As opposed to traditional (“dumb”) fuzzers, which generate inputs without taking code structure into account, symbolic execution tools precisely capture the computation of each value. They use solvers at each branch to generate new inputs and thus to provide the precisely calculated input to cover all parts of code.

Though symbolic execution, in theory, can find inputs for any feasible path, it is still rather slow compared to fuzzing and requires a lot of work to set up. There have been attempts to combine fuzzing and symbolic execution, for example, in a tool called Driller. 

Symbolic execution can be visually represented in the form of a flow graph that identifies the decision points associated with each flow.

 

Symbolic Execution Tools

KLEE

KLEE is an open-source code testing instrument that runs on LLVM bitcode, a representation of the program created by the clang compiler. KLEE explores the program and generates test cases to reproduce any crashes it finds. Among the  advantages of KLEE are:

  • Achieving a high code coverage, especially compared to the manual testing effort (on average, more than 90% of code lines were covered during KLEE execution - Source: Stanford.edu).
  • Ability to work directly on the compiled version of the code (mostly no further changes on the code were required to conduct testing).
  • High effectiveness in finding errors and inconsistencies even in heavily-tested code.

However, in practice, this tool also has some limitations:

  • High computational power requirements (common for all symbolic execution tools)
  • Need to include all source code (white-box-testing)
  • Need to use a particular clang version for compatibility reasons
  • Some instructions may be unsupported (e.g. the system may not know the formulae for all computations)

 

Driller

Driller is a concolic execution tool. Concolic execution is a software testing technique that performs symbolic execution (using symbolic input values with sets of expressions, one expression per output variable) with concrete execution (testing on particular inputs) path. The advantage of this approach is that it can achieve high code coverage even in case of complex source code but still maintain a high degree of scalability and speed.

Driller uses selective concolic execution to explore only the paths that are found interesting by the fuzzer and to generate inputs for conditions (branches) that a fuzzer cannot satisfy. In other words, it leverages concolic execution to reach deeper program code but uses a feedback-driven/guided fuzzer to alleviate path explosion, which greatly increases the speed of the testing process. 

Diagram 2: Use Case for Symbolic Execution?

Outlook 

Although Driller marked significant research advances in the field of symbolic execution, it is still a highly specialized tool that requires expert knowledge to set up and run and at the same time uses up a lot of computational resources. So, how can industry profit from the latest research?

Code Intelligence GmbH has recently developed a software solution (CI Fuzz) that combines modern fuzzing technology with concolic execution. The solution does not just generate random inputs but uses machine learning to provide inputs that reach deeper into the code structure. Concolic execution is applied additionally in cases when the fuzzing algorithm reaches its limits.

This allows both to speed up the search for vulnerabilities in the source code and at the same time achieve high code coverage (i.e the share of the source code executed during tests). The biggest advantage of CI Fuzz is the easy and intuitive set-up of the testing software. Security tests based on fuzzing and concolic execution can thus be managed by developers without previous fuzzing or concolic execution experience.

 

What do you expect from the rise of symbolic code execution? Drop a comment!