Selasa, 23 Juli 2013

Loop Detection - Chapter 4 A Different Approach to Loop Detection




Loop Detection
Peter Silberman
 


Chapter 4
A Different Approach to Loop Detection
The reader has seen how to detect dominators within a CFG and how to use that as a component to find natural loops. The previous chapter described why natural loop detection was flawed when trying to detect irreducible loops. For binary auditing, the tool will need to be able to pick up all loops and then let the user deduce whether or not the loops are interesting. This chapter will introduce the loop algorithm used in the IDA plug-in to detect loops.

To come up with an algorithm that was robust enough to detect both loops in the irreducible and reducible loop categories, the author decided to modify the previous definition of a natural loop. The new definition reads ”a loop can have multiple entry points and at least one link that creates a cycle.” This definition avoids the use of dominators to detect loops in the CFG.

The way this alternative algorithm works is by first making a call to the is reference to(ea t to, ea t ref) function. The function is reference to will determine if there is a reference from the ea t specified by ref to the parameter to. This check within the loop detection algorithm determines if there is a backedge or link
that would complete a loop. The reason this check is done first is for speed.

If there is no reference that would complete a loop then there is no reason to call is path to, thus preventing unnecessary calculations. However, if there is a link or backedge, a call to the overloaded function is path to(ea t from, ea t to) is used to determine if the nodes that are being examined can even
reach each other. The is path to function simulates all possible code execution conditions by following all possible edges to determine if the flow of execution could ever reach parameter to when starting at parameter from. The function is path to(ea t from, ea t to) returns one (true) if there is indeed a path going from from to to. With both of these functions returning one, it can be deduced that these nodes are involved in the loop.

4.1 Problems with new approach
In every algorithm there can exists small problems, that make the algorithm far from optimal. This problem applies to the new approach presented above.
The algorithm presented above has not been optimized for performance. The algorithm runs in a time of O(Nˆ2), which carries quite a load if there are more than 600 or so nodes.
The reason that the algorithm is so time consuming is that instead of implementing a Breadth First Search (BFS), a Depth First Search (DFS) was implemented, in the is path to function which computes all possible paths to and from a given node. Depth First Search is much more expensive than Breadth First Search, and because of that the algorithm may in some rare cases suffer.
If the reader is interested in how to implement a more efficient algorithm for finding the dominators, the reader should check out Compiler Design & Implementation by Steven S. Muchnick.
It should be noted that in future of this plug-in there will be optimizations made to the code. The optimizations will specifically deal new implementations of a Breadth First Search instead of the Depth First Search, as well as other small optimizations.

Tidak ada komentar:

Posting Komentar