Loop Detection
Peter Silberman
Chapter 4
A Different Approach to Loop Detection
The
reader has seen how to detect dominators within a CFG and how to use that as a
component to find natural loops. The previous chapter described why natural
loop detection was flawed when trying to detect irreducible loops. For binary
auditing, the tool will need to be able to pick up all loops and then let the
user deduce whether or not the loops are interesting. This chapter will
introduce the loop algorithm used in the IDA plug-in to detect loops.
To come
up with an algorithm that was robust enough to detect both loops in the
irreducible and reducible loop categories, the author decided to modify the
previous definition of a natural loop. The new definition reads ”a loop can
have multiple entry points and at least one link that creates a cycle.” This
definition avoids the use of dominators to detect loops in the CFG.
The way
this alternative algorithm works is by first making a call to the is
reference to(ea t to, ea t ref) function. The function is
reference to will determine if there is a reference from the ea t specified by ref to the
parameter to. This check within the loop detection algorithm determines if there is
a backedge or link
that
would complete a loop. The reason this check is done first is for speed.
If there
is no reference that would complete a loop then there is no reason to call is path
to, thus preventing unnecessary calculations. However, if there is a link
or backedge, a call to the overloaded function is path to(ea t from, ea
t to) is used to determine if the nodes that are being examined can even
reach
each other. The is path to function simulates all possible
code execution conditions by following all possible edges to determine if the
flow of execution could ever reach parameter to when
starting at parameter from. The function is path
to(ea t from, ea t to) returns one (true) if
there is indeed a path going from from to to. With
both of these functions returning one, it can be deduced that these nodes are
involved in the loop.
4.1 Problems with new
approach
In every
algorithm there can exists small problems, that make the algorithm far from
optimal. This problem applies to the new approach presented above.
The
algorithm presented above has not been optimized for performance. The algorithm
runs in a time of O(Nˆ2), which carries quite a load if there are more than 600
or so nodes.
The
reason that the algorithm is so time consuming is that instead of implementing a
Breadth First Search (BFS), a Depth First Search (DFS) was implemented, in the is path
to function which computes all possible paths to and from a given node.
Depth First Search is much more expensive than Breadth First Search, and
because of that the algorithm may in some rare cases suffer.
If the
reader is interested in how to implement a more efficient algorithm for finding
the dominators, the reader should check out Compiler Design & Implementation
by Steven S. Muchnick.
It
should be noted that in future of this plug-in there will be optimizations made
to the code. The optimizations will specifically deal new implementations of a Breadth
First Search instead of the Depth First Search, as well as other small optimizations.
Tidak ada komentar:
Posting Komentar