Loop Detection
Peter Silberman
Chapter 3
Algorithms
Used to Detect Loops
A lot of research has been done on the subject of loop detection.
The research, however, was not done for the purpose of finding and exploiting
vulnerabilities that exist inside of loops. Most research has been done with an
interest in recognizing and optimizing loops (A
good article about loop optimization and compiler optimization is http://www.cs.princeton.edu/courses/archive/spring03/cs320/notes/loops.pdf.
Research on the optimization of loops has led scientists to
classify various types of loops. There are two distinct categories to which any
loop will belong. Either the loop will be an irreducible loop
(Irreducible loops are defined as ”loops with multiple entry
[points]” (http://portal.acm.org/citation.cfm?id=236114.236115)
or a reducible loop (3Reducible
loops are defined as ”loops with one entry [point]” (http://portal.acm.org/citation.cfm?id=236114.236115). Given that there are two different distinct categories, it
stands to reason that the two types of loops are detected in different
fashions. Two popular papers on loop detection are Interval Finding Algorithm[1] and Identifying Loops Using DJ Graphs[2]. This document will
cover the most widely accepted theory on loop detection.
3.1 Natural Loop Detection
One of the most well known algorithms for loop detection is
demonstrated in the book Compilers Principles, Techniques, and
Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman. In this
algorithm, the authors use a technique that consists of two components
to find natural loops (A natural loop ”Has a single entry
point. The header dominates all nodes in the loop.” (http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s03/public/lectures/L7
handouts.pdf all loops are not natural loops).
The first
component of natural loop detection is to build a dominator tree out of the
control flow graph (CFG). A dominator can be found when all paths to a given
node have to go through another node. A control flow graph is essentially a map
of code execution with directional information. The algorithm in the
book
calls for the finding of all the dominators in a CFG. Let’s look at the actual
algorithm.
Starting
from the entry node, the algorithm needs to check if there is a path to the
slave from the entry node. This path has to avoid the master node.
If it is
possible to get to the slave node without touching the master node, it can be
determined that the master node does not dominate the slave node.
If it is
not possible to get to the slave node, it is determined that the master node
does dominate the slave. To implement this routine the user would call the is path
to(ea t from, ea t to, ea t avoid) function included in loop
detection.cpp. This function will essentially check to see if there is a path from
the parameter from that can get to the parameter to, and
will avoid the node specified in avoid.
As the
reader can see from Figure 1, there is a loop in this CFG. Let B to C to D be
the path of nodes that create a loop, it will be represented as B->C->D. There
is also another loop from nodes B->D. Using
the algorithm described above it is possible to verify which of these nodes is
involved in the natural loop. The first question to ask is if the flow of the
program can get from A to D while avoidingB. As the reader can see, it is
impossible in this case to get to D avoiding B.
As such,
a call to the is path to function will tell the user that
B Dominates D. This can be represented as B Dom D, and B Dom C. This is
due to the fact that there is no way to reach C or D without going through B.
One question that might be asked is how exactly does this demonstrate a loop?
The answer is that, in fact, it doesn’t. The second component of the natural
loop detection checks to see if there is a link, or backedge, from D to B that
would allow the flow of the program to return to node B to complete the loop.
In the case of B->D there exists a backedge that does
complete the loop.
3.2 Problems with
Natural Loop Detection
There is
a very big problem with natural loops. The problem is with the natural loop
definition which is “a single entry point whose header dominates all the nodes
in the loop”. Natural loop detection does not deal with irreducible loops, as
defined previously.
As the
reader can see both B and D are entry points into C. Also neither D nor B dominates
C. This throws a huge wrench into the algorithm and makes it only able to pick
up loops that fall under the specification of a natural loop or reducible loop (5It is
important to note that it is next that it is next to impossible to reproduce ??F:fig2) without
using nested for loops and goto’s statements. For this reason it is rare that
the reader will see an example of this in a binary. However, it is possible
therefore the author thought it important to mention).
Tidak ada komentar:
Posting Komentar