Minggu, 08 September 2013

Loop Detection - C5-6 Loop Detection Using IDA Plug-ins




Loop Detection
Peter Silberman
 


Chapter 5
Loop Detection Using IDA Plug-ins
In every algorithm and theory there exists small problems. It is important to understand the algorithm presented The plug-in described in this document uses the Function Analyzer Class (function analyzer) that was developed by Pedram Amini (http://labs.idefense.com) as the base class. The Loop Detection (loop detection) class uses inheritance to glean its attributes from Function Analyzer. The reason inheritance is used is primarily for ease of development. Inheritance is also used so that instead of having to re-add functions to a new version of Function Analyzer, the user only has to replace the old file. The final reason inheritance is used is for code conformity, which is accomplished by creating virtual functions.

These virtual functions allow the user to override methods that are implemented in the Function Analyzer. This means that if a user understands the structure of function analyzer, they should not have a hard time understanding loop detections structure.

5.1 Plug-in Usage
To best utilize this plug-in the user needs to understand its features and capabilities.
When a user runs the plug-in they will be prompted with a window that is shown in figure 5.1. Each of the options shown in figure 5.1 are described individually.

1. Graph Loop
This feature will visualize the loops, marking the entry of a loop with green border, the exit of a loop with a red border and a loop node with a yellow border.

2. Highlight Function Calls
This option allows the user to highlight the background of any function call made within the loop. The highlighting is done within IDA View.

3. Output Stack Information
This is a feature that is only enabled with the graph loop option. When this option is enabled the graph will contain information about the stack of the function including the variables name, whether or not it is an argument, and the size of the variable. This option is a great feature for static auditing.

4. Highlight Code
This option is very similar to Highlight Function except instead of just highlighting function calls within loops it will highlight all the code that is executed within the loops. This makes it easier to read the loops in IDA View

5. Verbose Output
This feature allows the user to see how the program is working and will give more information about what the plug-in is doing.

6. Auto Commenting
This option adds comments to loops nodes, such as where the loop begins, where it exits, and other useful information so that the user doesn’t have to continually look at the graph.

7. All Loops Highlighting of Functions
This feature will find every loop within the IDA database. It will then highlight any call to any function within a loop. The highlighting is done within the IDA View making navigation of code easier.

8. All Loops Highlighting of Code
This option will find every loop within the database. It will then highlight all segments of code involved in a loop. The highlighting of code will allow for easier navigation of code within the IDA View.

9. Natural Loops
This detection feature allows the user to only see natural loops. It may not pick up all loops but is an educational implementation of the previously discussed algorithm.

10. Recursive Function Calls
This detection option will allow the user to see where recursive function calls are located.

5.2 Known Issues
There a couple of known issues with this plug-in. It does not deal with rep* instructions, nor does it deal with mov** instructions that might result in copied buffers. Future versions will deal with these instructions, but since it is opensourced the user can make changes as they see fit. Another issue is that of “no-interest”. By this the author means detecting loops that aren’t of interest or don’t pose a security risk. These loops, for example, may be just counting loops that don’t write memory. Halvar Flake describes this topic in his talk that was given at Blackhat Windows 2004[3]. Feel free to read his paper and make changes accordingly. The author will also update the plug-in with these options at a later date.

5.3 Case Study: Zone Alarm
For a case study the author chose Zone Alarm’s vsdatant.sys driver. This driver does a lot of the dirty work for Zone Alarm such as packet filtering, application monitoring, and other kernel level duties. Some may wonder why it would be worthwhile to find loops in a driver. In Zone Alarm’s case, the user can hope to find miscalculations in lengths where they didn’t convert a signed to unsigned value properly and therefore may cause an overflow when looping. Anytime an application takes data in remotely that may be type-casted at some point, there is always a great chance for loops that overflow their bounds.

When analyzing the Zone Alarm driver the user needs to select certain options to get a better idea of what is going on with loops. First, the user should select verbose output and All Loops Highlighting of Functions to see if there are any dangerous function calls within the loop.

After running through the loop detection phase, some interesting results are found that are shown in figure 5.3.

Visiting the address 0x00011a21 in IDA shows the loop. To begin, the reader will need to find the loop’s entry point, which is at:

.text:00011A1E jz short loc_11A27

At the loop’s entry point, the reader will notice:


.text:00011A27 push 206B6444h ; Tag
.text:00011A2C push edi ; NumberOfBytes
.text:00011A2D push 1 ; PoolType
.text:00011A2F call ebp ;ExAllocatePoolWithTag

At this point, the reader can see that every time the loop passes through its entry point it will allocate memory. To determine if the attacker can cause a double free error, further investigation is needed.

.text:00011A31 mov esi, eax
.text:00011A33 test esi, esi
.text:00011A35 jz short loc_11A8F

If the memory allocation within the loop fails, the loop terminates correctly. The next call in the loop is to ZwQuerySystemInformation which tries to acquire the SystemProcessAndThreadsInformation.

.text:00011A46 mov eax, [esp+14h+var_4]
.text:00011A4A add edi, edi
.text:00011A4C inc eax
.text:00011A4D cmp eax, 0Fh
.text:00011A50 mov [esp+14h+var_4], eax
.text:00011A54 jl short loc_11A1C

This part of the loop is quite un-interesting. In this segment the code increments a counter in eax until eax is greater than 15. It is obvious that it is not possible to cause a double free error in this case because the user has no control over the loop condition or data within the loop. This ends the investigation into a possible double free error.

Above is a good example of how to analyze loops that may be of interest. With all binary analysis it is important to not only identify dangerous function calls but to also identify if the attacker can control data that might be manipulated or referenced within a loop.


Chapter 6

Conclusion
During the course of this paper, the reader has had a chance to learn about the different types of loops and some of the method of detecting them. The reader has also gotten an in-depth view of the new IDA plug-in released with this article. Hopefully now when the reader sees a loop, whether in code or binary, the reader can explore the loop and determine if it is a security risk or not.


Bibliography

[1] Tarjan, R. E. 1974. Testing flow graph reducibility. J Comput. Syst. Sci. 9,355-365.
[2] Sreedhar, Vugranam, Guang Gao, & Yong-Fong Lee. Identifying loops using DJ graphs. http://portal.acm.org/citation.cfm?id=236114.236115
[3] Flake, Halvar. Automated Reverse Engineering. http://www.blackhat.com/presentations/win-usa-04/bh-win-04-flake.pdf

Selasa, 23 Juli 2013

Loop Detection - Chapter 4 A Different Approach to Loop Detection




Loop Detection
Peter Silberman
 


Chapter 4
A Different Approach to Loop Detection
The reader has seen how to detect dominators within a CFG and how to use that as a component to find natural loops. The previous chapter described why natural loop detection was flawed when trying to detect irreducible loops. For binary auditing, the tool will need to be able to pick up all loops and then let the user deduce whether or not the loops are interesting. This chapter will introduce the loop algorithm used in the IDA plug-in to detect loops.

To come up with an algorithm that was robust enough to detect both loops in the irreducible and reducible loop categories, the author decided to modify the previous definition of a natural loop. The new definition reads ”a loop can have multiple entry points and at least one link that creates a cycle.” This definition avoids the use of dominators to detect loops in the CFG.

The way this alternative algorithm works is by first making a call to the is reference to(ea t to, ea t ref) function. The function is reference to will determine if there is a reference from the ea t specified by ref to the parameter to. This check within the loop detection algorithm determines if there is a backedge or link
that would complete a loop. The reason this check is done first is for speed.

If there is no reference that would complete a loop then there is no reason to call is path to, thus preventing unnecessary calculations. However, if there is a link or backedge, a call to the overloaded function is path to(ea t from, ea t to) is used to determine if the nodes that are being examined can even
reach each other. The is path to function simulates all possible code execution conditions by following all possible edges to determine if the flow of execution could ever reach parameter to when starting at parameter from. The function is path to(ea t from, ea t to) returns one (true) if there is indeed a path going from from to to. With both of these functions returning one, it can be deduced that these nodes are involved in the loop.

4.1 Problems with new approach
In every algorithm there can exists small problems, that make the algorithm far from optimal. This problem applies to the new approach presented above.
The algorithm presented above has not been optimized for performance. The algorithm runs in a time of O(Nˆ2), which carries quite a load if there are more than 600 or so nodes.
The reason that the algorithm is so time consuming is that instead of implementing a Breadth First Search (BFS), a Depth First Search (DFS) was implemented, in the is path to function which computes all possible paths to and from a given node. Depth First Search is much more expensive than Breadth First Search, and because of that the algorithm may in some rare cases suffer.
If the reader is interested in how to implement a more efficient algorithm for finding the dominators, the reader should check out Compiler Design & Implementation by Steven S. Muchnick.
It should be noted that in future of this plug-in there will be optimizations made to the code. The optimizations will specifically deal new implementations of a Breadth First Search instead of the Depth First Search, as well as other small optimizations.