The Definition of Reverse Engineering, JVMs, and the Java programming language.

By 01

Sunday, October 12, 2008
I recently had a conversation with a colleague that reminded me of a another discussion with my Masters Project advisor a couple of years ago when I was exploring research topics.  The topic under discussion was Reverse Engineering --in particular, Software Reverse Engineering.  It sounds really cool, but what is it?  At some point, the topic of reverse engineering came up.  My professor had made the point that software reverse engineering wasn't something that had really been achieved because the goal is to reconstitute the original source code (in its purest sense of the definition).  We moved on from that line of reasoning quickly, but the comment stayed in the back of my mind.

There are plenty of books purporting to be about Reverse Engineering in the pop culture computer literature.  These titles certainly claim to be doing software Reverse Engineering.  So, how do I rectify the comment from my adviser with what I find on the local book store's computer shelf?

We need a precise definition of Software Reverse Engineering.  In the strictest sense, it is the recreation of the original source code from a compiled binary (or, other translated form).  This can be very challenging, if not impossible, in the presence of register-based Instruction Set Architectures and optimizing compilers--the norm in today's industry.  I believe that is the bases of my adviser's original comment.  A slightly, more liberal definition of the term might be the reconstitution of something that is logically equivalent to a program's original source code, but not necessarily the same, line-for-line.  Can this still be considered Software Reverse Engineering?

I propose that it can.  In fact, it's a great definition because it is within the realm of achievable and has real-world application.  A more conservative approach would be to acknowledge this as the definition of a form of Software Reverse Engineering.  The jad Java decompiler is the only thing that has prevented people from being fired when Java source code was lost--I've worked at companies where it has happened more than once.  This may be an example of my getting away from academic, purest roots.

I posed the following question to my professor: If one's goal were to reconstitute something that was logically equivalent to a program's original source code, but not necessarily be the same, line-for-line, can one still call it reverse engineering? 

He agreed that it represented a form of Reverse Engineering.  But, it must be noted that it does not produce the original source code, which is the end-goal of Reverse Engineering in its purest sense.

So, I'm going to use "the reconstitution of something that is logically equivalent to a program's original source code" as a definition for a form of Software Reverse Engineering.  Furthermore, I'm going to claim that Software Reverse Engineering has enjoyed success in the Java community primarily because of the stack-based architecture that the JVM's ISA implements.

References

[1] http://www.acm.uiuc.edu/sigmil/RevEng/
[2] http://en.wikipedia.org/wiki/Reverse_engineering
[3] http://en.wikipedia.org/wiki/Instruction_set
[4] http://en.wikipedia.org/wiki/Decompiler

 

©2008 www.thinkmiddleware.com

All copyrights & trademarks belong to their respective owners.

The comments and opinions herein are that of the author.

Please direct all comments to 01.

While the information presented on this web site is believed to be correct, the author is not responsible for any damage, loss of data, or other issues that may arise from using the information posted here.

Made with CityDesk
Last Modified: Sunday, 09-Nov-2008 10:48:38 MST