How to open any software and look at it's code?

In summary, the conversation discusses the possibility of examining and understanding the coding of a given program. It is mentioned that most software is compiled into machine-readable code, making it difficult to understand for humans. Suggestions are made for tools such as machine code readers and disassemblers to assist in the process. However, it is also noted that the feasibility and difficulty of this task depends on the specific program and its language.
  • #1
pairofstrings
411
7
I want to open any given program and look at it's coding, can anyone tell me how it can be done?
 
Technology news on Phys.org
  • #2
Yes. Get a machine code reader / bit parser.

And learn machine code.Seriously though, depending on what kind of programs, most software has been compiled to a form that is efficient for machine reading - zeros and ones. It is not interpreted human-readable code, like HTML and JavaScript for example.
 
  • #3
If this were possible, a lot of companies would be out of business.
 
  • #4
There are reverse compilers around.
Google and thou shall find! :wink:
I have some experience with some of those.
You won't get very far on a large and complex program though.
 
  • #5
How about a Hex Editor ?
 
  • #6
uart said:
How about a Hex Editor ?

Yeah. Same thing. You're just looking at bits 16 at a time.

But it is no closer to illuminating the workings of the app.
 
  • #7
The most common tool used for this is a dissassembler, but this would be very time consuming on a large program. It's a form of reverse engineering, and the legality of this depends on the circumstances. Wiki article:

http://en.wikipedia.org/wiki/Reverse_engineering
 
  • #8
There are disassemblers for popular high-level languages such as Java, VB and .NET which can yield very readable code and give substantial insight as to how the app operates.

Some languages are more vulnerable than others. In my experience, C and C++ programs are much more difficult to reverse engineer but can still be modified depending on how the code is structured - this is how hackers bypass software protection for example.
 
  • #9
-Job- said:
In my experience, C and C++ programs are much more difficult to reverse engineer

That's because, in some cases, even the language itself is little more than memory allocations and pointers and incomprehensible at the best of times. :biggrin::biggrin::biggrin:
 
  • #10
DaveC426913 said:
That's because, in some cases, even the language itself is little more than memory allocations and pointers and incomprehensible at the best of times. :biggrin::biggrin::biggrin:

Ha that's true. The reality is that languages like .NET compile into an intermediate representation, using something aptly called "Intermediate Language".

.NET ships with Ildasm.exe a disassembler that outputs the IL for a given .NET program. If you take a look you'll see that the IL is quite readable, so .NET decompilers have a lot more information. JAVA also retains symbol information when compiled into class files.

C/C++ compile to lower level code which makes it much more difficult to piece back together, but you can always stare and step through the assembly code - Visual Studio has dumpbin for this, there are also more sophisticated tools like IDA.
 
  • #11
So:

What the OP wants to do is pick an app (or apps) he wants to examine, determine what language they are written in, and then determine the feasibilty for decompiling them.

Feasibility and choice of solution will be on a case-by-case basis.
 
  • #12
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.

The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, ...

I have tried disassembly of .com programs on 8086's and that is just about impossible. Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.
 
  • #13
There are cases where a person or a group have reversed engineered an old PC game to enhance it and/or to make it work with current versions of Windows. This involved dissassembling and understanding key aspects of the program, and figuring out key data files for games. One example of this is the racing game Grand Prix Legends (orignally made in 1997), where a group of people modified the game to include shifter support, race cars from other years, and a large number of tracks were made as add ons.
 
  • #14
256bits said:
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.

The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, ...

I have tried disassembly of .com programs on 8086's and that is just about impossible. Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.

I guess it depends on what you want to do.

If all you want to do is crack the software protection, then it's going to be pretty quick with something like SoftICE to at least get a basic idea of the regions of the program that are involved with this feature. If the protection is not really advanced (like say a simple function), then writing a few NOP codes in the right places (maybe with some extra modification) is pretty easy. If its more complex protection, then the structure of the protection is there, but its usually more complicated and more integrated into the code-base than the simpler mechanisms.

But yeah if you want to actually understand the whole executable, then for most cases good luck.
 
  • #15
chiro said:
I guess it depends on what you want to do.

I thought the OP's question was fairly clear on that.
 
  • #16
256bits said:
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.
That's assuming that you can fit only 10 bytes on a page.
256bits said:
The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, ...

I have tried disassembly of .com programs on 8086's and that is just about impossible.
Not necessarily. I had a disassembler about 20 years ago, and I was able to disassemble and modify several .com and .exe files.

One of the .com files was a DOS utility that would display a text file. By default, the utility displayed white letters against a blue background. I found the code that was setting the text and background colors, and changed it so that it displayed black letters on a gray background.

Another .com file was called bubble.com, IIRC. When it was running it seemed to make the text on the screen drop off the bottom, one character at a time. If it ran long enough, the screen would eventually become blank. I disassembled that code and changed it so that the letters rose up instead of falling down.

These were relatively small files, so disassembling them wasn't that difficult. One that took more time was an .exe that was part of Norton Utilities. One of the utilities they offered was the ability to change the name of a directory. The standard way of doing this back then (early 90s) was to create a directory with the name you wanted, move all the files from the old directory, and then delete the old directory. I couldn't imagine that Norton was doing all of this just to rename a directory. I disassembled the file, and looked through all of the assembly code (about 35K bytes) for all of the int 21H instructions, the interrupts that go into the DOS functions. For each of the int 21H instructions, I wrote down how the AX and other registers were set, so as to determine the particular DOS function being used. I then looked up each function call in a reference I have, and found that one of them was using the DOS Rename File function, which is what Norton was using to rename a directory. The only difference between a directory and a file is that the directory has the directory attribute set.
256bits said:
Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.
 
  • #17
Mark44 said:
I had a disassembler about 20 years ago, and I was able to disassemble and modify several .com and .exe files.

Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?
 
  • #18
pairofstrings said:
Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?

Not sure if a C=64 disassembler could handle a Java app unless it'll fit on a 5 1/4" floppy... :biggrin:
 
  • #19
pairofstrings said:
Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?
IIRC, the name was MD86, but I don't think they're in business any longer.

A disassembler works by attempting to translate binary machine code back into a particular assembly language. I say "attempt" because a program will typically have blocks of code, which are relatively easy to translate, as well as blocks of data, which the disassembler will sometimes try to turn into code instructions. Part of being able to disassemble an executable is being able to recognize blocks of data for what they are: just numbers or characters.

Another thing that makes a disassembled executable or COM file difficult to understand is that named labels and named variables in the original assembly code show up in the disassembly code as just labels with uninformative names such as B200 and the like. The disassembler I had let you replace these label names with ones of your choice.
 

Related to How to open any software and look at it's code?

1. What is the purpose of opening and looking at software code?

Opening and examining software code allows you to understand how a program works, identify any potential bugs or security vulnerabilities, and make modifications or improvements if needed.

2. How do I open software code?

To open software code, you will need a text editor or integrated development environment (IDE) that supports the programming language in which the software was written. The code can usually be found in the program's installation folder or by using a shortcut key within the program itself.

3. Can I modify the code of any software?

No, in most cases, modifying the code of a software program is not allowed without the permission of the software's creator or company. It is important to respect copyright and licensing laws when it comes to making changes to software code.

4. What should I look for when examining software code?

When looking at software code, you should pay attention to the structure and organization of the code, any comments or documentation that may be included, and any potential security vulnerabilities or bugs. It is also important to have a basic understanding of the programming language in order to make sense of the code.

5. Is it necessary to have programming knowledge to examine software code?

While having a basic understanding of programming can be helpful, it is not always necessary to have extensive knowledge in order to examine software code. With some research and patience, anyone can learn to read and understand the code of a software program.

Similar threads

  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
29
Views
2K
  • Programming and Computer Science
Replies
1
Views
954
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
6
Views
2K
  • Programming and Computer Science
Replies
7
Views
2K
  • Programming and Computer Science
Replies
8
Views
758
  • Programming and Computer Science
Replies
0
Views
537
  • Programming and Computer Science
Replies
5
Views
1K
  • Programming and Computer Science
Replies
2
Views
838
Back
Top