Thursday, August 25, 2011

Manipulating Java Class Files with BCEL - Part Three: More About BCEL

This is the third article in the BCEL series. You can read all here. Since I have covered the basics, I will accumulate the points left now. I will discuss about local variables, fields, methods and jump instructions.

Data Types: Data types are handled in JVM by two ways - by having different opcodes for different datatypes, and by using descriptor (as explained in Manipulating Java Class Files with BCEL - Part One : Hello World!. In java, not all types support all operations. For example, there are no opcodes for addition of two byte values or two short values. This is understandable if you notice that all values are promoted to at least an int before any operation. A byte simply cannot exist in the operand stack. This saves the JVM from having to remember the number of bytes to pop from the operand stack for operation. Also, longs have different opcodes so that the JVM knows it has to pop 8 bytes instead of 4 when it is used.

Where to Find Instructions: All details of all instructions can be found here. These are alphabetically ordered.


Speaking of operand stack, there is one per thread per method. What is per thread per method? Well each thread has a stack called a method invocation stack (or you may call it by any other sweeter name). Each member of this stack is a frame that stores all local variables including the method parameters. If a method is invoked from another, a new frame is added, whenever a method returns or throws an exception, the corresponding frame is popped and discarded. Each frame is also associated with an operand stack (and instruction pointers etc.).

Compound Instructions: Well they do not really have that official name, but I would prefer to call them that way. JVM opcodes are designed for optimization. Take for example iload 0. This pushes the value of local variable of type int and number 0 into the stack. However, the JVM has another opcode iload_0, which does the same thing, but does not have an operand effectively saving code space. Both of these are correct. However, with BCEL, you do not have to worry about these, as BCEL automatically converts an iload 0 to iload_0. BCEL does not even have a class called ILOAD_0.

Local Variables: The local variables are refered to in the JVM by index numbers instead of there names. First the parameters of the method are assigned indices, starting from 0 and in the order they appear in the method parameter list. More variables can be created by simply storing value into new indices with istore, astore etc. depending on the datatype of the variable.

Method Invocation: There are four kinds of method invocations - InvokeVirtual, InvokeStatic, InvokeInterface and InvokeSpecial. InvokeVirtual is used to call a normal method. JVM automatically handles overriding. Note however that overloading is resolved during compilation. A sister of this is InvokeInterface. It does the same thing for interface methods. InvokeStatic is used for static methods, and lastly, InvokeSpecial is used for constructors.

Jumps: Jumps are used for flow control like conditional execution (if-else) and loops (while, for, do-while etc.). Jumps can be either unconditional (goto) or conditional (if).

The goto statement simply instructs to jump to a given instruction. If statements are of mainly of two types - one that compares one value with another and two, that compares a value with 0. For example, the instruction if_acmp compares to object reference values. if_icmp<cond> instructions compare two integer values. The if<cond> instructions compare an integer value with 0.

If-Else: If else can be implemented using conditional jumps. The following program generates a class that takes a value in it's command line arguments, and compares it with 5.



Notice that the jump instructions take null as a parameter. This is because it is still not known at that point, the correct instruction to jump to. To handle this, BCEL jump instructions are subclasses of BranchInstruction, and adding those to an instruction list returns a BranchHandle. We can set the target of this handle to any other InstructionHandle, which is returned by adding any other instruction to an InstructionList.

The else part normally comes before the if in the bytecode. This is because a jump occurs when the condition is true, which is the if part.


Loop: Loop is very similar to an an if-statement, the only difference is jumping to an instruction that the JVM has already executed. The following example creates a program that takes an integer (let's say n) as an argument and prints 0 to (n-1)



Switch Statements: Switch instructions are of two kinds - Table Switch and Lookup Switch. The difference is that in table switch uses a range for valid values, whereas a lookup switch uses a lookup table for valid values. Both use a lookup table for instruction to jump to in case of a match. Since table switch uses a range for valid values, jump must be specified for all values in that range, which is not the case for lookup switch. Note that a compiler is free to choose a table switch even when the java source code uses sparse values. The compiler in that case provides the jump location to the same location as the default location in case a value in the range does not appear in the source. Its therefore all about optimization. In general, table switches are faster, but takes up lot of space in the code, whereas lookup switches are slower, but take up much less space.

The following example shows two ways to compile the program SwitchSource.java. (The BCEL program output has a different classname (SyntheticClass) for regularity). The tableswitch version is SimpleTableSwitch.java and the lookupswitch version is SimpleLookupSwitch.java. Note that for a BCEL programmer, the only difference is that in case of lookupswitch, the jump location for value 3 need not be provided. [Click on a tab to view code]



Now you should be able to run like java com.geekyarticles.bcel.SyntheticClass 4, and it should output Four.

No comments:

Post a Comment