Geeky Articles: Manipulating Java Class Files with BCEL

This is the second article in the series of articles on Apache BCEL. If you have not read part 1, read it here.

Expression Processing: Expressions are key part of a language. In this article, I will discuss how expressions are compiled into java byte code. I will also cover a little bit about compilation process. At the end, I will go through the steps and create a compiler for numerical expression.

How Java Codes Expressions: As I discussed in part 1, java works with an operand stack. A simple expression 2+3 will turn into the following (assuming 2 is stored in constant number 3 and 3 is stored in constant number 3).

ldc #3
ldc #6
iadd

The iadd instruction will pop its two operands, add them and them push them back into the stack. The following program does exactly this (but also contains code to do the output).

SampleExpression

package com.geekyarticles.bcel;

import java.io.IOException;

import org.apache.bcel.Constants;
import org.apache.bcel.classfile.JavaClass;
import org.apache.bcel.generic.ArrayType;
import org.apache.bcel.generic.ClassGen;
import org.apache.bcel.generic.ConstantPoolGen;
import org.apache.bcel.generic.GETSTATIC;
import org.apache.bcel.generic.IADD;
import org.apache.bcel.generic.INVOKEVIRTUAL;
import org.apache.bcel.generic.InstructionList;
import org.apache.bcel.generic.LDC;
import org.apache.bcel.generic.MethodGen;
import org.apache.bcel.generic.RETURN;
import org.apache.bcel.generic.Type;

public class SampleExpression {
    public static void main(String[] args) {
        System.out.println("Generating Class");

        //Create a ClassGen for our brand new class.
        ClassGen classGen=new ClassGen("com.geekyarticles.bcel.SyntheticClass", "java.lang.Object", "SyntheticClass.java", Constants.ACC_PUBLIC, null);

        //Get a reference to the constant pool of the class. This will be modified as we add methods, fields etc. Note that it already constains
        //a few constants.
        ConstantPoolGen constantPoolGen=classGen.getConstantPool();

        //The list of instructions for a method.
        InstructionList instructionList=new InstructionList();

        //Add the appropriate instructions.

        //Get the reference to static field out in class java.lang.System.
        instructionList.append(new GETSTATIC(constantPoolGen.addFieldref("java.lang.System", "out", "Ljava/io/PrintStream;")));

        //Load the constant
        instructionList.append(new LDC(constantPoolGen.addInteger(2)));
        //Load the constant
        instructionList.append(new LDC(constantPoolGen.addInteger(3)));
        //Add
        instructionList.append(new IADD());

        //Invoke the method.
        instructionList.append(new INVOKEVIRTUAL(constantPoolGen.addMethodref("java.io.PrintStream", "println", "(I)V")));

        //Return from the method
        instructionList.append(new RETURN());

        MethodGen methodGen=new MethodGen(Constants.ACC_PUBLIC|Constants.ACC_STATIC, Type.VOID, new Type[]{new ArrayType(Type.STRING, 1)}, new String[]{"args"}, "main", "com.geekyarticles.bcel.SyntheticClass", instructionList, constantPoolGen);

        methodGen.setMaxLocals();//Calculate the maximum number of local variables.
        methodGen.setMaxStack();//Very important: must calculate the maximum size of the stack.

        classGen.addMethod(methodGen.getMethod()); //Add the method to the class

        //Print a few things.
        System.out.println("********Constant Pool**********");
        System.out.println(constantPoolGen.getFinalConstantPool());
        System.out.println("********Method**********");
        System.out.println(methodGen);
        System.out.println("********Instruction List**********");
        System.out.println(instructionList);


        //Now generate the class
        JavaClass javaClass=classGen.getJavaClass();
        try {
            //Write the class byte code into a file
            javaClass.dump("com/geekyarticles/bcel/SyntheticClass.class");

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        //That's it.

    }
}

If you run the generated class now, it should output 5. More complex expressions are processed through properly manipulating the stack operations.

Prefix, Infix, Postfix Operators: Infix operators are the ones we normally use in normal calculation. For example 2+3. A prefix operator comes before the operands, like this: + 2 3. Similarly, a postfix operator comes after the operands, 2 3 +. The advantage of using prefix or postfix operators is that there is no need for parentheses. For example, the expression 2+3*8+(9-2) becomes

2 3 8 * + 9 2 - +

with postfix operators. Note that even operator precedence is not required. You may be wondering why I am talking about all these here. The simple reason is that if you write your expression in postfix notation, it will represent the JVM instructions needed to process the expression. The following shows the instructions need to do the above calculation. The parentheses represents a constant pool index that contains the value written inside.

ldc (2)
ldc (3)
ldc (8)
imul
iadd
ldc (9)
ldc (2)
isub
iadd

This exactly represents the postfix operator expression shown above. But postfix operator notations are difficult to understand for humans. So, how do we convert infix operator notation to prefix operator notation? For that we need to learn a little about how a compiler works.

Little Bit on Compilation: Although, steps may vary from compiler to compiler, a compilation process involve pretty much the following steps.

Lexing
Parsing
Semantic Analysis
Intermediate Code Generation
Optimization
Code Generation

This is not an article on compilation, so I will keep it as simple as possible.

Lexing: Lexing is the process of dividing the expression in a number of tokens. tokens are like words in a sentence. For example, 12+3*114 will be divided into tokens 12,+,3,*,114. All lexers are invariably deterministic finite state atomata or DFA. A DFA is a simple program (or any other thing) that has a set of posible states, and a set of posible transitions from one state to another triggered by an input symbol. A DFA thus processes a stream of input symbols, hopping from one state to another in the process. If a symbol is encountered for which no transitions are defined, its an error.

A state can be either an accepting state or a non-accepting state (but not a rejecting state). Whenever the DFA hits an accepting state, a token is created from the symbols already read. If at the end of the symbol stream, a non-accepting state is reached, it is an error.

The above picture shows the DFA for our lexer. The states Begin and Space are accepting states shown by double elipse. It might look a little too complicated at the begining, but if seen carefully, its pretty simple. Alphabets and $,_ start a variable; aplhanumeric and $,_ continue it. 0-9, . start and continue creating a number. Everything else is even simpler. The lex() function in the program given below implements this logic.

Parsing: Parsing is the process of arranging the tokens in something called a parse tree. The parse tree represents the order of evaluation. The leaves are evaluated first, and then the branches. Hence, the parse tree is later post-order traversed to create the instruction set. This is same as using the postfix operators.

A program's structure can be defined using a context free grammer (CFG). A CFG is a set of terminals, non-terminals and productions. Terminals are the symbols having one-to-one correspondence with the tokens generated in the lexer. Note here, the symbols in the parser are the tokens in the lexer. Non-terminals are the symbols created to facilitate creation of the grammer. Productions are rules for combining terminals and non-terminals to create non-terminals. The following set of productions represent our grammer.

The non-terminal at left hand side of the first production will be used as the root of the parse tree. For example, let us consider the following set of production.

E->E+E
E->num

Here num is a terminal symbol and E is non-terminal. Let us assume, the num symbol represents all integers. Then the symbol 2+2 can easily be formed as follows:

                                         E
                                 ________|_________
                                 |       |        |
                                 E       +        E
                                 |                |
                                 2                2

However, in this grammer, the symbols 2+3+4 can be written in two different ways, as shown below.

                                         E
                                 ________|_________
                                 |       |        |
                                 E       +        E
                                 |          ______|______
                                 2          |     |     |
                                            E     +     E
                                            |           |
                                            3           4



                                         E
                                 ________|_________
                                 |       |        |
                                 E       +        E
                           ______|______          |
                           |     |     |          4
                           E     +     E
                           |           |
                           2           3

Hence, this grammer is called ambiguous. The grammer can be made unambiguous by making the following change. However, notice that the language generated by both the grammers are same.

E->E+num
E->num

Making an unambiguous grammer is very important for parser generators. The following show the grammer used by us. However we will make the parser by hand coding.

E->E1
E->E+E1
E->E-E1
E1->E2
E1->E1*E2
E1->E1/E2
E2->(E)
E2->num
E2->var
E2->+E2
E2->+E2
E2->-E2
E2->-E2

LALR(1) Parsing: LALR(k) or Look Ahead Left Right parsing is the most commonly used parsing technique. It is left-right, because it reads the symbols from left to right, and then analyzes them from right to left. It means that the right most symbols are matched for any possible non-terminal. If one exists, replacement is done. In practice, a stack can be maintained to contain the symbols, and then the non-terminal can be formed using the symbols present at the top of the stack. Notice above, that while resolving amguity, we have conciously put the leaf symbols on the right side to support this parsing technique. The following shows the parsing sequence for 2+3+4.

2: E2 <- E1
+: E1 + <- E +
3: E + 3 <- E + E2 <- E + E1 <- E
+: E +
4: E + 4 <- E + E2 < - E + E1 <- E

However, this technique will fail if we have 2+3*4. At step 4, we will end up having E * , which would be a dead end. Hence, before we apply the addition, we need to see wheather the next symbol is a * or /. Since we look one symbol ahead, its called LALR(1) parsing.

Note that in a real LALR parser, the symbols are processed with a DFA, and it makes sense to store the DFA states in the stack rather than the symbols themselves to save time. However, for simplicity, we will logically try to find a suitable non-terminal every time.

In the example below, the parse method implements this parsing technique. (Again a code tells a thousand words). However, I would point out that here the non-terminals would actually represent operators.

Semantic Analysis: Semantic Analysis is the part of compilation in which language sematics are processed, including variable scoping, class, method resolution etc. In our case, the only analysis would be to find the variables. But it would be easier for us to do that after intermediate code generation, so we will do that then.

Intermediate Code Generation: As I pointed out earlier, java bytecode is equivalent to postfix operators, hence intermediate code generation would simply be the post order traversal of the parse tree.

Optimization: We would remove all unary + and all couples of unary -.

Code Generation: Now it is very simple, as the symbols are already in postfix form, it will be simply taking one by one and converting to code.

The following is the code for the compiler for expression. It creates a class file that has a main method. If variables are created, their value must be provide during invocation of the generated class in the order of their declaration (variables are declared when they are used for the first time).

How to check errors:  In case you make a mistake, it is very difficult to know the problem from the class verification error shown by the JVM. BCEL comes with a built-in class verifier called org.apache.bcel.verifier.Verifier. Below is a sample output.




bash-4.1$ java -cp /home/debasish/Download/bcel-5.2/bcel-5.2.jar:. org.apache.bcel.verifier.Verifier com.geekyarticles.bcel.ExpressionTest
JustIce by Enver Haase, (C) 2001-2002.
<http://bcel.sourceforge.net>
<http://jakarta.apache.org/bcel>

Now verifying: com.geekyarticles.bcel.ExpressionTest

Pass 1:
VERIFIED_OK
Passed verification.

Pass 2:
VERIFIED_OK
Passed verification.

Pass 3a, method number 0 ['public static void main(String[] args)']:
VERIFIED_OK
Passed verification.

Pass 3b, method number 0 ['public static void main(String[] args)']:
VERIFIED_OK
Passed verification.

Warnings:
Pass 2: Attribute '<LocalVariableTable: LocalVariable(start_pc = 0, length = 51, index = 0:String[] args)>' as an attribute of Code attribute '<CODE>' (method 'public static void main(String[] args)') will effectively be ignored and is only useful for debuggers and such.
Pass 2: SourceFile attribute 'SourceFile(ExpressionTest.exp)' has a funny name: remember not to confuse certain parsers working on javap's output. Also, this name ('ExpressionTest.exp') is considered an unqualified (simple) file name only.

ExpressionCompiler.java

package com.geekyarticles.bcel;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Stack;

import org.apache.bcel.Constants;
import org.apache.bcel.classfile.JavaClass;
import org.apache.bcel.generic.AALOAD;
import org.apache.bcel.generic.ALOAD;
import org.apache.bcel.generic.ArrayType;
import org.apache.bcel.generic.ClassGen;
import org.apache.bcel.generic.ConstantPoolGen;
import org.apache.bcel.generic.DADD;
import org.apache.bcel.generic.DDIV;
import org.apache.bcel.generic.DMUL;
import org.apache.bcel.generic.DNEG;
import org.apache.bcel.generic.DSUB;
import org.apache.bcel.generic.GETSTATIC;
import org.apache.bcel.generic.INVOKESTATIC;
import org.apache.bcel.generic.INVOKEVIRTUAL;
import org.apache.bcel.generic.InstructionList;
import org.apache.bcel.generic.LDC;
import org.apache.bcel.generic.LDC2_W;
import org.apache.bcel.generic.MethodGen;
import org.apache.bcel.generic.RETURN;
import org.apache.bcel.generic.Type;

public class ExpressionCompiler {

    /**
     * @param args
     * @throws ExpressionException
     */
    public static void main(String[] args) throws ExpressionException {
        if(args.length!=1){
            System.out.println("Usage: java ExpressionCompiler <Expression>");
            return;
        }

        String expr=args[0];
        //Generate tokens
        List<Token> tokens=lex(expr);
        System.out.println(tokens);

        //Generate parse tree
        Symbol symbol=parse(tokens, expr);
        System.out.println(symbol);

        //Generate intermdiate code
        List<Symbol> flattened=flatten(symbol);
        printSymbols(flattened);

        //Create variable list
        List<String> varList=varList(flattened);
        System.out.println(varList);

        //Optimize
        List<Symbol> optimized=optimize(flattened);
        printSymbols(optimized);

        //Generate code
        generateCode("com.geekyarticles.bcel.ExpressionTest",varList,optimized);



    }

    public static void printSymbols(List<Symbol> flattened) {
        for(Symbol s:flattened){
            if(s instanceof Terminal){
                System.out.print(((Terminal) s).terminalToken.value+" ");
            }else if(s instanceof BinaryNonTerminal){
                System.out.print(((BinaryNonTerminal) s).operator+" ");
            }else if(s instanceof UnaryNonTerminal){
                System.out.print(((UnaryNonTerminal) s).operator+":U ");
            }
        }
        System.out.println();
    }

    public static class ExpressionException extends Exception{
        public String expression;
        public String message;
        public int position;
        public ExpressionException(String expr, int pos, String msg){
            expression=expr;
            position=pos;
            message=msg;
        }
        public String toString(){
            StringBuilder errMsg=new StringBuilder();
            errMsg.append(message).append("\n");
            errMsg.append(expression).append("\n");
            for(int i=0;i<expression.length();i++){
                if(i==position){
                    errMsg.append('^');
                }else{
                    errMsg.append('_');
                }
            }
            return errMsg.toString();
        }
    }

    /*Lexer starts here*/

    public static enum TokenType{
        NUMERIC,VARIABLE, OPERATOR, PARENTHESES
    }

    public static enum LexerStates{
        NUMERIC, SPACE, VARIABLE, BEGIN
    }
    public static class Token{
        public String value;
        public int position;
        public TokenType type;
        public String toString(){
            return type+":"+value;
        }
    }

    public static void error(String expr, int pos, String message) throws ExpressionException{
        throw new ExpressionException(expr, pos, message);
    }

    public static List<Token> lex(String expr) throws ExpressionException{
        List<Token> tokenList=new ArrayList<ExpressionCompiler.Token>();
        LexerStates state=LexerStates.BEGIN;
        char [] exprChars = expr.toCharArray();
        StringBuilder value=null;
        for(int i=0;i<exprChars.length;i++){
            char c=exprChars[i];

            switch(state){
            case BEGIN:
                if(c=='+'||c=='-'){
                    Token token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.OPERATOR;
                    tokenList.add(token);
                }else if(c>='0'&&c<='9'||c=='.'){
                    value=new StringBuilder();
                    value.append(c);
                    state=LexerStates.NUMERIC;
                }else if(c=='('){
                    Token token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.PARENTHESES;
                    tokenList.add(token);
                }else if((c>='A'&&c<='Z')||(c>='a'&&c<='z')||c=='$'||c=='_'){
                    value=new StringBuilder();
                    value.append(c);
                    state=LexerStates.VARIABLE;
                }else if(c==' '){
                    //do not do anything for white space
                }else{
                    error(expr, i, "Invalid character: "+c);
                }
                break;
            case NUMERIC:
                if(c=='+'||c=='-'||c=='*'||c=='/'){
                    Token token=new Token();
                    token.position=i;
                    token.value=value.toString();
                    token.type=TokenType.NUMERIC;
                    tokenList.add(token);

                    token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.OPERATOR;
                    tokenList.add(token);

                    state=LexerStates.BEGIN;
                }else if(c>='0'&&c<='9'||c=='.'){
                    if(c=='.'&&value.indexOf(".")>=0){
                        error(expr, i, "Invalid character: "+c);
                    }
                    value.append(c);
                }else if(c==')'){
                    Token token=new Token();
                    token.position=i;
                    token.value=value.toString();
                    token.type=TokenType.NUMERIC;
                    tokenList.add(token);

                    token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.PARENTHESES;
                    tokenList.add(token);
                    state=LexerStates.SPACE;
                }else if(c==' '){
                    Token token=new Token();
                    token.position=i;
                    token.value=value.toString();
                    token.type=TokenType.NUMERIC;
                    tokenList.add(token);
                    state=LexerStates.SPACE;
                }else{
                    error(expr, i, "Invalid character: "+c);
                }
                break;
            case SPACE:
                if(c=='+'||c=='-'||c=='*'||c=='/'){
                    Token token=new Token();
                    token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.OPERATOR;
                    tokenList.add(token);

                    state=LexerStates.BEGIN;
                }else if(c>='0'&&c<='9'||c=='.'){
                    value=new StringBuilder();
                    value.append(c);
                    state=LexerStates.NUMERIC;
                }else if(c=='('){
                    Token token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.PARENTHESES;
                    tokenList.add(token);
                    state=LexerStates.BEGIN;
                }else if(c==')'){
                    Token token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.PARENTHESES;
                    tokenList.add(token);
                    state=LexerStates.SPACE;
                }else if((c>='A'&&c<='Z')||(c>='a'&&c<='z')||c=='$'||c=='_'){
                    value=new StringBuilder();
                    value.append(c);
                    state=LexerStates.VARIABLE;
                }else if(c==' '){
                    //do not do anything for white space
                }else{
                    error(expr, i, "Invalid character: "+c);
                }
                break;
            case VARIABLE:
                if((c>='A'&&c<='Z')||(c>='a'&&c<='z')||c=='$'||c=='_'||c>='0'&&c<='9'){
                    value.append(c);
                }else if(c=='+'||c=='-'||c=='*'||c=='/'){
                    Token token=new Token();
                    token.position=i;
                    token.value=value.toString();
                    token.type=TokenType.VARIABLE;
                    tokenList.add(token);

                    token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.OPERATOR;
                    tokenList.add(token);

                    state=LexerStates.BEGIN;
                }else if(c==' '){
                    Token token=new Token();
                    token.position=i;
                    token.value=value.toString();
                    token.type=TokenType.OPERATOR;
                    tokenList.add(token);
                    state=LexerStates.SPACE;
                }else if(c==')'){
                    Token token=new Token();
                    token.position=i;
                    token.value=value.toString();
                    token.type=TokenType.VARIABLE;
                    tokenList.add(token);

                    token=new Token();
                    token.position=i;
                    token.value=Character.valueOf(c).toString();
                    token.type=TokenType.PARENTHESES;
                    tokenList.add(token);
                    state=LexerStates.SPACE;
                }else{
                    error(expr, i, "Invalid character: "+c);
                }
                break;
            }
        }
        //Now need to add the remaining upto the end

        switch(state){
        case NUMERIC:
            Token token=new Token();
            token.position=exprChars.length-1;
            token.value=value.toString();
            token.type=TokenType.NUMERIC;
            tokenList.add(token);
            break;
        case VARIABLE:
            token=new Token();
            token.position=exprChars.length-1;
            token.value=value.toString();
            token.type=TokenType.VARIABLE;
            tokenList.add(token);
            break;
        }

        return tokenList;
    }

    /*Parser starts here*/
    public static interface Symbol{
        public int getPosition();
    }

    public static interface NonTerminal extends Symbol{

    }

    public static class UnaryNonTerminal implements NonTerminal{
        public String operator;
        public Symbol operand;
        @Override
        public int getPosition() {
            // TODO Auto-generated method stub
            return operand.getPosition();
        }

        public String toString(){
            return operand.toString()+operator+":U ";
        }
    }

    public static class BinaryNonTerminal implements NonTerminal{
        public String operator;
        public Symbol operandLeft;
        public Symbol operandRight;
        @Override
        public int getPosition() {
            // TODO Auto-generated method stub
            return operandRight.getPosition();
        }

        public String toString(){
            return operandLeft.toString()+operandRight.toString()+operator+" ";
        }
    }

    public static class Terminal implements Symbol{
        public Token terminalToken;

        @Override
        public int getPosition() {
            // TODO Auto-generated method stub
            return terminalToken.position;
        }

        public String toString(){
            return terminalToken.value+" ";
        }
    }

    public static Symbol parse(List<Token> tokens, String expr) throws ExpressionException{
        Stack<Symbol> symbolStack=new Stack<Symbol>();
        WhetherLookAheadFailed failed=new WhetherLookAheadFailed();
        for(int i=0;i<tokens.size();i++){
            Token token=tokens.get(i);
            Terminal nextSymbol=null;
            if(i<tokens.size()-1){
                nextSymbol=new Terminal();
                nextSymbol.terminalToken=tokens.get(i+1);
            }
            Terminal terminal=new Terminal();
            terminal.terminalToken=token;

            symbolStack.push(terminal);
            while(doCheckBinary(symbolStack, nextSymbol, failed)
                    ||(!failed.lookAheadFail
                            &&(doCheckUnary(symbolStack)
                                    ||doCheckParen(symbolStack, expr))));

        }

        if(symbolStack.size()==1){
            return symbolStack.pop();
        }else if(symbolStack.size()>=1){
            Symbol prob=symbolStack.get(symbolStack.size()-1);
            error(expr, prob.getPosition(),"Parsing error");
        }
        return null;
    }

    public static class WhetherLookAheadFailed{
        public boolean lookAheadFail=false;
    }

    //Check wheather the top of the stack can be reduced to a binary non-terminal. If possible, do it
    public static boolean doCheckBinary(Stack<Symbol> symbolStack, Symbol nextSymbol, WhetherLookAheadFailed failed){
        failed.lookAheadFail=false;
        if(symbolStack.size()<3){
            return false;
        }
        Symbol rightSymbol=symbolStack.get(symbolStack.size()-1);
        Symbol leftSymbol=symbolStack.get(symbolStack.size()-3);
        Symbol midSymbol=symbolStack.get(symbolStack.size()-2);
        if(midSymbol instanceof Terminal){
            Terminal mid=(Terminal) midSymbol;
            if(mid.terminalToken.type==TokenType.OPERATOR){

                if((rightSymbol instanceof NonTerminal
                        ||((Terminal)rightSymbol).terminalToken.type==TokenType.NUMERIC
                        ||((Terminal)rightSymbol).terminalToken.type==TokenType.VARIABLE)
                        &&(leftSymbol instanceof NonTerminal
                                ||((Terminal)leftSymbol).terminalToken.type==TokenType.NUMERIC
                                ||((Terminal)leftSymbol).terminalToken.type==TokenType.VARIABLE)){

                    if("+".equals(mid.terminalToken.value)||"-".equals(mid.terminalToken.value)){
                        if(nextSymbol!=null&&nextSymbol instanceof Terminal){
                            if("*".equals(((Terminal)nextSymbol).terminalToken.value)
                                    ||"/".equals(((Terminal)nextSymbol).terminalToken.value)){

                                //Did a look ahead, no go here
                                failed.lookAheadFail=true;
                                return false;
                            }
                        }
                    }

                    symbolStack.pop();
                    symbolStack.pop();
                    symbolStack.pop();

                    BinaryNonTerminal binaryNonTerminal=new BinaryNonTerminal();
                    binaryNonTerminal.operandLeft=leftSymbol;
                    binaryNonTerminal.operandRight=rightSymbol;
                    binaryNonTerminal.operator=mid.terminalToken.value;
                    symbolStack.push(binaryNonTerminal);
                    return true;
                }
            }
        }
        return false;
    }

    //Check wheather the top of the stack can be reduced to a unary non-terminal
    public static boolean doCheckUnary(Stack<Symbol> symbolStack){
        if(symbolStack.size()<2){
            return false;
        }
        Symbol rightSymbol=symbolStack.get(symbolStack.size()-1);
        Symbol midSymbol=symbolStack.get(symbolStack.size()-2);

        if(midSymbol instanceof Terminal){
            Terminal mid=(Terminal) midSymbol;
            if(mid.terminalToken.type==TokenType.OPERATOR&&("+".equals(mid.terminalToken.value)||"-".equals(mid.terminalToken.value))){
                if((rightSymbol instanceof NonTerminal
                        ||((Terminal)rightSymbol).terminalToken.type==TokenType.NUMERIC
                        ||((Terminal)rightSymbol).terminalToken.type==TokenType.VARIABLE)){

                    symbolStack.pop();
                    symbolStack.pop();

                    UnaryNonTerminal unaryNonTerminal=new UnaryNonTerminal();
                    unaryNonTerminal.operator=mid.terminalToken.value;
                    unaryNonTerminal.operand=rightSymbol;
                    symbolStack.push(unaryNonTerminal);
                    return true;
                }
            }
        }

        return false;
    }

    //Check wheather the top of the stack can be reduced to a parentheses non-terminal
    public static boolean doCheckParen(Stack<Symbol> symbolStack, String expr) throws ExpressionException{
        if(symbolStack.size()<3){
            return false;
        }
        Symbol rightSymbol=symbolStack.get(symbolStack.size()-1);
        Symbol leftSymbol=symbolStack.get(symbolStack.size()-3);
        Symbol midSymbol=symbolStack.get(symbolStack.size()-2);
        if(rightSymbol instanceof Terminal && leftSymbol instanceof Terminal){
            if(")".equals(((Terminal)rightSymbol).terminalToken.value) && "(".equals(((Terminal)leftSymbol).terminalToken.value)){
                if((midSymbol instanceof NonTerminal
                        ||((Terminal)midSymbol).terminalToken.type==TokenType.NUMERIC
                        ||((Terminal)midSymbol).terminalToken.type==TokenType.VARIABLE)){
                    symbolStack.pop();
                    symbolStack.pop();
                    symbolStack.pop();
                    symbolStack.push(midSymbol);
                    return true;
                }else{
                    error(expr, ((Terminal)midSymbol).terminalToken.position, "Invalid Symbol");
                }
            }
        }
        return false;

    }

    /*Intermediate code generation. Simple post order traversal, no rocket science.*/
    public static List<Symbol> flatten(Symbol top){
        List<Symbol> symbols=new ArrayList<Symbol>();
        Stack<Symbol> depthStack=new Stack<Symbol>();
        depthStack.push(top);

        while(depthStack.size()>0){
            Symbol current=depthStack.pop();
            if(current instanceof Terminal){
                symbols.add(current);
            }else if(current instanceof BinaryNonTerminal){
                BinaryNonTerminal binaryNonTerminal=(BinaryNonTerminal) current;
                if(binaryNonTerminal.operandLeft!=null){
                    depthStack.push(binaryNonTerminal);
                    depthStack.push(binaryNonTerminal.operandRight);
                    depthStack.push(binaryNonTerminal.operandLeft);
                    binaryNonTerminal.operandRight=null;
                    binaryNonTerminal.operandLeft=null;
                }else{
                    symbols.add(current);
                }
            }else if(current instanceof UnaryNonTerminal){
                UnaryNonTerminal unaryNonTerminal=(UnaryNonTerminal) current;
                if(unaryNonTerminal.operand!=null){
                    depthStack.push(unaryNonTerminal);
                    depthStack.push(unaryNonTerminal.operand);
                    unaryNonTerminal.operand=null;
                }else{
                    symbols.add(current);
                }
            }
        }

        return symbols;
    }

    public static List<String> varList(List<Symbol>  symbols){
        List<String> varList=new ArrayList<String>();
        for(Symbol s:symbols){
            if(s instanceof Terminal && ((Terminal)s).terminalToken.type==TokenType.VARIABLE){
                if(!varList.contains( ((Terminal)s).terminalToken.value)){
                    varList.add(((Terminal)s).terminalToken.value);
                }
            }
        }
        return varList;
    }

    /*Optimization. Will eliminate all unary + and consecutive unary -*/
    public static List<Symbol> optimize(List<Symbol>  symbols){
        List<Symbol> returnList=new ArrayList<Symbol>();
        boolean gotUMinus=false;
        Symbol lastMinus=null;
        for(Symbol s:symbols){
            if(s instanceof UnaryNonTerminal){
                UnaryNonTerminal unaryNonTerminal=(UnaryNonTerminal) s;
                if("+".equals(unaryNonTerminal.operator)){
                    continue;//Why on earth do you write an unary plus anyway?
                }else{
                    lastMinus=s;
                    if(gotUMinus){
                        gotUMinus=false;
                        continue;
                    }else{
                        gotUMinus=true;
                    }
                }
            }else{
                if(gotUMinus){
                    returnList.add(lastMinus);
                    gotUMinus=false;
                }
                returnList.add(s);
            }
        }
        return returnList;
    }

    public static void generateCode(String className, List<String> varList, List<Symbol> source){

        //Create a ClassGen for our brand new class.
        ClassGen classGen=new ClassGen(className, "java.lang.Object",className.substring(className.lastIndexOf('.')+1)+".exp", Constants.ACC_PUBLIC, null);

        //Get a reference to the constant pool of the class. This will be modified as we add methods, fields etc. Note that it already constains
        //a few constants.
        ConstantPoolGen constantPoolGen=classGen.getConstantPool();

        //The list of instructions for a method.
        InstructionList instructionList=new InstructionList();

        instructionList.append(new GETSTATIC(constantPoolGen.addFieldref("java.lang.System", "out", "Ljava/io/PrintStream;")));

        for(Symbol s:source){
            if(s instanceof Terminal){
                Terminal terminal=(Terminal) s;
                if(terminal.terminalToken.type==TokenType.NUMERIC){
                    instructionList.append(new LDC2_W(constantPoolGen.addDouble(Double.parseDouble(terminal.terminalToken.value))));
                }else if(terminal.terminalToken.type==TokenType.VARIABLE){
                    int indexOfVariable=varList.indexOf(terminal.terminalToken.value);
                    instructionList.append(new ALOAD(0));//the argument of the main function
                    instructionList.append(new LDC(constantPoolGen.addInteger(indexOfVariable)));
                    instructionList.append(new AALOAD());//Got value of the variable as a String;
                    instructionList.append(new INVOKESTATIC(constantPoolGen.addMethodref("java.lang.Double", "parseDouble", "(Ljava/lang/String;)D")));//Now we got the value as double
                }
            }else if(s instanceof UnaryNonTerminal){
                UnaryNonTerminal nonTerminal=(UnaryNonTerminal) s;
                if(nonTerminal.operator.equals("-")){//This is supposed to be the only unary operator avaiable
                    instructionList.append(new DNEG());
                }
            }else{
                BinaryNonTerminal nonTerminal=(BinaryNonTerminal) s;
                if(nonTerminal.operator.equals("+")){
                    instructionList.append(new DADD());
                }else if(nonTerminal.operator.equals("-")){
                    instructionList.append(new DSUB());
                }else if(nonTerminal.operator.equals("*")){
                    instructionList.append(new DMUL());
                }else if(nonTerminal.operator.equals("/")){
                    instructionList.append(new DDIV());
                }
            }
        }
        //Invoke println. we already have the object ref in the stack
        instructionList.append(new INVOKEVIRTUAL(constantPoolGen.addMethodref("java.io.PrintStream", "println", "(D)V")));

        //Return from the method
        instructionList.append(new RETURN());


        MethodGen methodGen=new MethodGen(Constants.ACC_PUBLIC|Constants.ACC_STATIC, Type.VOID, new Type[]{new ArrayType(Type.STRING, 1)}, new String[]{"args"}, "main", className, instructionList, constantPoolGen);

        methodGen.setMaxLocals();//Calculate the maximum number of local variables.
        methodGen.setMaxStack();//Very important: must calculate the maximum size of the stack.

        classGen.addMethod(methodGen.getMethod()); //Add the method to the class

        //Print a few things.
        System.out.println("********Constant Pool**********");
        System.out.println(constantPoolGen.getFinalConstantPool());
        System.out.println("********Method**********");
        System.out.println(methodGen);
        System.out.println("********Instruction List**********");
        System.out.println(instructionList);


        //Now generate the class
        JavaClass javaClass=classGen.getJavaClass();
        try {
            //Write the class byte code into a file
            javaClass.dump(className.replace('.', '/')+".class");

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        //That's it.
    }


}

To run this program, run java ExpressionCompiler "2+3.65*8-+adc_v* (-(+1))*(adf+$v3)+2.35 ". It will create a class com.geekyarticles.bcel.ExpressionTest. Now you can run java com.geekyarticles.bcel.ExpressionTest 1 2 3 to see the result.

Thursday, August 18, 2011

Manipulating Java Class Files with BCEL - Part Two: Expressions

No comments:

Post a Comment