BYACC/JJava extension v 1.15 |
BYACC/J is an extension of the Berkeley v 1.8 YACC-compatible parser generator. Standard YACC takes a YACC source file, and generates one or more C files from it, which if compiled properly, will produce a LALR-grammar parser. This is useful for expression parsing, interactive command parsing, and file reading. Many megabytes of YACC code have been written over the years.
This is the standard YACC tool that is in use every day to produce C/C++ parsers. I have added a "-J" flag which will cause BYACC to generate Java source code, instead. So there finally is a YACC for Java now!
Of course, the original YACC design is about twenty years old now,
and newer and better technologies are currently available. I think
Jacc is great, and so is Java Cup. Both of these
provide more thorough parsing of LALR and LL grammars than the
venerable YACC. Yet the idea of a YACC for Java is, in my opinion,
extremely valuable.
Several benefits are derived from a Java
parser-generator of this sort:
BYACC/J can be executed from existing Makefiles and IDE's.
BYACC/J is coded in C, so the generation of Java code is extremely fast.
The resulting byte code is small -- starting at about 11 kbytes.
Only one or two classfiles are included. If you need only a single type or an Object class, then one class file is generated. If you need a simple generic type, a simple data class is generated for you, making another small file.
No additional runtime libraries are required. The generated source code is the entire parser.
It can parse existing YACC grammars, enabling the 'Javanizing' ;-) of a large installed base of YACC source code (of course, your 'actions' need to be in Java).
Many developers are already very familiar with the workings of YACC.
It is absolutely free; no license, no royalties, free!
First of all, read a YACC book. Since this is actual Berkeley YACC, all of the usual procedures using YACC apply here. Some good descriptions and tutorials of YACC grammar and procedures are available in book form and on the Net. Here is the standard YACC manual page. Since Java's syntax is different from C, certain format differences must be followed. A typical YACC source file consists of the following:
The first part of the file is the DECLARATIONS area, where you define tokens, precedences, etc.
The second part is the ACTIONS area, where the grammar and the user's C actions are parsed.
The third part is the CODE area, where user functions are added.
They are separated by '%%' at the start of a line. Visually:
DECLARATIONS
%%
ACTIONS
%%
CODE
Portions of the file can be set off and ignored by YACC by
surrounding them with %{ and %} . This ability is typically used in
YACC C files to insert definitions and #include statements.
BYACC/J uses this area for the Java package and import
statements. Everything after this will be wrapped in a Java class
called Parser. Thus all functions you write will become
methods belonging to Parser.
All of the user actions you
write will be Java code inserted into a method called yyparse().
This means that only curly braces '{,}' are allowed. No
classes or methods can be defined in the ACTIONS area. Of course,
your code can instantiate classes and call methods.
Here is our example Java implementation of the classic YACC calculator demo:
%{ import java.lang.Math; import java.io.*; import java.util.StringTokenizer; %} /* YACC Declarations */ %token NUM %left '-' '+' %left '*' '/' %left NEG /* negation--unary minus */ %right '^' /* exponentiation */ /* Grammar follows */ %% input: /* empty string */ | input line ; line: '\n' | exp '\n' { System.out.println(" " + $1.dval + " "); } ; exp: NUM { $$ = $1; } | exp '+' exp { $$ = new ParserVal($1.dval + $3.dval); } | exp '-' exp { $$ = new ParserVal($1.dval - $3.dval); } | exp '*' exp { $$ = new ParserVal($1.dval * $3.dval); } | exp '/' exp { $$ = new ParserVal($1.dval / $3.dval); } | '-' exp %prec NEG { $$ = new ParserVal(-$2.dval); } | exp '^' exp { $$ = new ParserVal(Math.pow($1.dval, $3.dval)); } | '(' exp ')' { $$ = $2; } ; %% String ins; StringTokenizer st; void yyerror(String s) { System.out.println("par:"+s); } boolean newline; int yylex() { String s; int tok; Double d; //System.out.print("yylex "); if (!st.hasMoreTokens()) if (!newline) { newline=true; return '\n'; //So we look like classic YACC example } else return 0; s = st.nextToken(); //System.out.println("tok:"+s); try { d = Double.valueOf(s);/*this may fail*/ yylval = new ParserVal(d.doubleValue()); //SEE BELOW tok = NUM; } catch (Exception e) { tok = s.charAt(0);/*if not float, return char*/ } return tok; } void dotest() { BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); System.out.println("BYACC/J Calculator Demo"); System.out.println("Note: Since this example uses the StringTokenizer"); System.out.println("for simplicity, you will need to separate the items"); System.out.println("with spaces, i.e.: '( 3 + 5 ) * 2'"); while (true) { System.out.print("expression:"); try { ins = in.readLine(); } catch (Exception e) { } st = new StringTokenizer(ins); newline=false; yyparse(); } } public static void main(String args[]) { Parser par = new Parser(false); par.dotest(); } |
If this code were in a file called calc.y, you would
yacc-process it with the command:
yacc -J calc.y
This
will generate the file Parser.java, which can then be compiled
by:
javac Parser.java
to create the file Parser.class
which can be run with:
java Parser
The same file, but using the
semantic_type option, as in
yacc -Jsemantic=double tf.y
is available here.
In addition to the normal yacc command line switches, I have
supplied these:
-J Switches from C/C++ to Java output. Not necessary if other -J flags are used.
-Jclass=<classname> Changes the name of the Java class (and .java file) to classname.
-Jpackage=<packagename> Changes the package in which the parser resides from the default <nothing> to packagename.
-Jextends=<extendname> Changes the class the parser extends from the default <nothing> to extendname.
-Jimplements=<implement_name> Changes the interface the parser implements from the default <nothing> to implement_name.
-Jsemantic=<semantic_type> Changes the semantic (value of the rules' variables) type to semantic_type. No extra class will be generated.
-Jnorun Informs Byacc to not generate a run() method. Useful when working with threads.
-Jnoconstruct Informs Byacc to not generate constructors. Useful when extending classes.
-Jstack=<NNN> Changes the stack size from default 500 to NNN.
-Jnodebug Informs Byacc to omit debugging code for further better performance.
-Jfinal This option makes generated class final.
-Jthrows=<excaption_list> Informs Byacc to declare thrown exceptions for yyparse() method.
In order for javac to compile the code properly, the user must supply two methods in the YACC source:
void yyerror(String msg) -- This method is expected by BYACC/J, and is used to provide error messages to be directed to the channels the user desires.
int yylex() -- This method is the one where BYACC/J expects to obtain its input tokens. Wrap any file/string scanning code you have in this function. This method should return <0 if there is an error, and 0 when it encounters the end of input. See the examples to clarify what we mean.
I suggest heartily that the user peruse the file Parser.java to see how YACC's parsing algorithm works. I have done an immense amount of analysis and reverse engineering of the original BYACC sources. The Java code that is generated is heavily commented, and is amenable to debugging, and can provide a nice education in the workings of a YACC parser.
Optionally, the class generated is made an extension of Thread, as a convience, so that parsing may be performed as a background thread, allowing the current execution to continue unimpeded. A run() method and a constructor are also inserted into the code.
However, it may occur that the programmer needs to extend a different class. In this case, the -x<classname> option is provided, which will create an alternate extension. Since it is impossible to predict the needs of the other class, the run() and constructor will be omitted.
Previously, BYACC/J gave the programmer a choice of either double or int semantic (the value of a number or string) values. This worked for very simple parsing, but was extremely limiting. It would have been very difficult to mix value types within a file, thus making things like interpreters and compilers impossible.
The semantic
value is stored in a public class called ParserVal, which is
defined thusly:
public class ParserVal
|
So now a semantic
value can be an int, a double, a String, or an
Object. In your scanner (or something that yylex()
calls), you may use this like:
yylval = new ParserVal(doubleval);
|
And on the Left
Hand Side (the YACC side) you can use the values of the $ and
the $$ just as easily:
$$.ival = $1.ival + $2.ival;
|
A side effect of using this inner class is that the default parser no longer fits into one .class file, however, the resulting ParserVal.class is extremely small.
As time goes on, we will provide some examples and templates to speed you on your way.
Here is an example of what a 3d object file might look like. A corresponding bare-bones YACC parser is implemented in Java! This is also a good example of how to read a file from a URL.
Because someone said YACC couldn't be done in Java. Silly person!
Of course, thanks go to Tom Corbett for BYACC, a fine implementation of YACC. And thanks to his altruistic nature for putting it in the Public Domain. I just added the Java switch. Check the ACKNOWLEDGEMENTS file for more contributors.
Changes
Changes from 0.92 to 1.0a |
This version has one minor change from 0.91. |
New in 1.0a |
We have performed considerable debug, reorganization, and output cleaning. The output code is now Javadoc capable. |
New in 1.1 |
Added the option -Jstack=NNN to allow users to set the semantic stack size. |
New in 1.11 |
-d option is now supported with -J. It creates interface with token constants. |
New in 1.12 |
bug #1406129 [ -Jclass=X12345678 -> Segmentation fault ] fixed |
New in 1.13 |
bug #1506924 - ArrayIndexOutOfBoundException in yyparse() fixed |
New in 1.14 |
Generics can be now used in semantic type. |
New in 1.15 |
bug #1638577 - stack corruption when generating a java parser fixed bug #1630851 - Targets with empty right-hand side problematic in Java mode fixed |
The latest modified/cleaned up/updated Berkeley Yacc source files, GNU
makefile, and Borland C++ 5 Makefile can be obtained bellow, in
addition to several binary builds. These are native console
applications, so they do not require any class libraries to work. Of
course, you will need a Java development environment to process the
generated source files.
And remember, this version of YACC
also parses "standard" C/C++ YACC source files!
Binaries for Solaris/Sparc, Win32, Mac OS X, and Linux/Intel. Source files, GNU and Borland Makefiles in a GZIP TAR file. (WinZip can unpack these!) |
Older releases are available here.
YACC has already been described many times, and in great detail, so I would appreciate that BYACC/J users' questions about YACC and LALR parsers be directed to the many good sources available on the Net and in print. In other words, I will not do your homework for you! ;-) However, I would be happy to help with the Java file generation, as that is the portion that I have implemented.
For more information, please write to Tomas Hurka
Last updated: Nov 28 2008