B A S I L


Basil is a backtracking LR(1) parser generator.  Basil reads a grammar
file and generates data that can be used to initialize an LR(1) parser
with backtracking abilities.  Basil also generates the node classes
that are used by the parser to construct the abstract syntax tree.

Semantic actions are integrated into the parser by using navigators.
Navigators are objects that can traverse the abstract syntax tree,
either bottom-up as the tree is being created, or top-down.  A
separate utility is provided to help write navigators.

A parser built with Basil is event-driven.  Tokens can be passed to
the parser as they are formed.  This model makes it very easy to
string several parsers together.

CONFIGURATION

Basil accepts only the name of the grammar file on the command line.
Options are set in a configuration file.  The name of the
configuration file is controlled by the BASIL_CONFIG_FILE environment
variable.  The default value is "./basil.cfg".

Options are set in the configuration file using keyword-value
notation, one setting per line.  Whitespace must be used to separate
the keyword from the value.  The hash character ('#') starts a
comment.  All remaining characters on the line are ignored.

The following table lists the keywords and their default values.

  ----------------------------------------------------------------------
  Keyword                                Default Value           

  base-node-type                         basl_Node
  base-navigator-type                    basl_Navigator
  base-token-type                        basl_Token
  header-ext                             h
  source-ext                             cpp
  use-underscores-in-names               no
  get-location                           no
  location-type                          basl_Loc
  get-node-ptr                           no
  inline-node-ctor                       no
  inline-node-methods                    no
  node-include-filename                  cc_nodeheaders.h
  parse-tables-filename                  cc_tables.inc
  token-enum-filename                    cc_tokens.inc
  states-filename                        cc_states.txt
  lr-node-type                           basl_LRNode
  lr-node-refptr-type                    basl_LRNodeRefPtr
  lr-node-refptr-header                  basl_LRNodeRefPtr.h
  lr-node-refptr-vector-type             basl_LRNodeRefPtrVector
  lr-node-refptr-vector-const-iter-type  basl_LRNodeRefPtrVectorConstIter
  lr-node-refptr-vector-header           basl_LRNodeRefPtrVector.h
  ----------------------------------------------------------------------

GRAMMAR FILE

The syntax of the grammar file can best be described with its own
grammar.  Below is the complete grammar.  The rule with the left-hand
side "start" is the start rule.  Upper case symbols are tokens.
Comments begin with the hash ('#') character.  This is the same syntax
accepted by Basil.

  # start rule
  start -> file

  # file
  file -> decl-seq

  # declaration sequence
  decl-seq -> decl
  decl-seq -> decl-seq decl

  # declaration
  decl -> rule-decl
  decl -> token-decl

  # rule declaration
  rule-decl -> rule-name-opt rule-symbol ARROW rule-symbol-seq-opt
               follow-seq-opt

  # rule symbol seqence opt
  rule-symbol-seq-opt -> rule-symbol-seq
  rule-symbol-seq-opt -> 

  # rule symbol sequence
  rule-symbol-seq -> rule-symbol
  rule-symbol-seq -> rule-symbol-seq rule-symbol

  # rule symbol
  rule-symbol -> symbol attrib-seq-opt

  # symbol
  symbol -> IDENT

  # attribute sequence opt
  attrib-seq-opt -> attrib-seq
  attrib-seq-opt ->

  # attribute sequence
  attrib-seq -> attrib 
  attrib-seq -> attrib-seq attrib

  # attributes
  attrib -> GT
  attrib -> PLUS
  attrib -> GT_BANG
  attrib -> PLUS_BANG
  attrib -> STAR
  attrib -> LT
  attrib -> NUMBER

  # rule name opt
  rule-name-opt -> rule-name
  rule-name-opt ->

  # rule name
  rule-name -> LBRACK IDENT base-name-opt attrib-name-seq-opt RBRACK
  rule-name -> LBRACK LPAREN IDENT RPAREN RBRACK
  rule-name -> LBRACK RBRACK

  # base name opt
  base-name-opt -> base-name
  base-name-opt -> 

  # base name
  base-name -> COLON IDENT

  # attrib name seq
  attrib-name-seq-opt -> attrib-name-seq 
  attrib-name-seq-opt ->

  # attrib name seq
  attrib-name-seq -> attrib-name
  attrib-name-seq -> attrib-name-seq attrib-name

  # attrib name
  attrib-name -> IDENT

  # follow sequence opt
  follow-seq-opt -> follow-seq
  follow-seq-opt ->

  # follow seq
  follow-seq -> LBRACE rule-symbol-seq RBRACE

  # token declaration
  token-decl -> LBRACK IDENT RBRACK token-name-seq

  # token name seq
  token-name-seq -> token-name
  token-name-seq -> token-name-seq token-name

  # token name
  token-name -> IDENT

The tokens used in the grammar are listed in the table below.

  ------------------------------------------------------------
  ARROW        ->
  IDENT        identifier (may contain '-')
  GT           >
  GT_BANG      >!
  PLUS         +
  PLUS_BANG    +!
  LT           <          
  STAR         *
  NUMBER       unsigned integer
  LBRACK       [
  RBRACK       ]
  LPAREN       (
  RPAREN       )
  LBRACE       {
  RBRACE       }
  COLON        :
  ------------------------------------------------------------

The grammar file must contain at least one start rule.  As in the
grammar above, the rule with the left-hand side "start" indicates a
start rule.  The newline character is used to terminate comments.  In
all other contexts the new newline character is treated as whitespace.
Tokens must be in upper case.

RULE NAME

A rule can optionally have a name.  There are three forms of a rule
name:

  rule-name -> LBRACK IDENT base-name-opt attrib-name-seq-opt RBRACK
  rule-name -> LBRACK LPAREN IDENT RPAREN RBRACK
  rule-name -> LBRACK RBRACK

The first form defines a node class. The identifier names the class.
The parser will construct a new instance of the class when the rule is
reduced.  The instance is used to represent the rule in the abstract
syntax tree.

By default, the class is derived from the base node class.  The value
associated with the configuration keyword base-node-type names the
base node class.  All nodes must derive, directly or indirectly, from
this class, including tokens.  However, instead of deriving directly
from the base node class, you can specify another class by giving a
base-name.  Also, you can specify additional classes to derive from by
specifying an attrib-name-seq.  Typically, these classes contain
attributes.

The second form of a rule name can be used to associate a number of
rules with the same node.  This can prevent writing duplicate code.
The node must be defined once, as above.  The other rules can refer to
the node by using this form of a rule name.

The last form of a rule name only has effect on rules of the form "a
-> b".  The rule name forces a reduction on the rule, just as if the
rule had an associated node.  This is needed because Basil always
encodes the follow symbol as part of a reduction, so reductions of
this form are normally bypassed.  When a reduction is forced, the
child node is not lost, it remains on the stack.

If a rule has more than one symbol on its right-hand side and it does
not have an associated node, the child nodes will be lost when the
rule is reduced.  This is how the tree can be pruned.  Nodes are
reference counted so memory leaks will not occur.

EXPLICIT FOLLOW SEQUENCE

You can give a rule an explicit follow sequence using the syntax:

  follow-seq -> LBRACE rule-symbol-seq RBRACE
  
Any rule can be given a follow sequence but it is normally only used
on start rules to specify a different end token (or tokens).  By
default, EOT is the end token.  Both tokens and nonterminals can be
given.  If a nonterminal is given, all the tokens in its first set are
included.  If any nonterminal symbol in the sequence can go to null,
the computed follow set is also included.  Symbols in this set can
have attributes.

SYMBOL ATTRIBUTES

Basil has a number of symbol attributes, such as GT, PLUS, GT_BANG,
and PLUS_BANG, that are used to resolve conflicts.  All conflicts must
be resolved.  That is, you must indicate an order for the parser to
try the actions on a conflict.  This is done by specifying priorities.

On a conflict, the action that leads to parsing the token with the
highest priority is tried first.  If the action results later in a
syntax error (possibly due to a semantic check), the parser will
bactrack and try the action that leads to parsing the token with the
second highest priority, and so on.  For example, consider the
shift/reduce conflict in the rules below.  As described in more detail
below, the GT attribute gives X in the first rule a priority of one.
Because it has a higher priority than the X in the second rule, the
parser will reduce first.

  1) a -> b X >
  2) b -> X
  3) b ->


If one of the tokens involved in a conflict has exclusive priority,
only the action that leads to parsing it will be tried.  In other
words, the parser will not guess.  If more than one token involved in
a conflict has exclusive priority, the one with the highest priority
is picked.

SHIFT PRIORITY

The attribute GT gives a symbol shift priority.  One or more GT
attributes can be given to a symbol.  Basically, it means you want to
shift the symbol.  If given to a token, the token is simply given
priority.  However, if given to a nonterminal, all tokens derived from
it are given priority.

To see how this works, consider the following partially parsed rule in
some state.  The symbol Y here can either be a token or a nonterminal.

  X -> ...  .Y  ...

Let "..." mean a sequence of zero or more symbols.  The dot in this
rule is just before the symbol Y.  Suppose we define the following
variables.

  o Let A be the shift priority of the rule.  A is computed as
    described here.

  o Let B be the shift priority of X.

  o Let C be the shift priority of Y.

  o Let T = A + B + C.

If Y is a token, then Y in this context is given a priority of T.  If
Y is a nonterminal, then the shift priority of every nonkernel rule in
the state that has the left-hand side Y is set to T, if T is greater.
This process continues over all rules in the state until the shift
priorities are distributed.  The shift priority of the rule remains
with the rule as it is parsed (as the dot moves to the right).

So, for example, the token C below has a total priority of three
in the initial state.

  0) start -> a
  1) a -> b >
  2) b -> c >
  3) c > -> C

If the grammar is circular, the shift priorities are correctly
distributed.

EXCLUSIVE SHIFT PRIORITY

The attribute GT_BANG gives a symbol exclusive shift priority.
Exclusive shift priority is a flag.  Otherwise, it behaves like the GT
attribute except it gives exclusive priority to a token.

REDUCE PRIORITY

The attribute PLUS gives a symbol reduce priority.  One or more PLUS
attributes can be given to a symbol.  It has effect only on
nonterminals.  Basically, it means you want to reduce the symbol.  It
gives priority to all tokens following it.

To see how this works, consider the the following partially parsed
rule in some state.  The symbol Y here is a nonterminal.

  X -> ...  .Y  Z1  Z2  ...  ZN

Again, the dot in this rule is just before the symbol Y.  N symbols
follow Y: Z1 through ZN.  Suppose we define the following variables.

  o Let A be the shift priority of the rule.

  o Let B be the shift priority of X.

  o Let C be the reduce priority of Y.

  o Let T = A + B + C.

  o Let D be the reduce priority of X.
  
  o Let S = C + D.

  o Let FIRST(Z) be the set of tokens that begin the strings derived
    from symbol Z.  Each token in this set has an associated priority.

  o Let FOLLOW(X) be the set of tokens that can appear to the right of
    the rule in this context.  Each token in this set has an
    associated priority and a nonterminal symbol called the shortcut.
    FOLLOW(X) is computed as described here.

  o Let FOLLOW(Y) be the set of tokens that can appear to the right of
    Y in this context.  Each token in this set has an associated
    priority and a nonterminal symbol called the shortcut.  FOLLOW(Y)
    is computed as described here.
 
So then, FOLLOW(Y) is computed as follows:

  For i = 1 to N:
    For each token in FIRST(Zi):
      Let U be the associated priority of the token.
      Let V = U + T.
      Add token with reduce priority V and shortcut Y to FOLLOW(Y).
    Continue only if Zi can go to null.

  If N is 0, or if all Z1 through ZN can go to null:
    If the rule is "X -> Y" and it does not have a rule name:
       Add FOLLOW(X) to FOLLOW(Y), adding S to each priority in
       FOLLOW(X).
    Otherwise:
       Add FOLLOW(X) to FOLLOW(Y), adding S to each priority in
       FOLLOW(X), and setting Y as the shortcut.

FOLLOW(Y) is then merged with the FOLLOW set of every nonkernel rule
that has the left-hand side Y.  This procedure continues over all
rules in the state until nothing can be added to any FOLLOW set.  The
FOLLOW set remains with the rule as it is parsed (as the dot moves to
the right).

The reduce actions in a state are formed from the FOLLOW sets of the
rules with the dot after the last symbol on the right-hand side.  For
each token in such a FOLLOW set, a reduce action on the token is
formed.  The shortcut of the token becomes the follow symbol of the
reduce action.  The priority of the reduce action is the priority of
the token plus the reduce priority of the left-hand side of the rule.

So, here is an example:

  0) start -> a + d >>
  1) a -> X >>
  2) a ->
  3) d -> X

In the initial state, X will have a shift priority of two and
a reduce priority of three, so the parser will reduce first.

EXCLUSIVE REDUCE PRIORITY

The attribute PLUS_BANG gives a symbol exclusive reduce priority.
Exclusive reduce priority is a flag.  Otherwise, it behaves like the
PLUS attribute except it gives exclusive priority to a token.

ACCEPT

The attribute STAR is used to indicate an accepting symbol.  It has
effect only on nonterminals.  When the parser follows an accepting
symbol all pending parses from the start of the symbol are canceled.

STICKY FOLLOW

The attribute LT is used to keep a follow set on a reduction.  It has
effect only on nonterminal symbols.  Like Yacc, the most common
reduction in a state is performed by default.  This technique greatly
reduces the size of the parse tables.  The LT attribute overrides this
behavior.  When the attribute LT is used on a symbol, the symbol will
be followed only if a valid token is next, never by default.

To see how this works, take a look how rule-decl is actually written.

  # rule declaration
  rule-decl <* -> rule-name-opt rule-symbol ARROW rule-symbol-seq-opt >
                  follow-seq-opt

rule-symbol-seq-opt is given shift priority so that the parser first
tries any rule-symbols after the ARROW as the right-hand side of the
rule.  This is necessary because here rule-decl is an accepting
symbol, so when followed, or when the rule is reduced, all pending
parses from the start of the symbol will be canceled.  rule-decl is
given the LT attribute as well so that the rule will be reduced only
when a valid token is next.  This will prevent the parser from
including in the right-hand rule-symbol set the left-hand rule-symbol
of the following rule.

LEXICAL STATE

The final attribute that can be given to a symbol is a number.  This
number represents the lexical state of the symbol.  It is used to
provide context dependent lexing.  The lexer can query the parser for
the current lexical state before it lexes the next token.  Every
parsing state has a corresponding lexical state.  By default, the
lexical state is 0.  Below is an example.

  1) start -> a b
  2) a -> X
  3) b -> Y 1
  4) b -> Z
   
Here, after the parser shifts the X, the lexical state will be 1.

Basil will produce an error if there is a lexical state conflict.  For
example, lexical states 1 and 2 conflict in the grammar below.

  1) start -> a b
  2) a -> X
  3) b -> Y 1
  4) b -> Z 2

A lexical state can be given to nonterminals and tokens in an explicit
follow set.

TOKEN DECLARATION

The configuration keyword base-token-type must name the base type of
your tokens.  By default, Basil will assume your tokens are of this
type.  However, a token declaration can be used to specify the exact
type of one or more tokens.  The tokens types are needed in order to
generate the node classes.  Below is an example token declaration.

  [XToken] X Y Z

Here, tokens X, Y and Z are given the type XToken.

A token declaration can also be used to introduce additional tokens.

NODES

Basil generates a node class for every node that is defined in the
grammar file.  One header file and one source file is generated per
class.  Consider the following grammar that accepts qualified type
names.

  # start rule
  start -> type-name

  # type name
  [basl_TypeNameNode] type-name -> nested-name-spec-opt IDENT

  # nested name specifier opt
  nested-name-spec-opt ->
  nested-name-spec-opt -> nested-name-spec

  # nested name specifier
  [basl_NestedNameSpec1Node : basl_NestedNameSpecNode]
      nested-name-spec -> DCOLON
   
  [basl_NestedNameSpec2Node : basl_NestedNameSpecNode]
      nested-name-spec -> nested-name-spec-opt IDENT DCOLON

Basil will generate four node classes from this grammar.  The files
it will generate are:

  basl_Node.h
  basl_Node.cpp
  basl_TypeNameNode.h
  basl_TypeNameNode.cpp
  basl_NestedNameSpec1Node.h
  basl_NestedNameSpec1Node.cpp
  basl_NestedNameSpec2Node.h
  basl_NestedNameSpec2Node.cpp

Notice that the base node class is also generated.  Here, it is named
basl_Node, the default name.  The header and source file extensions
can be changed in the configuration file.  The class
basl_NestedNameSpecNode is not generated because Basil simply does not
know anything about it.  Most likely, basl_NestedNameSpecNode would
contain one or more attributes or derive from one or more attribute
classes.

The generated file basl_TypeNameNode.h is shown below.

  ------------------------------------------------------------
  // basl_TypeNameNode.h
  //
  // type-name -> nested-name-spec-opt IDENT
  //
  
  #ifndef basl_TypeNameNode_h
  #define basl_TypeNameNode_h
  
  #include "basl_LRNodeRefPtr.h"
  #include "basl_LRNodeRefPtrVector.h"
  #include "basl_NestedNameSpecNode.h"
  #include "basl_Node.h"
  #include "basl_Token.h"
  
  //
  // basl_TypeNameNode
  //
  
  class basl_TypeNameNode : public basl_Node
  {
    // methods
  public:
    // constructor
    basl_TypeNameNode (basl_LRNodeRefPtrVectorConstIter children);
  
    // destructor
    ~ basl_TypeNameNode ();
  
    // true if nested-name-spec-opt is not null
    bool hasNestedNameSpecOpt () const;
  
    // get nested-name-spec-opt
    basl_NestedNameSpecNode & getNestedNameSpecOpt () const;
  
    // get IDENT
    basl_Token & getIDENT () const;
  
    // children
  private:
    // nested-name-spec-opt
    basl_LRNodeRefPtr m_nestedNameSpecOpt;
  
    // IDENT
    basl_LRNodeRefPtr m_IDENT;
  
    // methods
  public:
    // navigate
    void navigate (basl_Navigator const & navigator);
  };
  
  #endif
  ------------------------------------------------------------

This file includes the base node header file, the child node header
files, and two additional files containing typedefs.  The child node
header files are included only if Basil can determine the actual node
types of its children.  Here, the first child node,
nested-name-spec-opt, if not null, could be one of two types.
However, both types derive from a common base.  The two additional
files are:

  o  basl_LRNodeRefPtr.h.  Must contain the typedef:

       basl_LRNodeRefPtr, a reference counted pointer to an
       basl_LRNode.  basl_LRNode is the real base class of all nodes.

  o  basl_LRNodeRefPtrVector.h.  Must contain the two typedefs:

       basl_LRNodeRefPtrVector, a vector of basl_LRNodeRefPtr.

       basl_LRNodeRefPtrVectorConstIter, the const iterator of the
       vector.
       
The filenames and typedef names can be changed in the configuration
file.

The rest of the header file is self explanatory.  The last function,
navigate, is part of the navigator mechanism.  Navigators are
described next.

NAVIGATORS

A navigator traverses the abstract syntax tree using the double
dispatch mechanism.  Navigators all derive from a base navigator
class.  The base navigator class is named basl_Navigator by default.
Basil generates this class.  The name can be changed in the
configuration file.

In the base node class, a virtual function called navigate initiates
the double dispatch mechanism.  navigate takes one argument, the
navigator.  Every generated leaf node class overrides navigate.  In
the body of navigate, each class calls an overloaded virtual function
of the navigator passing itself (*this) as the argument.  Tokens
must not override navigate.

Since writing a navigator by hand would be onerous, a separate utility
called Navgen can be used to generate the navigator class from a
specification language.  Navgen generates a static interface function
of the navigator called submit that creates an instance of the
navigator then passes it to the navigate function of a specified node.
submit can take additional arguments and return a value.  Navgen is
described in a separate document.

A navigator can optionally be slippery.  Top-down navigators tend to
be slippery, while bottom-up navigators do not.  By default, if a
navigator is slippery, it will slip down certain kinds of nodes in the
abstract syntax tree.  Nodes like the following:

  a -> b
  a -> a SOME_TOKEN b
  a -> a b

Typically, on nodes like these, the only action would be to send the
navigator down to the child (nonterminal) nodes, from left to right.
A slippery navigator does this by default.  The default action is
implemented in the base navigator class.  The slippery flag is passed
to the constructor of the base navigator.

PARSER DATA FILES

Basil generates the following data files.  The filenames can be
changed in the configuration file.

  o  cc_nodeheaders.h

     This file contains include preprocessor directives for all the
     generated node header files.  It is included in cc_tables.inc and
     in the base navigator source file.

  o  cc_tables.inc

     This file contains the parser tables.  This file must be included
     in your source file that creates the one instance of your
     basl_LRParserData object (see below).

  o  cc_tokens.inc

     This file contains an enum enumerating all token kinds.
     Typically, this file is included inside your base token class, or
     inside a token kind struct.

  o  cc_states.txt

     This file is a human-readable listing of the backtracking LR(1)
     parse states.

CONSTRUCTING A PARSER

Basil generates only the data for your parser.  You must build the
actual parser.  You can create either an LR(1) parser or an LR(k)
parser, depending upon your grammar.  If your grammar is LR(1), an
LR(1) parser will be sufficient.  Whatever kind of parser you write,
you will need to initialize your parser with a pointer to a
basl_LRParserData object.  It is best to write a function that returns
a pointer to this object, since you might have several parsers working
from the same data.  Your parsers then can simply call this function
in their initialization.  Typically, you would write two files:
"basl_GetParserData.h" and "basl_GetParserData.cpp".  Your header file
might look like this:

  ----------------------------------------------------------------------
  // basl_GetParserData.h
  //

  #ifndef basl_GetParserData_h
  #define basl_GetParserData_h

  // return pointer to parser data
  extern class basl_LRParserData * getParserData ();

  // parser start states
  #define MAIN_START_STATE 0
  // other start states
  // ...

  #endif
  ----------------------------------------------------------------------

The header file is also a good place to list your start states.  State
0 corresponds to your first start rule, state 1 corresponds to your
second start rule, and so on.

Your source file might look like this:
  
  ----------------------------------------------------------------------
  // GetParserData.cpp
  //

  // includes
  #include "basl_GetParserData.h"
  // header file from Basil installation
  #include "basl_LRParserData.h"

  // basil generated parser tables (must be included after
  // basl_LRParserData.h)
  #include "cc_tables.inc"

  //
  // get parser data
  //
  basl_LRParserData * getParserData ()
  {
    // initialize the one and only basl_LRParserData object
    static LRParserData data (state_table, goto_table, lex_state_table,
        rule_size_table, rule_table, token_table, get_node);

    // return pointer to object
    return & data;
  }
  ----------------------------------------------------------------------
  
Notice that data is a static object of getParserData.  This ensures
that there is only one object, and calling the function is the only
way to get a pointer it.  The objects used to initialize data are
defined in cc_tables.inc.  The objects are all static const data.

Since an LR(1) parser is easer to write, it is described first.  Both
kinds of parsers will be written as classes.  As classes, you can have
any number of parsers active at once.

LR(1) PARSER

An LR(1) parser is built from a basl_CoreLR1Parser.  Its interface is
described here.

  o basl_CoreLR1Parser::basl_CoreLR1Parser (basl_LRParserData * data,
        int start_state)

    Constructor.  A basl_CoreLR1Parser must be initialized with a
    pointer to the basl_LRParserData object and a start state.

  o int basl_CoreLR1Parser::lookupNextCmd (int kind)

    This function returns the next parse command given the kind of the
    next token.  The command returned will be one of the commands
    defined in the file basl_LRCommand.h.  The commands are:

       LR_REDUCE
       LR_SHIFT
       LR_FAIL
       LR_DONE

  o basl_LRNodeRefPtr basl_CoreLR1Parser::reduce ()

    This function instructs the core parser to reduce.  It returns a
    reference counted pointer to the node created.  If no node is
    created, a null pointer will be returned.  This function should be
    called only if lookupNextCmd returns LR_REDUCE.  After this
    function is called, lookupNextCmd should be called again.
    Eventually, lookupNextCmd will return either LR_SHIFT, LR_DONE or
    LR_FAIL.

  o void basl_CoreLR1Parser::shift (int kind,
        basl_LRNodeRefPtr const & token)

    This function instructs the core parser to shift the specified
    token.  The token kind must also be specified.  This function
    should be called only if lookupNextCmd returns LR_SHIFT.

  o basl_LRNodeRefPtr basl_CoreLR1Parser::done ()

    This function returns a reference counted pointer to the root node
    of the abstract syntax tree.  If the entire tree has been pruned,
    a null pointer will be returned.  This function should be called
    only if lookupNextCmd returns LR_DONE.

  o int basl_CoreLR1Parser::getLexState () const

    This function returns the current lexical state.  The result will
    always be an integer greater than or equal to zero.  Zero is the
    default lexical state.

  o void basl_CoreLR1Parser::restart (int start_state)

    This function restarts the parser at the specified start state.

Suppose basl_LR1Parser is the name of your parser.  Your header file
might look something like this:

  ----------------------------------------------------------------------
  // basl_LR1Parser.h
  //

  #ifndef basl_LR1Parser_h
  #define basl_LR1Parser_h

  // includes
  #include "basl_CoreLR1Parser.h"

  //
  // basl_LR1Parser
  //

  class basl_LR1Parser
  {
    // methods
  public:
    // constructor
    basl_LR1Parser ();

    // parse token
    void parse (basl_LRNodeRefPtr const & token);

    // get lex state
    int getLexState () const;

    // data
  private:
    // core LR(1) parser
    basl_CoreLR1Parser m_core_parser;
  };        

  #endif
  ----------------------------------------------------------------------

The function parse parses the token passed to it.  It then returns.
basl_LRNodeRefPtr is a reference counted pointer to a basl_LRNode.
The core parser makes no assumptions about your nodes or tokens,
except of course that they are basl_LRNodes.  The function getLexState
returns the current lexical state.  You only need this
function if your lexer is context dependent.

Your source file might look something like this:

  ----------------------------------------------------------------------
  // basl_LR1Parser.cpp
  //

  // header file
  #include "basl_LR1Parser.h"
  // this file declares the getParserData function and start states
  #include "basl_GetParserData.h"
  // this file declares the LR parse commands
  #include "basl_LRCommand.h"

  //
  // constructor
  //
  basl_LR1Parser::basl_LR1Parser ()
    : m_core_parser (getParserData (), MAIN_START_STATE)
  {
  }

  //
  // parse
  //
  void basl_LR1Parser::parse (basl_LRNodeRefPtr const & token)
  {
    // token kind, assume token_cast downcasts to token, and
    // member function getKind returns kind of token
    int kind = token_cast (token).getKind ();
    // reduction loop
    int code;
    while ((code = m_core_parser.lookupNextCmd (kind)) == LR_REDUCE)
    {
      // get node on reduction
      bals_LRNodeRefPtr node = m_core_parser.reduce ();
      // perform semantic actions on node using navigator, navigator returns
      // false if semantic actions should cause syntax error
      if (node.isSet () && ! BottomUpNavigator::submit (static_cast <basl_Node &> (* node)))
      {
        // fail parser
        code = LR_FAIL;
        break;
      }
    }
    // check if failed
    if (code == LR_FAIL)
    {
      // syntax error, signal error and throw exception or return after
      // setting error code
      // ...
    }
    // check shift
    if (code == LR_SHIFT)
    {
      // shift token
      m_core_parser.shift (kind, token);
    }
    // otherwise done
    else
    {
      // get root node of abstract syntax tree
      basl_LRNodeRefPtr node = m_core_parser.done ();
      // perform semantic actions top-down on tree using navigator
      if (node.isSet ())
      {
        // use top down navigator
        TopDownNavigator::submit (static_cast <basl_Node &> (* node));
      }
    } 
  }

  //
  // get lex state
  //
  int basl_LR1Parser::getLexState () const
  {
    // return lex state from core parser
    return m_core_parser.getLexState ();
  }
  ----------------------------------------------------------------------

As you can see, the core parser does the real work.  Your parser is
just a wrapper around it.  Navigators are used to perform semantic
actions.  In this example, semantic actions are performed bottom-up as
the abstract syntax tree is being created, as well as top-down from
the root of the tree.  Only one strategy is usually needed.  Note also
that your parser must signal syntax errors.  The core parser does not
signal any errors.

LR(K) PARSER

An LR(k) parser is built from a basl_CoreLRkParser.  The core parser
does not perform the backtracking, instead your parser must do it.
This makes the parser slightly more compilicated.  The interface to
basl_CoreLRkParser is described here.

  o basl_CoreLRkParser::basl_CoreLRkParser (LRParserData * data,
        int start_state)

    Constructor.  A basl_CoreLRkParser must be initialized with a
    pointer to the basl_LRParserData object and a start state.

  o void basl_CoreLRkParser::lookupCmdList (int kind)

    This function looks up the list of commands that should be
    performed for the next token.  The token kind must be passed to
    the function.  Afterwards, the function getNextCmd should be
    called to get the next command.

  o int basl_CoreLRkParser::getNextCmd (bool & guess)

    This function returns the next parse command.  The command will be
    one of the following:

      LR_REDUCE
      LR_ACCEPT
      LR_DONE         
      LR_SHIFT
      LR_FAIL

    These commands are defined in the file basl_LRCommand.h.  If there
    are remaining commands, guess will be set to true.

  o basl_LRNodeRefPtr basl_CoreLRkParser::reduce (int & num_cancel)

    This function instructs the core parser to reduce.  It returns a
    reference counted pointer to the node created.  If no node is
    created, a null pointer will be returned.  This function should
    be called only if getNextCmd returns LR_REDUCE or LR_ACCEPT.

    If the command is LR_ACCEPT, num_cancel will be the number
    of pending trial parsers to cancel.

  o void basl_CoreLRkParser::shift (int kind, basl_LRNodeRefPtr const & token)

    This function instructs the core parser to shift the specified
    token.  The token kind must also be specified.  This function
    should be called only if getNextCmd returns LR_SHIFT.
     
  o basl_LRNodeRefPtr basl_CoreLRkParser::done ()

    This function returns a reference counted pointer to the root node
    of the abstract syntax tree.  If the entire tree has been pruned,
    a null pointer will be returned.  This function should be called
    only if getNextCmd returns LR_DONE.

  o int basl_CoreLRkParser::getLexState () const

    This function returns the current lexical state.  The result will
    always be an integer greater than or equal to zero.  Zero is the
    default lexical state.

    In a backtracking parser, the current lexical state is the lexical
    state of the current trial parser.

  o void basl_CoreLRkParser::restart (int start_state)

    This function restarts the parser at the specified start state.

Suppose basl_LRkParser is the name of your parser.  Your header file
might look something like this:

  ----------------------------------------------------------------------
  // LRkParser.h
  //

  #ifndef basl_LRkParser_h
  #define basl_LRkParser_h

  // includes
  #include "basl_CoreLRkParser.h"
  // this file defines a platform independent Deque macro (deque or std::deque)
  #include "basl_Deque.h"
  // this file declares the typedef basl_LRNodeRefPtrDeque (a deque of ref counted node ptrs)
  #include "basl_LRNodeRefPtrDeque.h"

  //
  // basl_LRkParser
  //
  class basl_LRkParser
  {
    // methods
  public:
    // constructor
    basl_LRkParser ();

    // parse token
    void parse (basl_LRNodeRefPtr const & token);

    // get lex state
    int getLexState () const;

    // private types
  private:
    // trial parser
    struct TrialParser
    {
      // core LR(k) parser
      basl_CoreLRkParser core_parser;

      // current token
      basl_LRNodeRefPtr token;

      // index in queue of the next token 
      int index;

      // constructor
      TrialParser ();
    };
    // trial parser deque
    typedef basl_Deque <TrialParser> TrialParserDeque;

    // data
  private:
    // token queue
    basl_LRNodeRefPtrDeque m_token_queue;

    // current trial parser
    TrialParser m_parser;

    // pending trial parsers
    TrialParserDeque m_pending_parser_set;
  };        

  #endif
  ----------------------------------------------------------------------

As you can see, an LR(k) must manage a set of pending trial parsers,
as well as the current trial parser.  Each trial parser can be at a
different location in the source, which is why a token queue is
needed.  basl_CoreLRkParser is a member of TrialParser, as it must be
copied as part of a trial parser.  TrialParser contains other data
that is specific to the trial parse, like the current token and the
index in the token queue of the next token.  Any semantic data that is
associated with a trial parse should also be included in this struct.
The semantic data can be passed to your navigators.  For example, in a
C++ parser, you would most likely include a scope stack (because in
trial mode the scope stack needs to be modified).

The source file of your LR(k) parser might look like this:
 
  ----------------------------------------------------------------------
  // basl_LRkParser.cpp
  //

  // includes
  #include "basl_LRkParser.h"
  // this file declares the getParserData function and defines the start states
  #include "basl_GetParserData.h"
  // this file defines the LR parse commands
  #include "basl_LRCommand.h"

  //
  // trial parser constructor
  //
  basl_LRkParser::TrialParser::TrialParser ()
    : core_parser (getParserData (), MAIN_START_STATE), index (0)
  {
  }

  //
  // LR(k) parser constructor
  //
  basl_LRkParser::basl_LRkParser ()
  {
  }

  //
  // parse token
  //
  void parse (basl_LRNodeRefPtr const & token)
  {
    // push token on queue
    m_token_queue.push_back (token);
    // loop until the current trial parser needs the next token
    while (m_parser.index < m_token_queue.size ())
    {
      // get next token 
      m_parser.token = m_token_queue [m_parser.index ++];
      // command code
      int code;
      // reduce loop
      do
      {
        // lookup command list for token, assume token_cast downcasts to token and
        // getKind returns token kind
        m_parser.core_parser.lookupCmdList (token_cast (m_parser.token).getKind ());
        // backtracking loop
        for (;;)
        {
          // get next command
          bool guess;
          code = m_parser.core_parser.getNextCmd (guess);
          if (code != LR_FAIL)
          {
            // if guess is true, copy to pending parser set
            if (guess)
            {
              // add to front of queue
              m_pending_parser_set.push_front (m_parser);
            }
            // check if reduction
            if (code == LR_REDUCE || code == LR_ACCEPT)
            {
              // get node and number of trial parsers to cancel
              int num_cancel;
              basl_LRNodeRefPtr node = m_parser.core_parser.reduce (num_cancel);
              // perform semantic actions on node, assume navigator returns
              // false if semantic actions should cause parser to fail.  don't
              // fail if no node created
              if (! node.isSet () || BottomUpNavigator::submit (static_cast <basl_Node &> (* node)))
              {
                // cancel pending trial parsers if accept
                if (code == LR_ACCEPT)
                {
                  // erase from front of queue
                  m_pending_parser_set.erase (m_pending_parser_set.begin (),
                      m_pending_parser_set.begin () + num_cancel);
                  // if no pending parsers, token queue can be erased upto index
                  if (m_pending_parser_set.empty ())
                  {
                    // erase from front
                    m_token_queue.erase (m_token_queue.begin (),
                        m_token_queue.begin () + m_parser.index);
                    // next token is at the beginning of the queue
                    m_parser.index = 0; 
                  }
                }
                // break out of backtracking loop
                break;
              }
              // semantic actions caused parser to backtrack
            }
            // otherwise break out of backtracking loop
            else
            {
              break;
            }
          }
          // backtrack if possible
          if (m_pending_parser_set.empty ())
          {
            // syntax error, signal error and throw exception or return after
            // setting error code
            // ...
          }
          // copy trial parser from front of queue
          m_parser = m_pending_parser_set.front ();
          m_pending_parser_set.pop_front ();
          // end of backtracking loop
        }
        // end of reduce loop
      }
      while (code == LR_REDUCE || code == LR_ACCEPT);
      // check if shift command
      if (code == LR_SHIFT)
      {
        // shift token
        m_parser.core_parser.shift (token_cast (m_parser.token).getKind (), m_parser.token);
      }
      // otherwise parser is done
      else
      {
        // get root node
        basl_LRNodeRefPtr node = m_parser.core_parser.done ();
        // perform semantic actions on tree
        if (node.isSet ())
        {
          // use top down navigator
          TopDownNavigator::submit (static_cast <basl_Node &> (* node));
        }
      }
    }
  }

  //
  // get lex state
  //
  int basl_LRkParser::getLexState () const
  {
    // return lex state from current trial parser (tokens are only lexed once)
    return m_parser.core_parser::getLexState ();
  }
  ----------------------------------------------------------------------

Well, that's it.  See the examples for more detail.  To see the rules
as they are reduced, define PARSER_DEBUG when you compile the core
parser.