org.antlr.tool
Class Grammar

java.lang.Object
  extended by org.antlr.tool.Grammar

public class Grammar
extends java.lang.Object

Represents a grammar in memory.


Nested Class Summary
static class Grammar.Decision
           
 class Grammar.LabelElementPair
           
 
Field Summary
protected  java.util.Map actions
          Map a scope to a map of name:action pairs.
protected  boolean allDecisionDFACreated
           
static java.lang.String[] ANTLRLiteralCharValueEscape
          Given a char, we need to be able to show as an ANTLR literal.
static int[] ANTLRLiteralEscapedCharValue
          When converting ANTLR char and string literals, here is the value set of escape chars.
static java.lang.String ARTIFICIAL_TOKENS_RULENAME
           
 java.util.Set<GrammarAST> blocksWithSemPreds
          Track decisions with syn preds specified for reporting.
 java.util.Set<GrammarAST> blocksWithSynPreds
          Track decisions with syn preds specified for reporting.
protected  boolean builtFromString
          We need a way to detect when a lexer grammar is autogenerated from another grammar or we are just sending in a string representing a grammar.
static int CHAR_LABEL
           
protected  IntSet charVocabulary
          TODO: hook this to the charVocabulary option
static int COMBINED
           
protected  int decisionNumber
          Be able to assign a number to every decision in grammar; decisions in 1..n
 java.util.Set decisionsWhoseDFAsUsesSemPreds
          Track decisions that actually use the syn preds in the DFA.
 java.util.Set<DFA> decisionsWhoseDFAsUsesSynPreds
          Track decisions that actually use the syn preds in the DFA.
static java.util.Map defaultOptions
           
 long DFACreationWallClockTimeInMS
          How long in ms did it take to build DFAs for this grammar? If this grammar is a combined grammar, it only records time for the parser grammar component.
static java.util.Set doNotCopyOptionsToLexer
           
protected  boolean externalAnalysisAbort
          An external tool requests that DFA analysis abort prematurely.
protected  java.lang.String fileName
          What file name holds this grammar?
static java.lang.String FRAGMENT_RULE_MODIFIER
           
protected  CodeGenerator generator
          If non-null, this is the code generator we will use to generate recognizers in the target language.
protected  int global_k
          Is there a global fixed lookahead set for this grammar? If 0, nothing specified.
static java.lang.String GRAMMAR_FILE_EXTENSION
           
protected  GrammarAST grammarTree
          An AST that records entire input grammar with all rules.
static java.lang.String[] grammarTypeToFileNameSuffix
           
static java.lang.String[] grammarTypeToString
           
static java.lang.String IGNORE_STRING_IN_GRAMMAR_FILE_NAME
           
protected  Grammar importTokenVocabularyFromGrammar
          For interpreting and testing, you sometimes want to import token definitions from another grammar (instead of reading token defs from a file).
protected  java.util.Vector indexToDecision
          Each subrule/rule is a decision point and we must track them so we can go back later and build DFA predictors for them.
static int INITIAL_DECISION_LIST_SIZE
           
static int INVALID_RULE_INDEX
           
static java.lang.String[] LabelTypeToString
           
protected  java.util.Set leftRecursiveRules
          A list of all rules that are in any left-recursive cycle.
static java.util.Set legalOptions
           
static int LEXER
           
static java.lang.String LEXER_GRAMMAR_FILE_EXTENSION
          used for generating lexer temp files
protected  StringTemplate lexerGrammarST
          For merged lexer/parsers, we must construct a separate lexer spec.
protected  java.util.Set<java.lang.String> lexerRules
          If combined or lexer grammar, track the rules; Set.
protected  java.util.Set lookBusy
          Used during LOOK to detect computation cycles
protected  int maxTokenType
          Token names and literal tokens like "void" are uniquely indexed.
 java.lang.String name
          What name did the user provide for this grammar?
protected  java.util.LinkedHashMap nameToRuleMap
          Map a rule to it's Rule object
protected  java.util.LinkedHashMap nameToSynpredASTMap
          When we read in a grammar, we track the list of syntactic predicates and build faux rules for them later.
protected  NFA nfa
          The NFA that represents the grammar with edges labelled with tokens or epsilon.
 int numberOfManualLookaheadOptions
           
 int numberOfSemanticPredicates
           
protected  java.util.Map options
          A list of options specified at the grammar level such as language=Java.
static int PARSER
           
static int RULE_LABEL
           
static int RULE_LIST_LABEL
           
protected  int ruleIndex
          Rules are uniquely labeled from 1..n
protected  java.util.Vector ruleIndexToRuleList
          Map a rule index to its name; use a Vector on purpose as new collections stuff won't let me setSize and make it grow.
protected  java.util.Set<antlr.Token> ruleRefs
          The unique set of all rule references in any rule; set of Token objects so two refs to same rule can exist but at different line/position.
protected  java.util.Map scopes
          Track the scopes defined outside of rules and the scopes associated with all rules (even if empty).
 java.util.Set setOfDFAWhoseConversionTerminatedEarly
           
 java.util.Set setOfNondeterministicDecisionNumbers
           
 java.util.Set setOfNondeterministicDecisionNumbersResolvedWithPredicates
           
protected  java.util.Map stringLiteralToTypeMap
          Map token literals like "while" to its token type.
static java.lang.String SYNPRED_RULE_PREFIX
           
static java.lang.String SYNPREDGATE_ACTION_NAME
           
 java.util.Set<java.lang.String> synPredNamesUsedInDFA
          Track names of preds so we can avoid generating preds that aren't used Computed during NFA to DFA conversion.
static int TOKEN_LABEL
           
static int TOKEN_LIST_LABEL
           
protected  antlr.TokenStreamRewriteEngine tokenBuffer
          This is the buffer of *all* tokens found in the grammar file including whitespace tokens etc...
protected  java.util.Set<antlr.Token> tokenIDRefs
          The unique set of all token ID references in any rule
protected  java.util.Map tokenIDToTypeMap
          Map token like ID (but not literals like "while") to its token type
 Tool tool
           
static int TREE_PARSER
           
 int type
          What type of grammar is this: lexer, parser, tree walker
protected  java.util.Vector typeToTokenList
          Map a token type to its token name.
protected  java.util.Set visitedDuringRecursionCheck
          The checkForLeftRecursion method needs to track what rules it has visited to track infinite recursion.
protected  boolean watchNFAConversion
           
 
Constructor Summary
Grammar()
           
Grammar(java.lang.String grammarString)
           
Grammar(java.lang.String fileName, java.lang.String grammarString)
           
Grammar(Tool tool, java.lang.String fileName, java.io.Reader r)
          Create a grammar from a Reader.
 
Method Summary
protected  LookaheadSet _LOOK(NFAState s)
           
 GrammarAST addArtificialMatchTokensRule(GrammarAST grammarAST, java.util.List ruleNames, boolean filterMode)
          Parse a rule we add artificially that is a list of the other lexer rules like this: "Tokens : ID | INT | SEMI ;" nextToken() will invoke this to set the current token.
 boolean allDecisionDFAHaveBeenCreated()
           
 void altReferencesRule(java.lang.String ruleName, GrammarAST refAST, int outerAltNum)
          Track a rule reference within an outermost alt of a rule.
 void altReferencesTokenID(java.lang.String ruleName, GrammarAST refAST, int outerAltNum)
          Track a token reference within an outermost alt of a rule.
 int assignDecisionNumber(NFAState state)
           
 boolean buildAST()
           
 boolean buildTemplate()
           
 java.util.List checkAllRulesForLeftRecursion()
           
 void checkAllRulesForUselessLabels()
          Remove all labels on rule refs whose target rules have no return value.
 void checkRuleReference(GrammarAST refAST, GrammarAST argsAST, java.lang.String currentRuleName)
           
 IntSet complement(int atom)
           
 IntSet complement(IntSet set)
          For lexer grammars, return everything in unicode not in set.
 java.lang.String computeTokenNameFromLiteral(int tokenType, java.lang.String literal)
          given a token type and the text of the literal, come up with a decent token type label.
protected  Grammar.Decision createDecision(int decision)
           
 void createLookaheadDFA(int decision)
           
 void createLookaheadDFAs()
          For each decision in this grammar, compute a single DFA using the NFA states associated with the decision.
 void createNFAs()
          Walk the list of options, altering this Grammar object according to any I recognize.
 AttributeScope createParameterScope(java.lang.String ruleName, antlr.Token argAction)
           
 AttributeScope createReturnScope(java.lang.String ruleName, antlr.Token retAction)
           
 AttributeScope createRuleScope(java.lang.String ruleName, antlr.Token scopeAction)
           
 AttributeScope defineGlobalScope(java.lang.String name, antlr.Token scopeAction)
           
protected  void defineLabel(Rule r, antlr.Token label, GrammarAST element, int type)
          Define a label defined in a rule r; check the validity then ask the Rule object to actually define it.
 void defineLexerRuleForAliasedStringLiteral(java.lang.String tokenID, java.lang.String literal, int tokenType)
          If someone does PLUS='+' in the parser, must make sure we get "PLUS : '+' ;" in lexer not "T73 : '+';"
 void defineLexerRuleForStringLiteral(java.lang.String literal, int tokenType)
           
 void defineLexerRuleFoundInParser(antlr.Token ruleToken, GrammarAST ruleAST)
           
 void defineNamedAction(GrammarAST ampersandAST, java.lang.String scope, GrammarAST nameAST, GrammarAST actionAST)
          Given @scope::name {action} define it for this grammar.
 void defineRule(antlr.Token ruleToken, java.lang.String modifier, java.util.Map options, GrammarAST tree, GrammarAST argActionAST, int numAlts)
          Define a new rule.
 void defineRuleListLabel(java.lang.String ruleName, antlr.Token label, GrammarAST element)
           
 void defineRuleRefLabel(java.lang.String ruleName, antlr.Token label, GrammarAST ruleRef)
           
 java.lang.String defineSyntacticPredicate(GrammarAST blockAST, java.lang.String currentRuleName)
          Define a new predicate and get back its name for use in building a semantic predicate reference to the syn pred.
 void defineToken(java.lang.String text, int tokenType)
          Define a token at a particular token type value.
 void defineTokenListLabel(java.lang.String ruleName, antlr.Token label, GrammarAST element)
           
 void defineTokenRefLabel(java.lang.String ruleName, antlr.Token label, GrammarAST tokenRef)
           
protected  void examineAllExecutableActions()
          Before generating code, we examine all actions that can have $x.y and $y stuff in them because some code generation depends on Rule.referencedPredefinedRuleAttributes.
 void externallyAbortNFAToDFAConversion()
          Terminate DFA creation (grammar analysis).
 java.util.Map getActions()
           
 IntSet getAllCharValues()
          If there is a char vocabulary, use it; else return min to max char as defined by the target.
static java.lang.String getANTLRCharLiteralForChar(int c)
          Return a string representing the escaped char for code c.
protected  java.util.List getArtificialRulesForSyntacticPredicates(ANTLRParser parser, java.util.LinkedHashMap nameToSynpredASTMap)
          for any syntactic predicates, we need to define rules for them; they will get defined automatically like any other rule.
static int getCharValueFromGrammarCharLiteral(java.lang.String literal)
          Given a literal like (the 3 char sequence with single quotes) 'a', return the int value of 'a'.
 CodeGenerator getCodeGenerator()
           
protected  Grammar.Decision getDecision(int decision)
           
 GrammarAST getDecisionBlockAST(int decision)
           
 NFAState getDecisionNFAStartState(int decision)
           
 java.util.List getDecisionNFAStartStateList()
           
 java.lang.String getDefaultActionScope(int grammarType)
          Given a grammar type, what should be the default action scope? If I say @members in a COMBINED grammar, for example, the default scope should be "parser".
 java.lang.String getFileName()
           
 AttributeScope getGlobalScope(java.lang.String name)
          Get a global scope
 java.util.Map getGlobalScopes()
           
 int getGrammarMaxLookahead()
           
 GrammarAST getGrammarTree()
           
 java.lang.String getImplicitlyGeneratedLexerFileName()
           
 java.io.File getImportedVocabFileName(java.lang.String vocabName)
           
 java.util.Set<java.lang.String> getLabels(java.util.Set<GrammarAST> rewriteElements, int labelType)
          Given a set of all rewrite elements on right of ->, filter for label types such as Grammar.TOKEN_LABEL, Grammar.TOKEN_LIST_LABEL, ...
 java.util.Set getLeftRecursiveRules()
          Return a list of left-recursive rules; no analysis can be done successfully on these.
 java.lang.String getLexerGrammar()
          If the grammar is a merged grammar, return the text of the implicit lexer grammar.
 java.util.Map getLineColumnToLookaheadDFAMap()
           
 DFA getLookaheadDFA(int decision)
           
 java.util.List getLookaheadDFAColumnsForLineInFile(int line)
          returns a list of column numbers for all decisions on a particular line so ANTLRWorks choose the decision depending on the location of the cursor (otherwise, ANTLRWorks has to give the *exact* location which is not easy from the user point of view).
 DFA getLookaheadDFAFromPositionInFile(int line, int col)
          Useful for ANTLRWorks to map position in file to the DFA for display
 int getMaxCharValue()
          What is the max char value possible for this grammar's target? Use unicode max if no target defined.
 int getMaxTokenType()
          How many token types have been allocated so far?
 int getNewTokenType()
          Return a new unique integer in the token type space
 NFAState getNFAStateForAltOfDecision(NFAState decisionState, int alt)
          Get the ith alternative (1..n) from a decision; return null when an invalid alt is requested.
 int getNumberOfAltsForDecisionNFA(NFAState decisionState)
          Decisions are linked together with transition(1).
 int getNumberOfCyclicDecisions()
           
 int getNumberOfDecisions()
           
 java.lang.Object getOption(java.lang.String key)
           
 Rule getRule(java.lang.String ruleName)
           
 int getRuleIndex(java.lang.String ruleName)
           
 java.lang.String getRuleModifier(java.lang.String ruleName)
           
 java.lang.String getRuleName(int ruleIndex)
           
 java.util.Collection getRules()
           
 NFAState getRuleStartState(java.lang.String ruleName)
           
 NFAState getRuleStopState(java.lang.String ruleName)
           
 IntSet getSetFromRule(TreeToNFAConverter nfabuilder, java.lang.String ruleName)
          Get the set equivalent (if any) of the indicated rule from this grammar.
 java.util.Set getStringLiterals()
          Get the list of ANTLR String literals
 GrammarAST getSyntacticPredicate(java.lang.String name)
           
 java.util.LinkedHashMap getSyntacticPredicates()
           
 java.lang.String getTokenDisplayName(int ttype)
          Given a token type, get a meaningful name for it such as the ID or string literal.
 java.util.Set getTokenDisplayNames()
          Get a list of all token IDs and literals that have an associated token type.
 java.util.Set getTokenIDs()
          Get the list of tokens that are IDs like BLOCK and LPAREN
 int getTokenType(java.lang.String tokenName)
           
 IntSet getTokenTypes()
          Return a set of all possible token or char types for this grammar
 java.util.Collection getTokenTypesWithoutID()
          Return an ordered integer list of token types that have no corresponding token ID like INT or KEYWORD_BEGIN; for stuff like 'begin'.
 Tool getTool()
           
static java.lang.StringBuffer getUnescapedStringFromGrammarStringLiteral(java.lang.String literal)
          ANTLR does not convert escape sequences during the parse phase because it could not know how to print String/char literals back out when printing grammars etc...
 boolean getWatchNFAConversion()
           
 java.lang.String grammarTreeToString(GrammarAST t)
           
 java.lang.String grammarTreeToString(GrammarAST t, boolean showActions)
           
 int importTokenVocabulary(Grammar importFromGr)
          Pull your token definitions from an existing grammar in memory.
 int importTokenVocabulary(java.lang.String vocabName)
          Load a vocab file .tokens and return max token type found.
protected  void initTokenSymbolTables()
           
 boolean isBuiltFromString()
           
 boolean isEmptyRule(GrammarAST block)
          Rules like "a : ;" and "a : {...} ;" should not generate try/catch blocks for RecognitionException.
 boolean isValidSet(TreeToNFAConverter nfabuilder, GrammarAST t)
          Given set tree like ( SET A B ) in lexer, check that A and B are both valid sets themselves, else we must tree like a BLOCK
 LookaheadSet LOOK(NFAState s)
          From an NFA state, s, find the set of all labels reachable from s.
 boolean NFAToDFAConversionExternallyAborted()
           
 boolean optionIsValid(java.lang.String key, java.lang.Object value)
           
 void printGrammar(java.io.PrintStream output)
           
 void referenceRuleLabelPredefinedAttribute(java.lang.String ruleName)
          To yield smaller, more readable code, track which rules have their predefined attributes accessed.
protected  void removeUselessLabels(java.util.Map ruleToElementLabelPairMap)
          A label on a rule is useless if the rule has no return value, no tree or template output, and it is not referenced in an action.
 void setCodeGenerator(CodeGenerator generator)
           
 void setDecisionBlockAST(int decision, GrammarAST blockAST)
           
 void setDecisionNFA(int decision, NFAState state)
           
 void setFileName(java.lang.String fileName)
           
 void setGrammarContent(java.io.Reader r)
           
 void setGrammarContent(java.lang.String grammarString)
           
 void setLookaheadDFA(int decision, DFA lookaheadDFA)
          Set the lookahead DFA for a particular decision.
 void setName(java.lang.String name)
           
 java.lang.String setOption(java.lang.String key, java.lang.Object value, antlr.Token optionsStartToken)
          Save the option key/value pair and process it; return the key or null if invalid option.
 void setOptions(java.util.Map options, antlr.Token optionsStartToken)
           
 void setRuleAST(java.lang.String ruleName, GrammarAST t)
           
 void setRuleStartState(java.lang.String ruleName, NFAState startState)
           
 void setRuleStopState(java.lang.String ruleName, NFAState stopState)
           
 void setTool(Tool tool)
           
 void setWatchNFAConversion(boolean watchNFAConversion)
           
 void synPredUsedInDFA(DFA dfa, SemanticContext semCtx)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

SYNPRED_RULE_PREFIX

public static final java.lang.String SYNPRED_RULE_PREFIX
See Also:
Constant Field Values

GRAMMAR_FILE_EXTENSION

public static final java.lang.String GRAMMAR_FILE_EXTENSION
See Also:
Constant Field Values

LEXER_GRAMMAR_FILE_EXTENSION

public static final java.lang.String LEXER_GRAMMAR_FILE_EXTENSION
used for generating lexer temp files

See Also:
Constant Field Values

INITIAL_DECISION_LIST_SIZE

public static final int INITIAL_DECISION_LIST_SIZE
See Also:
Constant Field Values

INVALID_RULE_INDEX

public static final int INVALID_RULE_INDEX
See Also:
Constant Field Values

RULE_LABEL

public static final int RULE_LABEL
See Also:
Constant Field Values

TOKEN_LABEL

public static final int TOKEN_LABEL
See Also:
Constant Field Values

RULE_LIST_LABEL

public static final int RULE_LIST_LABEL
See Also:
Constant Field Values

TOKEN_LIST_LABEL

public static final int TOKEN_LIST_LABEL
See Also:
Constant Field Values

CHAR_LABEL

public static final int CHAR_LABEL
See Also:
Constant Field Values

LabelTypeToString

public static java.lang.String[] LabelTypeToString

ARTIFICIAL_TOKENS_RULENAME

public static final java.lang.String ARTIFICIAL_TOKENS_RULENAME
See Also:
Constant Field Values

FRAGMENT_RULE_MODIFIER

public static final java.lang.String FRAGMENT_RULE_MODIFIER
See Also:
Constant Field Values

SYNPREDGATE_ACTION_NAME

public static final java.lang.String SYNPREDGATE_ACTION_NAME
See Also:
Constant Field Values

ANTLRLiteralEscapedCharValue

public static int[] ANTLRLiteralEscapedCharValue
When converting ANTLR char and string literals, here is the value set of escape chars.


ANTLRLiteralCharValueEscape

public static java.lang.String[] ANTLRLiteralCharValueEscape
Given a char, we need to be able to show as an ANTLR literal.


LEXER

public static final int LEXER
See Also:
Constant Field Values

PARSER

public static final int PARSER
See Also:
Constant Field Values

TREE_PARSER

public static final int TREE_PARSER
See Also:
Constant Field Values

COMBINED

public static final int COMBINED
See Also:
Constant Field Values

grammarTypeToString

public static final java.lang.String[] grammarTypeToString

grammarTypeToFileNameSuffix

public static final java.lang.String[] grammarTypeToFileNameSuffix

tokenBuffer

protected antlr.TokenStreamRewriteEngine tokenBuffer
This is the buffer of *all* tokens found in the grammar file including whitespace tokens etc... I use this to extract lexer rules from combined grammars.


IGNORE_STRING_IN_GRAMMAR_FILE_NAME

public static final java.lang.String IGNORE_STRING_IN_GRAMMAR_FILE_NAME
See Also:
Constant Field Values

name

public java.lang.String name
What name did the user provide for this grammar?


type

public int type
What type of grammar is this: lexer, parser, tree walker


options

protected java.util.Map options
A list of options specified at the grammar level such as language=Java. The value can be an AST for complicated values such as character sets. There may be code generator specific options in here. I do no interpretation of the key/value pairs...they are simply available for who wants them.


legalOptions

public static final java.util.Set legalOptions

doNotCopyOptionsToLexer

public static final java.util.Set doNotCopyOptionsToLexer

defaultOptions

public static final java.util.Map defaultOptions

global_k

protected int global_k
Is there a global fixed lookahead set for this grammar? If 0, nothing specified. -1 implies we have not looked at the options table yet to set k.


actions

protected java.util.Map actions
Map a scope to a map of name:action pairs. Map> The code generator will use this to fill holes in the output files. I track the AST node for the action in case I need the line number for errors.


nfa

protected NFA nfa
The NFA that represents the grammar with edges labelled with tokens or epsilon. It is more suitable to analysis than an AST representation.


maxTokenType

protected int maxTokenType
Token names and literal tokens like "void" are uniquely indexed. with -1 implying EOF. Characters are different; they go from -1 (EOF) to ?. For example, 0 could be a binary byte you want to lexer. Labels of DFA/NFA transitions can be both tokens and characters. I use negative numbers for bookkeeping labels like EPSILON. Char/String literals and token types overlap in the same space, however.


charVocabulary

protected IntSet charVocabulary
TODO: hook this to the charVocabulary option


tokenIDToTypeMap

protected java.util.Map tokenIDToTypeMap
Map token like ID (but not literals like "while") to its token type


stringLiteralToTypeMap

protected java.util.Map stringLiteralToTypeMap
Map token literals like "while" to its token type. It may be that WHILE="while"=35, in which case both tokenNameToTypeMap and this field will have entries both mapped to 35.


typeToTokenList

protected java.util.Vector typeToTokenList
Map a token type to its token name. Must subtract MIN_TOKEN_TYPE from index.


importTokenVocabularyFromGrammar

protected Grammar importTokenVocabularyFromGrammar
For interpreting and testing, you sometimes want to import token definitions from another grammar (instead of reading token defs from a file).


tool

public Tool tool

ruleRefs

protected java.util.Set<antlr.Token> ruleRefs
The unique set of all rule references in any rule; set of Token objects so two refs to same rule can exist but at different line/position.


tokenIDRefs

protected java.util.Set<antlr.Token> tokenIDRefs
The unique set of all token ID references in any rule


lexerRules

protected java.util.Set<java.lang.String> lexerRules
If combined or lexer grammar, track the rules; Set. Track lexer rules so we can warn about undefined tokens.


decisionNumber

protected int decisionNumber
Be able to assign a number to every decision in grammar; decisions in 1..n


ruleIndex

protected int ruleIndex
Rules are uniquely labeled from 1..n


leftRecursiveRules

protected java.util.Set leftRecursiveRules
A list of all rules that are in any left-recursive cycle. There could be multiple cycles, but this is a flat list of all problematic rules.


externalAnalysisAbort

protected boolean externalAnalysisAbort
An external tool requests that DFA analysis abort prematurely. Stops at DFA granularity, which are limited to a DFA size and time computation as failsafe.


nameToSynpredASTMap

protected java.util.LinkedHashMap nameToSynpredASTMap
When we read in a grammar, we track the list of syntactic predicates and build faux rules for them later. See my blog entry Dec 2, 2005: http://www.antlr.org/blog/antlr3/lookahead.tml This maps the name (we make up) for a pred to the AST grammar fragment.


nameToRuleMap

protected java.util.LinkedHashMap nameToRuleMap
Map a rule to it's Rule object


scopes

protected java.util.Map scopes
Track the scopes defined outside of rules and the scopes associated with all rules (even if empty).


ruleIndexToRuleList

protected java.util.Vector ruleIndexToRuleList
Map a rule index to its name; use a Vector on purpose as new collections stuff won't let me setSize and make it grow. :( I need a specific guaranteed index, which the Collections stuff won't let me have.


grammarTree

protected GrammarAST grammarTree
An AST that records entire input grammar with all rules. A simple grammar with one rule, "grammar t; a : A | B ;", looks like: ( grammar t ( rule a ( BLOCK ( ALT A ) ( ALT B ) ) ) )


indexToDecision

protected java.util.Vector indexToDecision
Each subrule/rule is a decision point and we must track them so we can go back later and build DFA predictors for them. This includes all the rules, subrules, optional blocks, ()+, ()* etc... The elements in this list are NFAState objects.


generator

protected CodeGenerator generator
If non-null, this is the code generator we will use to generate recognizers in the target language.


lookBusy

protected java.util.Set lookBusy
Used during LOOK to detect computation cycles


visitedDuringRecursionCheck

protected java.util.Set visitedDuringRecursionCheck
The checkForLeftRecursion method needs to track what rules it has visited to track infinite recursion.


watchNFAConversion

protected boolean watchNFAConversion

lexerGrammarST

protected StringTemplate lexerGrammarST
For merged lexer/parsers, we must construct a separate lexer spec. This is the template for lexer; put the literals first then the regular rules. We don't need to specify a token vocab import as I make the new grammar import from the old all in memory; don't want to force it to read from the disk. Lexer grammar will have same name as original grammar but will be in different filename. Foo.g with combined grammar will have FooParser.java generated and Foo__.g with again Foo inside. It will however generate FooLexer.java as it's a lexer grammar. A bit odd, but autogenerated. Can tweak later if we want.


fileName

protected java.lang.String fileName
What file name holds this grammar?


DFACreationWallClockTimeInMS

public long DFACreationWallClockTimeInMS
How long in ms did it take to build DFAs for this grammar? If this grammar is a combined grammar, it only records time for the parser grammar component. This only records the time to do the LL(*) work; NFA->DFA conversion.


numberOfSemanticPredicates

public int numberOfSemanticPredicates

numberOfManualLookaheadOptions

public int numberOfManualLookaheadOptions

setOfNondeterministicDecisionNumbers

public java.util.Set setOfNondeterministicDecisionNumbers

setOfNondeterministicDecisionNumbersResolvedWithPredicates

public java.util.Set setOfNondeterministicDecisionNumbersResolvedWithPredicates

setOfDFAWhoseConversionTerminatedEarly

public java.util.Set setOfDFAWhoseConversionTerminatedEarly

blocksWithSynPreds

public java.util.Set<GrammarAST> blocksWithSynPreds
Track decisions with syn preds specified for reporting. This is the a set of BLOCK type AST nodes.


decisionsWhoseDFAsUsesSynPreds

public java.util.Set<DFA> decisionsWhoseDFAsUsesSynPreds
Track decisions that actually use the syn preds in the DFA. Computed during NFA to DFA conversion.


synPredNamesUsedInDFA

public java.util.Set<java.lang.String> synPredNamesUsedInDFA
Track names of preds so we can avoid generating preds that aren't used Computed during NFA to DFA conversion. Just walk accept states and look for synpreds because that is the only state target whose incident edges can have synpreds. Same is try for decisionsWhoseDFAsUsesSynPreds.


blocksWithSemPreds

public java.util.Set<GrammarAST> blocksWithSemPreds
Track decisions with syn preds specified for reporting. This is the a set of BLOCK type AST nodes.


decisionsWhoseDFAsUsesSemPreds

public java.util.Set decisionsWhoseDFAsUsesSemPreds
Track decisions that actually use the syn preds in the DFA. Set


allDecisionDFACreated

protected boolean allDecisionDFACreated

builtFromString

protected boolean builtFromString
We need a way to detect when a lexer grammar is autogenerated from another grammar or we are just sending in a string representing a grammar. We don't want to generate a .tokens file, for example, in such cases.

Constructor Detail

Grammar

public Grammar()

Grammar

public Grammar(java.lang.String grammarString)
        throws antlr.RecognitionException,
               antlr.TokenStreamException
Throws:
antlr.RecognitionException
antlr.TokenStreamException

Grammar

public Grammar(java.lang.String fileName,
               java.lang.String grammarString)
        throws antlr.RecognitionException,
               antlr.TokenStreamException
Throws:
antlr.RecognitionException
antlr.TokenStreamException

Grammar

public Grammar(Tool tool,
               java.lang.String fileName,
               java.io.Reader r)
        throws antlr.RecognitionException,
               antlr.TokenStreamException
Create a grammar from a Reader. Parse the grammar, building a tree and loading a symbol table of sorts here in Grammar. Then create an NFA and associated factory. Walk the AST representing the grammar, building the state clusters of the NFA.

Throws:
antlr.RecognitionException
antlr.TokenStreamException
Method Detail

setFileName

public void setFileName(java.lang.String fileName)

getFileName

public java.lang.String getFileName()

setName

public void setName(java.lang.String name)

setGrammarContent

public void setGrammarContent(java.lang.String grammarString)
                       throws antlr.RecognitionException,
                              antlr.TokenStreamException
Throws:
antlr.RecognitionException
antlr.TokenStreamException

setGrammarContent

public void setGrammarContent(java.io.Reader r)
                       throws antlr.RecognitionException,
                              antlr.TokenStreamException
Throws:
antlr.RecognitionException
antlr.TokenStreamException

getLexerGrammar

public java.lang.String getLexerGrammar()
If the grammar is a merged grammar, return the text of the implicit lexer grammar.


getImplicitlyGeneratedLexerFileName

public java.lang.String getImplicitlyGeneratedLexerFileName()

getImportedVocabFileName

public java.io.File getImportedVocabFileName(java.lang.String vocabName)

addArtificialMatchTokensRule

public GrammarAST addArtificialMatchTokensRule(GrammarAST grammarAST,
                                               java.util.List ruleNames,
                                               boolean filterMode)
Parse a rule we add artificially that is a list of the other lexer rules like this: "Tokens : ID | INT | SEMI ;" nextToken() will invoke this to set the current token. Add char literals before the rule references. If in filter mode, we want every alt to backtrack and we need to do k=1 to force the "first token def wins" rule. Otherwise, the longest-match rule comes into play with LL(*). The ANTLRParser antlr.g file now invokes this when parsing a lexer grammar, which I think is proper even though it peeks at the info that later phases will compute. It gets a list of lexer rules and builds a string representing the rule; then it creates a parser and adds the resulting tree to the grammar's tree.


getArtificialRulesForSyntacticPredicates

protected java.util.List getArtificialRulesForSyntacticPredicates(ANTLRParser parser,
                                                                  java.util.LinkedHashMap nameToSynpredASTMap)
for any syntactic predicates, we need to define rules for them; they will get defined automatically like any other rule. :)


initTokenSymbolTables

protected void initTokenSymbolTables()

createNFAs

public void createNFAs()
Walk the list of options, altering this Grammar object according to any I recognize. protected void processOptions() { Iterator optionNames = options.keySet().iterator(); while (optionNames.hasNext()) { String optionName = (String) optionNames.next(); Object value = options.get(optionName); if ( optionName.equals("tokenVocab") ) { } } }


createLookaheadDFAs

public void createLookaheadDFAs()
For each decision in this grammar, compute a single DFA using the NFA states associated with the decision. The DFA construction determines whether or not the alternatives in the decision are separable using a regular lookahead language. Store the lookahead DFAs in the AST created from the user's grammar so the code generator or whoever can easily access it. This is a separate method because you might want to create a Grammar without doing the expensive analysis.


createLookaheadDFA

public void createLookaheadDFA(int decision)

externallyAbortNFAToDFAConversion

public void externallyAbortNFAToDFAConversion()
Terminate DFA creation (grammar analysis).


NFAToDFAConversionExternallyAborted

public boolean NFAToDFAConversionExternallyAborted()

getNewTokenType

public int getNewTokenType()
Return a new unique integer in the token type space


defineToken

public void defineToken(java.lang.String text,
                        int tokenType)
Define a token at a particular token type value. Blast an old value with a new one. This is called directly during import vocab operation to set up tokens with specific values.


defineRule

public void defineRule(antlr.Token ruleToken,
                       java.lang.String modifier,
                       java.util.Map options,
                       GrammarAST tree,
                       GrammarAST argActionAST,
                       int numAlts)
Define a new rule. A new rule index is created by incrementing ruleIndex.


defineSyntacticPredicate

public java.lang.String defineSyntacticPredicate(GrammarAST blockAST,
                                                 java.lang.String currentRuleName)
Define a new predicate and get back its name for use in building a semantic predicate reference to the syn pred.


getSyntacticPredicates

public java.util.LinkedHashMap getSyntacticPredicates()

getSyntacticPredicate

public GrammarAST getSyntacticPredicate(java.lang.String name)

synPredUsedInDFA

public void synPredUsedInDFA(DFA dfa,
                             SemanticContext semCtx)

defineNamedAction

public void defineNamedAction(GrammarAST ampersandAST,
                              java.lang.String scope,
                              GrammarAST nameAST,
                              GrammarAST actionAST)
Given @scope::name {action} define it for this grammar. Later, the code generator will ask for the actions table.


getActions

public java.util.Map getActions()

getDefaultActionScope

public java.lang.String getDefaultActionScope(int grammarType)
Given a grammar type, what should be the default action scope? If I say @members in a COMBINED grammar, for example, the default scope should be "parser".


defineLexerRuleFoundInParser

public void defineLexerRuleFoundInParser(antlr.Token ruleToken,
                                         GrammarAST ruleAST)

defineLexerRuleForAliasedStringLiteral

public void defineLexerRuleForAliasedStringLiteral(java.lang.String tokenID,
                                                   java.lang.String literal,
                                                   int tokenType)
If someone does PLUS='+' in the parser, must make sure we get "PLUS : '+' ;" in lexer not "T73 : '+';"


defineLexerRuleForStringLiteral

public void defineLexerRuleForStringLiteral(java.lang.String literal,
                                            int tokenType)

getRule

public Rule getRule(java.lang.String ruleName)

getRuleIndex

public int getRuleIndex(java.lang.String ruleName)

getRuleName

public java.lang.String getRuleName(int ruleIndex)

defineGlobalScope

public AttributeScope defineGlobalScope(java.lang.String name,
                                        antlr.Token scopeAction)

createReturnScope

public AttributeScope createReturnScope(java.lang.String ruleName,
                                        antlr.Token retAction)

createRuleScope

public AttributeScope createRuleScope(java.lang.String ruleName,
                                      antlr.Token scopeAction)

createParameterScope

public AttributeScope createParameterScope(java.lang.String ruleName,
                                           antlr.Token argAction)

getGlobalScope

public AttributeScope getGlobalScope(java.lang.String name)
Get a global scope


getGlobalScopes

public java.util.Map getGlobalScopes()

defineLabel

protected void defineLabel(Rule r,
                           antlr.Token label,
                           GrammarAST element,
                           int type)
Define a label defined in a rule r; check the validity then ask the Rule object to actually define it.


defineTokenRefLabel

public void defineTokenRefLabel(java.lang.String ruleName,
                                antlr.Token label,
                                GrammarAST tokenRef)

defineRuleRefLabel

public void defineRuleRefLabel(java.lang.String ruleName,
                               antlr.Token label,
                               GrammarAST ruleRef)

defineTokenListLabel

public void defineTokenListLabel(java.lang.String ruleName,
                                 antlr.Token label,
                                 GrammarAST element)

defineRuleListLabel

public void defineRuleListLabel(java.lang.String ruleName,
                                antlr.Token label,
                                GrammarAST element)

getLabels

public java.util.Set<java.lang.String> getLabels(java.util.Set<GrammarAST> rewriteElements,
                                                 int labelType)
Given a set of all rewrite elements on right of ->, filter for label types such as Grammar.TOKEN_LABEL, Grammar.TOKEN_LIST_LABEL, ... Return a displayable token type name computed from the GrammarAST.


examineAllExecutableActions

protected void examineAllExecutableActions()
Before generating code, we examine all actions that can have $x.y and $y stuff in them because some code generation depends on Rule.referencedPredefinedRuleAttributes. I need to remove unused rule labels for example.


checkAllRulesForUselessLabels

public void checkAllRulesForUselessLabels()
Remove all labels on rule refs whose target rules have no return value. Do this for all rules in grammar.


removeUselessLabels

protected void removeUselessLabels(java.util.Map ruleToElementLabelPairMap)
A label on a rule is useless if the rule has no return value, no tree or template output, and it is not referenced in an action.


altReferencesRule

public void altReferencesRule(java.lang.String ruleName,
                              GrammarAST refAST,
                              int outerAltNum)
Track a rule reference within an outermost alt of a rule. Used at the moment to decide if $ruleref refers to a unique rule ref in the alt. Rewrite rules force tracking of all rule AST results. This data is also used to verify that all rules have been defined.


altReferencesTokenID

public void altReferencesTokenID(java.lang.String ruleName,
                                 GrammarAST refAST,
                                 int outerAltNum)
Track a token reference within an outermost alt of a rule. Used to decide if $tokenref refers to a unique token ref in the alt. Does not track literals! Rewrite rules force tracking of all tokens.


referenceRuleLabelPredefinedAttribute

public void referenceRuleLabelPredefinedAttribute(java.lang.String ruleName)
To yield smaller, more readable code, track which rules have their predefined attributes accessed. If the rule has no user-defined return values, then don't generate the return value scope classes etc... Make the rule have void return value. Don't track for lexer rules.


checkAllRulesForLeftRecursion

public java.util.List checkAllRulesForLeftRecursion()

getLeftRecursiveRules

public java.util.Set getLeftRecursiveRules()
Return a list of left-recursive rules; no analysis can be done successfully on these. Useful to skip these rules then and also for ANTLRWorks to highlight them.


checkRuleReference

public void checkRuleReference(GrammarAST refAST,
                               GrammarAST argsAST,
                               java.lang.String currentRuleName)

isEmptyRule

public boolean isEmptyRule(GrammarAST block)
Rules like "a : ;" and "a : {...} ;" should not generate try/catch blocks for RecognitionException. To detect this it's probably ok to just look for any reference to an atom that can match some input. W/o that, the rule is unlikey to have any else.


getTokenType

public int getTokenType(java.lang.String tokenName)

getTokenIDs

public java.util.Set getTokenIDs()
Get the list of tokens that are IDs like BLOCK and LPAREN


getTokenTypesWithoutID

public java.util.Collection getTokenTypesWithoutID()
Return an ordered integer list of token types that have no corresponding token ID like INT or KEYWORD_BEGIN; for stuff like 'begin'.


getTokenDisplayNames

public java.util.Set getTokenDisplayNames()
Get a list of all token IDs and literals that have an associated token type.


getCharValueFromGrammarCharLiteral

public static int getCharValueFromGrammarCharLiteral(java.lang.String literal)
Given a literal like (the 3 char sequence with single quotes) 'a', return the int value of 'a'. Convert escape sequences here also. ANTLR's antlr.g parser does not convert escape sequences. 11/26/2005: I changed literals to always be '...' even for strings. This routine still works though.


getUnescapedStringFromGrammarStringLiteral

public static java.lang.StringBuffer getUnescapedStringFromGrammarStringLiteral(java.lang.String literal)
ANTLR does not convert escape sequences during the parse phase because it could not know how to print String/char literals back out when printing grammars etc... Someone in China might use the real unicode char in a literal as it will display on their screen; when printing back out, I could not know whether to display or use a unicode escape. This routine converts a string literal with possible escape sequences into a pure string of 16-bit char values. Escapes and unicode specs are converted to pure chars. return in a buffer; people may want to walk/manipulate further. The NFA construction routine must know the actual char values.


importTokenVocabulary

public int importTokenVocabulary(Grammar importFromGr)
Pull your token definitions from an existing grammar in memory. You must use Grammar() ctor then this method then setGrammarContent() to make this work. This is useful primarily for testing and interpreting grammars. Return the max token type found.


importTokenVocabulary

public int importTokenVocabulary(java.lang.String vocabName)
Load a vocab file .tokens and return max token type found.


getTokenDisplayName

public java.lang.String getTokenDisplayName(int ttype)
Given a token type, get a meaningful name for it such as the ID or string literal. If this is a lexer and the ttype is in the char vocabulary, compute an ANTLR-valid (possibly escaped) char literal.


getStringLiterals

public java.util.Set getStringLiterals()
Get the list of ANTLR String literals


getGrammarMaxLookahead

public int getGrammarMaxLookahead()

setOption

public java.lang.String setOption(java.lang.String key,
                                  java.lang.Object value,
                                  antlr.Token optionsStartToken)
Save the option key/value pair and process it; return the key or null if invalid option.


setOptions

public void setOptions(java.util.Map options,
                       antlr.Token optionsStartToken)

getOption

public java.lang.Object getOption(java.lang.String key)

optionIsValid

public boolean optionIsValid(java.lang.String key,
                             java.lang.Object value)

buildAST

public boolean buildAST()

isBuiltFromString

public boolean isBuiltFromString()

buildTemplate

public boolean buildTemplate()

getRules

public java.util.Collection getRules()

setRuleAST

public void setRuleAST(java.lang.String ruleName,
                       GrammarAST t)

setRuleStartState

public void setRuleStartState(java.lang.String ruleName,
                              NFAState startState)

setRuleStopState

public void setRuleStopState(java.lang.String ruleName,
                             NFAState stopState)

getRuleStartState

public NFAState getRuleStartState(java.lang.String ruleName)

getRuleModifier

public java.lang.String getRuleModifier(java.lang.String ruleName)

getRuleStopState

public NFAState getRuleStopState(java.lang.String ruleName)

assignDecisionNumber

public int assignDecisionNumber(NFAState state)

getDecision

protected Grammar.Decision getDecision(int decision)

createDecision

protected Grammar.Decision createDecision(int decision)

getDecisionNFAStartStateList

public java.util.List getDecisionNFAStartStateList()

getDecisionNFAStartState

public NFAState getDecisionNFAStartState(int decision)

getLookaheadDFA

public DFA getLookaheadDFA(int decision)

getDecisionBlockAST

public GrammarAST getDecisionBlockAST(int decision)

getLookaheadDFAColumnsForLineInFile

public java.util.List getLookaheadDFAColumnsForLineInFile(int line)
returns a list of column numbers for all decisions on a particular line so ANTLRWorks choose the decision depending on the location of the cursor (otherwise, ANTLRWorks has to give the *exact* location which is not easy from the user point of view). This is not particularly fast as it walks entire line:col->DFA map looking for a prefix of "line:".


getLookaheadDFAFromPositionInFile

public DFA getLookaheadDFAFromPositionInFile(int line,
                                             int col)
Useful for ANTLRWorks to map position in file to the DFA for display


getLineColumnToLookaheadDFAMap

public java.util.Map getLineColumnToLookaheadDFAMap()

getNumberOfDecisions

public int getNumberOfDecisions()

getNumberOfCyclicDecisions

public int getNumberOfCyclicDecisions()

setLookaheadDFA

public void setLookaheadDFA(int decision,
                            DFA lookaheadDFA)
Set the lookahead DFA for a particular decision. This means that the appropriate AST node must updated to have the new lookahead DFA. This method could be used to properly set the DFAs without using the createLookaheadDFAs() method. You could do this Grammar g = new Grammar("..."); g.setLookahead(1, dfa1); g.setLookahead(2, dfa2); ...


setDecisionNFA

public void setDecisionNFA(int decision,
                           NFAState state)

setDecisionBlockAST

public void setDecisionBlockAST(int decision,
                                GrammarAST blockAST)

allDecisionDFAHaveBeenCreated

public boolean allDecisionDFAHaveBeenCreated()

getMaxTokenType

public int getMaxTokenType()
How many token types have been allocated so far?


getMaxCharValue

public int getMaxCharValue()
What is the max char value possible for this grammar's target? Use unicode max if no target defined.


getTokenTypes

public IntSet getTokenTypes()
Return a set of all possible token or char types for this grammar


getAllCharValues

public IntSet getAllCharValues()
If there is a char vocabulary, use it; else return min to max char as defined by the target. If no target, use max unicode char value.


getANTLRCharLiteralForChar

public static java.lang.String getANTLRCharLiteralForChar(int c)
Return a string representing the escaped char for code c. E.g., If c has value 0x100, you will get "?". ASCII gets the usual char (non-hex) representation. Control characters are spit out as unicode. While this is specially set up for returning Java strings, it can be used by any language target that has the same syntax. :) 11/26/2005: I changed this to use double quotes, consistent with antlr.g 12/09/2005: I changed so everything is single quotes


complement

public IntSet complement(IntSet set)
For lexer grammars, return everything in unicode not in set. For parser and tree grammars, return everything in token space from MIN_TOKEN_TYPE to last valid token type or char value.


complement

public IntSet complement(int atom)

isValidSet

public boolean isValidSet(TreeToNFAConverter nfabuilder,
                          GrammarAST t)
Given set tree like ( SET A B ) in lexer, check that A and B are both valid sets themselves, else we must tree like a BLOCK


getSetFromRule

public IntSet getSetFromRule(TreeToNFAConverter nfabuilder,
                             java.lang.String ruleName)
                      throws antlr.RecognitionException
Get the set equivalent (if any) of the indicated rule from this grammar. Mostly used in the lexer to do ~T for some fragment rule T. If the rule AST has a SET use that. If the rule is a single char convert it to a set and return. If rule is not a simple set (w/o actions) then return null. Rules have AST form: ^( RULE ID modifier ARG RET SCOPE block EOR )

Throws:
antlr.RecognitionException

getNumberOfAltsForDecisionNFA

public int getNumberOfAltsForDecisionNFA(NFAState decisionState)
Decisions are linked together with transition(1). Count how many there are. This is here rather than in NFAState because a grammar decides how NFAs are put together to form a decision.


getNFAStateForAltOfDecision

public NFAState getNFAStateForAltOfDecision(NFAState decisionState,
                                            int alt)
Get the ith alternative (1..n) from a decision; return null when an invalid alt is requested. I must count in to find the right alternative number. For (A|B), you get NFA structure (roughly): o->o-A->o | o->o-B->o This routine returns the leftmost state for each alt. So alt=1, returns the upperleft most state in this structure.


LOOK

public LookaheadSet LOOK(NFAState s)
From an NFA state, s, find the set of all labels reachable from s. This computes FIRST, FOLLOW and any other lookahead computation depending on where s is. Record, with EOR_TOKEN_TYPE, if you hit the end of a rule so we can know at runtime (when these sets are used) to start walking up the follow chain to compute the real, correct follow set. This routine will only be used on parser and tree parser grammars. TODO: it does properly handle a : b A ; where b is nullable Actually it stops at end of rules, returning EOR. Hmm... should check for that and keep going.


_LOOK

protected LookaheadSet _LOOK(NFAState s)

setCodeGenerator

public void setCodeGenerator(CodeGenerator generator)

getCodeGenerator

public CodeGenerator getCodeGenerator()

getGrammarTree

public GrammarAST getGrammarTree()

getTool

public Tool getTool()

setTool

public void setTool(Tool tool)

computeTokenNameFromLiteral

public java.lang.String computeTokenNameFromLiteral(int tokenType,
                                                    java.lang.String literal)
given a token type and the text of the literal, come up with a decent token type label. For now it's just T. Actually, if there is an aliased name from tokens like PLUS='+', use it.


toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

grammarTreeToString

public java.lang.String grammarTreeToString(GrammarAST t)

grammarTreeToString

public java.lang.String grammarTreeToString(GrammarAST t,
                                            boolean showActions)

setWatchNFAConversion

public void setWatchNFAConversion(boolean watchNFAConversion)

getWatchNFAConversion

public boolean getWatchNFAConversion()

printGrammar

public void printGrammar(java.io.PrintStream output)