If we detect recursion on more than one alt, decision is non-LL(*),
but try to isolate it to only those states whose closure operations
detect recursion.
If a closure operation finds that we tried to invoke the same
rule too many times (stack would grow beyond a threshold), it
marks the state has aborted and notifies the DecisionProbe.
The NFA->DFA algorithm may terminate leaving some states
without a path to an accept state, implying that upon certain
input, the decision is not deterministic--no decision about
predicting a unique alternative can be made.
ACTION -
Static variable in class org.antlr.analysis.Label
Parse a rule we add artificially that is a list of the other lexer
rules like this: "Tokens : ID | INT | SEMI ;" nextToken() will invoke
this to set the current token.
The basic output templates without AST or templates stuff; this will be
the templates loaded for the language such as Java.stg *and* the Dbg
stuff if turned on.
From (A)+ build
|---| (Transition 2 from A.right points at alt 1)
v | (follow of loop is Transition 1)
o->o-A-o->o
Meaning that the last NFAState in A points back to A's left Transition NFAState
and we add a new begin/end NFAState.
From (A)* build
|---|
v |
o->o-A-o--o (Transition 2 from block end points at alt 1; follow is Transition 1)
| ^
o---------| (optional branch is 2nd alt of optional block containing A+)
Meaning that the last (end) NFAState in A points back to A's
left side NFAState and we add 3 new NFAStates (the
optional branch is built just like an optional subrule).
For reference to rule r, build
o-e->(r) o
where (r) is the start of rule r and the trailing o is not linked
to from rule ref state directly (it's done thru the transition(0)
RuleClosureTransition.
You can generate a switch rather than if-then-else for a DFA state
if there are no semantic predicates and the number of edge label
values is small enough; e.g., don't generate a switch for a state
containing an edge label such as 20..52330 (the resulting byte codes
would overflow the method 65k limit probably).
For all NFA states (configurations) merged in d,
compute the epsilon closure; that is, find all NFA states reachable
from the NFA states in d via purely epsilon transitions.
Is there a non-syn-pred predicate visible from s that is not in
the rule enclosing s? This accounts for most predicate situations
and lets ANTLR do a simple LL(1)+pred computation.
How long in ms did it take to build DFAs for this grammar?
If this grammar is a combined grammar, it only records time for
the parser grammar component.
Return the interval with elements from this not in other;
other must not be totally enclosed (properly contained)
within this, which would result in two disjoint intervals
instead of the single one returned by this method.
figure out if this state eventually reaches an accept state and
modify the instance variable 'reduced' to indicate if we find
at least one state that cannot reach an accept state.
The unique edge transition class number; every time we see a new
set of edges emanating from a state, we number it so we can reuse
if it's every seen again for another state.
Map an edge transition table to a unique set number; ordered so
we can push into the output template as an ordered list of sets
and then ref them from within the transition[][] table.
Rules in tree grammar that use -> rewrites and are spitting out
templates via output=template and then use rewrite=true must only
use -> on alts that are simple nodes or trees or single rule refs
that match either nodes or trees.
Are two IntervalSets equal? Because all intervals are sorted
and disjoint, equals is a simple linear walk over both lists
to make sure they are the same.
Before generating code, we examine all actions that can have
$x.y and $y stuff in them because some code generation depends on
Rule.referencedPredefinedRuleAttributes.
From this node, add a d--a-->t transition for all
labels 'a' where t is a DFA node created
from the set of NFA states reachable from any NFA
state in DFA state d.
Rule ref nodes, token refs, set, and NOT set refs need to track their
location in the generated NFA so that local FOLLOW sets can be
computed during code gen for automatic error recovery.
Walk each NFA configuration in this DFA state looking for a conflict
where (s|i|ctx) and (s|j|ctx) exist, indicating that state s with
context conflicting ctx predicts alts i and j.
What GrammarAST node (derived from the grammar) is this DFA
associated with? It will point to the start of a block or
the loop back of a (...)+ block etc...
Given a grammar type, what should be the default action scope?
If I say @members in a COMBINED grammar, for example, the
default scope should be "parser".
Get the set of Rules that need to have manual delegations
like "void rule() { importedGrammar.rule(); }"
If this grammar is master, get list of all rule definitions from all
delegate grammars.
Return a list of File objects that name files ANTLR will read
to process T.g; for now, this can only be .tokens files and only
if they use the tokenVocab option.
For gated productions, we need an OR'd list of all predicates for the
target of an edge so we can gate the edge based upon the predicates
associated with taking that path (if any).
If a rule has no user-defined return values and nobody references
it's start/stop (predefined attributes), then there is no need to
define a struct; otherwise for now we assume a struct.
Interpret any ANTLR grammar:
java Interp file.g tokens-to-ignore start-rule input-file
java Interp C.g 'WS COMMENT' program t.c
where the WS and COMMENT are the names of tokens you want to have
the parser ignore.
A set of integers that relies on ranges being common to do
"run-length-encoded" like compression (if you view an IntSet like
a BitSet with runs of 0s and 1s).
A generic set of ints that has an efficient implementation, BitSet,
which is a compressed bitset and is useful for ints that
are small, for example less than 500 or so, and w/o many ranges.
INVALID -
Static variable in class org.antlr.analysis.Label
Is the DFA reduced? I.e., does every state have a path to an accept
state? If not, don't delete as we need to generate an error indicating
which paths are "dead ends".
Similar to LeftRecursionMessage except this is used for announcing
cycles found by walking rules without decisions; the other msg is
invoked when a decision DFA construction finds a problem in closure.
Anything at this value or larger can be considered a simple atom int
for easy comparison during analysis only; faux labels are not used
during parse time for real token types or char values.
We have labels like EPSILON that are below 0; it's hard to
store them in an array with negative index so use this
constant as an index shift when accessing arrays based upon
token type.
This probe tells you a lot about a decision and is useful even
when there is no error such as when a syntactic nondeterminism
is solved via semantic predicates.
As this state is constructed (i.e., as NFA states are added), we
can easily check for non-epsilon transitions because the only
transition that could be a valid label is transition(0).
Given an NFA state number, how many times has the NFA-to-DFA
conversion pushed that state on the stack? In other words,
the NFA state must be a rule invocation state and this method
tells you how many times you've been to this state.
Report the list of predicates found for each alternative; copy
the list because this set gets altered later by the method
tryToResolveWithSemanticPredicates() while flagging NFA configurations
in d as resolved.
Currently the analysis reports issues between token definitions, but
we don't print out warnings in favor of just picking the first token
definition found in the grammar ala lex/flex.
If > 1 NFA configurations within this DFA state have identical
NFA state and context, but differ in their predicted
TODO update for new context suffix stuff 3-9-2005
alternative then a single input sequence predicts multiple alts.
Rather than add a new instance variable to NFA and DFA just for
serializing machines, map old state numbers to new state numbers
by a State object -> Integer new state number HashMap.
Was a syntactic ambiguity resolved with predicates? Any DFA
state that predicts more than one alternative, must be resolved
with predicates or it should be reported to the user.
The code generator for ANTLR can usually be retargeted just by providing
a new X.stg file for language X, however, sometimes the files that must
be generated vary enough that some X-specific functionality is required.
Target() -
Constructor for class org.antlr.codegen.Target
$x.start refs are checked during translation not before so ANTLR misses
the fact that rule r has refs to predefined attributes if the ref is after
the def of the method or self-referential.
A generic message from the tool such as "file not found" type errors; there
is no reason to create a special object for each error unlike the grammar
errors, which may be rather complex.
Predicates are lists of AST nodes from the NFA created from the
grammar, but the same predicate could be cut/paste into multiple
places in the grammar.