Binary pattern combinator.
A reusable part of a lexer rule that doesn't match (a token) on its own.
Grammar named name.
Match count number of instances of type sub.
Match (greedily) one or more instances of type sub.
Match (greedily) zero or more instances of type sub.
Match (greedily) zero or one instances of type sub.
Gx filer parser.
Gx parser with range interface over all statements.
Import of modules.
Lexer grammar named name.
N-ary expression.
Match (non-greedily) one or more instances of type sub.
Match (non-greedily) zero or more instances of type sub.
Match (non-greedily) zero or one instances of type sub.
Don't match an instance of type sub.
Parser grammar named name.
Match value range between limits[0] and limits[1].
Rule.
Sequence.
Unary match combinator.
Format when printing AST (nodes).
Node.
< Token kind. TODO: make this a string type like with std.experimental.lexer
Build the D source files parserPaths.
Put x indented at indentDepth.
< Indentation size in number of spaces.
< Use fixed-size (statically allocated) sequence and alternative buffers.
Build Context
Lower and upper limit of dchar count.
Gx lexer for all version ANTLR grammsrs (.g, .g2, .g4).
Gx rule.
https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687036/ANTLR+Cheat+Sheet
https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
https://github.com/antlr/grammars-v4
https://github.com/antlr/grammars-v4/blob/master/bnf/bnf.g4
https://stackoverflow.com/questions/53245751/convert-a-form-of-bnf-grammar-to-g4-grammar
https://bnfc.digitalgrammars.com/
https://forum.dlang.org/post/rsmlqfwowpnggwyuibok@forum.dlang.org https://www.regular-expressions.info/unicode.html
https://stackoverflow.com/questions/64654430/meaning-of-plu-in-antlr-grammar/64658336#64658336
https://stackoverflow.com/questions/28829049/antlr4-any-difference-between-import-and-tokenvocab
https://github.com/antlr/antlr4/blob/master/doc/grammars.md
https://github.com/antlr/antlr4/tree/master/doc
https://slebok.github.io/zoo/index.html
TODO:
- Use import std.algorithm.searching : commonPrefix; in alternatives and call it commonPrefixLiteral
- Add Syntax Tree Nodes as structs with members being sub-nodes. Composition over inheritance. If we use structs over classes more languages, such as Vox, can be supported in the code generation phase. Optionally use extern(C++) classes. Sub-node pointers should be defined as unique pointers with deterministic destruction.
- Should be allowed instead of warning: grammars-v4/lua/Lua.g4(329,5): Warning: missing left-hand side, token (leftParen) at offset 5967
- Parallelize grammar parsing and generation of parser files using https://dlang.org/phobos/std_parallelism.html#.parallel After that compilation of parser files should grouped into CPU-count number of groups.
- Use: https://forum.dlang.org/post/zcvjwdetohmklaxriswk@forum.dlang.org
- Rewriting (X+)? as X* in ANTLR grammars and commit to grammars-v4. See https://stackoverflow.com/questions/64706408/rewriting-x-as-x-in-antlr-grammars
- Add errors for missing symbols during code generation
- Literal indexing: - Add map from string literal to fixed-length (typically lexer) rule - Warn about string literals, such as str(...), that are equal to tokens such ELLIPSIS in Python3.g4.
- Make Rule.root be of type Matcher and make - dcharCountSpan and - toMatchInSource members of Matcher. - Remove Symbol.toMatchInSource
- Support tokens { INDENT_WS, DEDENT_WS, LINE_BREAK_WS } to get Python3.g4` with TOK.whitespaceIndent, whitespaceDedent, whitespaceLineBreak useWhitespaceClassesFlag See: https://stackoverflow.com/questions/8642154/antlr-what-is-simpliest-way-to-realize-python-like-indent-depending-grammar
- Unicode regular expressions. Use https://www.regular-expressions.info/unicode.html Use https://forum.dlang.org/post/rsmlqfwowpnggwyuibok@forum.dlang.org
- Use to detect conflicting rules with import and tokenVocab
- Use a region allocator on top of the GC to pre-allocate the nodes. Either copied from std.allocator or Vox. Maybe one region for each file. Calculate the region size from lexer statistics (number of operators, symbols and literals).
- not(...)'s implementation needs to be adjusted. often used in conjunction with altN?
- handle all TODO's in makeRule
- Move parserSourceBegin to gxbnf_rdbase.d
- Use TOK.tokenSpecOptions in parsing. Ignored for now.
- Essentially, Packrat parsing just means caching whether sub-expressions match at the current position in the string when they are tested -- this means that if the current attempt to fit the string into an expression fails then attempts to fit other possible expressions can benefit from the known pass/fail of subexpressions at the points in the string where they have already been tested.
- Deal with differences between import and tokenVocab. See: https://stackoverflow.com/questions/28829049/antlr4-any-difference-between-import-and-tokenvocab
- Add Rule in generated code that defines opApply for matching that overrides - Detect indirect mutual left-recursion by check if Rule.lastOffset (in generated code) is same as current parser offset. Simple-way in generated parsers: enters a rule again without offset change. Requires storing last offset for each non-literal rule. ** Last offset during parsing. * * Used to detect infinite recursion, size_t.max indicates no last offset * yet defined for this rule. * size_t lastOffset = size_t.max;
- Warn about options{greedy=false;}: and advice to replace with non-greedy variants - Warn about options{greedy=true;}: being deprecated
- Display column range for tokens in messages. Use head.input.length. Requires updating FlyCheck. See: -fdiagnostics-print-source-range-info at https://clang.llvm.org/docs/UsersManual.html. See: https://clang.llvm.org/diagnostics.html Use GNU-style formatting such as: fix-it:"test.c":{45:3-45:21}:"gtk_widget_show_all".
- Use: nxt.git to scan parsing examples in grammars-v4
- If performance is needed: - Avoid casts and instead compare against head.tok for isA!NodeType - use RuleAltN(uint n) in makeAlt - use SeqN(uint n) in makeSeq
- Support reading parsers from Grammar Zoom.
Lexer/Parser Generator for ANTLR (G, G2, G4) and (E)BNF grammars.