nxt.gxbnf

Lexer/Parser Generator for ANTLR (G, G2, G4) and (E)BNF grammars.

Members

Aliases

Imports
alias Imports = DynamicArray!(Import, Mallocator, uint)
Undocumented in source.
Input
alias Input = string
Undocumented in source.
NodeArray
alias NodeArray = DynamicArray!(Node, Mallocator, uint)
Undocumented in source.
Output
alias Output = DynamicArray!char
Undocumented in source.
PatternArray
alias PatternArray = DynamicArray!(Pattern, Mallocator, uint)
Undocumented in source.
Rules
alias Rules = DynamicArray!(Rule, Mallocator, uint)
Undocumented in source.
RulesByName
alias RulesByName = Rule[Input]
Undocumented in source.
SymbolRefs
alias SymbolRefs = DynamicArray!(SymbolRef, Mallocator, uint)
Undocumented in source.

Classes

Action
class Action
Undocumented in source.
ActionSymbol
class ActionSymbol
Undocumented in source.
AltCharLiteral
class AltCharLiteral
Undocumented in source.
AltM
class AltM
Undocumented in source.
AnyClass
class AnyClass
Undocumented in source.
AttributeSymbol
class AttributeSymbol
Undocumented in source.
BinaryOpPattern
class BinaryOpPattern

Binary pattern combinator.

BlockComment
class BlockComment
Undocumented in source.
Channels
class Channels
Undocumented in source.
CharAltM
class CharAltM
Undocumented in source.
Class
class Class
Undocumented in source.
DotDotSentinel
class DotDotSentinel
Undocumented in source.
FragmentRule
class FragmentRule

A reusable part of a lexer rule that doesn't match (a token) on its own.

Grammar
class Grammar

Grammar named name.

GreedyCount
class GreedyCount

Match count number of instances of type sub.

GreedyOneOrMore
class GreedyOneOrMore

Match (greedily) one or more instances of type sub.

GreedyZeroOrMore
class GreedyZeroOrMore

Match (greedily) zero or more instances of type sub.

GreedyZeroOrOne
class GreedyZeroOrOne

Match (greedily) zero or one instances of type sub.

GxFileParser
class GxFileParser

Gx filer parser.

GxParserByStatement
class GxParserByStatement

Gx parser with range interface over all statements.

Header
class Header
Undocumented in source.
Import
class Import

Import of modules.

LeftParenSentinel
class LeftParenSentinel
Undocumented in source.
LexerGrammar
class LexerGrammar

Lexer grammar named name.

LineComment
class LineComment
Undocumented in source.
Mode
class Mode
Undocumented in source.
NaryOpPattern
class NaryOpPattern

N-ary expression.

NonGreedyOneOrMore
class NonGreedyOneOrMore

Match (non-greedily) one or more instances of type sub.

NonGreedyZeroOrMore
class NonGreedyZeroOrMore

Match (non-greedily) zero or more instances of type sub.

NonGreedyZeroOrOne
class NonGreedyZeroOrOne

Match (non-greedily) zero or one instances of type sub.

NotPattern
class NotPattern

Don't match an instance of type sub.

Options
class Options
Undocumented in source.
OtherSymbol
class OtherSymbol
Undocumented in source.
ParserGrammar
class ParserGrammar

Parser grammar named name.

Pattern
class Pattern
Undocumented in source.
PipeSentinel
class PipeSentinel
Undocumented in source.
Range
class Range

Match value range between limits[0] and limits[1].

RewriteSyntacticPredicate
class RewriteSyntacticPredicate
Undocumented in source.
Rule
class Rule

Rule.

ScopeAction
class ScopeAction
Undocumented in source.
ScopeSymbol
class ScopeSymbol
Undocumented in source.
ScopeSymbolAction
class ScopeSymbolAction
Undocumented in source.
SeqM
class SeqM

Sequence.

StrLiteral
class StrLiteral
Undocumented in source.
SymbolRef
class SymbolRef
Undocumented in source.
TerminatedUnaryOpPattern
class TerminatedUnaryOpPattern
Undocumented in source.
TildeSentinel
class TildeSentinel
Undocumented in source.
TokenNode
class TokenNode
Undocumented in source.
Tokens
class Tokens
Undocumented in source.
UnaryOpPattern
class UnaryOpPattern

Unary match combinator.

Enums

Layout
enum Layout

Format when printing AST (nodes).

NODE
enum NODE

Node.

TOK
enum TOK

< Token kind. TODO: make this a string type like with std.experimental.lexer

Functions

buildSourceFiles
string buildSourceFiles(string[] parserPaths, string[] parserModules, bool linkFlag)

Build the D source files parserPaths.

createMainFile
SourceFile createMainFile(string path, string[] parserPaths, string[] parserModules)
Undocumented in source. Be warned that the author may not have intended to support it.
dcharCountSpanOf
DcharCountSpan dcharCountSpanOf(Pattern[] subs)
Undocumented in source. Be warned that the author may not have intended to support it.
doTree
void doTree(BuildCtx bcx)
Undocumented in source. Be warned that the author may not have intended to support it.
equalsAll
bool equalsAll(Node[] a, Node[] b)
Undocumented in source. Be warned that the author may not have intended to support it.
flattenSubs
PatternArray flattenSubs(PatternArray subs)
Undocumented in source. Be warned that the author may not have intended to support it.
iput
void iput(Output sink, uint indentDepth, T x)

Put x indented at indentDepth.

lexAllInDirTree
void lexAllInDirTree(BuildCtx bcx)
Undocumented in source. Be warned that the author may not have intended to support it.
makeAltA
Pattern makeAltA(Token head, PatternArray subs, bool rewriteFlag)
Undocumented in source. Be warned that the author may not have intended to support it.
makeAltM
Pattern makeAltM(Token head, Pattern[] subs, bool rewriteFlag)
Undocumented in source. Be warned that the author may not have intended to support it.
makeAltN
Pattern makeAltN(Token head, Pattern[n] subs, bool rewriteFlag)
Undocumented in source. Be warned that the author may not have intended to support it.
makeLiteral
TokenNode makeLiteral(Token head)
Undocumented in source. Be warned that the author may not have intended to support it.
makeSeq
Pattern makeSeq(PatternArray subs, GxLexer lexer, bool rewriteFlag)
Undocumented in source. Be warned that the author may not have intended to support it.
makeSeq
Pattern makeSeq(Pattern[] subs, GxLexer lexer, bool rewriteFlag)
Undocumented in source. Be warned that the author may not have intended to support it.
makeSeq
Pattern makeSeq(Node[] subs, GxLexer lexer, bool rewriteFlag)
Undocumented in source. Be warned that the author may not have intended to support it.
needsWrapping
bool needsWrapping(Node[] subs)
Undocumented in source. Be warned that the author may not have intended to support it.
parseAllInDirTree
void parseAllInDirTree(BuildCtx bcx)
Undocumented in source. Be warned that the author may not have intended to support it.
parseCharAltM
Pattern parseCharAltM(CharAltM alt, GxLexer lexer)
Undocumented in source. Be warned that the author may not have intended to support it.
putCharLiteral
void putCharLiteral(Output sink, Input inp)
Undocumented in source. Be warned that the author may not have intended to support it.
putStringLiteralBackQuoted
void putStringLiteralBackQuoted(Output sink, Input inp)
Undocumented in source. Be warned that the author may not have intended to support it.
putStringLiteralDoubleQuoted
void putStringLiteralDoubleQuoted(Output sink, Input inp)
Undocumented in source. Be warned that the author may not have intended to support it.
showNIndents
void showNIndents(Output sink, uint indentDepth)
Undocumented in source. Be warned that the author may not have intended to support it.
showNSpaces
void showNSpaces(uint indentDepth)
Undocumented in source. Be warned that the author may not have intended to support it.
showNSpaces
void showNSpaces(Output sink, uint n)
Undocumented in source. Be warned that the author may not have intended to support it.
toPathModuleName
string toPathModuleName(string path)
tryRelativePath
string tryRelativePath(string rootDirPath, string path)
Undocumented in source. Be warned that the author may not have intended to support it.

Manifest constants

indentStep
enum indentStep;

< Indentation size in number of spaces.

matcherFunctionNamePrefix
enum matcherFunctionNamePrefix;
Undocumented in source.
showProgressFlag
enum showProgressFlag;
Undocumented in source.
useStaticTempArrays
enum useStaticTempArrays;

< Use fixed-size (statically allocated) sequence and alternative buffers.

Static functions

isSymbolStart
bool isSymbolStart(dchar ch)
Undocumented in source. Be warned that the author may not have intended to support it.

Static variables

mainName
auto mainName;
Undocumented in source.
mainSource
auto mainSource;
Undocumented in source.
parserSourceBegin
auto parserSourceBegin;
Undocumented in source.
parserSourceEnd
auto parserSourceEnd;
Undocumented in source.

Structs

BuildCtx
struct BuildCtx

Build Context

DcharCountSpan
struct DcharCountSpan

Lower and upper limit of dchar count.

ExecutableFile
struct ExecutableFile
Undocumented in source.
Format
struct Format
Undocumented in source.
GxFileReader
struct GxFileReader
Undocumented in source.
GxLexer
struct GxLexer

Gx lexer for all version ANTLR grammsrs (.g, .g2, .g4).

ObjectFile
struct ObjectFile
Undocumented in source.
SourceFile
struct SourceFile
Undocumented in source.
Token
struct Token

Gx rule.

See Also

https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687036/ANTLR+Cheat+Sheet

https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form

https://github.com/antlr/grammars-v4

https://github.com/antlr/grammars-v4/blob/master/bnf/bnf.g4

https://stackoverflow.com/questions/53245751/convert-a-form-of-bnf-grammar-to-g4-grammar

https://bnfc.digitalgrammars.com/

https://forum.dlang.org/post/rsmlqfwowpnggwyuibok@forum.dlang.org https://www.regular-expressions.info/unicode.html

https://stackoverflow.com/questions/64654430/meaning-of-plu-in-antlr-grammar/64658336#64658336

https://stackoverflow.com/questions/28829049/antlr4-any-difference-between-import-and-tokenvocab

https://github.com/antlr/antlr4/blob/master/doc/grammars.md

https://github.com/antlr/antlr4/tree/master/doc

https://slebok.github.io/zoo/index.html

TODO:

- Use import std.algorithm.searching : commonPrefix; in alternatives and call it commonPrefixLiteral

- Add Syntax Tree Nodes as structs with members being sub-nodes. Composition over inheritance. If we use structs over classes more languages, such as Vox, can be supported in the code generation phase. Optionally use extern(C++) classes. Sub-node pointers should be defined as unique pointers with deterministic destruction.

- Should be allowed instead of warning: grammars-v4/lua/Lua.g4(329,5): Warning: missing left-hand side, token (leftParen) at offset 5967

- Parallelize grammar parsing and generation of parser files using https://dlang.org/phobos/std_parallelism.html#.parallel After that compilation of parser files should grouped into CPU-count number of groups.

- Use: https://forum.dlang.org/post/zcvjwdetohmklaxriswk@forum.dlang.org

- Rewriting (X+)? as X* in ANTLR grammars and commit to grammars-v4. See https://stackoverflow.com/questions/64706408/rewriting-x-as-x-in-antlr-grammars

- Add errors for missing symbols during code generation

- Literal indexing: - Add map from string literal to fixed-length (typically lexer) rule - Warn about string literals, such as str(...), that are equal to tokens such ELLIPSIS in Python3.g4.

- Make Rule.root be of type Matcher and make - dcharCountSpan and - toMatchInSource members of Matcher. - Remove Symbol.toMatchInSource

- Support tokens { INDENT_WS, DEDENT_WS, LINE_BREAK_WS } to get Python3.g4` with TOK.whitespaceIndent, whitespaceDedent, whitespaceLineBreak useWhitespaceClassesFlag See: https://stackoverflow.com/questions/8642154/antlr-what-is-simpliest-way-to-realize-python-like-indent-depending-grammar

- Unicode regular expressions. Use https://www.regular-expressions.info/unicode.html Use https://forum.dlang.org/post/rsmlqfwowpnggwyuibok@forum.dlang.org

- Use to detect conflicting rules with import and tokenVocab

- Use a region allocator on top of the GC to pre-allocate the nodes. Either copied from std.allocator or Vox. Maybe one region for each file. Calculate the region size from lexer statistics (number of operators, symbols and literals).

- not(...)'s implementation needs to be adjusted. often used in conjunction with altN?

- handle all TODO's in makeRule

- Move parserSourceBegin to gxbnf_rdbase.d

- Use TOK.tokenSpecOptions in parsing. Ignored for now.

- Essentially, Packrat parsing just means caching whether sub-expressions match at the current position in the string when they are tested -- this means that if the current attempt to fit the string into an expression fails then attempts to fit other possible expressions can benefit from the known pass/fail of subexpressions at the points in the string where they have already been tested.

- Deal with differences between import and tokenVocab. See: https://stackoverflow.com/questions/28829049/antlr4-any-difference-between-import-and-tokenvocab

- Add Rule in generated code that defines opApply for matching that overrides - Detect indirect mutual left-recursion by check if Rule.lastOffset (in generated code) is same as current parser offset. Simple-way in generated parsers: enters a rule again without offset change. Requires storing last offset for each non-literal rule. ** Last offset during parsing. * * Used to detect infinite recursion, size_t.max indicates no last offset * yet defined for this rule. * size_t lastOffset = size_t.max;

- Warn about options{greedy=false;}: and advice to replace with non-greedy variants - Warn about options{greedy=true;}: being deprecated

- Display column range for tokens in messages. Use head.input.length. Requires updating FlyCheck. See: -fdiagnostics-print-source-range-info at https://clang.llvm.org/docs/UsersManual.html. See: https://clang.llvm.org/diagnostics.html Use GNU-style formatting such as: fix-it:"test.c":{45:3-45:21}:"gtk_widget_show_all".

- Use: nxt.git to scan parsing examples in grammars-v4

- If performance is needed: - Avoid casts and instead compare against head.tok for isA!NodeType - use RuleAltN(uint n) in makeAlt - use SeqN(uint n) in makeSeq

- Support reading parsers from Grammar Zoom.

Meta