Another tangentially related tool that I recently learned about is "BNFC":
Given a Labelled BNF grammar the tool produces:
- an abstract syntax implementation in the target language,
- a case skeleton for the abstract syntax in the target language,
- a pretty-printer in the target language,
- an Alex, JLex, or Flex lexer generator file ,
- a Happy, CUP, or Bison parser generator file, and
- a LaTeX file containing a readable specification of the language.
Targeting C, Haskell, Agda, C, C++, Java, or OCaml.
Might be fun to expand on this to generate tree-sitter, highlight.js, or a vscode extension.
Tangentially related, here's a tool I've wanted (someone else) to build:
There are many variations of how grammars are written, usually variants of Backus–Naur form, and often I find a grammar spec is published using a different variant than what is expected by the parser tool I want to use (e.g., a grammar-based fuzzer).
You could use lemon to parse that custom BNF grammar to turn it into a grammar that lemon likes.
That being said, based on my playing around with lemon, the grammar is only a small part of the work with getting the actions right taking up the bulk of the time. You have to insert a bunch of attribute indicators (for lack of a better term) that get passed into the actions anyway so you can’t just take a perfectly formed grammar and use it without a bunch of work.
Plus some parser generators like left-recursive, some like right-recursive, some have ‘sugar’ to indicate repetition and/or optionality, some don’t so you have to specify it manually using empty rules, lots of different ways to do the same thing.
I did something like this here https://github.com/mingodad/lalr-parser-test where I expanded byacc/bison/lemon to generate a naked grammar, ebnf grammar (understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams) and interchange the grammar between then (byacc/bison to lemon and the other way around too, taking in account the difference in how they interpret rules precedence).
Why not just use the parser generator to generate a parser that will standardize the grammar and then use the parser generator again on the output of that parser to generate a parser for the desired grammar?
> Lemon uses a different grammar syntax which is designed to reduce the number of coding errors.
Why does it seem as if almost every parser generator defines its own quirky grammar syntax? What's wrong or so difficult with just accepting W3C EBNF? Who thinks it's a good idea to force grammars to be re-written in the first place?
Does nobody complain about vendor lock-in due to the quirky grammar syntax they were forced to use?
Where are the automatic conversion utilities for these parser generators? Something that takes, say W3C EBNF and spits out the quirky parser generator grammar language? Shouldn't that be simple?
The extra syntax is to fix having to count the number of items to get the value you want like ‘$$=foo($7, $3, $4);’ because if your count is off you end up sending in a bad bug report to one of the industry leaders in computer graphics which is kind of embarrassing.
I have used this parser generator and it works like a charm.
The only thing I would change is the way it is distributed: the links at the end point to a sort of amalgamation of the actual sources that make up the parser generator (in the same style that is used for the SQLite source amalgamation), but I think it would be more beneficial to have access to the separate files that actually make up this amalgamation (and you could hide as static variables / functions some of the implementation details this way).
Nowadays, you code your grammar directly in the source language, and your parser library generates a parser at compile time as part of the normal build cycle.
Of course this works best in a language that supports operator overloading and compile-time operations.
In the old days, in C++, this would have been done with template metaprogramming, which cost various annoyances. Now no such workarounds are needed.
Might be fun to expand on this to generate tree-sitter, highlight.js, or a vscode extension.
http://bnfc.digitalgrammars.com/