Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Lemon Parser Generator (sqlite.org)
86 points by begoon on Aug 25, 2022 | hide | past | favorite | 20 comments


Another tangentially related tool that I recently learned about is "BNFC":

    Given a Labelled BNF grammar the tool produces:
 
    - an abstract syntax implementation in the target language,
    - a case skeleton for the abstract syntax in the target language,
    - a pretty-printer in the target language,
    - an Alex, JLex, or Flex lexer generator file ,
    - a Happy, CUP, or Bison parser generator file, and
    - a LaTeX file containing a readable specification of the language.
Targeting C, Haskell, Agda, C, C++, Java, or OCaml.

Might be fun to expand on this to generate tree-sitter, highlight.js, or a vscode extension.

http://bnfc.digitalgrammars.com/


I'm using lemon to parse a programming language.

Lemon produces this report for my grammar:

    Parser statistics:
      terminal symbols...................    19
      non-terminal symbols...............    42
      total symbols......................    61
      rules..............................    75
      states.............................    56
      conflicts..........................     0
      action table entries...............   377
      lookahead table entries............   379
      total table size (bytes)...........  1433
It generates a parser that seems to be a reasonable size:

    $ wc src/parser.c
    2200 10090 78282 src/parser.c
I've tried a few other parser generators, lemon was my favourite.

re2c for tokenizing, and lemon for parsing =


Tangentially related, here's a tool I've wanted (someone else) to build:

There are many variations of how grammars are written, usually variants of Backus–Naur form, and often I find a grammar spec is published using a different variant than what is expected by the parser tool I want to use (e.g., a grammar-based fuzzer).

For example, Python's grammar (https://docs.python.org/3/reference/grammar.html) has custom syntax with a whole PEP (https://peps.python.org/pep-0617/) describing the grammar of the grammar.

It would be nice to have a tool that can at least help with the mechanical transformation between these grammar syntaxes.


You could use lemon to parse that custom BNF grammar to turn it into a grammar that lemon likes.

That being said, based on my playing around with lemon, the grammar is only a small part of the work with getting the actions right taking up the bulk of the time. You have to insert a bunch of attribute indicators (for lack of a better term) that get passed into the actions anyway so you can’t just take a perfectly formed grammar and use it without a bunch of work.

Plus some parser generators like left-recursive, some like right-recursive, some have ‘sugar’ to indicate repetition and/or optionality, some don’t so you have to specify it manually using empty rules, lots of different ways to do the same thing.


I did something like this here https://github.com/mingodad/lalr-parser-test where I expanded byacc/bison/lemon to generate a naked grammar, ebnf grammar (understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams) and interchange the grammar between then (byacc/bison to lemon and the other way around too, taking in account the difference in how they interpret rules precedence).


FWIW tree-sitter has done a lot of the ground work for this sort of thing, if you need to support a lot of different languages.


Why not just use the parser generator to generate a parser that will standardize the grammar and then use the parser generator again on the output of that parser to generate a parser for the desired grammar?

Its all parser generator all the way down.


> Lemon uses a different grammar syntax which is designed to reduce the number of coding errors.

Why does it seem as if almost every parser generator defines its own quirky grammar syntax? What's wrong or so difficult with just accepting W3C EBNF? Who thinks it's a good idea to force grammars to be re-written in the first place?

Does nobody complain about vendor lock-in due to the quirky grammar syntax they were forced to use?

Where are the automatic conversion utilities for these parser generators? Something that takes, say W3C EBNF and spits out the quirky parser generator grammar language? Shouldn't that be simple?

I really don't get it.


The extra syntax is to fix having to count the number of items to get the value you want like ‘$$=foo($7, $3, $4);’ because if your count is off you end up sending in a bad bug report to one of the industry leaders in computer graphics which is kind of embarrassing.


In addition to the other replies, Lemon predates W3C, possibly by a decade.


> What's wrong or so difficult with just accepting W3C EBNF?

Why should we do what the W3C want?

Anyway - not all grammars are expressible using EBNF. For example prioritised choice.


I have used this parser generator and it works like a charm.

The only thing I would change is the way it is distributed: the links at the end point to a sort of amalgamation of the actual sources that make up the parser generator (in the same style that is used for the SQLite source amalgamation), but I think it would be more beneficial to have access to the separate files that actually make up this amalgamation (and you could hide as static variables / functions some of the implementation details this way).

Anyway, excellent tool!


Lemon is maintained as part of SQLite, so to get what you want, you need to clone the SQLite fossil repo.

There is one sentence on Lemon's homepage which points that direction, which is very easy to miss.

All of the hwaci tools are maintained in Fossil repos and distributed as amalgamations, so that's consistent at least.


The generated code is derived from a template file for exactly this reason.


> The code comes with no warranty. If it breaks, you get to keep both pieces.

That reads much better than legalese.


Here’s my port to Go: https://github.com/gopikchr/golemon

It’s a little ways behind the canonical implementation, because I haven’t touched it for a while, but lemon changes very slowly if at all


Related:

The Lemon Parser Generator - https://news.ycombinator.com/item?id=10295087 - Sept 2015 (10 comments)

The Lemon Parser Generator - https://news.ycombinator.com/item?id=4473854 - Sept 2012 (2 comments)


Nowadays, you code your grammar directly in the source language, and your parser library generates a parser at compile time as part of the normal build cycle.

Of course this works best in a language that supports operator overloading and compile-time operations.

In the old days, in C++, this would have been done with template metaprogramming, which cost various annoyances. Now no such workarounds are needed.


Quite the Baader-Meinhof effect!

This morning I added a commit to a fossil repo, and that commit was on a .y lemon file.

That's not a typical thing I do in a day.


I was literally reading a section about `yacc` in the compiler dragon book when I saw this link. Definitely quite a convenient link to see today.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: