The Lemon Parser Generator

dunham · on Aug 25, 2022

Another tangentially related tool that I recently learned about is "BNFC":

    Given a Labelled BNF grammar the tool produces:
 
    - an abstract syntax implementation in the target language,
    - a case skeleton for the abstract syntax in the target language,
    - a pretty-printer in the target language,
    - an Alex, JLex, or Flex lexer generator file ,
    - a Happy, CUP, or Bison parser generator file, and
    - a LaTeX file containing a readable specification of the language.

Targeting C, Haskell, Agda, C, C++, Java, or OCaml.

Might be fun to expand on this to generate tree-sitter, highlight.js, or a vscode extension.

http://bnfc.digitalgrammars.com/

414owen · on Aug 25, 2022

I'm using lemon to parse a programming language.

Lemon produces this report for my grammar:

    Parser statistics:
      terminal symbols...................    19
      non-terminal symbols...............    42
      total symbols......................    61
      rules..............................    75
      states.............................    56
      conflicts..........................     0
      action table entries...............   377
      lookahead table entries............   379
      total table size (bytes)...........  1433

It generates a parser that seems to be a reasonable size:

    $ wc src/parser.c
    2200 10090 78282 src/parser.c

I've tried a few other parser generators, lemon was my favourite.

re2c for tokenizing, and lemon for parsing =

rgovostes · on Aug 25, 2022

Tangentially related, here's a tool I've wanted (someone else) to build:

There are many variations of how grammars are written, usually variants of Backus–Naur form, and often I find a grammar spec is published using a different variant than what is expected by the parser tool I want to use (e.g., a grammar-based fuzzer).

For example, Python's grammar (https://docs.python.org/3/reference/grammar.html) has custom syntax with a whole PEP (https://peps.python.org/pep-0617/) describing the grammar of the grammar.

It would be nice to have a tool that can at least help with the mechanical transformation between these grammar syntaxes.

UncleEntity · on Aug 25, 2022

You could use lemon to parse that custom BNF grammar to turn it into a grammar that lemon likes.

That being said, based on my playing around with lemon, the grammar is only a small part of the work with getting the actions right taking up the bulk of the time. You have to insert a bunch of attribute indicators (for lack of a better term) that get passed into the actions anyway so you can’t just take a perfectly formed grammar and use it without a bunch of work.

Plus some parser generators like left-recursive, some like right-recursive, some have ‘sugar’ to indicate repetition and/or optionality, some don’t so you have to specify it manually using empty rules, lots of different ways to do the same thing.

mingodad · on Aug 25, 2022

I did something like this here https://github.com/mingodad/lalr-parser-test where I expanded byacc/bison/lemon to generate a naked grammar, ebnf grammar (understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams) and interchange the grammar between then (byacc/bison to lemon and the other way around too, taking in account the difference in how they interpret rules precedence).

junon · on Aug 25, 2022

FWIW tree-sitter has done a lot of the ground work for this sort of thing, if you need to support a lot of different languages.

IIAOPSW · on Aug 25, 2022

Why not just use the parser generator to generate a parser that will standardize the grammar and then use the parser generator again on the output of that parser to generate a parser for the desired grammar?

Its all parser generator all the way down.

yuan43 · on Aug 25, 2022

> Lemon uses a different grammar syntax which is designed to reduce the number of coding errors.

Why does it seem as if almost every parser generator defines its own quirky grammar syntax? What's wrong or so difficult with just accepting W3C EBNF? Who thinks it's a good idea to force grammars to be re-written in the first place?

Does nobody complain about vendor lock-in due to the quirky grammar syntax they were forced to use?

Where are the automatic conversion utilities for these parser generators? Something that takes, say W3C EBNF and spits out the quirky parser generator grammar language? Shouldn't that be simple?

I really don't get it.

UncleEntity · on Aug 25, 2022

The extra syntax is to fix having to count the number of items to get the value you want like ‘$$=foo($7, $3, $4);’ because if your count is off you end up sending in a bad bug report to one of the industry leaders in computer graphics which is kind of embarrassing.

rhmw2b · on Aug 25, 2022

In addition to the other replies, Lemon predates W3C, possibly by a decade.

chrisseaton · on Aug 25, 2022

> What's wrong or so difficult with just accepting W3C EBNF?

Why should we do what the W3C want?

Anyway - not all grammars are expressible using EBNF. For example prioritised choice.

gonzus · on Aug 25, 2022

I have used this parser generator and it works like a charm.

The only thing I would change is the way it is distributed: the links at the end point to a sort of amalgamation of the actual sources that make up the parser generator (in the same style that is used for the SQLite source amalgamation), but I think it would be more beneficial to have access to the separate files that actually make up this amalgamation (and you could hide as static variables / functions some of the implementation details this way).

Anyway, excellent tool!

samatman · on Aug 25, 2022

Lemon is maintained as part of SQLite, so to get what you want, you need to clone the SQLite fossil repo.

There is one sentence on Lemon's homepage which points that direction, which is very easy to miss.

All of the hwaci tools are maintained in Fossil repos and distributed as amalgamations, so that's consistent at least.

UncleEntity · on Aug 25, 2022

The generated code is derived from a template file for exactly this reason.

j0e1 · on Aug 25, 2022

> The code comes with no warranty. If it breaks, you get to keep both pieces.

That reads much better than legalese.

zellyn · on Aug 25, 2022

Here’s my port to Go: https://github.com/gopikchr/golemon

It’s a little ways behind the canonical implementation, because I haven’t touched it for a while, but lemon changes very slowly if at all

dang · on Aug 25, 2022

The Lemon Parser Generator - https://news.ycombinator.com/item?id=4473854 - Sept 2012 (2 comments)

ncmncm · on Aug 25, 2022

Nowadays, you code your grammar directly in the source language, and your parser library generates a parser at compile time as part of the normal build cycle.

Of course this works best in a language that supports operator overloading and compile-time operations.

In the old days, in C++, this would have been done with template metaprogramming, which cost various annoyances. Now no such workarounds are needed.

samatman · on Aug 25, 2022

Quite the Baader-Meinhof effect!

This morning I added a commit to a fossil repo, and that commit was on a .y lemon file.

That's not a typical thing I do in a day.

JLCarveth · on Aug 25, 2022

I was literally reading a section about `yacc` in the compiler dragon book when I saw this link. Definitely quite a convenient link to see today.