I am unconvinced. To me it seems like handling symbols that start and end sequen...

scotty79 · 2026-03-01T20:08:44 1772395724

Transformers look like perfect tech for keeping track of how deep and inside of what we are at the moment.

thesz · 2026-03-01T20:33:52 1772397232

Transformers are able to recognize balanced brackets grammar at 97% success rate: https://openreview.net/pdf?id=kaILSVAspn

This is 3% or infinitely far away from the perfect tech.

The perfect tech is the stack.

krackers · 2026-03-01T23:59:13 1772409553

This is very interesting since there is another notable paper which shows LLMs can recognize and generate CFGs

https://arxiv.org/abs/2305.13673

and of course a^n b^n is also classic CFG, so it's not clear why one paper had positive results while the other hand negative.

thesz · 2026-03-02T00:20:47 1772410847

Dyck grammar (balanced brackets) are not an a^nb^n, there are several kinds of brackets.

I cannot find probability of success in paper you linked. Is it 100%? I believe it is less than 100%, because LLMs are intrinsically probabilistic machines.

krackers · 2026-03-02T00:29:19 1772411359

Figure 12 shows probabilities I think, it actually does seem to be 100% at temperature 0.1 for certain pretraining runs.

thesz · 2026-03-02T00:59:32 1772413172

  > it actually does seem to be 100%

For all Dyck grammar sequences, infinitely many of them? ;)

krackers · 2026-03-02T01:13:18 1772413998

Well they used strings of < 800 chars, you probably run into context window and training limits at some point (they mention some result that you need at least something of GPT-2 size to begin recognizing more intricate CFGs (their synthetic cfg3f). But then again your physical real-world computer which is conceptually "turing complete" can't handle "infinite strings" either.

> Dyck/balanced-brackets grammar

Yes, it's not the Dyck grammar but another CFG they created, they call it the "cfg3" family.

Of course I agree the stack (/pushdown automaton) is the simpler and perfectly optimal structure for this task, but I think it's unfair to say that LLMs _cannot_ recognize or generate CFGs.

(Then again I know you didn't make any such broad refutation of that sort, I mostly wanted to bring up that paper to show that it is possible for them to at least "grok" certain CFGs with low enough error ratio that they must have internalized the underlying grammar [and in fact I believe the paper goes on to apply interprability methods to actually trace the circuits with which it encodes the inductive grammar, which puts to rest any notion of them simply "parroting" the data]). But these were "synthetic" LLMs specifically trained for that grammar, these results probably don't apply in practice to your chatGPT that was trained mostly on human text.

thesz · 2026-03-02T02:12:01 1772417521

  > but I think it's unfair to say that LLMs _cannot_ recognize or generate CFGs.

They recognize and/or generate finite (<800 chars) grammars in that paper.

Usually, sizes of files on a typical Unix workstation follow two-mode log-normal distribution (sum of two log-normal distributions), with heavy tails due to log-normality [1]. Authors of the paper did not attempt to model that distribution.

[1] This was true for my home directories for several years.

thesz · 2026-03-02T01:03:34 1772413414

And this Figure 12 is not about Dyck/balanced-brackets grammar. This figure is about something not properly described in the paper.

cyanydeez · 2026-03-01T20:22:43 1772396563

Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.

While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.

Lerc · 2026-03-01T23:18:10 1772407090

Not at all. You have a set of embeddings for the literal token, and a set for the metadata. At inference time all input gets the literal embedding, the metadata embedding can receive provenance data or nothing at all. You have a vector for user query in the metadata space. The inference engine dissallows any metadata that is not user input to be close to the user query vector.

Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.

dragonwriter · 2026-03-02T05:24:21 1772429061

Actually, all you need is an interface that lets you manipulate the token sequence instead of the text sequence along with a map of the special tokens for the model (most [all?] models have special tokens with defined meanings used in training and inference that are not mapped from character sequences, and native harnesses [the backend APIs of hosted models that only provide a text interface and not a token-level one] leverage them to structure input to the model after tokenization of the various pieces that come to the harnesses API from whatever frontend is in use.)

zahlman · 2026-03-01T23:24:57 1772407497

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.

krackers · 2026-03-02T00:01:27 1772409687

This already happens, user vs system prompts are delimited in this manner, and most good frontends will treat any user input as "needing to be escaped" so you can never "prompt inject" your way into emitting a system role token.

The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.

Lerc · 2026-03-02T13:08:28 1772456908

>The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.

My suspicion is that this failure happens for the same reason why I think the metadata would help with nesting. To take an electronic metaphor, special tokens are edge triggered signals, the metadata approach is signaled by level.

Special tokens are effively an edge but Internally, a transformer must turn the edge into level that propagates along with the context. You can attack this because it can decide by context that the level has been turned off.

You can see this happen in attacks that pre-seed responses with a few tokens accepting the prompt to override refusals. The refusal signal seems to last very few tokens before simply completing the text of the refusal because that's what it has started saying.

There's a paper showing how quickly the signal drops away, but I forget what it is called.