It is a stateless text / pixel auto-complete it has no references of self, stop ...

doph · 2026-04-30T04:00:42 1777521642

is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?

danpalmer · 2026-04-30T04:16:20 1777522580

The kv cache is not persistent. It's a hyper-short-term memory.

in-silico · 2026-04-30T06:02:39 1777528959

Modern kv caches can contain up to 1 million tokens (~3000 pages of text). It's not that short, it's like 48 straight hours of reading.

danpalmer · 2026-04-30T12:17:08 1777551428

Yes and no, it's not just text, it's images, video, etc, and it's not just the pages of content, it's also all the "thinking" as well. Plus the models tend to work better earlier on in the context.

I regularly get close to filling up context windows and have to compact the context. I can do this several times in one human session of me working on a problem, which you could argue is roughly my own context window.

My point though was that almost nothing of the model's knowledge is in the context, it's all in the training. We have no functional long term memory for LLMs beyond training.

cyanydeez · 2026-04-30T15:53:40 1777564420

The KV cache isn't memory, it's the extent of the process saved so the inference can start where the last generated output is concatenated with the next input. It's entirely about saving compute and has nothing to do with memory.

This really confuses how stupid LLMs are: they're just text logs as output and text logs as input; hence the goblins are just tokens that seem to problematically be more probable in the output.

But the KV cache is a thing made to keep a session from having to run through the entire inference. The only thing you can call "memory" is there's no random perturbations in the KV cache while there may be in re=running chat which ends up being non-deterministic. You can think of it as a deterministic seed to prevent a random conversation from it's normal non-deterministic output

mediaman · 2026-04-30T04:40:13 1777524013

It has trained on vast amounts of content that contains the concept of self, of course the idea of self is emergent.

And autoregressive LLMs are not stateless.

dakolli · 2026-04-30T09:54:16 1777542856

of course the idea of self is emergent

You sound really sure of yourself, thousands of ML researchers would disagree with you that self awareness is emergent or at all apparent in large language models. You're literally psychotic if you think this is the case and you need to go touch grass.

NonHyloMorph · 2026-04-30T17:37:03 1777570623

There is a difference between the emergence of selfawareness and the emergence of its idea. Probably

mediaman · 2026-04-30T21:20:15 1777584015

I wrote that carefully; I'd recommend you re-read more slowly it instead of calling me "psychotic" and telling me that I need to "touch grass."

Of course they have the idea of self! The millions of pages of human text contains it; it is baked into the weights, just like the knowledge of the taste of Cheetos is baked in despite their lack of any taste buds. Their knowledge of it does not mean the neural net is actually a conscious creature or truly has a self.

I assume you do not call people psychotic to their face in real life, because it's mean. Please next time take a pause and consider if there is literally any other way you could communicate your potential disagreement.

andai · 2026-04-30T04:21:42 1777522902

Ask Claude about Claude.

yard2010 · 2026-04-30T06:49:57 1777531797

Imagine people would just click words on iOS auto complete mistaking this for intelligence:

"I think the problem is that when you don't have to be perfect for me that's why I'm asking you to do it but I would love to see you guys too busy to get the kids to the park and the trekkers the same time as the terrorists."

How do you like this theory?