Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Totally tangential, totally not related to the post (unless you squint your eyes and really blur things) ...

I was thinking about the old canard of the sufficiently smart compiler. It made me think about LLM output and how in some way the output of a LLM could be bytecode as much as it could be the English language. You have a tokenized input and the translated output. You have a massive and easily generatable training set. I wonder if, one day, our compilers will be LLMs?



A function that implements natural language -> bytecode is IMO way more likely to be under the hood an LLM operating a compiler (or maybe a compiler operating LLMs) rather than a "bare" LLM. From an end user's perspective maybe it won't matter but I think it's an important technical point. IMO there's no evidence that an LLM will ever be the best way to execute general purpose computations.


Why would you tolerate a nonreliable compiler with no assured relationship between its inputs and its outputs? Have people just got too comfortable with the C++ model of "UB means I can insert a security bug for you"?


In a hypothetical future where the reliability of LLMs improves, I can imagine the model being able to craft optimizations that a traditional compiler cannot.

Like there are already cases where hand-rolling assembly can eke out performance gains, but few do that because it’s so arduous. If the LLM could do it reliably it’d be a huge win.

It’s a big if, but not outside the realm of possibility.


I agree it is currently a pipe dream. But if I was looking for a doctoral research idea, it might be fun to work on something like that.

Lots of potential avenues to explore, e.g. going from a high-level language to some IR, from some IR to bytecode, or straight from high-level to machine code.

I mean, -O3 is already so much of a black box that I can't understand it. And the tedium of hand optimizing massive chunks of code is why we automate it at all. Boredom is something we don't expect LLMs to suffer, so having one pore over some kind of representation and apply optimizations seems totally reasonable. And if it had some kinds of "emergent behaviors" based on intelligence that allow it to beat the suite of algorithmic optimization we program into compilers, it could actually be a benefit.


You definitely could, not far removed from text to image or text to audio generators.


Compilers require strict semantics and deterministic output. It’s the exact opposite of AI.

I could see AI being used (in a deterministic way) to make decisions about what optimizations to apply, to improve error messages, or make languages easier to use/reason about, but not for the frontend/backend/optimizations themselves.


I guess an actual compiler would be cheaper and more reliable.

In theory we could do the same with mathematical computations, 2+2=4 and the like; but computing the result seems easier.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: