Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pix2tex: Using a ViT to convert images of equations into LaTeX code (github.com/lukas-blecher)
183 points by Tomte on Nov 3, 2023 | hide | past | favorite | 64 comments


Took a peek at the models they use. It seems to be a vision transformer encoder decoder architecture with a resent backbone. Looks really good. I had a similar idea of training a model and making a desktop application, but haven't had the opportunity. I wonder how much compute it took to train the model.

I think this paper was the first one to do OCR on LaTeX: http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf The paper describes an Encoder-Decoder architecture with CNN encoder and LSTM based decoder.


Want to give proper credit to my former student for starting this: Yuntian Deng et al., 2016 (https://arxiv.org/abs/1609.04938). I believe this repo uses the dataset from that paper.

Some recent cool work he's been doing: https://www.youtube.com/watch?v=lx1XcTdhalU.


For the morbidly curious, that nightmare math is someone's quantum field theory notes which they typeset in TeX:

https://rohankulkarni.me/files/notes/heidelberg_qft/12_2.pdf ("12.2 Diagrammatic expansion of partition function for Yukawa theory")


I was looking for "nightmare math" in the README and was confused because I didn't find any. I guess that's what a theoretical physics degree does to you: that formula looks very harmless to me.


Yeah, I was also let down when I found the "nightmare math" was simply integral of a generic Lagrangian density...


you're a simple integral of generic density


wow, please folks, keep it civilized! :D


Looks like a horror show to me. Makes me feel embarrassed at leaving my math bs behind and going into cs. I have an insane retirement idea of retiring to some fun mountain town and going to grad school in physics. Where's the best place to go skiing with a college that takes old washed up programmers as students?



Good suggestion, but maybe shooting too high. Checks off the "in the mountains" part of my fantasy life. But a place that has seminars for grad students and working physicists and has many Nobel laureates who attended as students may be above my intellectual grade.


Close by is Grenoble. You can take the bus from the main station to arrive at the nearest ski stations within 45mins. :)


U of NM maybe? You can go skying just outside of Albuquerque and play golf on the same day.


Well now you have to show us something you would consider nightmare math


https://petapixel.com/2019/07/05/goodbye-aberration-physicis...

Another example is standard model of particle physics. There’s a way to write down the Lagrangian of the standard model very compactly: https://visit.cern/content/standard_model_formula_t_shirt

But if you expanded all the implied sums and terms, it probably would be monstrous. After all it is supposed to contain all the terms that this figure contains for example: https://commons.wikimedia.org/wiki/File:Standard_Model_of_El...

Also, ask string theorists to show you calculations that they need to find the largest paper to perform. I’ve heard people doing calculations where a single line is the width of an A1 paper.


Surely he could have defined some sensible quantities and notations or exploit some symmetry to make that monstrosity a bit more compact, no? I mean the Einstein field equations in its simplest form is something like G=k*T for suitable definitions of G and T. But if you wrote each component of G in terms of the metric tensor, it would become huge. This is just one component of the Riemann tensor: https://i.stack.imgur.com/xkrq9.png


You are not wrong. Eg the original Maxwell equation is so ugly (I think 20 of them) but the ones that are taught nowadays looks so elegant.

The language is important. The more you “understand” something, the simpler it is (over simplification here.) eg I would not consider the Einstein equation expanded out to be natural after understanding it. (Well Maxwell’s equation can be summarized in 1 single elegant equation as well.)

But the reason I’d consider that optics formula to be monstrous is that no matter how you group it into smaller pieces (ie refactoring it), there’s no way to hide the fact that it is not elegant at all. There’s no “understanding” there, it is just so happen the exact solution looks like that.

To put it that way then, often fundamental “master equation” are simple in some ways, but exact solutions to some particular manifestation of that master equation is often quite ugly and monstrous. In that sense then rather than quoting Einstein equation I’d quote its solution eg one with mass and spin and electric charge.

P.S. the solution of quartic equation is also a good example of this category


There is a ton of repetition, so you could probably replace it with like five relatively small equations. Still...glad I didn't get into optics lol.


I'm pretty sure I used to get nightmares from Christoffel symbols and Riemannian geometry: https://i.stack.imgur.com/Njf17.png


Yeah that didn't look much more nasty than most problem sets


three dimensions over time, with vector components?


I recall dimly a period of months during engineering school when I would have been able to parse those symbols and perhaps make a joke about something in the lunch room. Those days are long behind me.


I would argue that it is a nightmare in general, a viewpoint of someone who actually shared this pain before.


Quantum is a trip. Pages of math to describe... 4 straight lines.


Quite the opposite, the 4 straight lines represent all that math. Notation is very powerful and get you quite far.


Understood, and that was exactly my (poorly made) point. Crazy how powerful quantum notation is!


Slightly related to the task, I wanna plug in my utility app for finding LaTeX commands for characters, DeTeXt: https://venkatasg.net/apps/detext

I've gotten a lot of requests to do whole equations, but I feel that would massively increase the complication of the app for not that much benefit? How often do people want to convert a whole bunch of equations into LaTeX? My use case is usually writing my own equations and forgetting the command for a specific symbol, or looking for a symbol that looks something like X.


Stealing formula. Sometimes one just want to be able to copy and paste equations in their own notes.


I was thinking about how terrible of an idea it was for an ol fortune 500 company of mine to put all their information and lessons learned into some proprietary company infrastructure. This might have been alright, but at the end of the day people were uploading powerpoints. Heck it might even have been reasonable at the time, but with LLMs, it seems like storing everything in text/csv files would have been a much better idea.

The longer I live, the more I'm interested in saving all of my data into text files that I can parse later without vendor lock-in concern. Maybe other open formats as well, best tool for the job, ya know.


pptx isn't bad. Indeed, you have more structure available than just text dumps.

It's just a zip file containing a bunch of XML. And the slides XML isn't beautiful/super nice but not super ugly either. Naively processing it is lossy, but not as lossy as converting it to text.

And most images end up as png in them. The most annoying thing is images with data (like equations).


If you're looking for more e2e math / latex aware OCR checkout https://github.com/facebookresearch/nougat



A commercial product that does the same thing and has worked very well in my experience is https://mathpix.com/. The free tier has met my needs to date.


I use the paid version and I find it well worth the money to be able to quickly compile multiple math sources into one LaTeX document for reference. It's a huge time saver and works surprisingly well, even on my handwritten notes.


I've been using mathpix for several years, and it works really well.


Nice idea. This is one of those dream problems where you can just synthesize a ton of data and solve the inverse problem. As a student this is a great way to go for a project, but can be hard to think up.


You can generate images from Latex code easily, but generating Latex code for realistic formulas seems trickier. It would be easy to end up with a ton of formulas unlike real-world ones.


Indeed, synthesizing the data well is an equally or maybe more important part of this kind of project than the particular choice of neural network architecture.


From "STEM formulas" https://news.ycombinator.com/item?id=36839748 :

> latex2sympy parses LaTeX and generates SymPy symbolic CAS Python code (w/ ANTLR) and is now merged in SymPy core but you must install ANTLR before because it's an optional dependency. Then, sympy.lambdify will compile a symbolic expression for use with TODO JAX, TensorFlow, PyTorch,.

  mamba install -c conda-forge sympy antlr # pytorch tensorflow jax  # jupyterlab jupyter_console
https://news.ycombinator.com/item?id=36159017 : sympy.utilities.lambdify.lambdify() , sympytorch, sympy2jax


But then add tests! Tests for LaTeX equations that had never been executable as code.

There are a number of ways to generate tests for functions and methods with and without parameter and return types.

Property-based testing is one way to auto-generate test cases.

Property testing: https://en.wikipedia.org/wiki/Property_testing

awesome-python-testing#property-based-testing: https://github.com/cleder/awesome-python-testing#property-ba...

https://github.com/HypothesisWorks/hypothesis :

> Hypothesis is a family of testing libraries which let you write tests parametrized by a source of examples. A Hypothesis implementation then generates simple and comprehensible examples that make your tests fail. This simplifies writing your tests and makes them more powerful at the same time, by letting software automate the boring bits and do them to a higher standard than a human would, freeing you to focus on the higher level test logic.

> This sort of testing is often called "property-based testing", and the most widely known implementation of the concept is the Haskell library QuickCheck, but Hypothesis differs significantly from QuickCheck and is designed to fit idiomatically and easily into existing styles of testing that you are used to, with absolutely no familiarity with Haskell or functional programming needed.

Fuzzing is another way to auto-generate tests and test cases; by testing combinations of function parameters as a traversal through a combinatorial graph.

Fuzzing: https://en.wikipedia.org/wiki/Fuzzing

Google/atheris is based on libFuzzer: https://github.com/google/atheris

Clusterfuzz supports libFuzzer and APFL: https://google.github.io/clusterfuzz/setting-up-fuzzing/libf...


I often teach online using a wacom+tablet + handwriting app. I write a lot of equations. The slides are shared with students.

What would be really nice is, if I could feed slides.pdf to something like this, and it did OCR on every handwritten text (english or equation), and put the output as an invisible layer under the text. Will make the slides searchable.

I understand though, OCR on handwritten equations, is a very difficult problem.


I use OneNote for writing notes (and keep it in handwritten form). Surprisingly, it is searchable! Doesn’t work for equations though.


I don't know anything about the workflow of scientists or mathematicians. But I was wondering if equation recognition was something that could help them? Like, is there utility in seeing an equation on a whiteboard, importing it, and hooking up the right inputs and outputs from that equation?

This is very much idle daydreaming. When you write out a big equation on the wall, what happens then? Does it need to be validated? Or does it go directly into a paper and no computation is performed upon/with it?


For me at least, it is useful for different things, but they are mainly about writing stuff. It is much easier to copy a couple of equations from your references and use something like image to latex to get the source without having to write them yourself. Especially for complicated equations. It makes it much faster to have discussions with other people online. It makes writing notes, copying equations from textbook .etc. was always stupidly time-consuming if you end up writing them in latex.

Also, for a lot of people with addiction to think on blackboard where you don't have to worry about anything else. It is easy. Take a photo of what you did, erase the board, write new things and take another photo. And when you are done and want to preserve this in some notes or copy to paper, just use an equation recognition tool and your life is much easier.

It is a productivity tool that saves and efforts, it will not be going to make you a super researcher/scientist.


Big equations don't come out of the ether. Either they are derived some simpler set of underlying equations based on assumptions, or they are taken from a paper/book that did that derivation.

Usually, whoever does the derivation, or someone who wants to understand things properly, will do computations on multiple steps of the derivation from the start to the finish. A lot of these computations can be done by hand - you don't need a computer. A lot of computations should be done by hand - even if they could be done by a computer - because you only get a feel for the equations if you play with them with your hands. To quote Dirac, 'I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.' That comes from solving a lot of them by hand.

Yes, oftentimes, doing numerical or symbolic computation with a computer helps. But is the pain point of that having to type the equation into the computer. Hardly. It would be nice, but nothing ground breaking.


Now do it with hand-written formulas!


"Show HN: BetterOCR combines and corrects multiple OCR engines with an LLM" https://news.ycombinator.com/context?id=38056243


This makes me wonder how well GPT-4V performs on this task (I don't have access to it).



Shouldn't the sum be done first and then the multiplication? I think GPT-4V forgot to put the brackets around the sum.


Huh yeah it looks like it. I checked the python code when I was using this & that is correctly parenthesized.


Makes me wonder what the SOTA is for open source efforts along these lines.

I have heard about "mixture of experts" as being a potentially important advance, and also of course about multimodality. So I found this: https://github.com/YeonwooSung/LIMoE-pytorch


It is curious that the rendered equation (under "Sure, I can help with that") appears to be incorrect due to some missing parens but the Python implementation itself does appear to be correct.

Now show us a version that takes into account actual representations and errors, produces an optimal implementation of the calculation for accuracy, and explains why it is. :)


Repo has been deleted? I get a 404. I did see it earlier on.

I fed the equation image (screenshot at the right frame from their gif then cropped) into ChatGPT (GPT4-V) and it correctly deciphered the equation and gave the correct LaText code.

Why was the repo removed?


All of Github is down.


Nice, but what are the minimum requirements to run it locally?


ViTs are usually pretty cheap to run.


Now we just need a way to convert a LaTeX equation to scipy or Numpy or something like that.


There was something like that, but it's been abandoned for a while now... https://github.com/augustt198/latex2sympy


even a rough go would be nice, I'll go back and check variables and matrix operations, etc. but that step of going from a derivation to engineering code is a slow step in my workflow


Probably gpt4


gpt4 vision does this out of the box: https://pasteboard.co/3VSk5HfTeMmY.jpg

It might be more error prone though, didn’t test it extensively


FYI, the fraction, alignment, and exponent syntax is not correct on this output and it won’t render the same.


Keep up the good work


Sheldon? Sheldon is that you??




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: