Pix2tex: Using a ViT to convert images of equations into LaTeX code

kuter · on Nov 3, 2023

Took a peek at the models they use. It seems to be a vision transformer encoder decoder architecture with a resent backbone. Looks really good. I had a similar idea of training a model and making a desktop application, but haven't had the opportunity. I wonder how much compute it took to train the model.

I think this paper was the first one to do OCR on LaTeX: http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf The paper describes an Encoder-Decoder architecture with CNN encoder and LSTM based decoder.

srush · on Nov 3, 2023

Want to give proper credit to my former student for starting this: Yuntian Deng et al., 2016 (https://arxiv.org/abs/1609.04938). I believe this repo uses the dataset from that paper.

Some recent cool work he's been doing: https://www.youtube.com/watch?v=lx1XcTdhalU.

perihelions · on Nov 3, 2023

For the morbidly curious, that nightmare math is someone's quantum field theory notes which they typeset in TeX:

https://rohankulkarni.me/files/notes/heidelberg_qft/12_2.pdf ("12.2 Diagrammatic expansion of partition function for Yukawa theory")

mr_mitm · on Nov 3, 2023

I was looking for "nightmare math" in the README and was confused because I didn't find any. I guess that's what a theoretical physics degree does to you: that formula looks very harmless to me.

SiempreViernes · on Nov 3, 2023

Yeah, I was also let down when I found the "nightmare math" was simply integral of a generic Lagrangian density...

perihelions · on Nov 3, 2023

you're a simple integral of generic density

black_puppydog · on Nov 3, 2023

wow, please folks, keep it civilized! :D

rnk · on Nov 3, 2023

Looks like a horror show to me. Makes me feel embarrassed at leaving my math bs behind and going into cs. I have an insane retirement idea of retiring to some fun mountain town and going to grad school in physics. Where's the best place to go skiing with a college that takes old washed up programmers as students?

wolfi1 · on Nov 3, 2023

https://en.wikipedia.org/wiki/%C3%89cole_de_physique_des_Hou...

rnk · on Nov 3, 2023

Good suggestion, but maybe shooting too high. Checks off the "in the mountains" part of my fantasy life. But a place that has seminars for grad students and working physicists and has many Nobel laureates who attended as students may be above my intellectual grade.

black_puppydog · on Nov 6, 2023

Close by is Grenoble. You can take the bus from the main station to arrive at the nearest ski stations within 45mins. :)

mr_mitm · on Nov 4, 2023

U of NM maybe? You can go skying just outside of Albuquerque and play golf on the same day.

nilsherzig · on Nov 4, 2023

Well now you have to show us something you would consider nightmare math

KolenCh · on Nov 4, 2023

https://petapixel.com/2019/07/05/goodbye-aberration-physicis...

Another example is standard model of particle physics. There’s a way to write down the Lagrangian of the standard model very compactly: https://visit.cern/content/standard_model_formula_t_shirt

But if you expanded all the implied sums and terms, it probably would be monstrous. After all it is supposed to contain all the terms that this figure contains for example: https://commons.wikimedia.org/wiki/File:Standard_Model_of_El...

Also, ask string theorists to show you calculations that they need to find the largest paper to perform. I’ve heard people doing calculations where a single line is the width of an A1 paper.

mr_mitm · on Nov 4, 2023

Surely he could have defined some sensible quantities and notations or exploit some symmetry to make that monstrosity a bit more compact, no? I mean the Einstein field equations in its simplest form is something like G=k*T for suitable definitions of G and T. But if you wrote each component of G in terms of the metric tensor, it would become huge. This is just one component of the Riemann tensor: https://i.stack.imgur.com/xkrq9.png

KolenCh · on Nov 5, 2023

You are not wrong. Eg the original Maxwell equation is so ugly (I think 20 of them) but the ones that are taught nowadays looks so elegant.

The language is important. The more you “understand” something, the simpler it is (over simplification here.) eg I would not consider the Einstein equation expanded out to be natural after understanding it. (Well Maxwell’s equation can be summarized in 1 single elegant equation as well.)

But the reason I’d consider that optics formula to be monstrous is that no matter how you group it into smaller pieces (ie refactoring it), there’s no way to hide the fact that it is not elegant at all. There’s no “understanding” there, it is just so happen the exact solution looks like that.

To put it that way then, often fundamental “master equation” are simple in some ways, but exact solutions to some particular manifestation of that master equation is often quite ugly and monstrous. In that sense then rather than quoting Einstein equation I’d quote its solution eg one with mass and spin and electric charge.

P.S. the solution of quartic equation is also a good example of this category

kelipso · on Nov 4, 2023

There is a ton of repetition, so you could probably replace it with like five relatively small equations. Still...glad I didn't get into optics lol.

mr_mitm · on Nov 4, 2023

I'm pretty sure I used to get nightmares from Christoffel symbols and Riemannian geometry: https://i.stack.imgur.com/Njf17.png

diracs_stache · on Nov 3, 2023

Yeah that didn't look much more nasty than most problem sets

mistrial9 · on Nov 3, 2023

three dimensions over time, with vector components?

ttul · on Nov 3, 2023

I recall dimly a period of months during engineering school when I would have been able to parse those symbols and perhaps make a joke about something in the lunch room. Those days are long behind me.

bluish29 · on Nov 3, 2023

I would argue that it is a nightmare in general, a viewpoint of someone who actually shared this pain before.

itishappy · on Nov 3, 2023

Quantum is a trip. Pages of math to describe... 4 straight lines.

leumassuehtam · on Nov 3, 2023

Quite the opposite, the 4 straight lines represent all that math. Notation is very powerful and get you quite far.

itishappy · on Nov 3, 2023

Understood, and that was exactly my (poorly made) point. Crazy how powerful quantum notation is!

_venkatasg · on Nov 3, 2023

Slightly related to the task, I wanna plug in my utility app for finding LaTeX commands for characters, DeTeXt: https://venkatasg.net/apps/detext

I've gotten a lot of requests to do whole equations, but I feel that would massively increase the complication of the app for not that much benefit? How often do people want to convert a whole bunch of equations into LaTeX? My use case is usually writing my own equations and forgetting the command for a specific symbol, or looking for a symbol that looks something like X.

KolenCh · on Nov 4, 2023

Stealing formula. Sometimes one just want to be able to copy and paste equations in their own notes.

hospitalJail · on Nov 3, 2023

I was thinking about how terrible of an idea it was for an ol fortune 500 company of mine to put all their information and lessons learned into some proprietary company infrastructure. This might have been alright, but at the end of the day people were uploading powerpoints. Heck it might even have been reasonable at the time, but with LLMs, it seems like storing everything in text/csv files would have been a much better idea.

The longer I live, the more I'm interested in saving all of my data into text files that I can parse later without vendor lock-in concern. Maybe other open formats as well, best tool for the job, ya know.

mlyle · on Nov 3, 2023

pptx isn't bad. Indeed, you have more structure available than just text dumps.

It's just a zip file containing a bunch of XML. And the slides XML isn't beautiful/super nice but not super ugly either. Naively processing it is lossy, but not as lossy as converting it to text.

And most images end up as png in them. The most annoying thing is images with data (like equations).

Palmik · on Nov 3, 2023

If you're looking for more e2e math / latex aware OCR checkout https://github.com/facebookresearch/nougat

wittjeff · on Nov 3, 2023

Also https://mathpix.com/ocr

gammarator · on Nov 3, 2023

A commercial product that does the same thing and has worked very well in my experience is https://mathpix.com/. The free tier has met my needs to date.

rnadomvirlabe · on Nov 3, 2023

I use the paid version and I find it well worth the money to be able to quickly compile multiple math sources into one LaTeX document for reference. It's a huge time saver and works surprisingly well, even on my handwritten notes.

techwizrd · on Nov 3, 2023

I've been using mathpix for several years, and it works really well.

radarsat1 · on Nov 3, 2023

Nice idea. This is one of those dream problems where you can just synthesize a ton of data and solve the inverse problem. As a student this is a great way to go for a project, but can be hard to think up.

mkl · on Nov 3, 2023

You can generate images from Latex code easily, but generating Latex code for realistic formulas seems trickier. It would be easy to end up with a ton of formulas unlike real-world ones.

radarsat1 · on Nov 4, 2023

Indeed, synthesizing the data well is an equally or maybe more important part of this kind of project than the particular choice of neural network architecture.

westurner · on Nov 4, 2023

From "STEM formulas" https://news.ycombinator.com/item?id=36839748 :

> latex2sympy parses LaTeX and generates SymPy symbolic CAS Python code (w/ ANTLR) and is now merged in SymPy core but you must install ANTLR before because it's an optional dependency. Then, sympy.lambdify will compile a symbolic expression for use with TODO JAX, TensorFlow, PyTorch,.

  mamba install -c conda-forge sympy antlr # pytorch tensorflow jax  # jupyterlab jupyter_console

https://news.ycombinator.com/item?id=36159017 : sympy.utilities.lambdify.lambdify() , sympytorch, sympy2jax

westurner · on Nov 4, 2023

But then add tests! Tests for LaTeX equations that had never been executable as code.

There are a number of ways to generate tests for functions and methods with and without parameter and return types.

Property-based testing is one way to auto-generate test cases.

Property testing: https://en.wikipedia.org/wiki/Property_testing

awesome-python-testing#property-based-testing: https://github.com/cleder/awesome-python-testing#property-ba...

https://github.com/HypothesisWorks/hypothesis :

> Hypothesis is a family of testing libraries which let you write tests parametrized by a source of examples. A Hypothesis implementation then generates simple and comprehensible examples that make your tests fail. This simplifies writing your tests and makes them more powerful at the same time, by letting software automate the boring bits and do them to a higher standard than a human would, freeing you to focus on the higher level test logic.

> This sort of testing is often called "property-based testing", and the most widely known implementation of the concept is the Haskell library QuickCheck, but Hypothesis differs significantly from QuickCheck and is designed to fit idiomatically and easily into existing styles of testing that you are used to, with absolutely no familiarity with Haskell or functional programming needed.

Fuzzing is another way to auto-generate tests and test cases; by testing combinations of function parameters as a traversal through a combinatorial graph.

Fuzzing: https://en.wikipedia.org/wiki/Fuzzing

Google/atheris is based on libFuzzer: https://github.com/google/atheris

Clusterfuzz supports libFuzzer and APFL: https://google.github.io/clusterfuzz/setting-up-fuzzing/libf...

abdullahkhalids · on Nov 3, 2023

I often teach online using a wacom+tablet + handwriting app. I write a lot of equations. The slides are shared with students.

What would be really nice is, if I could feed slides.pdf to something like this, and it did OCR on every handwritten text (english or equation), and put the output as an invisible layer under the text. Will make the slides searchable.

I understand though, OCR on handwritten equations, is a very difficult problem.

Jaxan · on Nov 3, 2023

I use OneNote for writing notes (and keep it in handwritten form). Surprisingly, it is searchable! Doesn’t work for equations though.

bloopernova · on Nov 3, 2023

I don't know anything about the workflow of scientists or mathematicians. But I was wondering if equation recognition was something that could help them? Like, is there utility in seeing an equation on a whiteboard, importing it, and hooking up the right inputs and outputs from that equation?

This is very much idle daydreaming. When you write out a big equation on the wall, what happens then? Does it need to be validated? Or does it go directly into a paper and no computation is performed upon/with it?

bluish29 · on Nov 3, 2023

For me at least, it is useful for different things, but they are mainly about writing stuff. It is much easier to copy a couple of equations from your references and use something like image to latex to get the source without having to write them yourself. Especially for complicated equations. It makes it much faster to have discussions with other people online. It makes writing notes, copying equations from textbook .etc. was always stupidly time-consuming if you end up writing them in latex.

Also, for a lot of people with addiction to think on blackboard where you don't have to worry about anything else. It is easy. Take a photo of what you did, erase the board, write new things and take another photo. And when you are done and want to preserve this in some notes or copy to paper, just use an equation recognition tool and your life is much easier.

It is a productivity tool that saves and efforts, it will not be going to make you a super researcher/scientist.

abdullahkhalids · on Nov 3, 2023

Big equations don't come out of the ether. Either they are derived some simpler set of underlying equations based on assumptions, or they are taken from a paper/book that did that derivation.

Usually, whoever does the derivation, or someone who wants to understand things properly, will do computations on multiple steps of the derivation from the start to the finish. A lot of these computations can be done by hand - you don't need a computer. A lot of computations should be done by hand - even if they could be done by a computer - because you only get a feel for the equations if you play with them with your hands. To quote Dirac, 'I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.' That comes from solving a lot of them by hand.

Yes, oftentimes, doing numerical or symbolic computation with a computer helps. But is the pain point of that having to type the equation into the computer. Hardly. It would be nice, but nothing ground breaking.

runxel · on Nov 3, 2023

Now do it with hand-written formulas!

westurner · on Nov 4, 2023

"Show HN: BetterOCR combines and corrects multiple OCR engines with an LLM" https://news.ycombinator.com/context?id=38056243

GaggiX · on Nov 3, 2023

This makes me wonder how well GPT-4V performs on this task (I don't have access to it).

alright2565 · on Nov 3, 2023

It is excellent. https://imgur.com/a/1YMSl9s

GaggiX · on Nov 3, 2023

Shouldn't the sum be done first and then the multiplication? I think GPT-4V forgot to put the brackets around the sum.

alright2565 · on Nov 3, 2023

Huh yeah it looks like it. I checked the python code when I was using this & that is correctly parenthesized.

ilaksh · on Nov 3, 2023

Makes me wonder what the SOTA is for open source efforts along these lines.

I have heard about "mixture of experts" as being a potentially important advance, and also of course about multimodality. So I found this: https://github.com/YeonwooSung/LIMoE-pytorch

Silhouette · on Nov 3, 2023

It is curious that the rendered equation (under "Sure, I can help with that") appears to be incorrect due to some missing parens but the Python implementation itself does appear to be correct.

Now show us a version that takes into account actual representations and errors, produces an optimal implementation of the calculation for accuracy, and explains why it is. :)

facu17y · on Nov 3, 2023

Repo has been deleted? I get a 404. I did see it earlier on.

I fed the equation image (screenshot at the right frame from their gif then cropped) into ChatGPT (GPT4-V) and it correctly deciphered the equation and gave the correct LaText code.

Why was the repo removed?

gwern · on Nov 3, 2023

All of Github is down.

jksk61 · on Nov 3, 2023

Nice, but what are the minimum requirements to run it locally?

nuz · on Nov 3, 2023

ViTs are usually pretty cheap to run.

bee_rider · on Nov 3, 2023

Now we just need a way to convert a LaTeX equation to scipy or Numpy or something like that.

adr1an · on Nov 3, 2023

There was something like that, but it's been abandoned for a while now... https://github.com/augustt198/latex2sympy

diracs_stache · on Nov 3, 2023

even a rough go would be nice, I'll go back and check variables and matrix operations, etc. but that step of going from a derivation to engineering code is a slow step in my workflow

cypress66 · on Nov 3, 2023

Probably gpt4

sashank_1509 · on Nov 4, 2023

gpt4 vision does this out of the box: https://pasteboard.co/3VSk5HfTeMmY.jpg

It might be more error prone though, didn’t test it extensively

gyrovagueGeist · on Nov 4, 2023

FYI, the fraction, alignment, and exponent syntax is not correct on this output and it won’t render the same.

_ktqs · on Nov 3, 2023

Keep up the good work

spandextwins · on Nov 3, 2023

Sheldon? Sheldon is that you??