Took a peek at the models they use. It seems to be a vision transformer encoder decoder architecture with a resent backbone. Looks really good. I had a similar idea of training a model and making a desktop application, but haven't had the opportunity. I wonder how much compute it took to train the model.
Want to give proper credit to my former student for starting this: Yuntian Deng et al., 2016 (https://arxiv.org/abs/1609.04938). I believe this repo uses the dataset from that paper.
I was looking for "nightmare math" in the README and was confused because I didn't find any. I guess that's what a theoretical physics degree does to you: that formula looks very harmless to me.
Looks like a horror show to me. Makes me feel embarrassed at leaving my math bs behind and going into cs. I have an insane retirement idea of retiring to some fun mountain town and going to grad school in physics. Where's the best place to go skiing with a college that takes old washed up programmers as students?
Good suggestion, but maybe shooting too high. Checks off the "in the mountains" part of my fantasy life. But a place that has seminars for grad students and working physicists and has many Nobel laureates who attended as students may be above my intellectual grade.
Also, ask string theorists to show you calculations that they need to find the largest paper to perform. I’ve heard people doing calculations where a single line is the width of an A1 paper.
Surely he could have defined some sensible quantities and notations or exploit some symmetry to make that monstrosity a bit more compact, no? I mean the Einstein field equations in its simplest form is something like G=k*T for suitable definitions of G and T. But if you wrote each component of G in terms of the metric tensor, it would become huge. This is just one component of the Riemann tensor: https://i.stack.imgur.com/xkrq9.png
You are not wrong. Eg the original Maxwell equation is so ugly (I think 20 of them) but the ones that are taught nowadays looks so elegant.
The language is important. The more you “understand” something, the simpler it is (over simplification here.) eg I would not consider the Einstein equation expanded out to be natural after understanding it. (Well Maxwell’s equation can be summarized in 1 single elegant equation as well.)
But the reason I’d consider that optics formula to be monstrous is that no matter how you group it into smaller pieces (ie refactoring it), there’s no way to hide the fact that it is not elegant at all. There’s no “understanding” there, it is just so happen the exact solution looks like that.
To put it that way then, often fundamental “master equation” are simple in some ways, but exact solutions to some particular manifestation of that master equation is often quite ugly and monstrous. In that sense then rather than quoting Einstein equation I’d quote its solution eg one with mass and spin and electric charge.
P.S. the solution of quartic equation is also a good example of this category
I recall dimly a period of months during engineering school when I would have been able to parse those symbols and perhaps make a joke about something in the lunch room. Those days are long behind me.
Slightly related to the task, I wanna plug in my utility app for finding LaTeX commands for characters, DeTeXt: https://venkatasg.net/apps/detext
I've gotten a lot of requests to do whole equations, but I feel that would massively increase the complication of the app for not that much benefit? How often do people want to convert a whole bunch of equations into LaTeX? My use case is usually writing my own equations and forgetting the command for a specific symbol, or looking for a symbol that looks something like X.
I was thinking about how terrible of an idea it was for an ol fortune 500 company of mine to put all their information and lessons learned into some proprietary company infrastructure. This might have been alright, but at the end of the day people were uploading powerpoints. Heck it might even have been reasonable at the time, but with LLMs, it seems like storing everything in text/csv files would have been a much better idea.
The longer I live, the more I'm interested in saving all of my data into text files that I can parse later without vendor lock-in concern. Maybe other open formats as well, best tool for the job, ya know.
pptx isn't bad. Indeed, you have more structure available than just text dumps.
It's just a zip file containing a bunch of XML. And the slides XML isn't beautiful/super nice but not super ugly either. Naively processing it is lossy, but not as lossy as converting it to text.
And most images end up as png in them. The most annoying thing is images with data (like equations).
A commercial product that does the same thing and has worked very well in my experience is https://mathpix.com/. The free tier has met my needs to date.
I use the paid version and I find it well worth the money to be able to quickly compile multiple math sources into one LaTeX document for reference. It's a huge time saver and works surprisingly well, even on my handwritten notes.
Nice idea. This is one of those dream problems where you can just synthesize a ton of data and solve the inverse problem. As a student this is a great way to go for a project, but can be hard to think up.
You can generate images from Latex code easily, but generating Latex code for realistic formulas seems trickier. It would be easy to end up with a ton of formulas unlike real-world ones.
Indeed, synthesizing the data well is an equally or maybe more important part of this kind of project than the particular choice of neural network architecture.
> latex2sympy parses LaTeX and generates SymPy symbolic CAS Python code (w/ ANTLR) and is now merged in SymPy core but you must install ANTLR before because it's an optional dependency. Then, sympy.lambdify will compile a symbolic expression for use with TODO JAX, TensorFlow, PyTorch,.
> Hypothesis is a family of testing libraries which let you write tests parametrized by a source of examples. A Hypothesis implementation then generates simple and comprehensible examples that make your tests fail. This simplifies writing your tests and makes them more powerful at the same time, by letting software automate the boring bits and do them to a higher standard than a human would, freeing you to focus on the higher level test logic.
> This sort of testing is often called "property-based testing", and the most widely known implementation of the concept is the Haskell library QuickCheck, but Hypothesis differs significantly from QuickCheck and is designed to fit idiomatically and easily into existing styles of testing that you are used to, with absolutely no familiarity with Haskell or functional programming needed.
Fuzzing is another way to auto-generate tests and test cases; by testing combinations of function parameters as a traversal through a combinatorial graph.
I often teach online using a wacom+tablet + handwriting app. I write a lot of equations. The slides are shared with students.
What would be really nice is, if I could feed slides.pdf to something like this, and it did OCR on every handwritten text (english or equation), and put the output as an invisible layer under the text. Will make the slides searchable.
I understand though, OCR on handwritten equations, is a very difficult problem.
I don't know anything about the workflow of scientists or mathematicians. But I was wondering if equation recognition was something that could help them? Like, is there utility in seeing an equation on a whiteboard, importing it, and hooking up the right inputs and outputs from that equation?
This is very much idle daydreaming. When you write out a big equation on the wall, what happens then? Does it need to be validated? Or does it go directly into a paper and no computation is performed upon/with it?
For me at least, it is useful for different things, but they are mainly about writing stuff. It is much easier to copy a couple of equations from your references and use something like image to latex to get the source without having to write them yourself. Especially for complicated equations. It makes it much faster to have discussions with other people online. It makes writing notes, copying equations from textbook .etc. was always stupidly time-consuming if you end up writing them in latex.
Also, for a lot of people with addiction to think on blackboard where you don't have to worry about anything else. It is easy. Take a photo of what you did, erase the board, write new things and take another photo. And when you are done and want to preserve this in some notes or copy to paper, just use an equation recognition tool and your life is much easier.
It is a productivity tool that saves and efforts, it will not be going to make you a super researcher/scientist.
Big equations don't come out of the ether. Either they are derived some simpler set of underlying equations based on assumptions, or they are taken from a paper/book that did that derivation.
Usually, whoever does the derivation, or someone who wants to understand things properly, will do computations on multiple steps of the derivation from the start to the finish. A lot of these computations can be done by hand - you don't need a computer. A lot of computations should be done by hand - even if they could be done by a computer - because you only get a feel for the equations if you play with them with your hands. To quote Dirac, 'I consider that I understand an equation when I can predict the properties of its solutions, without actually solving it.' That comes from solving a lot of them by hand.
Yes, oftentimes, doing numerical or symbolic computation with a computer helps. But is the pain point of that having to type the equation into the computer. Hardly. It would be nice, but nothing ground breaking.
Makes me wonder what the SOTA is for open source efforts along these lines.
I have heard about "mixture of experts" as being a potentially important advance, and also of course about multimodality. So I found this: https://github.com/YeonwooSung/LIMoE-pytorch
It is curious that the rendered equation (under "Sure, I can help with that") appears to be incorrect due to some missing parens but the Python implementation itself does appear to be correct.
Now show us a version that takes into account actual representations and errors, produces an optimal implementation of the calculation for accuracy, and explains why it is. :)
Repo has been deleted? I get a 404. I did see it earlier on.
I fed the equation image (screenshot at the right frame from their gif then cropped) into ChatGPT (GPT4-V) and it correctly deciphered the equation and gave the correct LaText code.
even a rough go would be nice, I'll go back and check variables and matrix operations, etc. but that step of going from a derivation to engineering code is a slow step in my workflow
I think this paper was the first one to do OCR on LaTeX: http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf The paper describes an Encoder-Decoder architecture with CNN encoder and LSTM based decoder.