Facebook AI has built the first AI system that can solve advanced mathematics equations using symbolic reasoning. By developing a new way to represent complex mathematical expressions as a kind of language and then treating solutions as a translation problem for sequence-to-sequence neural networks, we built a system that outperforms traditional computation systems at solving integration problems and both first- and second-order differential equations.
Previously, these kinds of problems were considered out of the reach of deep learning models, because solving complex equations requires precision rather than approximation. Neural networks excel at learning to succeed through approximation, such as recognizing that a particular pattern of pixels is likely to be an image of a dog or that features of a sentence in one language match those in another. Solving complex equations also requires the ability to work with symbolic data, such as the letters in the formula b – 4ac = 7. Such variables can’t be directly added, multiplied, or divided, and using only traditional pattern matching or statistical analysis, neural networks were limited to extremely simple mathematical problems.
Our solution was an entirely new approach that treats complex equations like sentences in a language. This allowed us to leverage proven techniques in neural machine translation (NMT), training models to essentially translate problems into solutions. Implementing this approach required developing a method for breaking existing mathematical expressions into a language-like syntax, as well as generating a large-scale training data set of more than 100M paired equations and solutions.
When presented with thousands of unseen expressions — equations that weren’t part of its training data — our model performed with significantly more speed and accuracy than traditional, algebra-based equation-solving software, such as Maple, Mathematica, and Matlab. This work not only demonstrates that deep learning can be used for symbolic reasoning but also suggests that neural networks have the potential to tackle a wider variety of tasks, including those not typically associated with pattern recognition. We’re sharing details about our approach as well as methods to help others generate similar training sets.
A new way to apply NMT
Humans who are particularly good at symbolic math often rely on a kind of intuition. They have a sense of what the solution to a given problem should look like — such as observing that if there is a cosine in the function we want to integrate, then there may be a sine in its integral — and then do the necessary work to prove it. This is different from the direct calculation required for algebra. By training a model to detect patterns in symbolic equations, we believed that a neural network could piece together the clues that led to their solutions, roughly similar to a human’s intuition-based approach to complex problems. So we began exploring symbolic reasoning as an NMT problem, in which a model could predict possible solutions based on examples of problems and their matching solutions.
An example of how our approach expands an existing equation (on the left) into an expression tree that can serve as input for a translation model. For this equation, the preorder sequence input into our model would be: (plus, times, 3, power, x, 2, minus, cosine, times, 2, x, 1).
To implement this application with neural networks, we needed a novel way of representing mathematical expressions. NMT systems are typically sequence-to-sequence (seq2seq) models, using sequences of words as input, and outputting new sequences, allowing them to translate complete sentences rather than individual words. We used a two-step approach to apply this method to symbolic equations. First, we developed a process that effectively unpacks equations, laying them out in a branching, treelike structure that can then be expanded into sequences that are compatible with seq2seq models. Constants and variables act as leaves, while operators (such as plus and minus) and functions are the internal nodes that connect the branches of the tree.
Though it might not look like a traditional language, organizing expressions in this way provides a language-like syntax for equations — numbers and variables are nouns, while operators act as verbs. Our approach enables an NMT model to learn to align the patterns of a given tree-structured problem with its matching solution (also expressed as a tree), similar to matching a sentence in one language with its confirmed translation. This method lets us leverage powerful, out-of-the-box seq2seq NMT models, swapping out sequences of words for sequences of symbols.
Building a new data set for training
Though our expression-tree syntax made it theoretically possible for an NMT model to effectively translate complex math problems into solutions, training such a model would require a large set of examples. And because in the two classes of problems we focused on — integration and differential equations — a randomly generated problem does not always have a solution, we couldn’t simply collect equations and feed them into the system. We needed to generate an entirely novel training set consisting of examples of solved equations restructured as model-readable expression trees. This resulted in problem-solution pairs, similar to a corpus of sentences translated between languages. Our set would also have to be significantly larger than the training data used in previous research in this area, which has attempted to train systems on thousands of examples. Since neural networks generally perform better when they have more training data, we created a set with millions of examples.
Building this data set required us to incorporate a range of data cleaning and generation techniques. For our symbolic integration equations, for example, we flipped the translation approach around: Instead of generating problems and finding their solutions, we generated solutions and found their problem (their derivative), which is a much easier task. This approach of generating problems from their solutions — what engineers sometimes refer to as trapdoor problems — made it feasible to create millions of integration examples. Our resulting translation-inspired data set consists of roughly 100M paired examples, with subsets of integration problems as well as first- and second-order differential equations.
We used this data set to train a seq2seq transformer model with eight attention heads and six layers. Transformers are commonly used for translation tasks, and our network was built to predict the solutions for different kinds of equations, such as determining a primitive for a given function. To gauge our model’s performance, we presented it with 5,000 unseen expressions, forcing the system to recognize patterns within equations that didn’t appear in its training. Our model demonstrated 99.7 percent accuracy when solving integration problems, and 94 percent and 81.2 percent accuracy, respectively, for first- and second-order differential equations. Those results exceeded those of all three of the traditional equation solvers we tested against. Mathematica achieved the next best results, with 84 percent accuracy on the same integration problems and 77.2 percent and 61.6 percent for differential equation results. Our model also returned most predictions in less than 0.5 second, while the other systems took several minutes to find a solution and sometimes timed out entirely.
Our model took the equations on the left as input — equations that both Mathematica and Matlab were unable to solve — and was able to find correct solutions (shown on the right) in less than one second.
Comparing generated solutions to reference solutions allowed us to easily and precisely validate the results. But our model is also able to produce multiple solutions for a given equation. This is similar to what happens in machine translation, where there are many ways to translate an input sentence.
What’s next for equation-solving AI
Our model currently works on problems with a single variable, and we plan to expand it to multiple-variable equations. This approach could also be applied to other mathematics- and logic-based fields, such as physics, potentially leading to software that assists scientists in a broad range of work.
But our system has broader implications for the study and use of neural networks. By discovering a way to use deep learning where it was previously seen as unfeasible, this work suggests that other tasks could benefit from AI. Whether through the further application of NLP techniques to domains that haven’t traditionally been associated with languages, or through even more open-ended explorations of pattern recognition in new or seemingly unrelated tasks, the perceived limitations of neural networks may be limitations of imagination, not technology.