Researchers from MIT and several other institutions have introduced an innovative technique that enhances the problem-solving capabilities of large language models by integrating programming and natural language. This new method, termed natural language embedded programs (NLEPs), significantly improves the accuracy and transparency of AI in tasks requiring numerical or symbolic reasoning.
Traditionally, large language models like those behind ChatGPT have excelled in tasks such as drafting documents, analysing sentiment, or translating languages. However, these models often struggle with tasks that demand numerical or symbolic reasoning. For instance, while a model might recite a list of U.S. presidents and their birthdays, it might falter when asked to identify which presidents elected after 1950 were born on a Wednesday. The solution to such problems lies beyond mere language processing.
MIT researchers propose a groundbreaking approach where the language model generates and executes a Python program to solve complex queries. NLEPs work by prompting the model to create a detailed program that processes the necessary data and then presents the solution in natural language. This method enhances the model's ability to perform a wide range of reasoning tasks with higher accuracy.
How NLEPs Work
NLEPs follow a structured four-step process. First, the model identifies and calls the necessary functions to tackle the task. Next, it imports relevant natural language data required for the task, such as a list of presidents and their birthdays. In the third step, the model writes a function to calculate the answer. Finally, it outputs the result in natural language, potentially accompanied by data visualisations.
This structured approach allows users to understand and verify the program's logic, increasing transparency and trust in the AI's reasoning. Errors in the code can be directly addressed, avoiding the need to rerun the entire model, thus improving efficiency.
One significant advantage of NLEPs is their generalizability. A single NLEP prompt can handle various tasks, reducing the need for multiple task-specific prompts. This makes the approach not only more efficient but also more versatile.
The researchers demonstrated that NLEPs could achieve over 90 percent accuracy in various symbolic reasoning tasks, outperforming traditional task-specific prompting methods by 30 percent. This improvement is notable even when compared to open-source language models.
NLEPs offer an additional benefit of improved data privacy. Since the programs run locally, sensitive user data does not need to be sent to external servers for processing. This approach also allows smaller language models to perform better without expensive retraining.
Despite these advantages, NLEPs rely on the model's program generation capabilities, meaning they may not work as well with smaller models trained on limited datasets. Future research aims to enhance the effectiveness of NLEPs in smaller models and explore how different prompts can further improve the robustness of the reasoning processes.
The introduction of natural language-embedded programs marks a mounting step forward in combining the strengths of programming and natural language processing in AI. This innovative approach not only enhances the accuracy and transparency of language models but also opens new possibilities for their application in complex problem-solving tasks. As researchers continue to refine this technique, NLEPs could become a cornerstone in the development of trustworthy and efficient AI systems.