Meta is concentrating on small language models (SLMs) for mobile devices, suggesting they can be as effective as larger models despite having fewer parameters.
Research Highlights Potential of Sub-Billion Parameter Models
Meta’s latest research indicates that language models with fewer than a billion parameters can perform on par with their larger counterparts. This breakthrough could transform on-device AI applications by reducing energy consumption during model inference tasks.
Comparison with Large Language Models
Large language models (LLMs), such as Mistral-22B with 22 billion parameters and GPT-4 with 1.76 trillion parameters, require extensive computing resources. In contrast, SLMs, like Microsoft’s Phi-3 family, start from 3.8 billion parameters, demanding less infrastructure.
Enabling Widespread Adoption on Mobile Devices
Meta researchers argue that effective SLMs with fewer than a billion parameters could enable widespread adoption of generative AI on mobile devices, which have limited computational power compared to servers.
Experimental Findings
In their experiments, the researchers tested models with 125 million and 350 million parameters. They found that prioritizing depth over width in model architecture significantly enhanced performance.
Introducing MobileLLM
“Contrary to the belief that data and parameter quantity are crucial for model quality, our study highlights the importance of architecture for sub-billion scale LLMs,” the researchers noted. They introduced a baseline network called MobileLLM, utilizing deep and thin architectures, embedding sharing, and grouped-query attention mechanisms. MobileLLM showed a significant accuracy boost of 2.7%/4.3% over previous 125M/350M models.
Demonstrated Effectiveness and Availability
The 125M and 350M MobileLLM models demonstrated effectiveness comparable to larger models like Llama 2 in handling chat and API calling tasks, showcasing the potential of small models for on-device use cases. Although MobileLLM is not yet publicly available in Meta’s products, the researchers have released the code and data from their experiments.