Artificial Intelligence has been the buzzword of 2023 with Google, Meta, and Microsoft showcasing their line of products and their grand vision to harness AI. In all the pandemonium surrounding AI, Apple has been conspicuously silent or slow when it comes to showcasing its AI prowess. Perhaps, this is the reason why many have been asking what Apple is doing to keep up with the AI arms race. The answer is simple, Apple has been working with AI in different capacities for years. It is only that users have not been able to integrate something like ChatGPT on their iPhones.
However, things are about to change. Apple in a new research paper has demonstrated a breakthrough technique that can help in running AI on iPhones. This technique involves streamlining bulky LLMs using flash storage optimisation. When Apple integrates advanced AI into the iPhone, it will be another significant turn of events. The Cupertino-based tech giant has announced significant developments in AI through two new research papers that it showcased this month. The paper revealed new techniques for 3D avatars and efficient language model inference.
This new research ‘LLM in a Flash: Efficient Large Language Model Inference with Limited Memory’ published on December 12 has the potential to transform the iPhone experience as it could offer a more immersive visual experience and users will be able to access complex AI systems on iPhones and iPads. The research paper essentially focuses on running large language models efficiently on devices with limited DRAM capacity. DRAM is dynamic random access memory that is used in PCs and is known for fast speed, high density, affordability, and lower power consumption.
Here are some takeaways from the research that will set Apple ahead of its peers
The paper addresses the challenge of running LLMs that actually exceed available DRAM, storing model parameters in flash memory, and running them into DRAM on demand. It talks about the Inference Cost Model that has been developed to optimise data transfers from flash memory, considering flash and DRAM characteristics.
The techniques that have been discussed in the paper are Windowing which reduces data transfer by re-using previously activated neurons, and Row-Column Bundling which increases data chunk sizes for efficient flash memory reads.
The paper also highlights Sparsity Exploitation which uses sparsity in FeedForward Network (FFN) layers to load parameters selectively to enhance efficiency. Another key aspect is memory management which proposes strategies to efficiently manage loaded data in DRAM to minimise overhead.
The researchers have used models such as OPT 6.7B and Falcon 7B to demonstrate their approach. As per the paper, the results showed that the models achieved a 4-5x and 20-25x increase in speed on CPU and GPU respectively when compared to traditional methods.
When it comes to the practical application of the research, the two models demonstrated significant improvements in resource-limited environments.
Apple’s new research shows an innovative approach to efficiently running LLMs in hardware-constrained environments. It paves a new direction for future research in on-device and next-generation user experience.
What does it mean for iPhone users?
From a user perspective, the findings on efficient LLM inference with limited memory could greatly benefit both Apple and iPhone users. With powerful LLMs running efficiently on devices with limited DRAM like iPhones and iPads, users will be able to experience enhanced AI capabilities at their fingertips. These capabilities include improved language processing, more sophisticated voice assistants, enhanced privacy, potentially reduced internet bandwidth usage, and most importantly making advanced AI accessible and responsive to all iPhone users.
Regardless of the future capabilities that demonstrate how Apple is working towards dominating AI research and applications, experts seem to be in a cautionary mode. Some experts seem to have been suggesting that the tech giant will need to practise great caution and responsibility while incorporating the research findings into real-world use cases. Some have even highlighted the need to consider privacy protection, ways to mitigate potential misuse and overall impact.