Accelerating innovation with Lund University and Codasip University Program
In recent years, the rapid advancement and adoption of Artificial Intelligence (AI) on the edge has brought about a surge in development. As AI models like ChatGPT become more prevalent and accurate, the computational requirements for inference also escalate. This necessitates architectural innovations aimed at reducing both power consumption and latency.
The need for edge computing
As machine learning permeates various aspects of our daily lives, the amount of data processed by our devices is skyrocketing. To handle the heavy computational load, many devices leverage cloud computing to transmit and offload data to servers. However, this approach isn’t without its drawbacks.
Transmitting data over the internet introduces latency, which can be problematic for real-time applications such as self-driving cars. Edge computing, where computation is performed locally, emerges as a solution to address these latency issues. However, it introduces new challenges, particularly in terms of performance and power efficiency.
Computing large volumes of data in battery-powered units poses significant challenges compared to servers with virtually limitless power supplies. Recent AI models, particularly large language models like ChatGPT, consist of billions of parameters that need to be shuttled between memory and processing units. This data movement creates a bottleneck for efficient AI inference. Transitioning towards a more data-centric architecture, such as Near Memory Computing (NMC), can mitigate these issues by moving data-intensive calculations closer to the source of the data, thereby reducing latency and power consumption.
Read more here: Source link