AxLaM: energy-efficient accelerator design for language models for edge computing

Show simple item record

dc.contributor.author Issac, Tom Glint
dc.contributor.author Mittal, Bhumika
dc.contributor.author Sharma, Santripta
dc.contributor.author Ronak, Abdul
dc.contributor.author Goud, Abhinav
dc.contributor.author Kasture, Neerja
dc.contributor.author Momin, Zaqi
dc.contributor.author Krishna, Aravind
dc.contributor.author Mekie, Joycee
dc.coverage.spatial United Kingdom
dc.date.accessioned 2025-01-24T15:05:29Z
dc.date.available 2025-01-24T15:05:29Z
dc.date.issued 2025-01
dc.identifier.citation Issac, Tom Glint; Mittal, Bhumika; Sharma, Santripta; Ronak, Abdul; Goud, Abhinav; Kasture, Neerja; Momin, Zaqi; Krishna, Aravind and Mekie, Joycee, "AxLaM: energy-efficient accelerator design for language models for edge computing", Philosophical Transactions of the Royal Society A, DOI: 10.1098/rsta.2023.0395, vol. 383, no. 2288, Jan. 2025.
dc.identifier.issn 1364-503X
dc.identifier.issn 1471-2962
dc.identifier.uri https://doi.org/10.1098/rsta.2023.0395
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/10961
dc.description.abstract Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.
dc.description.statementofresponsibility by Tom Glint Issac, Bhumika Mittal, Santripta Sharma, Abdul Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna and Joycee Mekie
dc.format.extent vol. 383, no. 2288
dc.language.iso en_US
dc.publisher The Royal Society
dc.subject Transformer accelerator
dc.subject Language model BERT
dc.subject Hardware accelerator
dc.title AxLaM: energy-efficient accelerator design for language models for edge computing
dc.type Article
dc.relation.journal Philosophical Transactions of the Royal Society A


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account