Abstract:
DeepFrack is a novel framework developed for enhancing energy efficiency and reducing latency in deep learning workloads executed on hardware accelerators. By optimally fusing layers and implementing an asymmetric tiling strategy, DeepFrack addresses the limitations of traditional layer-by-layer scheduling. The computational efficiency of our method is underscored by significant performance improvements seen across various deep neural network architectures such as AlexNet, VGG, and ResNets when run on Eyeriss and Simba accelerators. The reduction in latency (30 % to 40 %) and energy consumption (30 % to 50 %) are further enhanced by the efficient usage of the on-chip buffer and reduction of external memory bandwidth bottleneck. This work contributes to the ongoing efforts in designing more efficient hardware accelerators for machine learning workloads.