Abstract:
In-Memory computing (IMC) has emerged as a promising approach to address the von Neumann bottleneck in deep learning applications. This work proposes FP-ATM, a 6T SRAM-based all-digital design for multiply-accumulate (MAC) operations, featuring a flexible NOR Adder Tree for In-Memory Computing. The proposed macro is data-aware and can support input activations and weights for INT8 and BF16 number formats in a convolutional neural network. Using multiple macros in different configurations can support neural networks with different topologies. The proposed macro is based on bit-serial multiplication and parallel adder trees. This architecture can achieve massively parallel MAC operations with high energy efficiency and throughput. The proposed macro achieves a peak energy efficiency of 267.7 TFLOPS/W at 0.65V, 8.5 times the state-of-the-art work. The maximum frequency is 1.67 GHz and achieves throughput of 2.67 GFLOPS/Kb at a voltage of 0.9V.