Recently, the Design, Automation and Test in Europe Conference was successfully held in Antwerp, Belgium. Doctoral student Wang Zongwu, co-directed by associate professor Jiang Li and assistant professor He Zhezhi of the Advanced Computer Architecture Laboratory of the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, won the Best Paper Award in the Test and Dependability field.
The title of the research paper is "Self-Terminated Write of Multi-Level Cell ReRAM for Efficient Neuromorphic Computing". This work focuses on the high-resistance instability in existing memristor-based processing-in-memory (PIM) systems and proposes the circuit-level solution of the adaptive programming for multi-bit cells. The proposed design is deeply integrated into the existing storage and computing system, boosting the storage density and calculation power. Furthermore, the neural network deployment speed and energy efficiency are increased by 4.7 times and 2 times, respectively, significantly reducing the overhead of the NN model deployment.
Background
The ReRAM-based PIM unit can significantly reduce the computational complexity of large-scale VMM operations in neural network inference from to , which greatly reduce the time and energy consumption of large-scale data processing. It seems the most promising technology to break the von Neumann bottleneck existing in the traditional architecture. However, due to the immaturity of manufacturing technology and material research, ReRAM still has many non-ideal characteristics in practical testing. Among them, the resistance instability of ReRAM between different programming cycles and the variation between different devices in the same programming cycle are the most influential, which seriously limits the resistance accuracy of ReRAM, making it unable to be well applied to high-density storage and energy-efficient in-memory accelerator. So far, the only solution to achieve high accuracy is the write & verify scheme that is widely used in traditional NVM. However, this scheme requires dozens of programming pulses for each memory cell programming, and mapping a VGG-16 network with 128M weights requires 46 minutes. It is unrealistic to apply to large-scale neural network weights deployment. Therefore, an efficient and high-precision ReRAM-based weight mapping scheme is needed, which is also a key problem that all memristor-based PIM systems must solve.
Research results
Since Jingang Wu's team of Fudan University proposed the Self-Adaptive Write Mode (SAWM) programming scheme on VLSI in 2012, which realized single-bit precision programming, research teams from different research groups have continued to improve the scheme. However, due to the randomness of the resistance change of ReRAM and the characteristics of low delay, the adaptive programming circuit needs to have high precision and high feedback speed at the same time, which is quite challenging. Therefore, prior works mainly focus on adaptive programming circuits with low power, even at the cost of lower accuracy. However, with the increasing demand for storage density in the era of big data, it is imperative to achieve high-precision storage, and the storage and computing system based on the principle of current summation will further amplify the impact of resistance variation. This research starts from the SAWM programming circuit proposed by the Fudan University, and is deeply integrated with the existing ReRAM-based PIM system. An extremely compact adaptive programming circuit is proposed based on maximizing the reuse of existing peripheral circuits. Only 7 additional transistors are required for a single programming channel to realize the adaptive cut-off of SET and RESET simultaneously. The unique circuit design also ensures the high efficiency of the feedback circuit and achieves more precise resistance control.
Fig. 1 Left: schematic diagram of adaptive circuit implementation; Right: Simulation waveform
Figure 1 shows the schematic diagram of the designed circuit and the Spice simulation results. It can be seen that the SET and RESET programming are adaptively cut at and , respectively, and the leakage current is lower than 1nA. In the Spice simulation, the ReRAM and MOSFET models are derived from Stanford and commercial 65 nm PDKs, respectively. Considering the variation of ReRAM and MOSFET, 10,000 runs Monte Carlo simulation results show that the circuit can achieve 2-bit precision programming results for both storage system and PIM system, as shown in figure 2.
Fig.2 Monte Carlo simulation results of ReRAM’s resistance distribution
Design, Automation and Test in Europe
DATE is an annual theme conference on electronic design automation held in Europe. Since its establishment 30 years ago, DATE has become an exchange event for famous scholars and business experts in the field of electronic design and testing around the world. The conference covers all aspects of electronic and (embedded) systems engineering technology research, covering design, testing and tools for automation of electronic product design from integrated circuits to large-scale distributed systems. The scope of the conference also includes developing design requirements and new architectures for challenging application areas such as telecommunications, wireless communications, multimedia, healthcare, and automotive systems.
Paper link: https://github.com/ZongwuWang/DATE_2022