“Autonomous Driving with Spiking Neural Networks” by Ph.D. Candidate Rui-Jie Zhu Accepted in NeurIPS 2024

Autonomous driving demands an integrated approach that encompasses perception, prediction, and planning, all while operating under strict energy constraints to enhance scalability and environmental sustainability. We present Spiking Autonomous Driving (\name{}), the first unified Spiking Neural Network (SNN) to address the energy challenges faced by autonomous driving systems through its event-driven and energy-efficient nature. SAD is trained end-to-end and consists of three main modules: perception, which processes inputs from multi-view cameras to construct a spatiotemporal bird’s eye view; prediction, which utilizes a novel dual-pathway with spiking neurons to forecast future states; and planning, which generates safe trajectories considering predicted occupancy, traffic rules, and ride comfort. Evaluated on the nuScenes dataset, SAD achieves competitive performance in perception, prediction, and planning tasks, while drawing upon the energy efficiency of SNNs. This work highlights the potential of neuromorphic computing to be applied to energy-efficient autonomous driving, a critical step toward sustainable and safety-critical automotive technology. Our code is available at https://github.com/ridgerchu/SAD.
Link: https://arxiv.org/abs/2405.19687

“Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks” by Undergraduate Researcher Ruhai Lin Accepted at IEEE MCSoC-2024

The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data movement is increasingly the bottleneck to performance. This movement of data can exist between processor and memory, or between cores and chips. This paper investigates the impact of bottleneck size, in terms of inter-chip data traffic, on the performance of deep learning models in embedded multicore and many-core systems. We conduct a systematic analysis of the relationship between bottleneck size, computational resource utilization, and model accuracy. We apply a hardware-software co-design methodology where data bottlenecks are replaced with extremely narrow layers to reduce the amount of data traffic. In effect, time-multiplexing of signals is replaced by learnable embeddings that reduce the demands on chip IOs. Our experiments on the CIFAR100 dataset demonstrate that the classification accuracy generally decreases as the bottleneck ratio increases, with shallower models experiencing a more significant drop compared to deeper models. Hardware-side evaluation reveals that higher bottleneck ratios lead to substantial reductions in data transfer volume across the layers of the neural network. Through this research, we can determine the trade-off between data transfer volume and model performance, enabling the identification of a balanced point that achieves good performance while minimizing data transfer volume. This characteristic allows for the development of efficient models …

“Evaluation and mitigation of cognitive biases in medical language models” published in npj Digital Medicine

Increasing interest in applying large language models (LLMs) to medicine is due in part to their impressive performance on medical exam questions. However, these exams do not capture the complexity of real patient–doctor interactions because of factors like patient compliance, experience, and cognitive bias. We hypothesized that LLMs would produce less accurate responses when faced with clinically biased questions as compared to unbiased ones. To test this, we developed the BiasMedQA dataset, which consists of 1273 USMLE questions modified to replicate common clinically relevant cognitive biases. We assessed six LLMs on BiasMedQA and found that GPT-4 stood out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which showed large drops in performance. Additionally, we introduced three bias mitigation strategies, which improved but did not fully restore accuracy. Our findings highlight the need to improve LLMs’ robustness to cognitive biases, in order to achieve more reliable applications of LLMs in healthcare.

Link: https://www.nature.com/articles/s41746-024-01283-6

“Neuromorphic intermediate representation: a unified instruction set for interoperable brain-inspired computing” Published in Nature Communications

Spiking neural networks and neuromorphic hardware platforms that simulate neuronal dynamics are getting wide attention and are being applied to many relevant problems using Machine Learning. Despite a well-established mathematical foundation for neural dynamics, there exists numerous software and hardware solutions and stacks whose variability makes it difficult to reproduce findings. Here, we establish a common reference frame for computations in digital neuromorphic systems, titled Neuromorphic Intermediate Representation (NIR). NIR defines a set of computational and composable model primitives as hybrid systems combining continuous-time dynamics and discrete events. By abstracting away assumptions around discretization and hardware constraints, NIR faithfully captures the computational model, while bridging differences between the evaluated implementation and the underlying mathematical formalism. NIR supports an unprecedented number of neuromorphic systems, which we demonstrate by reproducing three spiking neural network models of different complexity across 7 neuromorphic simulators and 4 digital hardware platforms. NIR decouples the development of neuromorphic hardware and software, enabling interoperability between platforms and improving accessibility to multiple neuromorphic technologies. We believe that NIR is a key next step in brain-inspired hardware-software co-evolution, enabling research towards the implementation of energy efficient computational principles of nervous systems. NIR is available at neuroir.org

Link: https://www.nature.com/articles/s41467-024-52259-9

“SpikeGPT: Generative pre-trained language model with spiking neural networks” by Ph.D. Candidate Rui-Jie Zhu Published in Transactions on Machine Learning Research

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT’, a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations.

New Preprint: “Scalable MatMul-free Language Modeling” by Ph.D. Candidate Ruijie Zhu

The cost of processing language models is insane. It is estimated that the computation demands of ChatGPT are >$100,000 p/day to serve billions of requests received.

Led by Rui-Jie Zhu, we have developed the first MatMul-free language model (VMM/MMM-free) to scale beyond billion-parameters. Our previous work with SpikeGPT tapped out at about 216M parameters, but our latest model has been able to go up to 2.7B parameters (only limited by compute). We’re pretty certain it can keep going.

We provide a GPU-optimized implementation that uses 61% less VRAM over an unoptimized implementation during training.

However, there are several operations in this model that GPUs aren’t yet fully optimized for, such as ternary operations. So Ethan Sifferman, Tyler Sheaves and Dustin R. built a custom FPGA implementation to really milk it, and we can reach human-reading throughput at 13W. A little less than the power consumed by the human brain.

Preprint: https://lnkd.in/gaWbg7ss

GitHub training code: https://lnkd.in/gKFzQs_z

Pre-trained models on HuggingFace: https://lnkd.in/gDXFjPdm

No alt text provided for this image

New Preprint: “Autonomous Driving with Spiking Neural Networks” by Ph.D. Candidate Ruijie Zhu

Spiking Autonomous Driving

From the guy who built the first spiking language generation model, Rui-Jie Zhu has found a way to make spiking neural networks (SNNs) perform end-to-end autonomous vehicle control. This model takes a 6-camera input and integrates perception, prediction and planning together into a single model with approximately 75x less operations than ST-P3 at comparable performance.

Making SNNs push beyond toy datasets has been a tough time, but we’ve put a lot of effort into showing how to scale to challenging, real-world problems. The next step for this model is to push it into a closed-loop system. Deploying models like this on low-latency neuromorphic hardware can enable fast response times from sensor to control. This is necessary if we want to bridge the sim2real gap. I.e., by the time you take action, you don’t want your world to have changed by too much.

Rather than forcing “spiking” into applications for the sake of it, it’s important to take it to domains where there is a computational benefit – and I think this is one of them.

Preprint: https://arxiv.org/abs/2405.19687

Code: https://github.com/ridgerchu/SAD

“Knowledge Distillation Through Time for Future Event Prediction” Presented at ICLR by Undergraduate Researcher Skye Gunasekaran

Abstract:  Is it possible to learn from the future? Here, we introduce knowledge distillation through time (KDTT). In traditional knowledge distillation (KD), a reliable teacher model is used to train an error-prone student model. The difference between the teacher and student is typically model capacity; the teacher is larger in architecture. In KDTT, the teacher and student models differ in their assigned tasks. The teacher model is tasked with detecting events in sequential data, a simple task compared to the student model, which is challenged with forecasting said events in the future. Through KDTT, the student can use the ’future’ logits from a teacher model to extract temporal uncertainty. We show the efficacy of KDTT on seizure prediction, where the student forecaster achieves a 20.0% average increase in the area under the curve of the receiver operating characteristic (AUC-ROC)

New Paper: “Optically Tunable Electrical Oscillations in Oxide-Based Memristors for Neuromorphic Computing” led by Collaborator Dr. Shimul K. Nath

optical memristor
The application of hardware-based neural networks can be enhanced by integrating sensory neurons and synapses that enable direct input from external stimuli. Here, we report direct optical control of an oscillatory neuron based on volatile threshold switching in V 3 O 5. The devices exhibit electroforming-free operation with switching parameters that can be tuned by optical illumination. Using temperature-dependent electrical measurements, conductive atomic force microscopy (C-AFM), in-situ thermal imaging, and lumped element modelling, we show that the changes in switching parameters, including threshold and hold voltages, arise from overall conductivity increase of the oxide film due to the contribution of both photo-conductive and bolometric characteristics of V 3 O 5, which eventually affects the oscillation dynamics. Furthermore, our investigation reveals V 3 O 5 as a new bolometric material with a remarkable temperature coefficient of resistivity (TCR) as high as-4.6% K-1 at 423 K. We show the utility of optically tuneable device response and spiking frequency by demonstrating in-sensor reservoir computing with reduced computational effort and an optical encoding layer for spiking neural network, respectively, using a simulated array of devices. This article is protected by copyright. All rights reserved.

New Preprint: “Addressing cognitive bias in medical language models” led by Ph.D. Candidate Samuel Schmidgall

Preprint link here.

Abstract: The integration of large language models (LLMs) into the medical field has gained significant attention due to their promising accuracy in simulated clinical decision-making settings. However, clinical decision-making is more complex than simulations because physicians’ decisions are shaped by many factors, including the presence of cognitive bias. However, the degree to which LLMs are susceptible to the same cognitive biases that affect human clinicians remains unexplored. Our hypothesis posits that when LLMs are confronted with clinical questions containing cognitive biases, they will yield significantly less accurate responses compared to the same questions presented without such biases.In this study, we developed BiasMedQA, a novel benchmark for evaluating cognitive biases in LLMs applied to medical tasks. Using BiasMedQA we evaluated six LLMs, namely GPT-4, Mixtral-8x70B, GPT-3.5, PaLM-2, Llama 2 70B-chat, and the medically specialized PMC Llama 13B. We tested these models on 1,273 questions from the US Medical Licensing Exam (USMLE) Steps 1, 2, and 3, modified to replicate common clinically-relevant cognitive biases. Our analysis revealed varying effects for biases on these LLMs, with GPT-4 standing out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which were disproportionately affected by cognitive bias. Our findings highlight the critical need for bias mitigation in the development of medical LLMs, pointing towards safer and more reliable applications in healthcare.

New Paper: “To spike or not to spike: A digital hardware perspective on deep learning acceleration” led by Dr. Fabrizio Ottati in IEEE JETCAS

Find the paper on IEEE Xplore here.

Abstract:

As deep learning models scale, they become increasingly competitive from domains spanning from computer vision to natural language processing; however, this happens at the expense of efficiency since they require increasingly more memory and computing power. The power efficiency of the biological brain outperforms any large-scale deep learning (DL) model; thus, neuromorphic computing tries to mimic the brain operations, such as spike-based information processing, to improve the efficiency of DL models. Despite the benefits of the brain, such as efficient information transmission, dense neuronal interconnects, and the co-location of computation and memory, the available biological substrate has severely constrained the evolution of biological brains. Electronic hardware does not have the same constraints; therefore, while modeling spiking neural networks (SNNs) might uncover one piece of the puzzle, the design of efficient hardware backends for SNNs needs further investigation, potentially taking inspiration from the available work done on the artificial neural networks (ANNs) side. As such, when is it wise to look at the brain while designing new hardware, and when should it be ignored? To answer this question, we quantitatively compare the digital hardware acceleration techniques and platforms of ANNs and SNNs. As a result, we provide the following insights: (i) ANNs currently process static data more efficiently, (ii) applications targeting data produced by neuromorphic sensors, such as event-based cameras and silicon cochleas, need more investigation since the behavior of these sensors might naturally fit the SNN paradigm, and (iii) hybrid approaches combining SNNs and ANNs might lead to the best solutions and should be investigated further at the hardware level, accounting for both efficiency and loss optimization.

New Paper: “Capturing the Pulse: A State-of-the-Art Review on Camera-Based Jugular Vein Assessment” led by Ph.D. Candidate Coen Arrow in Biomedical Optics Express

See the full paper here.

Abstract

Heart failure is associated with a rehospitalisation rate of up to 50% within six months. Elevated central venous pressure may serve as an early warning sign. While invasive procedures are used to measure central venous pressure for guiding treatment in hospital, this becomes impractical upon discharge. A non-invasive estimation technique exists, where the clinician visually inspects the pulsation of the jugular veins in the neck, but it is less reliable due to human limitations. Video and signal processing technologies may offer a high-fidelity alternative. This state-of-the-art review analyses existing literature on camera-based methods for jugular vein assessment. We summarize key design considerations and suggest avenues for future research. Our review highlights the neck as a rich imaging target beyond the jugular veins, capturing comprehensive cardiac signals, and outlines factors affecting signal quality and measurement accuracy. Addressing an often quoted limitation in the field, we also propose minimum reporting standards for future studies.

Brain-Inspired Machine Learning at UCSC: Class Tape-out Success

This quarter, I introduced Brain-Inspired Machine Learning as a course to University of California, Santa Cruz. And while machine learning is cool and all, it’s only as good as the hardware it runs on.

31 students & first time chip designers all took the lead on building DRC/LVS clean neuromorphic circuits. Students came from grad & undergrad backgrounds across various corners of the university. ECE, CSE, Math, Computational Media, Bioengineering, Psychology, etc. Many had never even taken an ECE 101 class, and started learning from scratch 2 weeks ago.

Their designs are now all being manufactured together in the Sky130 Process. Each design is compiled onto the same piece of silicon with TinyTapeout, thanks to Matt Venn and Uri Shaked.

We spent Friday night grinding in my lab while blaring metalcore tunes. All students managed to clear all checks. The final designs do a heap of cool things like accelerate sparse matrix-multiplies, event denoising, to simulating reservoir networks. I naturally had to squeeze in a Hodgkin-Huxley neuron in the 6 hours before the deadline (pictured).

Not sure if it’s the cost of living, or the mountain lions on campus, but damn. UCSC students have some serious grit.

Hodgkin-Huxley Neuron Model GDS Art

Telluride Workshop: Open Source Neuromorphic Hardware, Software and Wetware

Prof. Jason Eshraghian & Dr. Peng Zhou were topic area leaders at the Telluride Neuromorphic Engineering & Cognition Workshop. Tasks addressed included:

A project highlight includes the development of the Neuromorphic Intermediate Representation (NIR), an intermediate representation to translate various neuromorphic and physics-driven models that are based on continuous time ODEs into different formats. This makes it much easier to deploy models trained in one library to map to a large variety of backends.

Ruijie Zhu and Prof. Jason Eshraghian Present Invited Talk “Scaling up SNNs with SpikeGPT” at the Intel Neuromorphic Research Centre

spikegpt-architecture

Abstract: If we had a dollar for every time we heard “It will never scale!”, then neuromorphic engineers would be billionaires. This presentation will be centered on SpikeGPT, the first large-scale language model (LLM) using spiking neural nets (SNNs), and possibly the largest SNN that has been trained using error backpropagation.

The need for lightweight language models is more pressing than ever, especially now that we are becoming increasingly reliant on them from word processors and search engines, to code troubleshooting and academic grant writing. Our dependence on a single LLM means that every user is potentially pooling sensitive data into a singular database, which leads to significant security risks if breached.

SpikeGPT was built to move towards addressing the privacy and energy consumption challenges we presently run into using Transformer blocks. Our approach decomposes self-attention down into a recurrent form that is compatible with spiking neurons, along with dynamical weight matrices where the dynamics are learnable, rather than the parameters as with conventional deep learning.

We will provide an overview of what SpikeGPT does, how it works, and what it took to train it successfully. We will also provide a demo on how users can download pre-trained models available on HuggingFace so that listeners are able to experiment with them.

Link to the talk can be found here.

New Preprint: Brain-inspired learning in artificial neural networks: A Review led by Ph.D. Candidate Samuel Schmidgall

Abstract: Artificial neural networks (ANNs) have emerged as an essential tool in machine learning, achieving remarkable success across diverse domains, including image and speech generation, game playing, and robotics. However, there exist fundamental differences between ANNs’ operating mechanisms and those of the biological brain, particularly concerning learning processes. This paper presents a comprehensive review of current brain-inspired learning representations in artificial neural networks. We investigate the integration of more biologically plausible mechanisms, such as synaptic plasticity, to enhance these networks’ capabilities. Moreover, we delve into the potential advantages and challenges accompanying this approach. Ultimately, we pinpoint promising avenues for future research in this rapidly advancing field, which could bring us closer to understanding the essence of intelligence.

Link to the preprint here.

SNN Overview