Prof. Jason Eshraghian Delivering Plenary Talk at IEEE MCSoC: “Large-Scale Neuromorphic Computing on Heterogeneous Systems”

In the realm of large-scale model training, the efficiency bottleneck often stems from the intensive data communication required between GPUs. Drawing inspiration from the brain’s remarkable efficiency, this talk explores neuromorphic computing’s potential to mitigate this bottleneck. As chip designers increasingly turn to advanced packaging technologies and chiplets, the models running on these heterogeneous platforms must evolve accordingly. Spiking neural networks, inspired by the brain’s method of encoding information over time and its utilization of fine-grained sparsity for information transfer, are perfectly poised to extract the benefits (and limitations) imposed in heterogeneous hardware systems. This talk will delve into strategies for integrating spiking neural networks into large-scale models and how neuromorphic computing, alongside the utilization of chiplets, can surpass the current capabilities of GPUs, paving the way for the next generation of AI systems.

“Autonomous Driving with Spiking Neural Networks” by Ph.D. Candidate Rui-Jie Zhu Accepted in NeurIPS 2024

Autonomous driving demands an integrated approach that encompasses perception, prediction, and planning, all while operating under strict energy constraints to enhance scalability and environmental sustainability. We present Spiking Autonomous Driving (\name{}), the first unified Spiking Neural Network (SNN) to address the energy challenges faced by autonomous driving systems through its event-driven and energy-efficient nature. SAD is trained end-to-end and consists of three main modules: perception, which processes inputs from multi-view cameras to construct a spatiotemporal bird’s eye view; prediction, which utilizes a novel dual-pathway with spiking neurons to forecast future states; and planning, which generates safe trajectories considering predicted occupancy, traffic rules, and ride comfort. Evaluated on the nuScenes dataset, SAD achieves competitive performance in perception, prediction, and planning tasks, while drawing upon the energy efficiency of SNNs. This work highlights the potential of neuromorphic computing to be applied to energy-efficient autonomous driving, a critical step toward sustainable and safety-critical automotive technology. Our code is available at https://github.com/ridgerchu/SAD.
Link: https://arxiv.org/abs/2405.19687

“Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks” by Undergraduate Researcher Ruhai Lin Accepted at IEEE MCSoC-2024

The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data movement is increasingly the bottleneck to performance. This movement of data can exist between processor and memory, or between cores and chips. This paper investigates the impact of bottleneck size, in terms of inter-chip data traffic, on the performance of deep learning models in embedded multicore and many-core systems. We conduct a systematic analysis of the relationship between bottleneck size, computational resource utilization, and model accuracy. We apply a hardware-software co-design methodology where data bottlenecks are replaced with extremely narrow layers to reduce the amount of data traffic. In effect, time-multiplexing of signals is replaced by learnable embeddings that reduce the demands on chip IOs. Our experiments on the CIFAR100 dataset demonstrate that the classification accuracy generally decreases as the bottleneck ratio increases, with shallower models experiencing a more significant drop compared to deeper models. Hardware-side evaluation reveals that higher bottleneck ratios lead to substantial reductions in data transfer volume across the layers of the neural network. Through this research, we can determine the trade-off between data transfer volume and model performance, enabling the identification of a balanced point that achieves good performance while minimizing data transfer volume. This characteristic allows for the development of efficient models …

Prof. Jason Eshraghian Delivers Invited Talk at Memrisys 2024: “A Pathway to Large-Scale Neuromorphic Memristive Systems”

Abstract:

Memristors and neuromorphic computing go together like spaghetti and meatballs. Their
promise of reaching brain-scale computational efficiency has significant implications for
accelerating cognitive workloads, so why haven’t we yet toppled NVIDIA from their throne?
While consultants might say it’s because of the lack of market inertia, and engineers might tell
you there are still technical hurdles to overcome. This talk will focus on the technical challenges
faced by circuit designers using memristors, specifically in the context of accelerating large-scale
deep learning workloads. These challenges are well-established, and treated as design
constraints in memristive circuits that presently exist. But overcoming those barriers remains an
open question. This talk provides a guide on how we might overcome their challenges using
systems-level approaches, and how spike-based computing could potentially be the right
problem for memristive computing, ultimately pushing past what have historically been
perceived as limitations.

“Evaluation and mitigation of cognitive biases in medical language models” published in npj Digital Medicine

Increasing interest in applying large language models (LLMs) to medicine is due in part to their impressive performance on medical exam questions. However, these exams do not capture the complexity of real patient–doctor interactions because of factors like patient compliance, experience, and cognitive bias. We hypothesized that LLMs would produce less accurate responses when faced with clinically biased questions as compared to unbiased ones. To test this, we developed the BiasMedQA dataset, which consists of 1273 USMLE questions modified to replicate common clinically relevant cognitive biases. We assessed six LLMs on BiasMedQA and found that GPT-4 stood out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which showed large drops in performance. Additionally, we introduced three bias mitigation strategies, which improved but did not fully restore accuracy. Our findings highlight the need to improve LLMs’ robustness to cognitive biases, in order to achieve more reliable applications of LLMs in healthcare.

Link: https://www.nature.com/articles/s41746-024-01283-6

Prof. Jason Eshraghian Delivering an Educational Class at 2024 Embedded Systems Week

What do Transformers have to learn from Biological Spiking Neural Networks?

The brain is the perfect place to look for inspiration to develop more efficient neural networks. One of the main differences with modern deep learning is that the brain encodes and processes information as spikes rather than continuous, high-precision activations. This presentation will dive into how the open-source ecosystem has been used to develop brain-inspired neuromorphic accelerators, from our development of a Python training library for spiking neural networks (snnTorch, >100,000 downloads). We will explore how this is linked to our MatMul-free Language Model, providing insight into the next generation of large-scale, billion parameter models.

“Neuromorphic intermediate representation: a unified instruction set for interoperable brain-inspired computing” Published in Nature Communications

Spiking neural networks and neuromorphic hardware platforms that simulate neuronal dynamics are getting wide attention and are being applied to many relevant problems using Machine Learning. Despite a well-established mathematical foundation for neural dynamics, there exists numerous software and hardware solutions and stacks whose variability makes it difficult to reproduce findings. Here, we establish a common reference frame for computations in digital neuromorphic systems, titled Neuromorphic Intermediate Representation (NIR). NIR defines a set of computational and composable model primitives as hybrid systems combining continuous-time dynamics and discrete events. By abstracting away assumptions around discretization and hardware constraints, NIR faithfully captures the computational model, while bridging differences between the evaluated implementation and the underlying mathematical formalism. NIR supports an unprecedented number of neuromorphic systems, which we demonstrate by reproducing three spiking neural network models of different complexity across 7 neuromorphic simulators and 4 digital hardware platforms. NIR decouples the development of neuromorphic hardware and software, enabling interoperability between platforms and improving accessibility to multiple neuromorphic technologies. We believe that NIR is a key next step in brain-inspired hardware-software co-evolution, enabling research towards the implementation of energy efficient computational principles of nervous systems. NIR is available at neuroir.org

Link: https://www.nature.com/articles/s41467-024-52259-9

“SpikeGPT: Generative pre-trained language model with spiking neural networks” by Ph.D. Candidate Rui-Jie Zhu Published in Transactions on Machine Learning Research

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT’, a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations.

New Preprint: “Scalable MatMul-free Language Modeling” by Ph.D. Candidate Ruijie Zhu

The cost of processing language models is insane. It is estimated that the computation demands of ChatGPT are >$100,000 p/day to serve billions of requests received.

Led by Rui-Jie Zhu, we have developed the first MatMul-free language model (VMM/MMM-free) to scale beyond billion-parameters. Our previous work with SpikeGPT tapped out at about 216M parameters, but our latest model has been able to go up to 2.7B parameters (only limited by compute). We’re pretty certain it can keep going.

We provide a GPU-optimized implementation that uses 61% less VRAM over an unoptimized implementation during training.

However, there are several operations in this model that GPUs aren’t yet fully optimized for, such as ternary operations. So Ethan Sifferman, Tyler Sheaves and Dustin R. built a custom FPGA implementation to really milk it, and we can reach human-reading throughput at 13W. A little less than the power consumed by the human brain.

Preprint: https://lnkd.in/gaWbg7ss

GitHub training code: https://lnkd.in/gKFzQs_z

Pre-trained models on HuggingFace: https://lnkd.in/gDXFjPdm

No alt text provided for this image

New Preprint: “Autonomous Driving with Spiking Neural Networks” by Ph.D. Candidate Ruijie Zhu

Spiking Autonomous Driving

From the guy who built the first spiking language generation model, Rui-Jie Zhu has found a way to make spiking neural networks (SNNs) perform end-to-end autonomous vehicle control. This model takes a 6-camera input and integrates perception, prediction and planning together into a single model with approximately 75x less operations than ST-P3 at comparable performance.

Making SNNs push beyond toy datasets has been a tough time, but we’ve put a lot of effort into showing how to scale to challenging, real-world problems. The next step for this model is to push it into a closed-loop system. Deploying models like this on low-latency neuromorphic hardware can enable fast response times from sensor to control. This is necessary if we want to bridge the sim2real gap. I.e., by the time you take action, you don’t want your world to have changed by too much.

Rather than forcing “spiking” into applications for the sake of it, it’s important to take it to domains where there is a computational benefit – and I think this is one of them.

Preprint: https://arxiv.org/abs/2405.19687

Code: https://github.com/ridgerchu/SAD

“Knowledge Distillation Through Time for Future Event Prediction” Presented at ICLR by Undergraduate Researcher Skye Gunasekaran

Abstract:  Is it possible to learn from the future? Here, we introduce knowledge distillation through time (KDTT). In traditional knowledge distillation (KD), a reliable teacher model is used to train an error-prone student model. The difference between the teacher and student is typically model capacity; the teacher is larger in architecture. In KDTT, the teacher and student models differ in their assigned tasks. The teacher model is tasked with detecting events in sequential data, a simple task compared to the student model, which is challenged with forecasting said events in the future. Through KDTT, the student can use the ’future’ logits from a teacher model to extract temporal uncertainty. We show the efficacy of KDTT on seizure prediction, where the student forecaster achieves a 20.0% average increase in the area under the curve of the receiver operating characteristic (AUC-ROC)

New Paper: “Optically Tunable Electrical Oscillations in Oxide-Based Memristors for Neuromorphic Computing” led by Collaborator Dr. Shimul K. Nath

optical memristor
The application of hardware-based neural networks can be enhanced by integrating sensory neurons and synapses that enable direct input from external stimuli. Here, we report direct optical control of an oscillatory neuron based on volatile threshold switching in V 3 O 5. The devices exhibit electroforming-free operation with switching parameters that can be tuned by optical illumination. Using temperature-dependent electrical measurements, conductive atomic force microscopy (C-AFM), in-situ thermal imaging, and lumped element modelling, we show that the changes in switching parameters, including threshold and hold voltages, arise from overall conductivity increase of the oxide film due to the contribution of both photo-conductive and bolometric characteristics of V 3 O 5, which eventually affects the oscillation dynamics. Furthermore, our investigation reveals V 3 O 5 as a new bolometric material with a remarkable temperature coefficient of resistivity (TCR) as high as-4.6% K-1 at 423 K. We show the utility of optically tuneable device response and spiking frequency by demonstrating in-sensor reservoir computing with reduced computational effort and an optical encoding layer for spiking neural network, respectively, using a simulated array of devices. This article is protected by copyright. All rights reserved.

New snnTorch Tutorial: Spiking-Tactile MNIST by Undergraduate Students Dylan Louie, Hannah Cohen-Sandler, and Shatoparba Banerjee

See the tutorial here.

The next tutorial from UCSC’s Brain-Inspired Machine Learning class is by Dylan J. LouieHannah Cohen Sandler and Shatoparba Banerjee.

They show how to train an SNN for tactile sensing using the Spiking-Tactile MNIST Neuromorphic Dataset. This dataset was developed in Benjamin C.K. Tee‘s lab in NUS. It consists of handwritten digits obtained by human participants writing on a neuromorphic tactile sensor array.

For more information about the dataset, see the preprint by Hian Hian See et al. here.

 

Prof. Jason Eshraghian and Dr. Fabrizio Ottati Present Tutorial at ISFPGA (Monterey, CA)

Fabrizio Ottati and I will be running a tutorial tomorrow (Sunday, 3 March) at the International Symposium on Field-Programmable Gate Arrays (ISFPGA) in Monterey, CA titled: “Who needs neuromorphic hardware? Deploying SNNs to FPGAs via HLS”.

snn-to-fpga

We’ll go through software and hardware: training SNNs using quantization-aware techniques across weights and stateful quantization, and then show how to go from an snnTorch model straight into AMD/Xilinx FPGAs for low-power + flexible deployment.

GitHub repo: https://github.com/open-neuromorphic/fpga-snntorch

Tutorial summary: https://www.isfpga.org/workshops-tutorials/#t2

New Preprint: “Addressing cognitive bias in medical language models” led by Ph.D. Candidate Samuel Schmidgall

Preprint link here.

Abstract: The integration of large language models (LLMs) into the medical field has gained significant attention due to their promising accuracy in simulated clinical decision-making settings. However, clinical decision-making is more complex than simulations because physicians’ decisions are shaped by many factors, including the presence of cognitive bias. However, the degree to which LLMs are susceptible to the same cognitive biases that affect human clinicians remains unexplored. Our hypothesis posits that when LLMs are confronted with clinical questions containing cognitive biases, they will yield significantly less accurate responses compared to the same questions presented without such biases.In this study, we developed BiasMedQA, a novel benchmark for evaluating cognitive biases in LLMs applied to medical tasks. Using BiasMedQA we evaluated six LLMs, namely GPT-4, Mixtral-8x70B, GPT-3.5, PaLM-2, Llama 2 70B-chat, and the medically specialized PMC Llama 13B. We tested these models on 1,273 questions from the US Medical Licensing Exam (USMLE) Steps 1, 2, and 3, modified to replicate common clinically-relevant cognitive biases. Our analysis revealed varying effects for biases on these LLMs, with GPT-4 standing out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which were disproportionately affected by cognitive bias. Our findings highlight the critical need for bias mitigation in the development of medical LLMs, pointing towards safer and more reliable applications in healthcare.

New Paper: “Surgical Gym: A high-performance GPU-based platform for reinforcement learning with surgical robots” led by PhD Candidate Samuel Schmidgall accepted at the 2024 IEEE Intl. Conf. on Robotics and Automation (ICRA 2024)

Preprint link here.

Abstract: Recent advances in robot-assisted surgery have resulted in progressively more precise, efficient, and minimally invasive procedures, sparking a new era of robotic surgical intervention. This enables doctors, in collaborative interaction with robots, to perform traditional or minimally invasive surgeries with improved outcomes through smaller incisions. Recent efforts are working toward making robotic surgery more autonomous which has the potential to reduce variability of surgical outcomes and reduce complication rates. Deep reinforcement learning methodologies offer scalable solutions for surgical automation, but their effectiveness relies on extensive data acquisition due to the absence of prior knowledge in successfully accomplishing tasks. Due to the intensive nature of simulated data collection, previous works have focused on making existing algorithms more efficient. In this work, we focus on making the simulator more efficient, making training data much more accessible than previously possible. We introduce Surgical Gym, an open-source high performance platform for surgical robot learning where both the physics simulation and reinforcement learning occur directly on the GPU. We demonstrate between 100-5000x faster training times compared with previous surgical learning platforms. The code is available at: https://github.com/SamuelSchmidgall/SurgicalGym.