New Preprint: “Scalable MatMul-free Language Modeling” by Ph.D. Candidate Ruijie Zhu

The cost of processing language models is insane. It is estimated that the computation demands of ChatGPT are >$100,000 p/day to serve billions of requests received.

Led by Rui-Jie Zhu, we have developed the first MatMul-free language model (VMM/MMM-free) to scale beyond billion-parameters. Our previous work with SpikeGPT tapped out at about 216M parameters, but our latest model has been able to go up to 2.7B parameters (only limited by compute). We’re pretty certain it can keep going.

We provide a GPU-optimized implementation that uses 61% less VRAM over an unoptimized implementation during training.

However, there are several operations in this model that GPUs aren’t yet fully optimized for, such as ternary operations. So Ethan Sifferman, Tyler Sheaves and Dustin R. built a custom FPGA implementation to really milk it, and we can reach human-reading throughput at 13W. A little less than the power consumed by the human brain.

Preprint: https://lnkd.in/gaWbg7ss

GitHub training code: https://lnkd.in/gKFzQs_z

Pre-trained models on HuggingFace: https://lnkd.in/gDXFjPdm

No alt text provided for this image

New Preprint: “Autonomous Driving with Spiking Neural Networks” by Ph.D. Candidate Ruijie Zhu

Spiking Autonomous Driving

From the guy who built the first spiking language generation model, Rui-Jie Zhu has found a way to make spiking neural networks (SNNs) perform end-to-end autonomous vehicle control. This model takes a 6-camera input and integrates perception, prediction and planning together into a single model with approximately 75x less operations than ST-P3 at comparable performance.

Making SNNs push beyond toy datasets has been a tough time, but we’ve put a lot of effort into showing how to scale to challenging, real-world problems. The next step for this model is to push it into a closed-loop system. Deploying models like this on low-latency neuromorphic hardware can enable fast response times from sensor to control. This is necessary if we want to bridge the sim2real gap. I.e., by the time you take action, you don’t want your world to have changed by too much.

Rather than forcing “spiking” into applications for the sake of it, it’s important to take it to domains where there is a computational benefit – and I think this is one of them.

Preprint: https://arxiv.org/abs/2405.19687

Code: https://github.com/ridgerchu/SAD

New snnTorch Tutorial: Spiking-Tactile MNIST by Undergraduate Students Dylan Louie, Hannah Cohen-Sandler, and Shatoparba Banerjee

See the tutorial here.

The next tutorial from UCSC’s Brain-Inspired Machine Learning class is by Dylan J. LouieHannah Cohen Sandler and Shatoparba Banerjee.

They show how to train an SNN for tactile sensing using the Spiking-Tactile MNIST Neuromorphic Dataset. This dataset was developed in Benjamin C.K. Tee‘s lab in NUS. It consists of handwritten digits obtained by human participants writing on a neuromorphic tactile sensor array.

For more information about the dataset, see the preprint by Hian Hian See et al. here.

 

New Preprint: “SpikeGPT: Generative Pre-Trained Language Model with Spiking Neural Networks” led by incoming Ph.D. candidate Ruijie Zhu

We have released SpikeGPT led by Ruijie Zhu and Qihang Zhao, the largest-scale SNN training via backprop to date, and (to the best of our knowledge) the first generative language spike-based model.

spikegpt-architecture

Abstract: As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, we successfully implement `SpikeGPT’, a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

Preprint: https://arxiv.org/abs/2302.13939

Code: https://github.com/ridgerchu/SpikeGPT

New Paper: “BPLC + NOSO: Backpropagation of errors based on latency code with neurons that only spike once at most” led by Ph.D. candidate Seong Min Jin published in Complex Intelligent Systems

BPLC + NOSO has been published, led by Seong Min Jin and Doo Seok Jeong (Jeong Lab), along with collaborators Dohun Kim and Dong Hyung Yoo.

Abstract: For mathematical completeness, we propose an error-backpropagation algorithm based on latency code (BPLC) with spiking neurons conforming to the spike–response model but allowed to spike once at most (NOSOs). BPLC is based on gradients derived without approximation unlike previous temporal code-based error-backpropagation algorithms. The latency code uses the spiking latency (period from the first input spike to spiking) as a measure of neuronal activity. To support the latency code, we introduce a minimum-latency pooling layer that passes the spike of the minimum latency only for a given patch. We also introduce a symmetric dual threshold for spiking (i) to avoid the dead neuron issue and (ii) to confine a potential distribution to the range between the symmetric thresholds. Given that the number of spikes (rather than timesteps) is the major cause of inference delay for digital neuromorphic hardware, NOSONets trained using BPLC likely reduce inference delay significantly. To identify the feasibility of BPLC + NOSO, we trained CNN-based NOSONets on Fashion-MNIST and CIFAR-10. The classification accuracy on CIFAR-10 exceeds the state-of-the-art result from an SNN of the same depth and width by approximately 2%. Additionally, the number of spikes for inference is significantly reduced (by approximately one order of magnitude), highlighting a significant reduction in inference delay.

Paper: https://link.springer.com/article/10.1007/s40747-023-00983-y

Code: https://github.com/dooseokjeong/BPLC-NOSO

New Preprint: “OpenSpike: An OpenRAM SNN Accelerator” led by Undergraduate Researcher Farhad Modaresi Accepted for ISCAS 2023 in Monterey, CA

Farhad Modaresi has led the design and tape-out of a fully open-sourced spiking neural network accelerator in the Skywater 130 process. The design is based on memory macros generated using OpenRAM.

Many of the advances in deep learning this past decade can be attributed to the open-source movement where researchers have been able to reproduce and iterate upon open code bases. With the advent of open PDKs (SkyWater), EDA toolchains, and memory compilers (OpenRAM by co-author Matthew Guthaus), we hope to port rapid acceleration in hardware development to the neuromorphic community. 

Check out the preprint here: https://arxiv.org/abs/2302.01015

GitHub repo with RTL you are welcome to steal: https://github.com/sfmth/OpenSpike

OpenSpike Schematic and Layout

New snnTorch Tutorial: Nonlinear Regression with SNNs (Part I)

We have a new a new snnTorch tutorial/notebook on nonlinear regression with SNNs.

This first part will show you how to train a spiking neuron’s membrane potential to follow a given trajectory over time. Future tutorials will introduce spiking LSTM models, recurrent leaky integrator neurons, and do some more fancy regression tasks.

Link to the tutorial here.

Nonlinear Regression with SNNs

Nonlinear Regression with SNNs

New Preprint & Code on Quantization-Aware Training with SNNs

We have a new preprint along with code that simplifies the process of training quantized spiking neural networks in snnTorch.

qsnn

Quantization-Aware training of SNNs with periodic schedules

We propose several techniques to smooth the process of training QSNNs, one of which is the use of cosine annealing and periodic boosting, which provides the network additional opportunities to continue searching for more optimized solution spaces.

Github link to code here.

Preprint link here.

New snnTorch Tutorial: Population Codes with SNNs

We have a new a new snnTorch tutorial/notebook on population coding.

Biologically, the average neuronal firing rate is roughly 0.1-1Hz, which is far slower than the reaction response time of animals and humans.

But if we pool together multiple neurons and count their spikes together, then it becomes possible to measure a firing rate for a population of neurons in a very short window of time.

As it turns out, population codes are also a handy trick for deep learning. Having more output neurons provides far more ‘pathways’ for errors to backprop through.

It also lets us take a more ‘parallel’ approach to training SNNs by swapping sequential steps through time for matrix-vector mults.

Here, we run through using a large pool of output neurons (instead of just the usual ~10 output neurons) to obtain decent results in 1 single simulated time-step.

Link to the tutorial here.

pop-coding

Population Codes in SNNs

 

New Preprint: Training Spiking Neural Networks Using Lessons From Deep Learning

How can we train biologically plausible Spiking Neural Networks with the Deep Learning hammer? Our perspective/tutorial/review aims to tackle this question. We explore how the neural code affects learning via gradient descent, the interplay between the STDP learning rule and the backpropagation-through-time algorithm, and step through using online learning with SNNs.

comp_graph

Computational graph of a spiking neuron

This preprint goes hand-in-hand with our recently updated snnTorch interactive tutorial series that goes from designing simple spiking neurons to training large-scale networks.

Link to the preprint here.

Link to the SNN tutorial series here.

snnTorch GitHub link here.

Best Live Demonstration Award at the IEEE Conference on Electronic Circuits and Systems 2020

Our retina-controlled upper-limb prosthesis system led by Coen Arrow won the Best Live Demonstration Award at the IEEE International Conference on Electronic Circuits and Systems.

Using sparse spiking electrical signals generated by retina photoreceptor cells in real-time could potentially assist with rehabilitation, and complement EMG signals to achieving high-precision feedback on a constrained power supply.

We somehow did this whole thing remotely across three continents, with Coen Arrow at the University of Western Australia; Hancong Wu & Kia Nazarpour at the University of Edinburgh, and the University of Michigan.

Code for the retina simulator can be found here.