In the realm of large-scale model training, the efficiency bottleneck often stems from the intensive data communication required between GPUs. Drawing inspiration from the brain’s remarkable efficiency, this talk explores neuromorphic computing’s potential to mitigate this bottleneck. As chip designers increasingly turn to advanced packaging technologies and chiplets, the models running on these heterogeneous platforms must evolve accordingly. Spiking neural networks, inspired by the brain’s method of encoding information over time and its utilization of fine-grained sparsity for information transfer, are perfectly poised to extract the benefits (and limitations) imposed in heterogeneous hardware systems. This talk will delve into strategies for integrating spiking neural networks into large-scale models and how neuromorphic computing, alongside the utilization of chiplets, can surpass the current capabilities of GPUs, paving the way for the next generation of AI systems.