How neural networks represent data: A potential theory for key deep learning phenomena

deep learning How neural networks represent data: A potential theory for key deep learning phenomena

11-04-2025
deep learning

New theory shows how neural networks align internal representations, enhancing efficiency, interpretability, and explaining key deep learning phenomena.

A new framework sheds light on how neural networks learn and organize information internally, offering a unified theory behind several core deep learning behaviors. Known as the Canonical Representation Hypothesis (CRH), this theory proposes that during training, neural networks naturally align their internal components—representations, weights, and gradients—within each layer. This inherent alignment leads to the development of compact and efficient internal structures, improving both the interpretability and performance of models. When this alignment is disrupted, the Polynomial Alignment Hypothesis (PAH) comes into play, suggesting that these elements evolve into polynomial relationships, revealing distinct learning phases within the network.

This dual-hypothesis approach provides a potential explanation for previously observed phenomena in deep learning, such as neural collapse and feature alignment patterns. Importantly, it also opens new avenues for optimizing model behavior by intentionally injecting noise into gradients to shape the network's representations. These findings not only enhance our understanding of deep learning mechanisms but also hint at parallels with how biological brains process information, making it a promising step toward more interpretable and biologically inspired AI systems.