I was just writing up a little 'for fun' paper for the end of a Stanford continuing education class on LLMs with this intro. I was trying to keep it even a bit more generic to help capture the imagination but the metaphorical similarities are striking here...something about how we encode high complexity in a process feels shared.
"Probabilities, Interference, Resonance, Uncertainty: AI Complexity Metaphors a Century after Quantum Mechanics
If the rise of compelling but probabilistic language models, built on the complex resonance (attention) effects across a vast corpus of text, is surprising, you may want to examine the scientific toolbox behind the last century's most revolutionary breakthrough: Quantum Mechanics.
Imagine a planet unexplainably leaving our Solar system; this was the emotion of our observation of α-decay under classical mechanics. Spontaneous radiation could only be explained with probabilistic physics. Traditional equations didn’t allow a bundle of protons and neutrons to escape the nuclear force, but in new Radium experiments, they did. In a breakthrough paradigm shift, scientists modeled the complex and counterintuitive behaviors of α-decay with the probability amplitudes of Quantum Mechanics. The new approach used a toolkit that featured probabilistic amplitude wave equations and allowed an elementary particle bundle to ‘tunnel’ through a barrier when the interference patterns of those waves were just right– expelling it from the nucleus randomly.
A century later, large language models demonstrated another paradigm-changing approach to providing Turing Test compatible results and standardized test-passing responses across much of written human understanding. In a metaphorically similar fashion, they achieved these results not by layering a rules based approach from explicit first principles and definitions, but rather by building probability curves based off of calculating the deeper interference, both constructive and destructive, of words (tokens) have on each other even at significant distance in a text.
The comparison here is conceptual rather than mathematically isomorphic. The universal function approximation principle of neural networks depends on non-linearities (ReLu, softmax) being introduced in the hidden and output layers. This is unlike the linear nature of the solutions to the Schrödinger equation for wave functions. But there are also intersting metaphorical similarities to explore. When a photon travels through a double-slit its probability of being detected at a on a wall must now calculate constructive and destructive interference of both paths. Language and meaning also change as we add terms which may constructively or destructively interfere. When we write the word “Dog” it a language model must clearly calculate new probabilities based on the prior word – imagine “hot,” “brown” or “missing” as an example."
....was googling for examples of this when I found your piece here in the writing. Thanks for capturing this as well! You have taken the comparison a layer deeper than I risked going in an AI class discussion.
One of my primary interests is the recognition and study of patterns, including the type you're describing here. Yes, I believe your inquiry is worthwhile--it's likely you're "on to something." Please continue to pursue these ideas.
Appreciate it Brian. I get the strong sense that there is some deep link between information/computation and fundamental aspects of physics and I don’t think it’s a coincidence—more fractal. There’s a lot of hand waving of course but before something becomes a science it often starts as philosophy or aliasing some shadow of a pattern.
I was just writing up a little 'for fun' paper for the end of a Stanford continuing education class on LLMs with this intro. I was trying to keep it even a bit more generic to help capture the imagination but the metaphorical similarities are striking here...something about how we encode high complexity in a process feels shared.
"Probabilities, Interference, Resonance, Uncertainty: AI Complexity Metaphors a Century after Quantum Mechanics
If the rise of compelling but probabilistic language models, built on the complex resonance (attention) effects across a vast corpus of text, is surprising, you may want to examine the scientific toolbox behind the last century's most revolutionary breakthrough: Quantum Mechanics.
Imagine a planet unexplainably leaving our Solar system; this was the emotion of our observation of α-decay under classical mechanics. Spontaneous radiation could only be explained with probabilistic physics. Traditional equations didn’t allow a bundle of protons and neutrons to escape the nuclear force, but in new Radium experiments, they did. In a breakthrough paradigm shift, scientists modeled the complex and counterintuitive behaviors of α-decay with the probability amplitudes of Quantum Mechanics. The new approach used a toolkit that featured probabilistic amplitude wave equations and allowed an elementary particle bundle to ‘tunnel’ through a barrier when the interference patterns of those waves were just right– expelling it from the nucleus randomly.
A century later, large language models demonstrated another paradigm-changing approach to providing Turing Test compatible results and standardized test-passing responses across much of written human understanding. In a metaphorically similar fashion, they achieved these results not by layering a rules based approach from explicit first principles and definitions, but rather by building probability curves based off of calculating the deeper interference, both constructive and destructive, of words (tokens) have on each other even at significant distance in a text.
The comparison here is conceptual rather than mathematically isomorphic. The universal function approximation principle of neural networks depends on non-linearities (ReLu, softmax) being introduced in the hidden and output layers. This is unlike the linear nature of the solutions to the Schrödinger equation for wave functions. But there are also intersting metaphorical similarities to explore. When a photon travels through a double-slit its probability of being detected at a on a wall must now calculate constructive and destructive interference of both paths. Language and meaning also change as we add terms which may constructively or destructively interfere. When we write the word “Dog” it a language model must clearly calculate new probabilities based on the prior word – imagine “hot,” “brown” or “missing” as an example."
....was googling for examples of this when I found your piece here in the writing. Thanks for capturing this as well! You have taken the comparison a layer deeper than I risked going in an AI class discussion.
One of my primary interests is the recognition and study of patterns, including the type you're describing here. Yes, I believe your inquiry is worthwhile--it's likely you're "on to something." Please continue to pursue these ideas.
Appreciate it Brian. I get the strong sense that there is some deep link between information/computation and fundamental aspects of physics and I don’t think it’s a coincidence—more fractal. There’s a lot of hand waving of course but before something becomes a science it often starts as philosophy or aliasing some shadow of a pattern.