How and how well do organisms predict environmental futures with limited resources?
|
A particular infinite-order Markov process called the Random Insertion Process has a series of phase transitions associated with trying to predict its future subject to limited memory. As the inverse "temperature" increases, causal states are added to our memory at certain critical temperatures. The colors denote the difference in how you add causal states and how memory (y-axis) increases when you're only allowed access to finite-length pasts. The implied representations of the Random Insertion Process along the curves can be used to build coarse-grained models of this process. The technology can be applied to far more complex stochastic processes.
|
The Random Insertion Process, an infinite-order Markov Process, has a tradeoff between memory and prediction, with memory on the x-axis and prediction on the y-axis. For a certain amount of memory, prediction is limited by the blue curve. The other curves show finite-length past approximations to the ground truth. The exact answer, the blue curve, can only be obtained by an algorithm that utilizes computational mechanics, my graduate school obsession. These curves are used to benchmark how well an organism would predict the future of the Random Insertion Process; the closer they are to the curve, the more efficiently they predict.
|
Organisms must predict the environmental future with limited resources in order to survive-- individual bacteria, bacterial populations, salamanders, humans, and even brain organoids and the better artificial neural networks. Why predict? Well, what we see are partial and noisy observations of a complicated world, so that in order to succeed, we must understand the so-called belief state of the environment-- the "predictive features", or causal states. At the same time, we have resource constraints on memory, time, energy, and materials that limit the quality of our predictions. Together, these two observations mean that organisms, both biological and artificial, should be "resource-rational predictors" or "efficient predictors". Ultimately, this stems from the fact that organisms must make decisions under resource constraints.
In short, I'm interested in how well and how organisms efficiently predict. We currently have very little proof that organisms are efficient predictors of the environment, and even less clue how organisms would accomplish such a feat, with some exceptions listed in the first sentence.
In short, I'm interested in how well and how organisms efficiently predict. We currently have very little proof that organisms are efficient predictors of the environment, and even less clue how organisms would accomplish such a feat, with some exceptions listed in the first sentence.
So what? What's the societal relevance?
What does it buy us if we prove that organisms are efficient predictors, and if we understand how they predict? It's hard to predict the societal ramifications of basic research. After all, general relativity led to GPS. But it's possible that we could use the amazing prediction algorithms used by organisms to build an energy-efficient ChatGPT, saving the environment from the huge data centers AI companies are building nowadays. (This makes more sense than you might think, given that ChatGPT is inferring causal states.)
You may be skeptical, but in the static case, in which we try to retain accurate perceptions as we compress, our work has already led to new insights into biosensors here and here that led to a patent! I am proudly an advisor for the corresponding startup, Awecom, Inc.
You may be skeptical, but in the static case, in which we try to retain accurate perceptions as we compress, our work has already led to new insights into biosensors here and here that led to a patent! I am proudly an advisor for the corresponding startup, Awecom, Inc.
How do we pursue answers to these questions?
As this is quantitative biology with both theory and experiments, the Devil is in the details. The math of this all is quite beautiful.
Even the more conventional stuff is still a little new. We can benchmark how well organisms efficiently predict using a variant of rate-distortion theory, which limits to the information bottleneck method in a special case; and we can infer how organisms predict using maximum likelihood tricks and knock-out experiments.
But the real secret mathematical sauce to what we do and have done in the Marzen lab are the aforementioned causal states. Oftentimes, we're dealing with very long pasts and very long futures, which means our problems are very high-dimensional. (The relevant variables grow exponentially with the length of pasts and futures.)
So how do we attack these questions quantitatively, given this curse of dimensionality? If you have causal states, you have everything you need in order to predict the future of the environment as well as possible and nothing more. If you use causal states wisely, you can get a huge dimensionality reduction in what you need to keep track of in order to benchmark memory and prediction in organisms.
In fact, you can cleverly guarantee that you can perform this dimensionality reduction easily by controlling the complexity of the artificial stimuli that the organisms are shown. You simply have to generate their environment from a special simpler type of hidden Markov model called an epsilon-Machine. (Unfortunately, that means that the environments in the experiments we do are more artificial than naturalistic, but you can still test hypotheses about whether or not organisms are efficient predictors even when organisms are not in naturalistic settings.)
A lot of people are interested in predictive inference in organisms, but almost none of them use causal states to benchmark how well organisms memorize and predict environmental pasts and futures. I do! Bear with me here: Computational mechanics, the study of causal states, is the underlying mathematical framework that explains recurrent neural networks, the algorithmic machinery of efficient prediction, which in turn is the sensory part of resource-rational decision making. It sounds kind of convoluted, but basically, all notions of memory and prediction--which arise from considering decision making that is constrained by resources-- are essentially equivalent to asking how the cognitive system relates to causal states. There is even evidence that working memory in humans is a mutual information between the human brain and the causal state. I have therefore focused on connecting computational mechanics to the predictive capabilities of biological and artificial organisms.
Even the more conventional stuff is still a little new. We can benchmark how well organisms efficiently predict using a variant of rate-distortion theory, which limits to the information bottleneck method in a special case; and we can infer how organisms predict using maximum likelihood tricks and knock-out experiments.
But the real secret mathematical sauce to what we do and have done in the Marzen lab are the aforementioned causal states. Oftentimes, we're dealing with very long pasts and very long futures, which means our problems are very high-dimensional. (The relevant variables grow exponentially with the length of pasts and futures.)
So how do we attack these questions quantitatively, given this curse of dimensionality? If you have causal states, you have everything you need in order to predict the future of the environment as well as possible and nothing more. If you use causal states wisely, you can get a huge dimensionality reduction in what you need to keep track of in order to benchmark memory and prediction in organisms.
In fact, you can cleverly guarantee that you can perform this dimensionality reduction easily by controlling the complexity of the artificial stimuli that the organisms are shown. You simply have to generate their environment from a special simpler type of hidden Markov model called an epsilon-Machine. (Unfortunately, that means that the environments in the experiments we do are more artificial than naturalistic, but you can still test hypotheses about whether or not organisms are efficient predictors even when organisms are not in naturalistic settings.)
A lot of people are interested in predictive inference in organisms, but almost none of them use causal states to benchmark how well organisms memorize and predict environmental pasts and futures. I do! Bear with me here: Computational mechanics, the study of causal states, is the underlying mathematical framework that explains recurrent neural networks, the algorithmic machinery of efficient prediction, which in turn is the sensory part of resource-rational decision making. It sounds kind of convoluted, but basically, all notions of memory and prediction--which arise from considering decision making that is constrained by resources-- are essentially equivalent to asking how the cognitive system relates to causal states. There is even evidence that working memory in humans is a mutual information between the human brain and the causal state. I have therefore focused on connecting computational mechanics to the predictive capabilities of biological and artificial organisms.
|
|
To do all this, we had to find the causal structure of generic continuous-time, discrete-event processes and the operators that propagated information along that causal structure. The epsilon-Machine isn't just a boring hidden Markov model anymore; it's the beautiful set of coupled conveyer belts shown to my right.
|
Tell me more about the math!
In graduate school, inspired by this paper and this paper, I happily realized that the lossless predictive features studied by Jim Crutchfield could be used to not only infer predictive features and build models, but to benchmark how well some agent inferred predictive features. Sometimes, the agent is better off building order-R Markov models; but sometimes, the agent is better off uncovering hidden states, thereby building infinite-order Markov models. And one has to be careful, because sometimes memorizing the past provides no guide to the future.
We have used this technology to benchmark recurrent neural networks, humans, and neurons, where the long and short of it is that the best recurrent neural networks and biological neural networks prove to be near-optimal at compression and prediction. There's a neat trick that we did in these papers, because at first, you might think that you would have to infer causal states in order to benchmark at all. Not the case! If you use an epsilon-Machine of your choice to generate output that can be used as input into prediction engines that are being benchmarked, you know exactly what the causal states of the input are, both forward-time and reverse-time, from an unpublished algorithm from Jim Crutchfield. It's the mathematical equivalent of showing a cat a synthetic video that you have created with full understanding of the video's generative model and recording from the cat's brain to figure out how and how well the cat is processing information.
This result might lead you to wonder: is it easy to predict and compress? Not at all! Large random channels fail at this in both discrete and continuous-time, and surprisingly, even the best artificial networks cannot predict simple hidden Markov model output well when the model has a large number of states.
Therefore, something interesting is going on with the best artificial agents and biological organisms that we have yet to uncover, exploit, and improve.
We have used this technology to benchmark recurrent neural networks, humans, and neurons, where the long and short of it is that the best recurrent neural networks and biological neural networks prove to be near-optimal at compression and prediction. There's a neat trick that we did in these papers, because at first, you might think that you would have to infer causal states in order to benchmark at all. Not the case! If you use an epsilon-Machine of your choice to generate output that can be used as input into prediction engines that are being benchmarked, you know exactly what the causal states of the input are, both forward-time and reverse-time, from an unpublished algorithm from Jim Crutchfield. It's the mathematical equivalent of showing a cat a synthetic video that you have created with full understanding of the video's generative model and recording from the cat's brain to figure out how and how well the cat is processing information.
This result might lead you to wonder: is it easy to predict and compress? Not at all! Large random channels fail at this in both discrete and continuous-time, and surprisingly, even the best artificial networks cannot predict simple hidden Markov model output well when the model has a large number of states.
Therefore, something interesting is going on with the best artificial agents and biological organisms that we have yet to uncover, exploit, and improve.
Another thing I did along the way: Inferring predictive features
The actual first task I was faced with as a graduate student was this: here's some data. How should we build a model? At first, I was drawn to Maximum Entropy methods, but when I learned about hidden Markov models, I realized I had happened upon something more powerful. However, the number of hidden Markov model topologies grows super-exponentially with the number of states, making a brute force search untenable for real-world data. There are various ways around this, but I decided to turn the brute force search through all topologies into a brute force search for topologies that incorporate "expert knowledge" about the data. In a series of papers, I enumerated what kinds of topologies I expected to see for my favorite discrete-event, continuous-time data. Here's an early paper on the discrete-time case, a paper that morphs discrete into continuous-time, and a paper on the full discrete-event, continuous-time case. This is the algorithm that resulted, and we just used it to make a model of sperm whale speech and quantitatively evaluate its statistical complexity, predictability (excess entropy), and randomness (entropy rate).