Research Program

Organisms must predict in order to survive. Why? Well, what we see are partial and noisy observations of a complicated world, so that in order to succeed, we must understand the so-called belief state of the environment-- the "predictive features".

My research to date mostly focuses on predictive features. These are aspects of any stream of data that you'd want to retain in order to understand future data. I also have a parallel interest in the constraints that govern such inference-- organisms only have so much memory. Together, these two threads form a main interest: resource-constrained predictive inference. Are biological organisms efficient predictive inference machines? If so, how do they operate, and how can we use that knowledge to engineer new devices?

Sensors as optimal compressors of the environment

This paper and this paper relate the task of a biological sensor to rate-distortion theory, a branch of information theory. Basically, a sensor has two tasks. First, it has to accurately convey information about the environment. Second, it has to be "smaller"-- have fewer neurons, take less time to compute things, be more compatible with downstream regions. Together, you get one grand objective function that alternately penalizes inaccuracy and memory.

Building such sensors is hard, but biologically-inspired algorithms can sometimes do the trick: here and here. These ideas have even lead to a patent! I am proudly an advisor for the company, Awecom, Inc., that owns the patent.

As part of a larger organism, sensors might opt to compress and predict, according to this seminal work. See our lab's work in the "Benchmarking how well agents infer predictive features" section on this matter.

Inferring predictive features

The first task I was faced with as a graduate student was this: here's some data. How should we build a model? At first, I was drawn to Maximum Entropy methods, but when I learned about hidden Markov models, I realized I had happened upon something more powerful. However, the number of hidden Markov model topologies grows super-exponentially with the number of states, making a brute force search untenable for real-world data. There are various ways around this, but I decided to turn the brute force search through all topologies into a brute force search for topologies that incorporate "expert knowledge" about the data. In a series of papers, I enumerated what kinds of topologies I expected to see for my favorite discrete-event, continuous-time data. Here's an early paper on the discrete-time case, a paper that morphs discrete into continuous-time, and a paper on the full discrete-event, continuous-time case. This is the algorithm that resulted.

Benchmarking how and how well agents infer predictive features--or rather, epsilon-Machines as stimuli

In graduate school, inspired by this paper and this paper, I happily realized that the lossless predictive features studied by Jim Crutchfield could be used to not only infer predictive features and build models, but to benchmark how well some agent inferred predictive features. Sometimes, the agent is better off building order-R Markov models; but sometimes, the agent is better off uncovering hidden states, thereby building infinite-order Markov models. And one has to be careful, because sometimes memorizing the past provides no guide to the future.

We have used this technology to benchmark recurrent neural networks, humans, and neurons (academic.oup.com/pnasnexus/article/2/6/pgad188/7202378), where the long and short of it is that the best recurrent neural networks and biological neural networks prove to be near-optimal at compression and prediction. (Paper in preparation.) There's a neat trick that we did in these papers, because at first, you might think that you would have to infer causal states in order to benchmark at all. Not the case! If you use an epsilon-Machine of your choice to generate output that can be used as input into prediction engines that are being benchmarked, you know exactly what the causal states of the input are, both forward-time and reverse-time, from an unpublished algorithm from Jim Crutchfield. It's the mathematical equivalent of showing a cat a synthetic video that you have created with full understanding of the video's generative model and recording from the cat's brain to figure out how and how well the cat is processing information.

This result might lead you to wonder: is it easy to predict and compress? Not at all! Large random channels fail at this in both discrete and continuous-time, and surprisingly, even the best artificial networks cannot predict the humble epsilon-Machine well when it has a large number of states (submitted). Something interesting is going on with the best artificial agents and biological organisms that we have yet to uncover, exploit, and improve. (Grant to be submitted-- contact me if you're interested in a research assistant gig.)