Intuitively, if you want to be energy efficient, you should die. (I credit Tony Bell with this analysis.) But now, it seems like there is buzz around the idea that energy efficiency leads to prediction of input.
In my opinion, this is almost true.
I can't find a way in which Tony Bell's argument doesn't hold up, unless you add so many constraints to your system that the death solution is impossible. For example, in this very interesting recent preprint, it seems to be the case that the death solution is impossible. If you add a scalar v in front of the input in their RNN, the death solution is now possible; I hypothesize some experiments might confirm that training for energy efficiency would set v to 0 and kill the activations of the network entirely. But instead, their system is constrained so that energy efficiency demands that "p" must be the negative of the input, and so predictive coding results.
A more general take on energy efficiency and prediction is the thermodynamics of prediction. Continuous-time versions are in this paper. I find this bound to be quite clever in that it equates prediction inefficiency with energy efficiency, rather than prediction wholesale. Prediction inefficiency can actually be zero when there is no prediction (e.g., this paper).
It is not yet clear to me if this bound is tight, though, for optimized systems. Based on the overly simple examples in the aforementioned paper, I'd say no, but we'll have to see.