Sarah Marzen
  • Main
  • People
  • Contact
  • Google Scholar
  • Random ruminations
  • Research Program
  • Conferences, Workshops, and Working Groups
  • Teaching
  • Main
  • People
  • Contact
  • Google Scholar
  • Random ruminations
  • Research Program
  • Conferences, Workshops, and Working Groups
  • Teaching

Random musings

Stray thoughts on my research, related research, education research, and sweeping commentaries on entire fields
Picture

Why should I care about predictive information curves?

5/2/2017

0 Comments

 
Or, let's start simpler: why should I care about entropy rate?

A lot of machine learning research nowadays is focused on finding minimal sufficient statistics of prediction (a.k.a. "causal states"), or just sufficient statistics, of some time series, whether it be a time series of Wikipedia edits or of Amazon purchases.  Most of my research assumes that we know these causal states, and then tries to use that knowledge to calculate a range of quantities (including entropy rate and predictive information curves) more accurately than if you were to do it directly from the time series/data.

This leads to the question... why?  Why care about these quantities?  Entropy rate enjoys a privileged status, due to Shannon's first theorem, so let's focus on predictive information curves for just a second.

For the initiated, the predictive information bottleneck are an application of the information bottleneck method to time series prediction, in which we compress the past as efficiently as possible to understand the future to some desired extent.  For the uninitiated, predictive information curves tell us the tradeoff between resources required to predict the future and predictive power.  In one of the first papers on the subject, Still et al identified causal states as one limiting case of the predictive information bottleneck.  With that theorem in mind, one might reasonably ask the following question: why study causal states?  Just study the predictive information bottleneck, and causal states pop out as a special case.

Surprisingly, or maybe not so surprisingly, it turns out that calculating predictive information curves and lossy predictive features is much easier when you have the lossless predictive features, a.k.a. the causal states.  For instance, check out some of the examples in this paper.  So, we end up in sort of a Catch-22 situation: to get accurate lossy predictive features, we need accurate lossless predictive features.

The jaded among us might finally wearily ask the following: now what?  We set out to find causal states (lossless predictive features).  Some smart people promised us that we could calculate these using the predictive information bottleneck, but now someone else has told us that those calculations are likely to be crappy unless we already have access to causal states.

At this point, I "pivoted", provoked by the following question: how can we tell if a sensor is excellent at extracting lossy predictive features?  One way to find out is to send input with known causal states to the sensor, and then calculate how well the sensor performs relative to the corresponding predictive information curve, as was done in this inspiring paper.  If we know the input's causal states, then we can calculate its predictive information curve rather accurately, and therefore can be confident in our assessment of the sensor's predictive capabilities.

At this point, Professor Crutchfield pointed something else out: a coarse-grained dynamical model might be desired if the original model is too complicated to be understood.  Imagine generating a very principled low-dimensional dynamical description of complicated genetic or neural circuits.  It's not yet clear that the predictive information bottleneck provides the best way of doing so, but it's at least a start.

These two applications are summed up by the following paragraph: "At second glance, these results may also seem rather useless. Why would one want lossy predictive features when lossless predictive features are available? Accurate estimation of lossy predictive features could and have been used to further test whether or not biological organisms are near-optimal predictors of their environment. Perhaps more importantly, lossless models can sometimes be rather large and hard to interpret, and a lossy model might be desired even when a lossless model is known."

Check out this paper for an example of what I mean.
0 Comments



Leave a Reply.

    Author

    Write something about yourself. No need to be fancy, just an overview.

    Archives

    February 2025
    January 2025
    December 2024
    August 2024
    July 2024
    May 2024
    December 2023
    October 2023
    November 2022
    July 2022
    December 2021
    November 2021
    March 2021
    February 2021
    May 2020
    July 2019
    May 2017

    Categories

    All

    RSS Feed

Proudly powered by Weebly