Calculating the entropy rate (the conditional entropy of the present symbol given all past symbols) or excess entropy (the mutual information between all past symbols and all future symbols) is not as easy as it may seem. Why? Because there are infinities-- an infinite number of past symbols and/or an infinite number of future symbols.
You can certainly make a lot of progress by tackling this problem head on, looking at longer and longer pasts and/or longer and longer futures.
I'm pretty lazy, so I usually look for shortcuts. Here's my favorite shortcut: identifying the minimal sufficient statistics of prediction and/or retrodiction, also known as forward- and reverse-time "causal states". Then, you can rewrite most of your favorite quantities that have the "right" kind of infinities in terms of these minimal sufficient statistics. If you're lucky, manipulation of these joint probability distributions of these forward- and reverse-time causal states is tractable.
My favorite paper illustrating this point is "Exact complexity", but for the more adventurous, I self-aggrandizingly recommend four of my own papers: "Predictive rate-distortion of infinite-order Markov processes", "Signatures of Infinity", "Statistical Signatures of Structural Organization", and the hopefully-soon-to-be-published "Structure and Randomness of Continuous-Time Discrete-Event Processes".
And finally, here's a copy of my talk at APS (that I missed due to sickness) that covers the corollary in "Predictive rate-distortion of infinite-order Markov processes".
Finding these causal states can be difficult, but this seems to be the best algorithm out there.