I am reasonably sure in that the sequential estimator replaces the parameter, so x1 is used in each estimator, x2 in all but the first one, &tc.

Are you sure? I understand that it is more like-HMM (mth-order HMM on a sequence of length n, where n>>m) so on a long sequence this "x1 is used n times, x2 n-1 times, and so on" is not really true. Right?

According to the Information theory and NML theory when one has only one observation, then the observation (that is the message) should not be compressed and should be sent in clear (in other words no statistics or information theory is really needed in this case)!

