Finite-state dimension quantifies the asymptotic rate of information in an infinite sequence as perceived by finite automata. For a fixed alphabet, the infinite sequences that have maximal finite-state dimension are exactly those that are Borel normal, i.e., in which all words of any given length appear with the same frequency. A theorem of Schnorr and Stimm (1972) shows that a real number is Borel normal if and only if, for every finite-state irreducible Markov chain with fair transitions, when the chain is simulated using the binary expansion of the given number, the empirical distribution of states converges to its stationary distribution. In this paper we extend this correspondence beyond normal numbers. We show that the finite-state dimension of a sequence can be characterized in terms of the conditional Kullback-Leibler divergence between the limiting distributions arising from the simulation of Markov chains using the given sequence and their stationary distributions. This provides a new information-theoretic characterization of finite-state dimension which generalizes the Schnorr-Stimm result. As an application, we prove a generalization of Agafonov's theorem for normal numbers. Agafonov's theorem states that a sequence is normal if and only if every subsequence selected by a finite automaton is also normal. We extend this to arbitrary sequences by establishing a tight quantitative relationship between the finite-state dimension of a sequence and the finite-state dimensions of its automatic subsequences.
翻译:暂无翻译