We present a novel method for averaging a sequence of histogram states visited by a Metropolis-Hastings Markov chain whose stationary distribution is the posterior distribution over a dense space of tree-based histograms. The computational efficiency of our posterior mean histogram estimate relies on a statistical data-structure that is sufficient for non-parametric density estimation of massive, multi-dimensional metric data. This data-structure is formalized as statistical regular paving (SRP). A regular paving (RP) is a binary tree obtained by selectively bisecting boxes along their first widest side. SRP augments RP by mutably caching the recursively computable sufficient statistics of the data. The base Markov chain used to propose moves for the Metropolis-Hastings chain is a random walk that data-adaptively prunes and grows the SRP histogram tree. We use a prior distribution based on Catalan numbers and detect convergence heuristically. The L1-consistency of the the initializing strategy over SRP histograms using a data-driven randomized priority queue based on a generalized statistically equivalent blocks principle is proved by bounding the Vapnik-Chervonenkis shatter coefficients of the class of SRP histogram partitions. The performance of our posterior mean SRP histogram is empirically assessed for large sample sizes simulated from several multivariate distributions that belong to the space of SRP histograms.
The paper actually appeared in the special issue of TOMACS Arnaud Doucet and I edited last year. It is coauthored by Dominic Lee, Jennifer Harlow and Gloria Teng. Unfortunately, Raazesh could not connect to our video-projector. Or fortunately as he gave a blackboard talk that turned to be fairly intuitive and interactive.