This is a very good point (and not only for citing Yves whom I came to know when we were together at CREST). There are many analogues to unbalanced survey sampling in simulation, from tempering to importance sampling, SMC, and to Wang-Landau for MCMC.

]]>These of course are extensions of the work in model-assisted survey sampling done elsewhere, but Professor Tillé has often expressed definite ideas on things, which I greatly respect and try to understand as best I can. In *Sampling Algorithms* he states, for instance:

One often says that a sample is representative if it is a reduced model of the population. Representativeness is then adduced as an argument of validity: a good sample must resemble the population of interest in such a way that some categories appear with the same proportions in the sample as in the population. This theory, currently spread by the media, is, however, erroneous. It if often more desirable to overrepresent some categories of the population or even to select units with with unequal probabilities. The sample must not be a reduced model of the population; it is only a tool used to provide estimates.

Professor Tillé then goes on (on the opening pages of *Sampling Algorithms* to illustrate with the task of estimating iron production in a country whose iron production is dominated by two large companies. He argues that the two companies are what matter, and the remainder of the industry should be included, but according to a sampling design.

The relevance, as I see it, to MCMC and the like is that sampling ought to serve the estimator being sought. While this, in the technology of MCMC, breaches a point of standardization which makes MCMC attractive, I can see and understand that if the estimator offers a lens with which to see the population through, there might be things it implies about how the sampling is done.

Of course, practical matters attend. If generating the MCMC just took days and a scholar isn’t sure that this particular estimator is the only one of interest, it may serve to pick the sample as the setup for using a broader class of estimators, and supporting the wider field.

]]>