Monte Carlo simulation and resampling methods for social science [book review]

Monte Carlo simulation and resampling methods for social science is a short paperback written by Thomas Carsey and Jeffrey Harden on the use of Monte Carlo simulation to evaluate the adequacy of a model and the impact of assumptions behind this model. I picked it in the library the other day and browsed through the chapters during one of my métro rides. Definitely not an in-depth reading, so be warned before reading the [telegraphic] review!

Overall, I think the book is doing a good job of advocating the use of simulation to evaluate the pros and cons of a given model (rephrased as data generating process) when faced with data. And doing it in R. After some rudiments in probability theory and in R programming, it briefly explains the use of resident random generators if not of how to handle new distributions and then spend a large part of the book on simulation around generalised and regular linear models. For instance, in the linear model, the authors test the impact of heterocedasticity, multicollinearity, measurement error, omitted variable(s), serial correlation, clustered data, and heavy-tailed errors. While this is a perfect way of exploring those semi-hidden hypotheses behind the linear model, I wonder at the impact on students of this exploration. On the one hand, they will perceive the importance of those assumptions and hopefully remember them. On the other hand, and this is a very recurrent criticism of mine, this implies a lot of maturity from the students, i.e., they have to distinguish the data, the model [maybe] behind the data, the finite if large number of hypotheses one can test, and the interpretation of the outcome of a simulation test… Given that they were introduced to basic probability just a few chapters before, this expectation [from the students] may prove unrealistic. (And a similar criticism applies to the following chapters, from GLM to jackknife and bootstrap.)

At the end of the book, the authors ask the question as to how could a reader use the information in this book towards one’s work. Drafting a generic protocol for this reader, who is supposed to consider “alterations to the data generating process” (p.272) and to “identify a possible problem or assumption violation” (p.271). Thus requiring a readership “who has some training in quantitative methods” (p.1). And then some more. But I definitely sympathise with the goal of confronting models and theory with the harsh reality of simulation output!

2 Responses to “Monte Carlo simulation and resampling methods for social science [book review]”

  1. I have lost my copy of this book, now requested by the library! If anyone has my copy, thanks for returning it.

  2. > very recurrent criticism of mine, this implies a lot of maturity from the students

    I agree and have speculated that they have to realise that they need to abstractly represent the empirical question, work with that abstraction (in an essentially error free way) without being able to anticipate how what’s implied by working with that abstraction does not quite line up empirically with what happened/happens empirically – there are always unexpected and delayed surprises (which never end).

    Maybe the essentially error free ways of working with abstractions fools them into thinking nothing should go wrong (the infamous I did the stats correctly so the effect must be there).

    How do folks learn that? Where does that emerge from training in quantitative methods?

    A contrast I ran into, was people with quantitative training getting the animation below and those with little, being completely baffled as why there were two models. The explanation that Nature’s machine represents the _reality_ which only _God_ gets to see and the Analyst’s machine represents our best representation/guess of that was not helpful.

    Click to access animation3.pdf

    (Might need to download and open in Adobe to see it properly)

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.