## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we do not simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):

## MCMSki 4, 5… [rejuvenating suggestion]

Posted in Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , on January 16, 2014 by xi'an

Another thing I should have included in the program. Or in the organising committee: a link with the Young Bayesians (j-ISBA) section… As pointed out to me by Kerrie Mengersen, ISBA meetings are obvious opportunities for young researchers to interact and network, as well as for seeking a job. Thus, there should be time slots dedicated to them in every ISBA sponsored meeting, from a mixer on the very first day to a job market coffee break the next day (and to any other social activity bound to increase the interactivity. Like a ski race.). So I would suggest every ISBA sponsored event (and no only the Bayesian Young Statistician Meetings!) should include a j-ISBA representative in its committee(s) to enforce this policy… (Kerrie also suggested random permutations during the banquet which is a neat idea provided the restaurant structure allows for this. It would have been total chaos in La Calèche last week!)

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #18)

Posted in Mountains, R, Statistics, University life with tags , , , , , , , , , , , , on January 6, 2014 by xi'an

MCMSki IV is about to start! While further participants may still register (registration is still open!), we are currently 223 registered participants, without accompanying people. I do hope most of these managed to reach the town of Chamonix-Mont-Blanc despite the foul weather on the East Coast. Unfortunately, three speakers (so far) cannot make it: Yugo Chen (Urbana-Champaign), David Hunter (Penn State), Georgios Karagiannis (Toronto), and Liam Paninski (New York). Nial Friel will replace David Hunter and give a talk on Noisy MCMC.

First, the  posters for tonight session (A to K authors) should be posted today (before dinner) on the boards at the end of the main lecture theatre. And removed tonight as well. Check my wordpress blog for the abstracts. (When I mentioned there was no deadline for sending abstracts, I did not expect getting one last Friday!)

Second, I remind potential skiers that the most manageable option is to ski on the Brévent domain, uphill from the conference centre. There is even a small rental place facing the cable-car station (make sure to phone +33450535264 to check they still have skis available) and renting storage closets…

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #17)

Posted in Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 3, 2014 by xi'an

We are a few days from the start, here are the latest items of information for the participants:

The shuttle transfer on January 5th, from Geneva Airport to Chamonix lasts 1 hour 30 minutes. At your arrival in the airport , follow the “Swiss Exit”. After the customs, the bus driver (handling a sign “MCMC’Ski Chamonix”) will be waiting for you at the Meeting Point in the Arrival Hall. The bus driver will arrive 10 minutes before the time of the meeting and will check for each participant on his or her list. There may be delays in case of poor weather.  The bus will drop you in front of or close to your hotel. If you miss the bus initially booked, you can get the next one. If you miss the last transfer, taking a taxi will be the only solution (warning, about 250 Euros!!!)

The registration will start on Monday January 6th at 8am, the conference will start at 8.45am. The conference will take place at the Majestic Congress Center, located 241 Allée du Majestic, in downtown Chamonix. There are signs all over town directing to Majestic Congrés. (No skiing equipment, i.e., skis, boots, boards, is allowed inside the building.) Speakers are advised to check with their chair in advance about downloading their talk.

The Richard Tweedie ski race should take place on Wednesday at 1pm, weather and snow permitting. There will be a registration line at the registration desk. (The cost is 10€ per person and does not include lift passes or equipment.) Thanks to Antonietta Mira, there will be two pairs of skis to be won!)

## parallel MCMC via Weirstrass sampler (a reply by Xiangyu Wang)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on January 3, 2014 by xi'an

Almost immediately after I published my comments on his paper with David Dunson, Xiangyu Wang sent a long comment that I think worth a post on its own (especially, given that I am now busy skiing and enjoying Chamonix!). So here it is:

Thanks for the thoughtful comments. I did not realize that Neiswanger et al. also proposed the similar trick to avoid combinatoric problem as we did for the rejection sampler. Thank you for pointing that out.

For the criticism 3 on the tail degeneration, we did not mean to fire on the non-parametric estimation issues, but rather the problem caused by using the product equation. When two densities are multiplied together, the accuracy of the product mainly depends on the tail of the two densities (the overlapping area), if there are more than two densities, the impact will be more significant. As a result, it may be unwise to directly use the product equation, as the most distant sub-posteriors could be potentially very far away from each other, and most of the sub posterior draws are outside the overlapping area. (The full Gibbs sampler formulated in our paper does not have this issue, as shown in equation 5, there is a common part multiplied on each sub-posterior, which brought them close.)

Point 4 stated the problem caused by averaging. The approximated density follows Neiswanger et al. (2013) will be a mixture of Gaussian, whose component means are the average of the sub-posterior draws. Therefore, if sub-posteriors stick to different modes (assuming the true posterior is multi-modal), then the approximated density is likely to mess up the modes, and produce some faked modes (eg. average of the modes. We provide an example in the simulation 3.)

Sorry for the vague description of the refining method (4.2). The idea is kinda dull. We start from an initial approximation to θ and then do one step Gibbs update to obtain a new θ, and we call this procedure ‘refining’, as we believe such process would bring the original approximation closer to the true posterior distribution.

The first (4.1) and the second (4.2) algorithms do seem weird to be called as ‘parallel’, since they are both modified from the Gibbs sampler described in (4) and (5). The reason we want to propose these two algorithms is to overcome two problems. The first is the dimensionality curse, and the second is the issue when the subset inferences are not extremely accurate (subset effective sample size small) which might be a common scenario for logistic regression (with large parameters) even with huge data set. First, algorithm (4.1) and (4.2) both start from some initial approximations, and attempt to improve to obtain a better approximation, thus avoid the dimensional issue. Second, in our simulation 1, we attempt to pull down the performance of the simple averaging by worsening the sub-posterior performance (we allocate smaller amount of data to each subset), and the non-parametric method fails to approximate the combined density as well. However, the algorithm 4.1 and 4.2 still work in this case.

I have some problem with the logistic regression example provided in Neiswanger et al. (2013). As shown in the paper, under the authors’ setting (not fully specified in the paper), though the non-parametric method is better than simple averaging, the approximation error of simple averaging is small enough for practical use (I also have some problem with their error evaluation method), then why should we still bother to use a much more complicated method?

Actually I’m adding a new algorithm into the Weierstrass rejection sampling, which will render it thoroughly free from the dimensionality curse of p. The new scheme is applicable to the nonparametric method in Neiswanger et al. (2013) as well. It should appear soon in the second version of the draft.