Archive for Mondrian forests

AISTATS 2016 [#2]

Posted in Kids, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , on May 13, 2016 by xi'an

The second and third days of AISTATS 2016 passed like a blur, with not even the opportunity to write my impressions in real time! Maybe long tapa breaks are mostly to blame for this… In any case, we had two further exciting plenary talks about privacy-preserving data analysis by Kamalika Chaudhuri and crowdsourcing and machine learning by Adam Tauman Kalai. The talk by Kamalika was covering recent results by Kamalika and coauthors about optimal privacy preservation in classification and a generalisation to correlated data, with the neat notion of a Markov Quilt.  Other talks that same day also dwelt on this privacy issue, but I could not be . The talk by Adam was full of fun illustrations on humans training learning systems (with the unsolved difficulty of those humans deliberately mis-training the system, as exhibited recently by the short-lived Microsoft Tay experiment).

Both poster sessions were equally exciting, with the addition of MLSS student posters on the final day. Among many, I particularly enjoyed Iain Murray’s pseudo-marginal slice sampling, David Duvenaud’s fairly intriguing use of early stopping for non-parametric inference,  Garrett Bernstein’s work on aggregated Markov chains, Ye Wang’s scalable geometric density estimation [with a special bonus for his typo on the University of Turing, instead of Torino], Gemma Moran’s and Chengtao Li’s posters on determinantal processes, and Matej Balog’s Mondrian forests with a Laplace kernel [envisioning potential applications for ABC]. Again, just to mention a few…

The participants [incl. myself] also took one evening off to visit a sherry winery in Jerez, with a well-practiced spiel on the story of the company, with some building designed by Gutave Eiffel, and with a wine-tasting session. As I personally find this type of brandy too strong in alcohol, I am not a big fan of sherry but it was nonetheless an amusing trip! With no visible after-effects the next morning, since the audience was as large as usual for Adam’s talk [although I did not cross a machine-learning soul on my 6am run…]

In short, I enjoyed very much AISTATS 2016 and remain deeply impressed by the efficiency of the selection process and the amount of involvement of the actors of this selection, as mentioned earlier on the ‘Og. Kudos!

forest fires

Posted in Mountains, pictures, Travel with tags , , , , , , , , , , , , , , on August 26, 2015 by xi'an

fire1Wildfires rage through the US West, with currently 33 going in the Pacific Northwest, 29 in Northern California, and 18 in the northern Rockies, with more surface burned so far this year than in any of the past ten years. Drought, hot weather, high lightning frequency, and a shortage of firefighters across the US all are contributing factors…fire2Washington State is particularly stricken and when we drove to the North Cascades from Mt. Rainier, we came across at least two fires, one near Twisp and the other one around Chelan… The visibility was quite poor, due to the amount of smoke, and, while the road was open, we saw many burned areas with residual fumaroles and even a minor bush fire that was apparently let to die out by itself. The numerous orchards around had been spared, presumably thanks to their irrigation system.fire3The owner of a small café and fruit stand on Highway 20 told us about her employee, who had taken the day off to protect her home, near Chelane, that had already burned down last year. Among 300 or so houses. Later on our drive north, the air cleared up, but we saw many instances of past fires, like the one below near Hart’s Pass, which occurred in 2003 and has not yet reached regeneration. Wildfires have always been a reality in this area, witness the first US smokejumpers being based (in 1939) at Winthrop, in the Methow valley, but this does not make it less of an objective danger. (Which made me somewhat worried as we were staying in a remote wooden area with no Internet or phone coverage to hear about evacuation orders. And a single evacuation route through a forest…)fire5Even when crossing the fabulous North Cascades Highway to the West and Seattle-Tacoma airport, we saw further smoke clouds, like this one near Goodall, after Lake Ross, with closed side roads and campgrounds.fire4And, when flying back on Wednesday, along the Canadian border, more fire fronts and smoke clouds were visible from the plane. Little did we know then that the town of Winthrop, near which we stayed, was being evacuated at the time, that the North Cascades Highway was about to be closed, and that three firefighters had died in nearby Twisp… Kudos to all firefighters involved in those wildfires! (And close call for us as we would still be “stuck” there!)fire6

parallelizing MCMC with random partition trees

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on July 7, 2015 by xi'an

Another arXived paper in the recent series about big or tall data and how to deal with it by MCMC. Which pertains to the embarrassingly parallel category. As in the previously discussed paper, the authors (Xiangyu Wang, Fangjian Guo, Katherine Heller, and David Dunson) chose to break the prior itself into m bits… (An additional point from last week criticism is that, were an unbiased estimator of each term in the product available in an independent manner, the product of the estimators would be the estimator of the product.) In this approach, the kernel estimator of Neiswanger et al. is replaced with a random partition tree histogram. Which uses the same block partition across all terms in the product representation of the posterior. And hence ends up with a smaller number of terms in the approximation, since it does not explode with m. (They could have used Mondrian forests as well! However I think their quantification of the regular kernel method cost as an O(Tm) approach does not account for Neiswanger et al.’s trick in exploiting the product of kernels…) The so-called tree estimate can be turned into a random forest by repeating the procedure several times and averaging. The simulation comparison runs in favour of the current method when compared with other consensus or non-parametric methods. Except in the final graph (Figure 5) which shows several methods achieving the same prediction accuracy against running time.