Archive for technical report

ABC by classification

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on December 21, 2021 by xi'an

As a(nother) coincidence, yesterday, we had a reading group discussion at Paris Dauphine a few days after Veronika Rockova presented the paper in person in Oaxaca. The idea in ABC by classification that she co-authored with Yuexi Wang and Tetsuya Kaj is to use the empirical Kullback-Leibler divergence as a substitute to the intractable likelihood at the parameter value θ. In the generalised Bayes setting of Bissiri et al. Since this quantity is not available it is estimated as well. By a classification method that somehow relates to Geyer’s 1994 inverse logistic proposal, using the (ABC) pseudo-data generated from the model associated with θ. The convergence of the algorithm obviously depends on the choice of the discriminator used in practice. The paper also makes a connection with GANs as a potential alternative for the generalised Bayes representation. It mostly focus on the frequentist validation of the ABC posterior, in the sense of exhibiting a posterior concentration rate in n, the sample size, while requiring performances of the discriminators that may prove hard to check in practice. Expanding our 2018 result to this setting, with the tolerance decreasing more slowly than the Kullback-Leibler estimation error.

Besides the shared appreciation that working with the Kullback-Leibler divergence was a nice and under-appreciated direction, one point that came out of our discussion is that using the (estimated) Kullback-Leibler divergence as a form of distance (attached with a tolerance) is less prone to variability (or more robust) than using directly (and without tolerance) the estimate as a substitute to the intractable likelihood, if we interpreted the discrepancy in Figure 3 properly. Another item was about the discriminator function itself: while a machine learning methodology such as neural networks could be used, albeit with unclear theoretical guarantees, it was unclear to us whether or not a new discriminator needed be constructed for each value of the parameter θ. Even when the simulations are run by a deterministic transform.

how a hiring quota failed [or not]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on February 26, 2019 by xi'an

This week, Nature has a “career news” section dedicated to how hiring quotas [may have] failed for French university hiring. And based solely on a technical report by a Science Po’ Paris researcher. The hiring quota means that every hiring committee for a French public university hiring committee must be made of at least 40% members of each gender.  (Plus at least 50% of external members.) Which has been reduced to 30% in some severely imbalanced fields like mathematics. The main conclusion of the report is that the reform has had a negative impact on the hiring imbalance between men and women in French universities, with “the higher the share of women in a committee, the lower women are ranked” (p.2). As head of the hiring board in maths at Dauphine, which officiates as a secretarial committee for assembling all hiring committee, I was interested in the reasons for this perceived impact, as I had not observed it at my [first order remote] level. As a warning the discussion that follows makes little sense without a prior glance at the paper.

“Deschamps estimated that without the reform, 21 men and 12 women would have been hired in the field of mathematics. But with the reform, committees whose membership met the quota hired 30 men and 3 women” Nature

Skipping the non-quantitative and somewhat ideological part of the report, as well as descriptive statistics, I looked mostly at the modelling behind the conclusions, as reported for instance in the above definite statement in Nature. Starting with a collection of assumptions and simplifications. A first dubious such assumption is that fields and even less universities where the more than 40% quota was already existing before (the 2015 reform) could be used as “control groups”, given the huge potential for confounders, especially the huge imbalance in female-to-male ratios in diverse fields. Second, the data only covers hiring histories for three French universities (out of 63 total) over the years 2009-2018 and furthermore merges assistant (Maître de Conférence) and full professors, where hiring is de facto much more involved, with often one candidate being contacted [prior to the official advertising of the position] by the department as an expression of interest (or the reverse). Third, the remark that

“there are no significant differences between the percentage of women who apply and those who are hired” (p.9)

seems to make the all discussion moot… and contradict both the conclusion and the above assertion! Fourth, the candidate’s qualification (or quality) is equated with the h-index, which is highly reductive and, once again, open to considerable biases in terms of seniority degree and of field. Depending on the publication lag and also the percentage of publications in English versus the vernacular in the given field. And the type of publications (from an average of 2.94 in business to 9.96 on physics]. Fifth, the report equates academic connections [that may bias the ranking] with having the supervisor present in the hiring committee [which sounds like a clear conflict of interest] or the candidate applying in the [same] university that delivered his or her PhD. Missing a myriad of other connections that make committee members often prone to impact the ranking by reporting facts from outside the application form.

“…controlling for field fixed effects and connections make the coefficient [of the percentage of women in the committee] statistically insignificant, though the point estimate remains high.” (p.17)

The models used by Pierre Deschamps are multivariate logit and probit regressions, where each jury attaches a utility to each of its candidates, made of a qualification term [for the position] and of a gender bias most surprisingly multiplying candidate gender and jury gender dummies. The qualification term is expressed as a [jury free] linear regression on covariates plus a jury fixed effect. Plus an error distributed as a Gumbel extreme variate that leads to a closed-form likelihood [and this seems to be the only reason for picking this highly skewed distribution]. The probit model is used to model the probability that one candidate has a better utility than another. The main issue with this modelling is the agglomeration of independence assumptions, as (i) candidates and hired ones are not independent, from being evaluated over several positions all at once, with earlier selections and rankings all public, to having to rank themselves all the positions where they are eligible, to possibly being co-authors of other candidates; (ii) jurys are not independent either, as the limited pool of external members, esp. in gender-imbalanced fields, means that the same faculty often ends up in several jurys at once and hence evaluates the same candidates as a result, plus decides on local ranking in connection with earlier rankings; (iii) independence between several jurys of the same university when this university may try to impose a certain if unofficial gender quota, a variate obviously impossible to fill . Plus again a unique modelling across disciplines. A side but not solely technical remark is that among the covariates used to predict ranking or first position for a female candidate, the percentage of female candidates appears, while being exogenous. Again, using a univariate probit to predict the probability that a candidate is ranked first ignores the comparison between a dozen candidates, both male and female, operated by the jury. Overall, I find little reason to give (significant) weight to the indicator that the president is a woman in the logistic regression and even less to believe that a better gender balance in the jurys has led to a worse gender balance in the hirings. From one model to the next the coefficients change from being significant to non-significant and, again, I find the definition of the control group fairly crude and unsatisfactory, if only because jurys move from one session to the next (and there is little reason to believe one field more gender biased than another, with everything else accounted for). And for another my own experience within hiring committees in Dauphine or elsewhere has never been one where the president strongly impacts the decision. If anything, the president is often more neutral (and never ever imoe makes use of the additional vote to break ties!)…

%d bloggers like this: