Debiasing Importance Sampling

Update

Interested readers may want to check out reference [1] below.

The Sampling Officials, painted by Rembrandt in 1662.

This post is about (self-normalized) importance sampling, a method to approximate a probability distribution \(\pi\) by generating samples from another probability distribution \(q\) and weighting them. We assume that both distributions have probability density functions that can be evaluated pointwise up to multiplicative constants. Importance sampling goes as follows.

Sample \(X_1,\ldots,X_N\sim q\) independently.
Compute \(w_n = \pi(X_n)/q(X_n)\) for each \(n=1,\ldots,N\).
Return the weighted empirical measure \(\hat{\pi} = \sum_{n=1}^N w_n \delta_{X_n} / \sum_{n=1}^N w_n\) where \(\delta_x\) denotes the Dirac measure at \(x\).

Because the importance sampling approximation \(\hat{\pi}\) involves a ratio where weights appear both on the numerator and the denominator, it does not matter if we multiply the weights by any constant value, hence the “self-normalized” qualifier.

By the law of large numbers, the central limit theorem and the delta method, one can show that importance sampling yields consistent estimates, converging at rate \(\sqrt{N}\), etc. See Art Owen’s chapter on importance sampling. On the other hand, it is well-known that importance sampling yields biased estimators for any finite \(N\); simply because the expectation of a ratio is not equal to the ratio of the expectations.

It turns out that there are generic ways to remove that bias. The idea is to embed importance sampling in particle MCMC [2], and then to apply the debiasing technique pioneered in [3]. The combination yields a simple algorithm, applicable whenever importance sampling is, that returns unbiased estimators of expectations with respect to \(\pi\), for any finite \(N\). This was noted in [4]. That paper deals with more general “particle filters”, but the method applies also to importance sampling or variants such as annealed importance sampling. More on this and self-contained experiments in the following notebook: in particular we observe that, as \(N\to \infty\), the bias of importance sampling can be removed with a vanishing increase in variance.

You can find experiments in this notebook: https://github.com/pierrejacob/blog-code/blob/main/debiasingis.ipynb

[1]

Deligiannidis, G., Jacob, P. E., Khribch, E. M. and Wang, G. (2025). On importance sampling and independent Metropolis-Hastings with an unbounded weight function. arXiv preprint arXiv:2411.09514v2.

[2]

Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 72 269–342.

[3]

Glynn, P. W. and Rhee, C.-H. (2014). Exact estimation for markov chain equilibrium expectations. Journal of Applied Probability 51 377–89.

[4]

Middleton, L., Deligiannidis, G., Doucet, A. and Jacob, P. E. (2019). Unbiased Smoothing using Particle Independent Metropolis-Hastings. Proceedings of Machine Learning Research, PMLR 89 2378–87.