In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by observing^{[clarification needed]} the chain after a number of steps. The more steps there are, the more closely the distribution of the sample matches the actual desired distribution.
YouTube Encyclopedic

1/5Views:116 6237 92462 2389 48252 217

✪ (ML 18.1) Markov chain Monte Carlo (MCMC) introduction

✪ MCMC

✪ Introduction to Bayesian statistics, part 2: MCMC and the Metropolis Hastings algorithm

✪ Markov chain Monte Carlo

✪ Machine learning  Importance sampling and MCMC I
Transcription
Contents
Application domains
Markov chain Monte Carlo methods are primarily used for calculating numerical approximations of multidimensional integrals, for example in Bayesian statistics, computational physics, computational biology,^{[1]} and computational linguistics.^{[2]}^{[3]}
In Bayesian statistics, the recent development of Markov chain Monte Carlo methods has been a key step in making it possible to compute large hierarchical models that require integrations over hundreds or even thousands of unknown parameters.^{[4]}
In rare event sampling, they are also used for generating samples that gradually populate the rare failure region.
General explanation
Markov chain Monte Carlo methods create samples from a possibly multidimensional continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance.
Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm which looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities.
Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in Markov chain Monte Carlo methods are autocorrelated.
These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given.
Reducing correlation
While MCMC methods were created to address multidimensional problems better than simple Monte Carlo algorithms, when the number of dimensions rises they too tend to suffer the curse of dimensionality: the regions of higher probability tend to stretch and get lost in a increasing volume of space that gives little contribution to the desired integral. One way to address this problem could be shortening the steps of the walker, so that it doesn't continuously try to exit the highest probability region, though this way the process would be highly autocorrelated and quite ineffective (i.e. many steps would be required for an accurate result). More sophisticated methods^{[which?]} use various ways of reducing the autocorrelation, while managing to keep the process in the regions that give a higher contribution to the integral. These algorithms^{[which?]} usually rely on a more complicated theory, and may be harder to implement, but they usually exhibit faster convergence (fewer steps required).
Examples
Examples of random walk Monte Carlo methods include the following:
 Metropolis–Hastings algorithm: This method generates a Markov chain using a proposal density for new steps and a method for rejecting some of the proposed moves. It is actually a general framework which includes as special cases the very first and simpler MCMC (Metropolis algorithm) and many more recent alternatives listed below.
 Gibbs sampling: This method requires all the conditional distributions of the target distribution to be sampled exactly. When drawing from the fullconditional distributions is not straightforward other samplerswithinGibbs are used (e.g., see ^{[5]}^{[6]}^{[7]}). Gibbs sampling is popular partly because it does not require any 'tuning'.
 Metropolisadjusted Langevin algorithm and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density.^{[8]}
 Slice sampling: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
 Multipletry Metropolis: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality.^{[9]}^{[10]}
 Reversiblejump: This method is a variant of the Metropolis–Hastings algorithm that allows proposals that change the dimensionality of the space.^{[11]} Markov chain Monte Carlo methods that change dimensionality have long been used in statistical physics applications, where for some problems a distribution that is a grand canonical ensemble is used (e.g., when the number of molecules in a box is variable). But the reversiblejump variant is useful when doing Markov chain Monte Carlo or Gibbs sampling over nonparametric Bayesian models such as those involving the Dirichlet process or Chinese restaurant process, where the number of mixing components/clusters/etc. is automatically inferred from the data.
 Hamiltonian (or Hybrid) Monte Carlo (HMC): Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing Hamiltonian dynamics, so the potential energy function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.
Trainingbased Markov chain Monte Carlo
Unlike most of the current Markov chain Monte Carlo methods that ignore the previous trials, using a new algorithm the Markov chain Monte Carlo algorithm is able to use the previous steps and generate the next candidate. This trainingbased algorithm is able to speedup the Markov chain Monte Carlo algorithm by an order of magnitude.^{[12]}
Interacting Markov chain Monte Carlo methodologies are a class of mean field particle methods for obtaining random samples from a sequence of probability distributions with an increasing level of sampling complexity.^{[13]} These probabilistic models include path space state models with increasing time horizon, posterior distributions w.r.t. sequence of partial observations, increasing constraint level sets for conditional distributions, decreasing temperature schedules associated with some BoltzmannGibbs distributions, and many others. In principle, any Markov chain Monte Carlo sampler can be turned into an interacting Markov chain Monte Carlo sampler. These interacting Markov chain Monte Carlo samplers can be interpreted as a way to run in parallel a sequence of Markov chain Monte Carlo samplers. For instance, interacting simulated annealing algorithms are based on independent MetropolisHastings moves interacting sequentially with a selectionresampling type mechanism. In contrast to traditional Markov chain Monte Carlo methods, the precision parameter of this class of interacting Markov chain Monte Carlo samplers is only related to the number of interacting Markov chain Monte Carlo samplers. These advanced particle methodologies belong to the class of FeynmanKac particle models,^{[14]}^{[15]} also called Sequential Monte Carlo or particle filter methods in Bayesian inference and signal processing communities.^{[16]} Interacting Markov chain Monte Carlo methods can also be interpreted as a mutationselection genetic particle algorithm with Markov chain Monte Carlo mutations.
Markov Chain quasiMonte Carlo (MCQMC)^{[17]}^{[18]} The advantage of lowdiscrepancy sequences in lieu of random numbers for simple independent Monte Carlo sampling is well known.^{[19]} This procedure, known as QuasiMonte Carlo method (QMC),^{[20]} yields an integration error that decays at a superior rate to that obtained by IID sampling, by the KoksmaHlawka inequality. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude^{[citation needed]}. The ArrayRQMC method^{[21]} combines randomized quasiMonte Carlo and Markov chain simulation by simulating chains simultaneously in a way that the empirical distribution of the states at any given step is a better approximation of the true distribution of the chain than with ordinary MCMC. In empirical experiments, the variance of the average of a function of the state sometimes converges at rate or even faster, instead of the Monte Carlo rate.^{[22]}
Convergence
Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error.^{[23]} A good chain will have rapid mixing: the stationary distribution is reached quickly starting from an arbitrary position. A standard empirical method to assess convergence is to run several independent simulated Markov chains and check that the ratio of interchain to intrachain variances for all the parameters sampled is close to 1.^{[23]}^{[24]}
Typically, Markov chain Monte Carlo sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated Markov chain Monte Carlobased algorithms such as coupling from the past can produce exact samples, at the cost of additional computation and an unbounded (though finite in expectation) running time.
Many random walk Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered.
See also Markov chain central limit theorem.
Software
Several software programs provide MCMC sampling capabilities, for example:
 Packages that use dialects of the BUGS model language:
 greta, a Bayesian statistical modeling language / R package which uses TensorFlow behind the scenes,^{[25]} similar to PyMC3's use of Theano as the computational backend
 MCSim
 PyMC3
 pymcmcstat
 R (programming language) with the packages adaptMCMC, atmcmc, BRugs, mcmc, MCMCpack, ramcmc, rjags, rstan, etc.
 Stan
 TensorFlow Probability (probabilistic programming library built on TensorFlow)
 MCL (a cluster algorithm for graphs)^{[26]} and HipMCL (a parallelized version)^{[27]}
 emcee (MIT licensed purePython implementation of Goodman & Weare's Affine Invariant Markov chain Monte Carlo Ensemble sampler)
 MacMCMC (Standalone, fullfeatured MCMC application for Mac OS)
See also
References
Citations
 ^ Gupta, Ankur; Rawlings, James B. (April 2014). "Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology". AIChE Journal. 60 (4): 1253–1268. doi:10.1002/aic.14409. PMC 4946376. PMID 27429455.
 ^ See Gill 2008.
 ^ See Robert & Casella 2004.
 ^ Banerjee, Sudipto; Carlin, Bradley P.; Gelfand, Alan P. (20140912). Hierarchical Modeling and Analysis for Spatial Data (Second ed.). CRC Press. p. xix. ISBN 9781439819173.
 ^ Gilks, W. R.; Wild, P. (19920101). "Adaptive Rejection Sampling for Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics). 41 (2): 337–348. doi:10.2307/2347565. JSTOR 2347565.
 ^ Gilks, W. R.; Best, N. G.; Tan, K. K. C. (19950101). "Adaptive Rejection Metropolis Sampling within Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics). 44 (4): 455–472. doi:10.2307/2986138. JSTOR 2986138.
 ^ Martino, L.; Read, J.; Luengo, D. (20150601). "Independent Doubly Adaptive Rejection Metropolis Sampling Within Gibbs Sampling". IEEE Transactions on Signal Processing. 63 (12): 3123–3138. arXiv:1205.5494. Bibcode:2015ITSP...63.3123M. doi:10.1109/TSP.2015.2420537. ISSN 1053587X.
 ^ See Stramer 1999.
 ^ Liu, Jun S.; Liang, Faming; Wong, Wing Hung (20000301). "The MultipleTry Method and Local Optimization in Metropolis Sampling". Journal of the American Statistical Association. 95 (449): 121–134. doi:10.1080/01621459.2000.10473908. ISSN 01621459.
 ^ Martino, Luca; Read, Jesse (20130711). "On the flexibility of the design of multiple try Metropolis schemes". Computational Statistics. 28 (6): 2797–2823. arXiv:1201.0646. doi:10.1007/s0018001304292. ISSN 09434062.
 ^ See Green 1995.
 ^ Tahmasebi, Pejman; Javadpour, Farzam; Sahimi, Muhammad (August 2016). "Stochastic shale permeability matching: Threedimensional characterization and modeling". International Journal of Coal Geology. 165: 231–242. doi:10.1016/j.coal.2016.08.024.
 ^ Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press. p. 626.
Monographs on Statistics & Applied Probability
 ^ Del Moral, Pierre (2004). FeynmanKac formulae. Genealogical and interacting particle approximations. Springer. p. 575.
Series: Probability and Applications
 ^ Del Moral, Pierre; Miclo, Laurent (2000). Branching and Interacting Particle Systems Approximations of FeynmanKac Formulae with Applications to NonLinear Filtering (PDF). Lecture Notes in Mathematics. 1729. pp. 1–145. doi:10.1007/bfb0103798. ISBN 9783540673149.
 ^ Del Moral, Pierre (2006). "Sequential Monte Carlo samplers  P. Del Moral  A. Doucet  A. Jasra  2006  Journal of the Royal Statistical Society: Series B (Statistical Methodology)  Wiley Online Library". Journal of the Royal Statistical Society. Series B (Statistical Methodology). 68 (3): 411–436. arXiv:condmat/0212648. doi:10.1111/j.14679868.2006.00553.x.
 ^ Chen, S., Josef Dick, and Art B. Owen. "Consistency of Markov chain quasiMonte Carlo on continuous state spaces." The Annals of Statistics 39.2 (2011): 673701.
 ^ Tribble, Seth D. Markov chain Monte Carlo algorithms using completely uniformly distributed driving sequences. Diss. Stanford University, 2007.
 ^ Papageorgiou, Anargyros, and J. F. Traub. "Beating Monte Carlo." Risk 9.6 (1996): 6365.
 ^ Sobol, Ilya M (1998). "On quasimonte carlo integrations". Mathematics and Computers in Simulation. 47 (2): 103–112. doi:10.1016/s03784754(98)000962.
 ^ L'Ecuyer, P., C. Lécot, and B. Tuffin. "A Randomized QuasiMonte Carlo Simulation Method for Markov Chains." Operations Research 56, 4 (2008): 958975.
 ^ L'Ecuyer, P., D. Munger, C. Lécot, and B. Tuffin. "Sorting Methods and Convergence Rates for ArrayRQMC: Some Empirical Comparisons." Mathematics and Computers in Simulation 143 (2018), 191201.
 ^ ^{a} ^{b} Gelman, A.; Rubin, D.B. (1992). "Inference from iterative simulation using multiple sequences (with discussion)". Statistical Science. 7 (4): 457–511. Bibcode:1992StaSc...7..457G. doi:10.1214/ss/1177011136.
 ^ Cowles, M.K.; Carlin, B.P. (1996). "Markov chain Monte Carlo convergence diagnostics: a comparative review". Journal of the American Statistical Association. 91 (434): 883–904. CiteSeerX 10.1.1.53.3445. doi:10.1080/01621459.1996.10476956.
 ^ "greta's software dependencies and inspirations". gretadev.github.io. Retrieved 20181002.
 ^ Enright, AJ; Van Dongen, S; Ouzounis, CA (1 April 2002). "An efficient algorithm for largescale detection of protein families". Nucleic Acids Research. 30 (7): 1575–84. doi:10.1093/nar/30.7.1575. PMC 101833. PMID 11917018.
 ^ Azad, A; Pavlopoulos, GA; Ouzounis, CA; Kyrpides, NC; Buluç, A (6 April 2018). "HipMCL: a highperformance parallel implementation of the Markov clustering algorithm for largescale networks". Nucleic Acids Research. 46 (6): e33. doi:10.1093/nar/gkx1313. PMC 5888241. PMID 29315405.
Sources
 Christophe Andrieu, Nando De Freitas, Arnaud Doucet and Michael I. Jordan An Introduction to MCMC for Machine Learning, 2003
 Asmussen, Søren; Glynn, Peter W. (2007). Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability. 57. Springer.
 Atzberger, P. "An Introduction to MonteCarlo Methods" (PDF).
 Berg, Bernd A. (2004). Markov Chain Monte Carlo Simulations and Their Statistical Analysis. World Scientific.
 Bolstad, William M. (2010). Understanding Computational Bayesian Statistics. Wiley. ISBN 9780470046098.
 Casella, George; George, Edward I. (1992). "Explaining the Gibbs sampler". The American Statistician. 46 (3): 167–174. CiteSeerX 10.1.1.554.3993. doi:10.2307/2685208. JSTOR 2685208.
 Gelfand, A.E.; Smith, A.F.M. (1990). "SamplingBased Approaches to Calculating Marginal Densities". Journal of the American Statistical Association. 85 (410): 398–409. CiteSeerX 10.1.1.512.2330. doi:10.1080/01621459.1990.10476213.
 Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Rubin, Donald B. (1995). Bayesian Data Analysis (1st ed.). Chapman and Hall. (See Chapter 11.)
 Geman, S.; Geman, D. (1984). "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence. 6 (6): 721–741. doi:10.1109/TPAMI.1984.4767596.
 Gilks, W.R.; Richardson, S.; Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC.
 Gill, Jeff (2008). Bayesian methods: a social and behavioral sciences approach (2nd ed.). Chapman and Hall/CRC. ISBN 9781584885627.
 Green, P.J. (1995). "Reversiblejump Markov chain Monte Carlo computation and Bayesian model determination". Biometrika. 82 (4): 711–732. CiteSeerX 10.1.1.407.8942. doi:10.1093/biomet/82.4.711.
 Neal, Radford M. (2003). "Slice Sampling". Annals of Statistics. 31 (3): 705–767. doi:10.1214/aos/1056562461. JSTOR 3448413.
 Neal, Radford M. (1993). "Probabilistic Inference Using Markov Chain Monte Carlo Methods".
 Robert, Christian P.; Casella, G. (2004). Monte Carlo Statistical Methods (2nd ed.). Springer. ISBN 9780387212395.
 Rubinstein, R.Y.; Kroese, D.P. (2007). Simulation and the Monte Carlo Method (2nd ed.). Wiley. ISBN 9780470177945.
 Smith, R.L. (1984). "Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed Over Bounded Regions". Operations Research. 32 (6): 1296–1308. doi:10.1287/opre.32.6.1296.
 Spall, J.C. (April 2003). "Estimation via Markov Chain Monte Carlo". IEEE Control Systems Magazine. 23 (2): 34–45. doi:10.1109/mcs.2003.1188770.
 Stramer, O.; Tweedie, R. (1999). "LangevinType Models II: SelfTargeting Candidates for MCMC Algorithms". Methodology and Computing in Applied Probability. 1 (3): 307–328. doi:10.1023/A:1010090512027.
Further reading
 Diaconis, Persi (April 2009). "The Markov chain Monte Carlo revolution" (PDF). Bull. Amer. Math. Soc. 46 (2): 179–205. doi:10.1090/s027309790801238x. S 02730979(08)01238X.
 Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. (2007), "Section 15.8. Markov Chain Monte Carlo", Numerical Recipes: The Art of Scientific Computing (3rd ed.), Cambridge University Press, ISBN 9780521880688
 Richey, Matthew (May 2010). "The Evolution of Markov Chain Monte Carlo Methods" (PDF). The American Mathematical Monthly. 117 (5): 383–413. CiteSeerX 10.1.1.295.4478. doi:10.4169/000298910x485923.
External links
 MCMC sampling and other methods in a basic overview, by Alexander Mantzaris (original link  now broken)
 PyMC  Python module implementing Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo.
 IA2RMS is a Matlab code of the "Independent Doubly Adaptive Rejection Metropolis Sampling" method, Martino, Read & Luengo (2015), for drawing from the fullconditional densities within a Gibbs sampler.