Introduction to Discrete MCMC (with Scrabble)

This page organizes the software resources that I wrote to accompany my notes on MCMC. These resources are still a work in progress but I hope they will be useful to people encountering this material for the first time. The underlying code for all of these tools is available on GitHub.

## Introduction

As MCMC sampling has become an increasingly popular tool for evaluating districting plans, people from a diverse set of backgrounds are encountering these methods for the first time. These notes and Sage Widgets represent my attempt at explaining the underlying ideas in a concrete and friendly fashion, with lots of opportunity for interaction and exploration. Throughout the .pdf document there are many links to individual interactive tools for exploring the ideas further. These tools and brief descriptions of their purposes are linked below. The organizational structure of this page follows that of the accompanying text.

If you are interested in building your own Sage @interact modules, I have prepared some notes and examples at the bottom of this page. I frequently use these tools for both teaching and research and highly recommend them.

#### Scrabble

I have always thought that Scrabble tiles are a great tool for teaching Markov chains and MCMC. They are familiar to many people, the tiles themselves come in a non-uniform distribution, and each tile has an associated point value that leads to an entirely separate distribution. The table below shows the frequency and point distributions for the American English tileset.

 Letter A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Frequency 9 2 2 4 12 2 3 2 9 1 1 4 2 6 8 2 1 6 4 6 4 2 2 1 2 1 2 Score 1 3 3 2 1 4 2 4 1 8 5 1 3 1 1 3 10 1 1 1 1 4 4 8 4 10 0

Throughout these interactive programs, we make use of a few specific Markov chains and score functions that are worth highlighting in advance (full descriptions and plots are also in the .pdf):

• Markov Chains
• Uniform: Each state is drawn from the uniform distribution over the alphabet.
• Scrabble Count: Each state is drawn proportionally to the number of occurences in the Scrabble tileset (with replacement).
• Letter Path: A random walk on the path graph with each letter connected to the adjacenct letters in the alphabet with [space] after z (Example Figure).
• Letter Cycle: A random walk on the cycle graph with each letter connected to the adjacenct letters in the alphabet with [space] after z and [space] connected to a (Example Figure).
• Keyboard: A random walk on adjacent keys of a standard QWERTY keyboard (Example Figure).
• Score Function
• Uniform: Each letter has a score of 1.
• Number of Vowels: Consonants have score 1, vowels have score 100, and y has score 50.
• Scrabble Points: The point value of the corresponding Scrabble tile.
• Scrabble Count: The number of times the Scrabble tile appears in the full tileset.
• Alphabetical: a=1, b=2, ... ,z=26, [space]=27.

## Probability Background

This section introduces the ideas of distributions, random variables, and expected values using examples like rolling a die or drawing a Scrabble tile.
• Die Rolling
• This tool simulates rolling a die repeatedly as an example of draws from a probability distribution. You can adjust the number of faces and labels on the die as well as the number of times it is rolled. The output is a plot showing the theoretical distribution as well as one showing the distribution of actual rolls. The individual rolls themselves are also reported.
• Dice Expectations
• This tool attempts to estimate the expected value of a die roll by averaging the values of many rolls. As above, you can change the properties of the die in each experiment. The output shows the convergence of the expected value as well as the final estimate, distribution, and error.
• Scrabble Expectations
• This tool attempts to estimate the expected point value of a Scrabble tile drawn from a full bag (with replacement).The output shows the convergence of the expected value as well as the final estimate, distribution, and error.

## Monte Carlo Sampling

This section introduces the idea of Monte Carlo sampling as an efficient way to estimate numerical values when we can evaluate random samples easily.
• Deterministic Solitaire
• This tool uses Monte Carlo to evaluate the likelihood of winning a simple, deterministic solitaire game. You can vary the parameters of the game to see how the trace and win percentage evolve.
• Deterministic War
• This is a simulation of multiplayer deterministic game whose success rate can be evaluated with Monte Carlo sampling. Game rules are provided in these slides.
• Pan Galactic Solitaire
• This isn't mentioned in the text but my first experiences with Monte Carlo sampling occurred while trying to analyze the game of Pan Galactic Solitaire with Peter Doyle. The Python source and a Windows version for my implementation of the game are freely available. More details on GitHub.
• Distances in a cube
• This tool uses Monte Carlo to estimate the expected distance between two points drawn uniformly in a cube.
• Simple Pi Estimator
• This tool estimate pi as the area under a circular arc using Monte Carlo sampling of points in the unit square.

## Markov Chains

This section introduces Markov chains and the related concept of walks on graphs.
• Ant-on-a-keyboard
• This is not interactive, just a .gif showing the ant walking on a keyboard Markov chain.
• Walks on Graphs
• This tool simulates many walks on a given Markov chain in order to approximate the steady state distribution empirically. You select the chain, number of steps, and number of trials and then the output compares the sampled values to the true steady state distribution.
• Markov Distributions
• This visualization shows how the distribution over states evolves as a Markov chain progresses. You select the chain and an initial state and the output shows a heatmap of the probabilities of being in a particular state over time.
• Markov Chain Expected Values
• This tool calculates uses samples drawn from a Markov chain to estimate the expected value of a specified score distribution over the steady state distribution. You select an initial state, the Markov chain, and a score function and the output shows the convergence and error of the estimated expected value to the true answer.

## Markov Chain Monte Carlo

Finally, we reach the main topic of this discussion, actual MCMC sampling. This section introduces the Metropolis--Hastings variant of MCMC and gives several examples, making use of the previously introduced Markov chains and score functions. There is only one widget in this section but it incorporates many of the previous tools.
• MCMC Sampling
• This widget allows you to select an initial ``word'', proposal distribution, score distribution, and chain length and then performs MCMC with the chosen parameters. The outputs show the theoretical proposal and score steady state distributions as well as the distributions of states that were actually proposed and accepted in the run. The total variation distance between the empirical and theoretical distributions is also plotted along with the convergence of the expected value under the score distribution. Finally, the traces and accepted states are reported.

## Mixing Times

Although much of the literature on mixing times requires mathematical tools that are beyond the scope of this introduction, this section provides some intuition for how the properties of the chain and Metropolis-Hastings waiting process can impact the convergence rate.
• Mixing Comparison
• This visualization shows how the distribution over states evolves as various MCMC chains progress. You select the proposal distribution and MCMC score function as well as an initial distribution over the states. The initial distribution can either be uniform over the letters, a random probability vector, or a pure state with all of the mass concentrated on a single letter. The output shows a heatmap of the probabilities of being in a particular state over time and compares the total variation distance between the target distribution and the distribution after k steps from the given starting point.

## Lifted Markov Chains

One approach for forming faster mixing Markov chains is the idea of a ``lifted'' walk, where we make use of an auxiliary graph that has a known rapidly mixing chain. The key idea is that in well-behaved graphs (e.g. those with lots of symmetry) we can construct faster mixing chains by lifting to a larger graph with even nicer properties. Although this is unintuitive at first glance, as we are moving to a setting with more nodes to try to get more rapid mixing, this example demonstrates the main properties that make this procedure work.
• Lifted Walks
• This interactive explores Example 1.1 presented in the paper Lifting Markov Chains to Speed up Mixing by Chen, Lovasz, and Pak. The original walk here is on the path graph with n vertices, which is then lifted to the cycle graph on 2n-2 vertices. You can select the size of the path and how many steps to evaluate and then the simple walk is compared to the lifted one. More theoretical details are provided on the linked page.

## MCMC for Redistricting

The last section of the notes focuses on applying MCMC methods to the problem of sampling legislative districting plans. This requires both an additional layer of abstraction, as the states in our Markov chain are now partitions of graphs instead of individual nodes, as well as more concrete engagement with the real-world data and laws that govern the process. We see how all of the concepts introduced in the previous sections - defining the state space, picking a proposal method, computing the acceptance function, etc. - have to be modified to work in this applied setting. The section concludes with a discussion of annealing and partitions of grids.
• GerryChain
• The ideas presented in this section are implemented in the section are implemented in the GerryChain softare package, developed by VRDI and MGGG.
• This guide covers the inner workings of the software and supplements the material presented on this page with a computational perspective.
• Computational Approaches for Political Redistricting
• In January 2019 I developed an IAP course on computational redistricting at MIT. The course webpage has many more links to software tools and guides for exploring this exciting project.