Dot Plots in Python

A simple tool for AP Statistics teachers

Dan Hales
Python in Plain English

--

AP Statistics seems to love dotplots. They’re easy to make by hand, they quickly give you an idea of what your distribution looks like, and they don’t require any real planning or number-crunching before diving in––compare this to histograms, which require you to know how high the bar is going to be before you draw it, and how many bins you want to use. There’s a simple one-to-one correspondence with observations and dots on the plot, so they’re easy to understand and easy to produce by hand in a testing environment.

When I was teaching the early units in AP Statistics, I got sick of making dot plots by hand, and I wound up turning to a variety of online tools that simply weren’t giving me exactly what I was wanting, which was a no-frills, minimal, straightforward plot like this one:

An example of a simple dot plot

Although there are plenty of online tools out there for making dot plots for tests, quizzes, or lecture slides, I was generally put off by what they required of me. They frequently added in chartjunk like multiple colors or awkward label sizes, or they required me to jump through a ton of hoops in order to get what I was going for.

To address this issue, here is a simple script for generating dotplots. All it does is provide a simple wrapper for matplotlib.pyplot.scatter, in addition to a handful of functions for computing the coordinates of the dots. Because I was also teaching AP Computer Science (both A and Principles) while teaching AP Statistics, I was always on the lookout for activities that could blur the lines between courses, injecting some computer science into statistics, and vice-versa. If you are an AP Statistics or AP Computer Science teacher, feel free to adapt this code into a classroom activity, or to use it simply as a tool. The code is available on my github, and I have also created a repl where you can run it online.

I aimed to keep the Python as accessible to students and beginners as possible, so these are the concepts needed to prep the data for plotting:

  • basic list operations, including indexing or using max , min , or len
  • list comprehensions
  • dictionary operations

Admittedly, the code for calling plt.scatter is more complex and requires a more knowledge of matplotlib architecture, but that comes down more to me aiming for a specific aesthetic than necessary algorithmic thinking.

Problem Statement: We have a one-dimensional list of numerical observations, and a known function matplotlib.pyplot.scatter , and want to create a new function, dotplot , which will employ scatter to create a dotplot.

Approach: We need to take each observation and convert it into a set of coordinates. The x-coordinate will be a stack_key , meaning a real number on a number line where we will stack our points. The y-coordinate will need to be computed, and will correspond to the number of points that will be placed on that stack.

The Algorithm

This is the general algorithm for accomplishing this task, followed by the code required to actually produce the dotplot.

  1. Specify the values on the number line where we want to put the stacks of dots. This can be done by calling get_stack_keys , which will create a specified number (passed to the num_stacks parameter) of evenly-spaced values between the minimum and maximum values observed, or by passing a list of stack_keys directly (which generally results in a cleaner plot, because the values are not computed).
  2. Assigning observations to the corresponding key. An observation is assigned to a stack_key if it is greater than or equal to that stack_key, but less than the next stack_key. In other words, binning the values. This is accomplished by creating a dictionary whose keys are the elements in thestack_key list, and whose values are a list of observations that fall into that bin. This is done by calling get_stack_dict.
  3. Computing the coordinates for the points on our scatterplot. If a given stack_key, say, 5, has three observations in its list, then we want to compute a list of points, [(5,1), (5,2), and (5,3)]. We do this for each stack_key, which creates the list of all points we need to put on our scatterplot.
  4. Use these coordinates to plot the points. Note that for plt.scatter, we’ll need to separate the xand y coordinates into separate lists.
  5. Customize the scatterplot by adjusting the window size, font size for the xticks, marker size (size of the dots), and hiding the box that is plotted around an Axes object by default.

All of this is handled by the dotplot function, as can be seen below:

import ap_stat# some arbitrary data
data = [1,3,6,7,7,7,7,3,8,2,4,6,8,2,3,4,5,6,9,1,4,4,8]
# we can see the values are between 1 and 9,
# so we'll specify keys explicitly
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9]
ap_stat.dotplot(data=data, keys=keys)
Output from the code above

The graph itself is in a fixed size of (16,7), and the spacing and font size on the axis are determined programmatically. If you’re interested in seeing the details of how I implemented this (and to see what other options you can use), feel free to check out the the notebook on my github.

Hopefully this code is more straightforward than some other options! Although I don’t use dotplots at all in my day-to-day data science (opting for histograms), they’re great tools for exploring distributions of small data sets in the classroom, so hopefully this code can lighten your prep work a little bit as you get together slides, activities, or assessments.

Python In Plain English

Did you know that we have three publications and a YouTube channel? Find links to everything at plainenglish.io!

--

--