Lesson 5c: Plotting with Bokeh#

In the previous two lessons you learned how to use Panda’s higher level plotting API for quick and simple visualization purposes and Matplotlib for lower level, detailed plotting capabilities. In this lesson you’re going to learn about Bokeh, which is a Python library for creating interactive visualizations for modern web browsers.

Bokeh helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without writing any JavaScript yourself. This can be extremely useful for both exploratory data analysis and also refined beautiful visualization outputs and dashboards for stakeholders.

Although Bokeh is considered a lower level visualization API, generating plots with Bokeh is still fairly straightforward and intuitive. Bokeh makes it easy to create plots but also allows you a lot of flexibility to make your plots very complex, refined, and interactive.

In this lesson I’ll teach the basics of Bokeh but provide you with resources where you can dig into more advanced Bokeh capabilities.

Note

Work through this lesson to create your first Bokeh plot and then at the end of this lesson is a longer video tutorial that will expose you to many other types of Bokeh plots that you can create.

Prerequisites#

Most of the functionality of Bokeh is accessed through submodules such as bokeh.plotting and bokeh.models. Also, when using Bokeh in a notebook we need to run bokeh.io.output_notebook() to make our plots viewable and interactive.

import pandas as pd

# Our main plotting package (must have explicit import of submodules)
import bokeh.io
import bokeh.models
import bokeh.plotting
import bokeh.transform

# Enable viewing Bokeh plots in the notebook
bokeh.io.output_notebook()
Loading BokehJS ...

We’ll use a cleaned up version of the Ames, IA housing data for illustration purposes:

df = pd.read_csv('../data/ames_clean.csv')
df.head()
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub ... 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub ... 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub ... 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub ... 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub ... 0 NaN NaN NaN 0 12 2008 WD Normal 250000

5 rows × 81 columns

Bokeh’s grammar and our first plot with Bokeh#

Constructing a plot with Bokeh consists of four main steps.

  1. Creating a figure on which to populate glyphs (symbols that represent data, e.g., dots for a scatter plot). Think of this figure as a “canvas” which sets the space on which you will “paint” your glyphs.

  2. Defining a data source that is the reference used to place the glyphs.

  3. Choose the kind of glyph you would like.

  4. Refining the plot by adding titles, formatted axis labels, or even interactive components.

After completing these steps, you need to render the graphic.

Let’s go through these steps to generate an interactive scatter plot of home sales price and total living area. So you have the concrete example in mind, the final graphic will look like this:

Hide code cell source
# Create the figure, stored in variable `p`
p = bokeh.plotting.figure(
    frame_width=700,
    frame_height=350,
    title='Relationship between home sale price and living area \nAmes, Iowa (2006-2010)',
    x_axis_label='Living Area (Square feet)',
    y_axis_label='Sale Price'
)

source = bokeh.models.ColumnDataSource(df)

p.scatter(
    source=source,
    x='GrLivArea',
    y='SalePrice',
    alpha=0.25
)

p.yaxis.formatter = bokeh.models.NumeralTickFormatter(format="$,")
p.xaxis.formatter = bokeh.models.NumeralTickFormatter(format=",")

tooltips = [("Sale Price","@SalePrice"),("SqFt","@GrLivArea")]
hover = bokeh.models.HoverTool(tooltips=tooltips, mode='mouse')
p.add_tools(hover)

bokeh.io.show(p)

1. Our first step is creating a figure, our “canvas.” In creating the figure, we are implicitly thinking about what kind of representation for our data we want. That is, we have to specify axes and their labels. We might also want to specify the title of the figure, whether or not to have grid lines, and all sorts of other customizations. Naturally, we also want to specify the size of the figure.

(Almost) all of this is accomplished in Bokeh by making a call to bokeh.plotting.figure() with the appropriate keyword arguments.

# Create the figure, stored in variable `p`
p = bokeh.plotting.figure(
    frame_width=700,
    frame_height=350,
    title='Relationship between home sale price and living area \nAmes, Iowa (2006-2010)',
    x_axis_label='Living Area (Square feet)',
    y_axis_label='Sale Price'
)

There are many more keyword attributes you can assign, including all of those listed in the Bokeh Plot class and the additional ones listed in the Bokeh Figure class.

2. Now that we have set up our canvas, we can decide on the data source. It is convenient to create a ColumnDataSource, a special Bokeh object that holds data to be displayed in a plot. (We will later see that we can change the data in a ColumnDataSource and the plot will automatically update!) Conveniently, we can instantiate a ColumnDataSource directly from a Pandas data frame.

source = bokeh.models.ColumnDataSource(df)

Note

We could also instantiate a data source using a dictionary of arrays, like

source = bokeh.models.ColumnDataSource(dict(x=[1, 2, 3, 4], y=[1, 4, 9, 16]))

3. Since we are creating a scatter plot we will choose scatter as our glyph. This kind of glyph requires that we specify which column of the data source will serve to place the glyphs along the \(x\)-axis and which will serve to place the glyphs along the \(y\)-axis. We choose the 'GrLivArea' column to specify the \(x\)-coordinate of the glyph and the 'SalePrice' column to specify the \(y\)-coordinate. Since there are a lot of observations clustered together we can control overplotting by adjusting the transparency with alpha.

We accomplish step 3 by calling one of the glyph methods of the Bokeh Figure instance, p. Since we are choosing a scatter plot, the appropriate method is p.scatter(), and we use the source, x, and y kwargs to specify the positions of the glyphs.

p.scatter(
    source=source,
    x='GrLivArea',
    y='SalePrice',
    alpha=0.25
);

4. Lastly, we can refine the plot in various ways. In this example we make the x and y-axis labels comma and dollar formatted respectively. We can also add interactive components to our visuals. Here, I add a hover tool so that sale price and total living area is displayed when my mouse hovers over a point.

Tip

We can specify these features (axis configuration and tooltips) when we instantiate the figure or afterwards by assigning attribute values to an already instantiated figure.

The syntax for a tooltip is a list of 2-tuples, where each tuple represents the tooltip you want. The first entry in the tuple is the label and the second is the column from the data source that has the values. The second entry must be preceded with an @ symbol signifying that it is a field in the data source and not field that is intrinsic to the plot, which is preceded with a $ sign. If there are spaces in the column heading, enclose the column name in braces (i.e. {name with spaces}). (See the documentation for tooltip specification for more information.)

p.yaxis.formatter = bokeh.models.NumeralTickFormatter(format="$,")
p.xaxis.formatter = bokeh.models.NumeralTickFormatter(format=",")

tooltips = [("Sale Price","@SalePrice"),("SqFt","@GrLivArea")]
hover = bokeh.models.HoverTool(tooltips=tooltips, mode='mouse')
p.add_tools(hover)

Now that we have built the plot, we can render it in the notebook using bokeh.io.show().

bokeh.io.show(p)

In looking at the plot, notice a toolbar to right of the plot that enables you to zoom and pan within the plot.

The importance of tidy data frames#

It might be clear for you now that building a plot in this way requires that the data frame you use be tidy. The organization of tidy data is really what enables this and high level plotting functionality. There is a well-specified organization of the data.

Code style in plot specifications#

Specifications of plots often involves calls to functions with lots of keyword arguments to specify the plot, and this can get unwieldy without a clear style. You can develop your own style, maybe reading Trey Hunner’s blog post again. I like to do the following.

  1. Put the function call, like p.scatter( or p = bokeh.plotting.figure( on the first line.

  2. The closed parenthesis for the function call is on its own line, unindented.

  3. Any arguments are given as kwargs (even if they can also be specified as positional arguments) at one level of indentation.

Note that you cannot use method chaining when instantiating figures or populating glyphs.

If you adhere to a style (which is roughly the style imposed by Black), it makes your code cleaner and easier to read.

Coloring with other dimensions#

Let’s say we wanted to make the same plot, but we wanted to color the points based on another feature such as whether the home has central air or not (CentralAir). To do this, we take advantage of two features of Bokeh.

  1. We create a color mapping using factor_cmap() that assigns colors to the discrete levels of a given factor (CentralAir in this example). Here, we simply assign red and blue colors; however, Bokeh has many color palettes to choose from.

  2. We can then use the scatter method to assign the glyph of choice and pass the color_mapper object to fill_color and/or fill_line. I also add the legend field so it shows up in the plot and we can format our legend as necessary (i.e. add title, change font).

# Create the figure, stored in variable `p`
p = bokeh.plotting.figure(
    frame_width=700,
    frame_height=350,
    title='Relationship between home sale price and living area \nAmes, Iowa (2006-2010)',
    x_axis_label='Living Area (Square feet)',
    y_axis_label='Sale Price'
)

source = bokeh.models.ColumnDataSource(df)

# create color mapper
color_mapper = bokeh.transform.factor_cmap(
    'CentralAir',
    palette=['red', 'blue'],
    factors=df['CentralAir'].unique()
    )

p.scatter(
    source=source,
    x='GrLivArea',
    y='SalePrice',
    marker='circle',
    alpha=0.25,
    fill_color=color_mapper,
    line_color=color_mapper,
    legend_field='CentralAir'
)

p.legend.title = "Has central air"

p.yaxis.formatter = bokeh.models.NumeralTickFormatter(format="$,")
p.xaxis.formatter = bokeh.models.NumeralTickFormatter(format=",")

tooltips = [("Sale Price","@SalePrice"),("SqFt","@GrLivArea")]
hover = bokeh.models.HoverTool(tooltips=tooltips, mode='mouse')
p.add_tools(hover)

bokeh.io.show(p)

Saving Bokeh plots#

After you create your plot, you can save it to a variety of formats. Most commonly you would save them as PNG (for presentations), SVG (for publications in the paper of the past), and HTML (for the paper of the future or sharing with colleagues).

To save as a PNG for quick use, you can click the disk icon in the tool bar.

To save to SVG, you first change the output backend to 'svg' and then you can click the disk icon again, and you will get an SVG rendering of the plot. After saving the SVG, you should change the output backend back to 'canvas' because it has much better in-browser performance.

p.output_backend = 'svg'

bokeh.io.show(p)

Now, click the disk icon in the plot above to save it.

After saving, we should switch back to canvas.

p.output_backend = 'canvas'

You can also save the figure programmatically using the bokeh.io.export_svgs() function. This requires additional installations, so we will not do it here, but show the code to do it. Again, this will only work if the output backed is 'svg'.

p.output_backend = 'svg'
bokeh.io.export_svgs(p, filename='ames_sale_price_vs_living_area.svg')
p.output_backend = 'canvas'

Finally, to save as HTML, you can use the bokeh.io.save() function. This saves your plot as a standalone HTML page. Note that the title kwarg is not the title of the plot, but the title of the web page that will appear on your Browser tab.

bokeh.io.save(
    p,
    filename='ames_sale_price_vs_living_area.html',
    title='Bokeh plot'
);
/var/folders/8f/c06lv6q17tjbyjv2nkt0_s4s1sh0tg/T/ipykernel_27999/3821509765.py:1: UserWarning: save() called but no resources were supplied and output_file(...) was never called, defaulting to resources.CDN
  bokeh.io.save(

Note

You can ignore the warning. The resulting HTML page has all of the interactivity of the plot and you can, for example, email it to your collaborators for them to explore.

Video Tutorial#

Video 🎥:

The following video provides an overview of Bokeh and will also expose you to other types of plots you can create (i.e. line charts, histograms, area plots).

Exercises#

Questions:

  1. Spend some time going through the Bokeh documentation and tutorials.

  2. Pick a feature from the Ames Housing data and create a bar chart. Can you make a similar bar chart as we did in the Matplotlib tutorial?

  3. Pick two continuous features from the Ames Housing data and create a scatter plot. Can you make a similar scatter plot as we did in the Matplotlib tutorial but with interactive components?

  4. Now identify a categorical feature that you can color the above scatter plot by (i.e. CentralAir).

  5. Using the hover tooltips, are you able to identify outliers in your plot(s)?

Computing environment#

Hide code cell source
%load_ext watermark
%watermark -v -p pandas,bokeh,jupyterlab
Python implementation: CPython
Python version       : 3.12.4
IPython version      : 8.26.0

pandas    : 2.2.2
bokeh     : 3.4.2
jupyterlab: 4.2.3