Plotly graphics#
One of the major players in interactive graphs is Plotly.
Some alternatives are Bokeh and Altair.
Interfacing it comes in two main flavours:
graph_objects: low-level graphics handling
plotly.express: high-level graphics handling
In addition plotly is integrated in the dash environment with its dialect.
Figures are dictionaries, which we will leverage.
# The following renders plotly graphs in Jupyter Notebook, Jupyter Lab and VS Code formats
import plotly.io as pio
pio.renderers.default = "notebook+plotly_mimetype"
Plotting with AI assistance#
Many plot commands can be obtained by describing plots to AIs.
AIs can also translate from one plotting framework to another.
Sketching a set of plot and adding sufficient descriptions, may result in usable code.
Basic plotting#
# Gapminder dataset of health and wealth stats for different countries
import plotly.express as px
df = px.data.gapminder()
df.head()
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
Line plot#
# Create a line plot of life expectancy over time for Norway.
# Let the figure be 400 pixels high and 700 pixels wide.
# Set the title to 'Life Expectancy in Norway'.
# Set the x-axis label to 'Year'.
# Set the y-axis label to 'Life Expectancy (years)'.
fig = px.line(df[df['country'] == 'Norway'], x='year', y='lifeExp', title='Life Expectancy in Norway', width=700, height=400)
fig.update_xaxes(title='Year')
fig.update_yaxes(title='Life Expectancy (years)')
fig
# Create a plot with one line for Norway and one line for Sweden in the same style as the plot above.
# Let the legend title be 'Country'.
fig = px.line(df[df['country'].isin(['Norway', 'Sweden'])], x='year', y='lifeExp', color='country', width=700, height=400)
fig.update_xaxes(title='Year')
fig.update_yaxes(title='Life Expectancy (years)')
fig.update_layout(legend_title_text='Country')
fig
# The dictionary defining the figure
print(fig)
Figure({
'data': [{'hovertemplate': 'country=Norway<br>year=%{x}<br>lifeExp=%{y}<extra></extra>',
'legendgroup': 'Norway',
'line': {'color': '#636efa', 'dash': 'solid'},
'marker': {'symbol': 'circle'},
'mode': 'lines',
'name': 'Norway',
'orientation': 'v',
'showlegend': True,
'type': 'scatter',
'x': array([1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007]),
'xaxis': 'x',
'y': array([72.67 , 73.44 , 73.47 , 74.08 , 74.34 , 75.37 , 75.97 , 75.89 , 77.32 ,
78.32 , 79.05 , 80.196]),
'yaxis': 'y'},
{'hovertemplate': 'country=Sweden<br>year=%{x}<br>lifeExp=%{y}<extra></extra>',
'legendgroup': 'Sweden',
'line': {'color': '#EF553B', 'dash': 'solid'},
'marker': {'symbol': 'circle'},
'mode': 'lines',
'name': 'Sweden',
'orientation': 'v',
'showlegend': True,
'type': 'scatter',
'x': array([1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007]),
'xaxis': 'x',
'y': array([71.86 , 72.49 , 73.37 , 74.16 , 74.72 , 75.44 , 76.42 , 77.19 , 78.16 ,
79.39 , 80.04 , 80.884]),
'yaxis': 'y'}],
'layout': {'height': 400,
'legend': {'title': {'text': 'Country'}, 'tracegroupgap': 0},
'margin': {'t': 60},
'template': '...',
'width': 700,
'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Year'}},
'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'Life Expectancy (years)'}}}
})
print(fig['data'][0]['line']['color'])
#636efa
Directly editing the dictionary#
fig['data'][0]['line']['color'] = "#000000"
fig
Shaded areas#
The fill parameter can be used to fill to the next line, to zero or to itself if the series reverses.
The latter is most convenient for single colour background shading.
# Plot the mean life expectancy in Europe over time. Shade the area between minimum and maximum life expectancy in Europe over time.
# Overlay Norway's life expectancy over the plot.
# https://plotly.com/python/continuous-error-bars/
dfE = df[df['continent'] == 'Europe'][['year', 'lifeExp']].groupby('year')
dfEmean = dfE.mean().reset_index()
dfEmean['Legend'] = 'Average' # Hack to include line in legend, see color below.
fig = px.line(dfEmean, x='year', y='lifeExp', title='Life Expectancy in Europe', color='Legend', width=700, height=400)
fig.update_xaxes(title='Year')
fig.update_yaxes(title='Life Expectancy (years)')
fig.update_layout(legend_title_text='Country')
# Fill between dfE.min().reset_index() and dfE.max().reset_index()
fig.add_scatter(x=dfE.min().reset_index()['year'], y=dfE.min().reset_index()['lifeExp'], name='Min', fill='tonexty')
fig.add_scatter(x=dfE.max().reset_index()['year'], y=dfE.max().reset_index()['lifeExp'], name='Max', fill='tonexty')
fig.add_scatter(x=df[df['country'] == 'Norway']['year'], y=df[df['country'] == 'Norway']['lifeExp'], name='Norway')
fig
Note
Look at the way .reset_index() is used to promote years back to a variable again.
Bar plot#
# Make a barplot of the life expectancy in Norway over time.
fig = px.bar(df[df['country'] == 'Norway'], x='year', y='lifeExp', title='Life Expectancy in Norway', width=700, height=400)
fig.update_xaxes(title='Year')
fig.update_yaxes(title='Life Expectancy (years)')
fig
# Make a barplot with both Norway and Sweden in the same plot. Let the countries be side by side for each year.
fig = px.bar(df[df['country'].isin(['Norway', 'Sweden'])], x='year', y='lifeExp', color='country', barmode='group', width=700, height=400)
fig.update_xaxes(title='Year')
fig.update_yaxes(title='Life Expectancy (years)')
fig.update_layout(legend_title_text='Country')
fig
Note
Remove “barmode” for stacking.
# Create a barplot with maximum life expectancy in Europe for each year.
# Overlay the life expectancy in Bulgaria over the plot with narrower bars using barmode='overlay'.
dfE = df[df['continent'] == 'Europe'][['year', 'lifeExp']].groupby('year')
dfEmax = dfE.max().reset_index()
dfEmax['Bulgaria'] = df[df['country'] == 'Bulgaria']['lifeExp'].reset_index()['lifeExp']
dfEmax.columns = ['year', 'Europe max', 'Bulgaria']
fig = px.bar(dfEmax, x='year', y=['Europe max', 'Bulgaria'], title='Life Expectancy in Europe', barmode='overlay', width=700, height=400)
fig.update_xaxes(title='Year')
fig.update_yaxes(title='Life Expectancy (years)')
fig.update_layout(legend_title_text='Country')
fig
# Inspect the figure
print(fig)
Figure({
'data': [{'alignmentgroup': 'True',
'hovertemplate': 'variable=Europe max<br>year=%{x}<br>value=%{y}<extra></extra>',
'legendgroup': 'Europe max',
'marker': {'color': '#636efa', 'opacity': 0.5, 'pattern': {'shape': ''}},
'name': 'Europe max',
'offsetgroup': 'Europe max',
'orientation': 'v',
'showlegend': True,
'textposition': 'auto',
'type': 'bar',
'x': array([1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007]),
'xaxis': 'x',
'y': array([72.67 , 73.47 , 73.68 , 74.16 , 74.72 , 76.11 , 76.99 , 77.41 , 78.77 ,
79.39 , 80.62 , 81.757]),
'yaxis': 'y'},
{'alignmentgroup': 'True',
'hovertemplate': 'variable=Bulgaria<br>year=%{x}<br>value=%{y}<extra></extra>',
'legendgroup': 'Bulgaria',
'marker': {'color': '#EF553B', 'opacity': 0.5, 'pattern': {'shape': ''}},
'name': 'Bulgaria',
'offsetgroup': 'Bulgaria',
'orientation': 'v',
'showlegend': True,
'textposition': 'auto',
'type': 'bar',
'x': array([1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007]),
'xaxis': 'x',
'y': array([59.6 , 66.61 , 69.51 , 70.42 , 70.9 , 70.81 , 71.08 , 71.34 , 71.19 ,
70.32 , 72.14 , 73.005]),
'yaxis': 'y'}],
'layout': {'barmode': 'overlay',
'height': 400,
'legend': {'title': {'text': 'Country'}, 'tracegroupgap': 0},
'template': '...',
'title': {'text': 'Life Expectancy in Europe'},
'width': 700,
'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Year'}},
'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'Life Expectancy (years)'}}}
})
# Adust the width of the Bulgaria bars to 2.
fig['data'][1]['width'] = 1.5
fig
Polar barplots#
The x-axis in barplots do not have to be straight.
angles = (dfEmax['year']-1952)/55*360*11/12
width = [360/12-5]*12
r = dfEmax['Europe max']
import plotly.graph_objects as go
fig = go.Figure(go.Barpolar(
r=r,
theta=angles,
width=width,
marker_color=dfEmax['Europe max'],
marker_line_color="black",
marker_line_width=2,
opacity=0.8
))
fig.update_layout(
template=None,
polar = dict(
radialaxis = dict(range=[0, 100], showticklabels=False, ticks=''),
angularaxis = dict(showticklabels=False, ticks='')
)
)
fig
# Change me to plotly express, please!
Scatter plot#
# Create a Plotly express scatter plot of the iris data
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='species')
fig.update_xaxes(title='Sepal width')
fig.update_yaxes(title='Sepal length')
fig
# Inspect the scatter plot.
# Note three legendgroups and the markers. Many more options are available.
print(fig)
Figure({
'data': [{'hovertemplate': 'species=setosa<br>sepal_width=%{x}<br>sepal_length=%{y}<extra></extra>',
'legendgroup': 'setosa',
'marker': {'color': '#636efa', 'symbol': 'circle'},
'mode': 'markers',
'name': 'setosa',
'orientation': 'v',
'showlegend': True,
'type': 'scatter',
'x': array([3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3. , 3. ,
4. , 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3. , 3.4, 3.5,
3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.1, 3. , 3.4, 3.5, 2.3,
3.2, 3.5, 3.8, 3. , 3.8, 3.2, 3.7, 3.3]),
'xaxis': 'x',
'y': array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8, 4.3,
5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. , 5. , 5.2,
5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4, 5.1, 5. , 4.5,
4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. ]),
'yaxis': 'y'},
{'hovertemplate': 'species=versicolor<br>sepal_width=%{x}<br>sepal_length=%{y}<extra></extra>',
'legendgroup': 'versicolor',
'marker': {'color': '#EF553B', 'symbol': 'circle'},
'mode': 'markers',
'name': 'versicolor',
'orientation': 'v',
'showlegend': True,
'type': 'scatter',
'x': array([3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2. , 3. , 2.2, 2.9,
2.9, 3.1, 3. , 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3. , 2.8, 3. ,
2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3. , 3.4, 3.1, 2.3, 3. , 2.5, 2.6, 3. ,
2.6, 2.3, 2.7, 3. , 2.9, 2.9, 2.5, 2.8]),
'xaxis': 'x',
'y': array([7. , 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. , 6.1,
5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7,
6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6, 5.5, 5.5, 6.1,
5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7]),
'yaxis': 'y'},
{'hovertemplate': 'species=virginica<br>sepal_width=%{x}<br>sepal_length=%{y}<extra></extra>',
'legendgroup': 'virginica',
'marker': {'color': '#00cc96', 'symbol': 'circle'},
'mode': 'markers',
'name': 'virginica',
'orientation': 'v',
'showlegend': True,
'type': 'scatter',
'x': array([3.3, 2.7, 3. , 2.9, 3. , 3. , 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3. , 2.5,
2.8, 3.2, 3. , 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3. ,
2.8, 3. , 2.8, 3.8, 2.8, 2.8, 2.6, 3. , 3.4, 3.1, 3. , 3.1, 3.1, 3.1,
2.7, 3.2, 3.3, 3. , 2.5, 3. , 3.4, 3. ]),
'xaxis': 'x',
'y': array([6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7,
5.8, 6.4, 6.5, 7.7, 7.7, 6. , 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1,
6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6. , 6.9, 6.7, 6.9,
5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]),
'yaxis': 'y'}],
'layout': {'legend': {'title': {'text': 'species'}, 'tracegroupgap': 0},
'margin': {'t': 60},
'template': '...',
'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Sepal width'}},
'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'Sepal length'}}}
})
# Manipulate symbols
fig = px.scatter(df, x="sepal_width", y="sepal_length",
color='species', size="petal_width")
fig.update_xaxes(title='Sepal width')
fig.update_yaxes(title='Sepal length')
fig
Boxplots and violin plots#
# Make a boxplot of the life expectancy per country in Europe
df = px.data.gapminder()
dfE = df[df['continent'] == 'Europe']
fig = px.box(dfE, x='country', y='lifeExp', title='Life Expectancy in Europe', width=800, height=500)
fig.update_xaxes(title='Country')
fig.update_yaxes(title='Life Expectancy (years)')
fig
# Make a violinplot of the life expectancy per country in Europe
# with the same style as the boxplot above.
fig = px.violin(dfE, x='country', y='lifeExp', title='Life Expectancy in Europe', width=800, height=400)
fig.update_xaxes(title='Country')
fig.update_yaxes(title='Life Expectancy (years)')
fig
Marginal plots#
Scatter plots support simple marginal plots, e.g., histograms and similar.
# Add a marginal violin plot to the scatter plot.
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length",
color='species', size="petal_width", marginal_y='box')
fig.update_xaxes(title='Sepal width')
fig.update_yaxes(title='Sepal length')
fig
Exercise#
Test other marginal plot types and locations.
Heatmap#
# Make a correlation heatmap of the iris data
df = px.data.iris()
fig = px.imshow(df.corr(numeric_only=True))
fig
Tables#
One can plot tables with styling.
# Make a Plotly express table view for the iris data
# https://plotly.com/python/table
import plotly.graph_objects as go
df = px.data.iris()
fig = go.Figure(data=[go.Table(
header=dict(values=list(df.columns),
fill_color='paleturquoise',
align='left'),
cells=dict(values=[df.sepal_length, df.sepal_width, df.petal_length, df.petal_width, df.species, df.species_id],
fill_color='lavender',
align='left'))
])
fig
Layouts#
For Plotly express there is no direct layout option, except for facets (see below).
Instead one need to go to the low-level graph objects.
# Make a two by two plotly express plot with two scatter plots and two pie charts, all four with random data
# https://plotly.com/python/subplots/
import plotly.graph_objects as go
import numpy as np
from plotly.subplots import make_subplots
np.random.seed(1)
# Initialize figure with subplots with type of plot in each cell
fig = make_subplots(rows=2, cols=2,
specs=[[{"type": "xy"}, {"type": "xy"}],
[{"type": "domain"}, {"type": "domain"}]])
fig.add_trace(go.Scatter(x=np.random.rand(100), y=np.random.rand(100), mode='markers'), row=1, col=1)
fig.add_trace(go.Scatter(x=np.random.rand(100), y=np.random.rand(100), mode='markers'), row=1, col=2)
fig.add_trace(go.Pie(values=np.random.rand(3)), row=2, col=1)
fig.add_trace(go.Pie(values=np.random.rand(3)), row=2, col=2)
fig.update_layout(height=600, width=800, title_text="Two by two subplots")
fig
Note
The plot type must be specified for the supblots, e.g., “xy”, “domain”.
Facet plots#
Facet plots are sets of plots having the same properties execpt for one categorical difference.
Examples can be scatter plots, line plots, histograms, etc. with one distinguishing feature.
Parameters for layout specifications are available.
# Tip dataset from Plotly
df = px.data.tips()
df.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
# Scatter plot with color and facet
# https://plotly.com/python/facet-plots/
fig = px.scatter(df, x="total_bill", y="tip", color='sex', facet_col="day")
fig.update_xaxes(matches=None)
fig
Sunburst plot#
Hierarchical data, e.g., pivoted data, can be displayed as sunbursts.
These are pie charts with concentric circles marking hierarchical relationships.
Interactivity is kind of cool here.
Note
As for ordinary pie charts, it is very hard to judge the relative sizes of sectors in sunburst plots.
# Sunburst plot
df = px.data.tips()
fig = px.sunburst(df, path=['day', 'time', 'sex'], values='total_bill')
fig
# Read the athlete_events.csv file
import pandas as pd
athletes = pd.read_csv('../../data/athlete_events.csv')
winter = athletes.loc[athletes['Season'] == 'Winter',:]
winter2000 = winter.loc[winter['Year'] >= 2000,:]
# Pivoting step on the summer2000 data
w2sy = winter2000.pivot_table(index='Sport', columns='Year', values='Height', aggfunc='count')
# Remove rows that only contain NaN values
w2sy = w2sy.dropna(how='all')
w2syu = w2sy.unstack().reset_index()
w2syu.columns = ['Year', 'Sport', 'Athletes']
w2syu.head()
Year | Sport | Athletes | |
---|---|---|---|
0 | 2002 | Alpine Skiing | 551 |
1 | 2002 | Biathlon | 564 |
2 | 2002 | Bobsleigh | 238 |
3 | 2002 | Cross Country Skiing | 766 |
4 | 2002 | Curling | 96 |
fig = px.sunburst(w2syu, path=['Year', 'Sport'], values='Athletes')
# Add header: "Athletes per sport in winter olympics"
fig.update_layout(title_text='Athletes per sport in winter olympics')
fig
Parallel coordinates#
Multiple features in a parallel coordinate system.
Each sample is a line marking values in each feature.
Colours from classes or continuous feature.
Interactivity includes marking part of coordinate axis and rearranging coordinate axes.
# Use Plotly parallell coordinates to visualize the Iris data
# https://plot.ly/python/parallel-coordinates-plot/
df = px.data.iris()
fig = px.parallel_coordinates(df, color="species_id", labels={"species_id": "Species",
"sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
"petal_width": "Petal Width", "petal_length": "Petal Length", },
color_continuous_scale=px.colors.diverging.Tealrose, color_continuous_midpoint=2)
fig
Exercise#
Adjust the above code to include a slider for opacity.
Show code cell content
# Dummy cell to ensure Plotly graphics are shown
import plotly.graph_objects as go
f = go.FigureWidget([go.Scatter(x=[1,1], y=[1,1], mode='markers')])