Plotly Fundamentals -
Getting Started
In this tutorial we will explore what the charting library plotly has to offer the aspiring data scientist. Compared to many other plotting libraries plotly uses build in rendering functionality of your web browser and provides convenient wrapper to produce truly stunning graphs with very little code. While many IDEs offer plug ins that allow you to work with plotly graphs directly from the IDE I strongly recommend using Jupyter notebook or Jupyter lab to execute below snippets.
As usual, all good things start with importing some packages. In our case we start with every data nerds most trusted companion in Pandas. Plotly provides two entry points to their framework. In a first step we will focus on Plotly express which provides you with pre built wrappers for the most common chart types.
import pandas as pd
import plotly.express as px
Next we need some data that we want to display. Fortunately, pandas already provides a number of simple data sets to feed our first graph. I went with a classic stock market data set that many folks are likely familiar with. It contains daily price information of the largest US tech stocks in normalised form.
stocks = px.data.stocks()
stocks.head()
date | GOOG | AAPL | AMZN | FB | NFLX | MSFT | |
---|---|---|---|---|---|---|---|
0 | 2018-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
1 | 2018-01-08 | 1.018172 | 1.011943 | 1.061881 | 0.959968 | 1.053526 | 1.015988 |
2 | 2018-01-15 | 1.032008 | 1.019771 | 1.053240 | 0.970243 | 1.049860 | 1.020524 |
3 | 2018-01-22 | 1.066783 | 0.980057 | 1.140676 | 1.016858 | 1.307681 | 1.066561 |
4 | 2018-01-29 | 1.008773 | 0.917143 | 1.163374 | 1.018357 | 1.273537 | 1.040708 |
Lets try to pick one column of the above data frame and produce a line chart which offers some additional functions such as zoom, a selector tool, exporting functionality and more. Sounds complicated right? With Plotly express you can achieve this with only a single line of code which produces a figure object that you can display using its show() method.
fig1line = px.line(stocks, x = 'date', y = 'GOOG')
fig1line.show()
In case you want to draw a different chart type you can simply change the first function call from line to various other chart types such as scatter. There is an extensive list of other chart types that all fundamentally work the same and take mostly the same parameters.
fig1dotted = px.scatter(stocks, x = 'date', y = 'GOOG')
fig1dotted.show()
Functions that produce plots are quite forgiving with regards to the input format provided. Instead of referring to the data with a column name string you could also us a list of column names and others.
fig1multiplelines = px.line(stocks, x = 'date', y = stocks.columns[1:])
fig1multiplelines.show()
Interactive features aside, a simple line chart is likely not going to blow anyone away. Let’s see what we can add to make things a bit more spicy. There is a large variety of additional attributes to tweak. One of the most useful an stylish in my opinion is the marginal attribute which you can set for each axis separately. With this you get additional subplots with a histogram of respective dimensions of your data set. The best thing is that all interactive features will also carry over to these subplots which gives you pretty a nifty cross filtering toolbox.
fig1marginals = px.scatter(stocks, x = 'GOOG', y = 'FB', width = 750, height = 750, marginal_x = 'histogram', marginal_y = 'histogram', trendline = 'ols')
fig1marginals.show()
In case you are working with a bigger dataset and you want to kick off a cool data science project with some explorative analysis across various slices of you data you can use the scatter_matrix functions and simply dump a larger slice of your data into the function call. This gives you a N x N matrix of subplots which allows you to look at marginals of your data side by side and pinpoint interesting features for you to exploit in you data wrangling endeavours. And the best part? You guessed it, its all interactive.
fig1scattermatrix = px.scatter_matrix(stocks.iloc[:,3:], width = 750, height = 750)
fig1scattermatrix.show()
While Plotly express is very useful for quick results it is pretty limited in the way you interact with it and is rather cumbersome in case you want to combine various figures in the same plot. In the following section we will have a look at Plotly’s graph objects and how you can use them to manipulate your chart to look truly stunning and unique. Let’s start by importing the graph_objects library.
import plotly.graph_objects as go
To illustrate, how graph_objects work we will first produce a simple bar chart. In contrast to plotly express you will need to provide data that you want to display and the layout you want to you use separately. To illustrate how this works we will directly pass a list of values for our x- and y-axis rather than using a pre defined data set. The values are wrapped in a function go.Bar which can be changed to other chart types to produce scatters, line charts and many more. Along the same lines, we will produce a simple layout that uses the standard setup and assigns a custom chart title using the go.Layout function. Subsequently, you can use the update_layout function to change the layout of the existing chart “fig”.
fig = go.Figure(
data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],
layout=go.Layout(
title=go.layout.Title(text="A boring bar chart")
)
)
fig.update_layout(font_family="Courier New")
fig.show()
Fundamentally, Plotly chart objects are defined in JSON files which contain information about the chart in a similar way that a python dictionary would store date. To see what that means, let’s use the bar chart from above and the to_json function provided by Plotly. Here we will make use of the json python package which provides useful functions when interacting with json files within your python scripts. To cut out some of the junk from the output we will first remove any default layout that Plotly applies with the layout function.
import json
fig.layout.template = None
bar_json = json.loads(fig.to_json())
print(json.dumps(bar_json, indent=4, sort_keys=True))
{
"data": [
{
"type": "bar",
"x": [
1,
2,
3
],
"y": [
1,
3,
2
]
}
],
"layout": {
"font": {
"family": "Courier New"
},
"title": {
"text": "A boring bar chart"
}
}
}
As you can see, all the information about your chart is provided in tree like data structure with attributes and values such as our x and y data, chart title and changed title font. By manipulating the data in the json you could in principle build up any Plotly chart from scratch. In fact, lets try to directly pass the json to the Figure function call and see what happens.
fig = go.Figure(bar_json)
fig.show()
Unsurprisingly the same chart as before is displayed. You can imagine, that building up a chart from scratch for large datasets and complicated layouts is not a convenient way to go about things if you need quick results. Fortunately, Plotly Express and Plotly Graph Objects provide all the functions you need to avoid that hassle.
To close the chapter, let’s look into another example that we have already seen in the Plotly Express section of the tutorial. To do that, we create an empty figure and subsequently add line chart objects to the figure to display the evolution of stock prices over time. Please note, that there is no function called “Line” and you need to use the Scatter function for your line chart implementations.
fig = go.Figure()
for stock in stocks.columns[1:]:
fig.add_trace(go.Scatter(x=stocks['date'], y= stocks[stock], name=stock))
fig.show()
This closes the first chapter of our journey towards mastering the Plotly charting library. In the subsequent chapters we will look into various different chart types and additional features provided by Plotly as well as some more advanced topics such as controls and animations. Thanks for reading :).