交互式数据可视化 Plotly Tutorial

摘要

Plotly 类库提供了一个可交互的,出版级别的在线图形库。Plotly 绘制的图形是以 HTML 页面的形式提供的,基于 JavaScript 提供交互功能。

下面提供了一些图形的示例,包括折线图、散点图、区域图、柱状图、箱线图、直方图等。

1. 数据准备

数据来源:https://www.kaggle.com/mylesoneill/world-university-rankings

加载数据:

1
2
3
4
import pandas as pd

timesData = pd.read_csv("./input/timesData.csv")
print(timesData.info())

打印结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2603 entries, 0 to 2602
Data columns (total 14 columns):
world_rank 2603 non-null object
university_name 2603 non-null object
country 2603 non-null object
teaching 2603 non-null float64
international 2603 non-null object
research 2603 non-null float64
citations 2603 non-null float64
income 2603 non-null object
total_score 2603 non-null object
num_students 2544 non-null object
student_staff_ratio 2544 non-null float64
international_students 2536 non-null object
female_male_ratio 2370 non-null object
year 2603 non-null int64
dtypes: float64(4), int64(1), object(9)
memory usage: 284.8+ KB
None

查看数据:

1
print(timesData.head(10))

打印结果:

df_num world_rank university_name country teaching international research citations income total_score num_students student_staff_ratio international_students female_male_ratio year
0 1 Harvard University United States of America 99.7 72.4 98.7 98.8 34.5 96.1 20,152 8.9 25% - 2011
1 2 California Institute of Technology United States of America 97.7 54.6 98 99.9 83.7 96 2,243 6.9 27% 33 : 67 2011
2 3 Massachusetts Institute of Technology United States of America 97.8 82.3 91.4 99.9 87.5 95.6 11,074 9 33% 37 : 63 2011
3 4 Stanford University United States of America 98.3 29.5 98.1 99.2 64.3 94.3 15,596 7.8 22% 42:58:00 2011
4 5 Princeton University United States of America 90.9 70.3 95.4 99.9 - 94.2 7,929 8.4 27% 45:55:00 2011
5 6 University of Cambridge United Kingdom 90.5 77.7 94.1 94 57 91.2 18,812 11.8 34% 46:54:00 2011
6 6 University of Oxford United Kingdom 88.2 77.2 93.9 95.1 73.5 91.2 19,919 11.6 34% 46:54:00 2011
7 8 University of California, Berkeley United States of America 84.2 39.6 99.3 97.8 - 91.1 36,186 16.4 15% 50:50:00 2011
8 9 Imperial College London United Kingdom 89.2 90 94.5 88.3 92.9 90.6 15,060 11.7 51% 37 : 63 2011
9 10 Yale University United States of America 92.1 59.2 89.7 91.5 - 89.5 11,751 4.4 20% 50:50:00 2011

2. 折线图 Line Charts

2.1. 过程说明

  • 导入数据
  • 创建 trace
    • x = x 轴数据
    • y = y 轴数据
    • mode = 绘制标记的类型
    • name = 图例名称
    • marker = 标记的样式
      • color = 线条的颜色,使用 RGB 定义
  • text = 坐标的名字
  • data = 一个列表,表示要绘制的数据
  • layout = 一个字典,表示布局信息
    • title = 图标的标题
    • x axis = 表示 x 轴的样式信息
      • title = x 轴的标签
      • ticklen = x 轴坐标轴上竖线的长度
      • zeroline = 布尔值,是否显示0坐标位置的线条,也就是 y 轴
  • fig = 一个包含数据和布局信息的字典
  • plot = 绘制图形,这里采用本地离线方式绘制

2.2. 示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import plotly as plt
import plotly.offline as pltof

df = timesData.iloc[:100,:]

trace1 = plt.graph_objs.Scatter(
x = df.world_rank,
y = df.citations,
mode = "lines",
name = "citations",
marker = dict(color = 'rgba(16, 112, 2, 0.8)'),
text= df.university_name)

trace2 = plt.graph_objs.Scatter(
x = df.world_rank,
y = df.teaching,
mode = "lines+markers",
name = "teaching",
marker = dict(color = 'rgba(80, 26, 80, 0.8)'),
text= df.university_name)
data = [trace1, trace2]
layout = dict(title = 'Citation and Teaching vs World Rank of Top 100 Universities',
xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
pltof.plot(fig)

执行成功之后,会在浏览器中打开一个页面:

GRAPH

当鼠标在图形上移动时,会有实时的反馈数据出来:

GRAPH

3. 散点图 Scatter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import plotly as plt
import plotly.offline as pltof

df2014 = timesData[timesData.year == 2014].iloc[:100,:]
df2015 = timesData[timesData.year == 2015].iloc[:100,:]
df2016 = timesData[timesData.year == 2016].iloc[:100,:]

trace1 =plt.graph_objs.Scatter(
x = df2014.world_rank,
y = df2014.citations,
mode = "markers",
name = "2014",
marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
text= df2014.university_name)

trace2 =plt.graph_objs.Scatter(
x = df2015.world_rank,
y = df2015.citations,
mode = "markers",
name = "2015",
marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
text= df2015.university_name)

trace3 =plt.graph_objs.Scatter(
x = df2016.world_rank,
y = df2016.citations,
mode = "markers",
name = "2016",
marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
text= df2016.university_name)
data = [trace1, trace2, trace3]
layout = dict(title = 'Citation vs world rank of top 100 universities with 2014, 2015 and 2016 years',
xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False),
yaxis= dict(title= 'Citation',ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)

pltof.plot(fig)

效果图:

GRAPH

4. 柱状图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import plotly.graph_objs as go
import plotly.offline as pltof

df2014 = timesData[timesData.year == 2014].iloc[:3,:]

trace1 = go.Bar(
x = df2014.university_name,
y = df2014.citations,
name = "citations",
marker = dict(color = 'rgba(255, 174, 255, 0.5)',
line=dict(color='rgb(0,0,0)',width=1.5)),
text = df2014.country)

trace2 = go.Bar(
x = df2014.university_name,
y = df2014.teaching,
name = "teaching",
marker = dict(color = 'rgba(255, 255, 128, 0.5)',
line=dict(color='rgb(0,0,0)',width=1.5)),
text = df2014.country)
data = [trace1, trace2]
layout = go.Layout(barmode = "group")
fig = go.Figure(data = data, layout = layout)

pltof.plot(fig)

效果图:

GRAPH

5. 堆叠柱状图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import plotly.graph_objs as go
import plotly.offline as pltof

df2014 = timesData[timesData.year == 2014].iloc[:3,:]

x = df2014.university_name

trace1 = {
'x': x,
'y': df2014.citations,
'name': 'citation',
'type': 'bar'
};
trace2 = {
'x': x,
'y': df2014.teaching,
'name': 'teaching',
'type': 'bar'
};
data = [trace1, trace2];
layout = {
'xaxis': {'title': 'Top 3 universities'},
'barmode': 'relative',
'title': 'citations and teaching of top 3 universities in 2014'
};
fig = go.Figure(data = data, layout = layout)

pltof.plot(fig)

效果图:

GRAPH

6. 带折线图的柱状图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import numpy as np
import plotly.graph_objs as go
from plotly import tools
import matplotlib.pyplot as plt
import plotly.offline as pltof

df2016 = timesData[timesData.year == 2016].iloc[:7,:]

y_saving = [each for each in df2016.research]
y_net_worth = [float(each) for each in df2016.income]
x_saving = [each for each in df2016.university_name]
x_net_worth = [each for each in df2016.university_name]
trace0 = go.Bar(
x=y_saving,
y=x_saving,
marker=dict(color='rgba(171, 50, 96, 0.6)',line=dict(color='rgba(171, 50, 96, 1.0)',width=1)),
name='research',
orientation='h',
)
trace1 = go.Scatter(
x=y_net_worth,
y=x_net_worth,
mode='lines+markers',
line=dict(color='rgb(63, 72, 204)'),
name='income',
)
layout = dict(
title='Citations and income',
yaxis=dict(showticklabels=True,domain=[0, 0.85]),
yaxis2=dict(showline=True,showticklabels=False,linecolor='rgba(102, 102, 102, 0.8)',linewidth=2,domain=[0, 0.85]),
xaxis=dict(zeroline=False,showline=False,showticklabels=True,showgrid=True,domain=[0, 0.42]),
xaxis2=dict(zeroline=False,showline=False,showticklabels=True,showgrid=True,domain=[0.47, 1],side='top',dtick=25),
legend=dict(x=0.029,y=1.038,font=dict(size=10) ),
margin=dict(l=200, r=20,t=70,b=70),
paper_bgcolor='rgb(248, 248, 255)',
plot_bgcolor='rgb(248, 248, 255)',
)
annotations = []
y_s = np.round(y_saving, decimals=2)
y_nw = np.rint(y_net_worth)

for ydn, yd, xd in zip(y_nw, y_s, x_saving):
annotations.append(dict(xref='x2', yref='y2', y=xd, x=ydn - 4,text='{:,}'.format(ydn),font=dict(family='Arial', size=12,color='rgb(63, 72, 204)'),showarrow=False))
annotations.append(dict(xref='x1', yref='y1', y=xd, x=yd + 3,text=str(yd),font=dict(family='Arial', size=12,color='rgb(171, 50, 96)'),showarrow=False))

layout['annotations'] = annotations

fig = tools.make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True,
shared_yaxes=False, vertical_spacing=0.001)

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)

fig['layout'].update(layout)

pltof.plot(fig)

效果图:

GRAPH

7. 饼状图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import numpy as np
import plotly.graph_objs as go
from plotly import tools
import matplotlib.pyplot as plt
import plotly.offline as pltof

df2016 = timesData[timesData.year == 2016].iloc[:7,:]
pie1 = df2016.num_students
pie1_list = [float(each.replace(',', '.')) for each in df2016.num_students]
labels = df2016.university_name

fig = {
"data": [
{
"values": pie1_list,
"labels": labels,
"domain": {"x": [0, .5]},
"name": "Number Of Students Rates",
"hoverinfo":"label+percent+name",
"hole": .3,
"type": "pie"
},],
"layout": {
"title":"Universities Number of Students rates",
"annotations": [
{ "font": { "size": 20},
"showarrow": False,
"text": "Number of Students",
"x": 0.20,
"y": 1
},
]
}
}

pltof.plot(fig)

效果:

GRAPH

8. 气泡图 Bubble Charts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import plotly.offline as pltof

df2016 = timesData[timesData.year == 2016].iloc[:20,:]
num_students_size = [float(each.replace(',', '.')) for each in df2016.num_students]
international_color = [float(each) for each in df2016.international]
data = [{
'y': df2016.teaching,
'x': df2016.world_rank,
'mode': 'markers',
'marker': {
'color': international_color,
'size': num_students_size,
'showscale': True
},
"text" : df2016.university_name
}]

pltof.plot(data)

效果图:

GRAPH

9. 直方图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import plotly.graph_objs as go
import plotly.offline as pltof

x2011 = timesData.student_staff_ratio[timesData.year == 2011]
x2012 = timesData.student_staff_ratio[timesData.year == 2012]

trace1 = go.Histogram(
x=x2011,
opacity=0.75,
name = "2011",
marker=dict(color='rgba(171, 50, 96, 0.6)'))
trace2 = go.Histogram(
x=x2012,
opacity=0.75,
name = "2012",
marker=dict(color='rgba(12, 50, 196, 0.6)'))

data = [trace1, trace2]
layout = go.Layout(barmode='overlay',
title=' students-staff ratio in 2011 and 2012',
xaxis=dict(title='students-staff ratio'),
yaxis=dict( title='Count'),
)
fig = go.Figure(data=data, layout=layout)
pltof.plot(data)

效果图:

GRAPH

10. 词云 Word Cloud

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import pandas as pd
from wordcloud import WordCloud

timesData = pd.read_csv("./input/timesData.csv")
timesData.info()

res = timesData.head(10)

res.to_csv("./temp.csv")


import matplotlib.pyplot as plt
x2011 = timesData.country[timesData.year == 2011]
plt.subplots(figsize=(8,8))
wordcloud = WordCloud(
background_color='white',
width=512,
height=384
).generate(" ".join(x2011))
plt.imshow(wordcloud)
plt.axis('off')

plt.show()

效果图:

GRAPH

11. 箱线图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pandas as pd
import plotly.graph_objs as go
import plotly.offline as pltof

timesData = pd.read_csv("./input/timesData.csv")
x2015 = timesData[timesData.year == 2015]

trace0 = go.Box(
y=x2015.total_score,
name = 'total score of universities in 2015',
marker = dict(
color = 'rgb(12, 12, 140)',
)
)
trace1 = go.Box(
y=x2015.research,
name = 'research of universities in 2015',
marker = dict(
color = 'rgb(12, 128, 128)',
)
)
data = [trace0, trace1]

pltof.plot(data)

效果图:

GRAPH

12. 散点矩阵图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import plotly.offline as pltof

timesData = pd.read_csv("./input/timesData.csv")

import plotly.figure_factory as ff

dataframe = timesData[timesData.year == 2015]
data2015 = dataframe.loc[:,["research","international", "total_score"]]
data2015["index"] = np.arange(1,len(data2015)+1)

fig = ff.create_scatterplotmatrix(data2015, diag='box', index='index',colormap='Portland',
colormap_type='cat',
height=1100, width=1100)

pltof.plot(fig)

效果图:

GRAPH

13. 3D 散点图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import plotly.offline as pltof

timesData = pd.read_csv("./input/timesData.csv")
dataframe = timesData[timesData.year == 2015]

trace1 = go.Scatter3d(
x=dataframe.world_rank,
y=dataframe.research,
z=dataframe.citations,
mode='markers',
marker=dict(
size=10,
color='rgb(255,0,0)'
)
)

data = [trace1]
layout = go.Layout(
margin=dict(
l=0,
r=0,
b=0,
t=0
)

)
fig = go.Figure(data=data, layout=layout)
pltof.plot(fig)

效果图:

GRAPH

查看具体点的信息:

GRAPH