pandasとmatplotlibでcsvデータを分析する

分析対象のデータは、日本統計学会公式認定 統計検定2級対応 統計学基礎の練習問題を一部加工。

day,Temperature,humidity,sunlighth,Wind
1,6.6,33.0,7.9,NorthWest
2,7.0,41.0,8.4,NorthNorthWest
3,5.9,48.0,5.2,NorthNorthWest
4,6.3,40.0,8.4,NorthWest
5,7.3,39.0,7.4,SouthWest
6,6.5,34.0,6.7,NorthWest
7,4.0,25.0,9.2,NorthWest
8,5.9,33.0,9.2,NorthNorthWest
9,6.1,46.0,9.1,EastNorthEast
10,3.4,27.0,9.2,NorthNorthWest
11,3.8,31.0,6.7,NorthNorthWest
12,5.1,37.0,8.4,NorthNorthWest
13,4.4,28.0,8.6,NorthNorthWest
14,3.8,36.0,9.1,WestNorthWest
15,4.0,52.0,1.1,EastNorthEast
16,2.2,39.0,8.2,NorthNorthWest
17,5.0,26.0,8.7,NorthNorthWest
18,5.5,36.0,9.4,NorthWest
19,6.3,41.0,9.3,NorthNorthWest
20,5.4,31.0,9.4,NorthNorthWest
21,5.0,28.0,9.3,NorthWest
22,6.0,34.0,9.4,NorthNorthWest
23,5.7,42.0,6.3,SouthEast
24,5.1,65.0,3.5,NorthWest
25,5.9,34.0,8.6,NorthWest
26,5.3,37.0,7.3,NorthWest
27,5.5,28.0,8.0,NorthNorthWest
28,3.7,30.0,9.3,NorthNorthWest
29,4.2,40.0,7.8,NorthNorthWest
30,2.9,34.0,5.1,NorthWest
31,2.9,28.0,9.7,WestNorthWest

風向きWindの度数と相対度数を棒グラフ化

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("/Users/technote/Documents//weather.csv", encoding='Shift_JIS')

n = float(data["day"].count())
frequency = data.groupby('Wind')['day'].count().apply(lambda x: pd.Series([x], index=['frequency']))
relative_frequency = data.groupby('Wind')['day'].count().apply(lambda x: pd.Series([x / n], index=['relative frequency']))

frequency.plot(kind='bar')
relative_frequency.plot(kind='bar')

plt.show()
  • 度数 f:id:stokutake:20151010144957p:plain

  • 相対度数 f:id:stokutake:20151010145000p:plain

平均気温の度数分布表と棒グラフ

  • 階級を1度に設定する。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv("/Users/workspace/PSCS/weather.csv", encoding='Shift_JIS')

n = float(data["day"].count())
frequency = data.groupby('Temperature')['day'].count().apply(lambda x: pd.Series([x], index=['frequency']))

c = pd.cut(frequency.index, np.arange(1, 10, 1))

classfied_frequency = frequency.groupby(c).sum()

print classfied_frequency

classfied_frequency.plot(kind='bar')

plt.show()
  • 度数分布表
       frequency
(1, 2]        NaN
(2, 3]          3
(3, 4]          6
(4, 5]          4
(5, 6]         11
(6, 7]          6
(7, 8]          1
(8, 9]        NaN
  • 棒グラフ

f:id:stokutake:20151010151356p:plain

平均気温の箱ヒゲ図

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("/Users//PSCS/weather.csv", encoding='Shift_JIS')

temp = data[['Temperature']]

temp.boxplot(return_type='axes')

plt.show()

f:id:stokutake:20151010162615p:plain

気温と湿度の相関

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv("/Users/tokutake/Documents/workspace/PSCS/weather.csv", encoding='Shift_JIS')

r = np.corrcoef(data['Temperature'], data['humidity'])[0, 1]

plt.scatter(data['Temperature'], data['humidity'])

plt.title('r=%.2f' % r, size=16)
plt.xlabel('Temperature', size=14)
plt.ylabel('humidity(%)', size=14)

plt.show()

f:id:stokutake:20151010170039p:plain