pandasとmatplotlibでcsvデータを分析する
分析対象のデータは、日本統計学会公式認定 統計検定2級対応 統計学基礎の練習問題を一部加工。
day,Temperature,humidity,sunlighth,Wind 1,6.6,33.0,7.9,NorthWest 2,7.0,41.0,8.4,NorthNorthWest 3,5.9,48.0,5.2,NorthNorthWest 4,6.3,40.0,8.4,NorthWest 5,7.3,39.0,7.4,SouthWest 6,6.5,34.0,6.7,NorthWest 7,4.0,25.0,9.2,NorthWest 8,5.9,33.0,9.2,NorthNorthWest 9,6.1,46.0,9.1,EastNorthEast 10,3.4,27.0,9.2,NorthNorthWest 11,3.8,31.0,6.7,NorthNorthWest 12,5.1,37.0,8.4,NorthNorthWest 13,4.4,28.0,8.6,NorthNorthWest 14,3.8,36.0,9.1,WestNorthWest 15,4.0,52.0,1.1,EastNorthEast 16,2.2,39.0,8.2,NorthNorthWest 17,5.0,26.0,8.7,NorthNorthWest 18,5.5,36.0,9.4,NorthWest 19,6.3,41.0,9.3,NorthNorthWest 20,5.4,31.0,9.4,NorthNorthWest 21,5.0,28.0,9.3,NorthWest 22,6.0,34.0,9.4,NorthNorthWest 23,5.7,42.0,6.3,SouthEast 24,5.1,65.0,3.5,NorthWest 25,5.9,34.0,8.6,NorthWest 26,5.3,37.0,7.3,NorthWest 27,5.5,28.0,8.0,NorthNorthWest 28,3.7,30.0,9.3,NorthNorthWest 29,4.2,40.0,7.8,NorthNorthWest 30,2.9,34.0,5.1,NorthWest 31,2.9,28.0,9.7,WestNorthWest
風向きWindの度数と相対度数を棒グラフ化
import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv("/Users/technote/Documents//weather.csv", encoding='Shift_JIS') n = float(data["day"].count()) frequency = data.groupby('Wind')['day'].count().apply(lambda x: pd.Series([x], index=['frequency'])) relative_frequency = data.groupby('Wind')['day'].count().apply(lambda x: pd.Series([x / n], index=['relative frequency'])) frequency.plot(kind='bar') relative_frequency.plot(kind='bar') plt.show()
度数
相対度数
平均気温の度数分布表と棒グラフ
- 階級を1度に設定する。
import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv("/Users/workspace/PSCS/weather.csv", encoding='Shift_JIS') n = float(data["day"].count()) frequency = data.groupby('Temperature')['day'].count().apply(lambda x: pd.Series([x], index=['frequency'])) c = pd.cut(frequency.index, np.arange(1, 10, 1)) classfied_frequency = frequency.groupby(c).sum() print classfied_frequency classfied_frequency.plot(kind='bar') plt.show()
- 度数分布表
frequency (1, 2] NaN (2, 3] 3 (3, 4] 6 (4, 5] 4 (5, 6] 11 (6, 7] 6 (7, 8] 1 (8, 9] NaN
- 棒グラフ
平均気温の箱ヒゲ図
import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv("/Users//PSCS/weather.csv", encoding='Shift_JIS') temp = data[['Temperature']] temp.boxplot(return_type='axes') plt.show()
気温と湿度の相関
import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv("/Users/tokutake/Documents/workspace/PSCS/weather.csv", encoding='Shift_JIS') r = np.corrcoef(data['Temperature'], data['humidity'])[0, 1] plt.scatter(data['Temperature'], data['humidity']) plt.title('r=%.2f' % r, size=16) plt.xlabel('Temperature', size=14) plt.ylabel('humidity(%)', size=14) plt.show()