2019年6月7日金曜日

[Python] 主成分分析

主成分分析に関するメモです.

主成分分析を行うには scikit-learn パッケージを使用して,sklearn.decomposition の PCA でインスタンスを生成します.
以下の例では,Davis データを用いて主成分分析を行っています.

Davisデータ(Davis.csv)はJupyter Notebookの保存されているディレクトリと同じディレクトリに保存されているものとします.
Davisデータの読み込みには pandas パッケージの pd.read_csv を使用します.
データ配列の第1, 2列の各行がデータ点${\bf{x_{i}}} = ( w_{i}, h_{i} )$に対応しています($x_{i}$は$i$番目の人の体重[kg],$h_{i}$は身長[cm]に対応).

パッケージの読み込みを行います.
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import pandas as pd

sklearn の PCA を使います.
>>> from sklearn.decomposition import PCA

pandasを使ってデータ読み込みます.読みこむDavis.csvファイル*は,REPLを実行しているディレクトリにあるものとしているので,必要に応じてパスの書き換えが必要です.
>>> dat = pd.read_csv('Davis.csv').values

身長の単位を[m]に変換し,対数の値を計算します.

>>> logdat = np.log(np.c_[dat[:,1],dat[:,2]/100].astype('float'))
データのプロットを行います.

>>> plt.plot(logdat[:,0], logdat[:,1], '.'); plt.show()
[<matplotlib.lines.Line2D object at 0x11df9dac8>]
読み込んだデータに対して主成分分析を行います.
>>> pca = PCA()>>> pca.fit(logdat)
PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,

  svd_solver='auto', tol=0.0, whiten=False)

>>> pca.components_

array([[ 0.99672116,  0.08091309],

       [ 0.08091309, -0.99672116]])
>>> 
上記のコードの pca.components_ は主成分です.

インデックス 11 のデータは外れ値として除去することにします.
>>> clean_logdat = np.delete(logdat, 11, axis=0)

外れ値(インデックス 11)を除去したデータに主成分分析を行います.
>>> pca = PCA() 
>>> pca.fit(clean_logdat) 
PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)
>>> pca.components_
array([[ 0.97754866,  0.21070979],
       [-0.21070979,  0.97754866]])
>>> 

----------
* 上記のコードで読み込む.csvファイル(Davis.csv)の中身
sex,weight,height,repwt,repht
M,77,182,77,180
F,58,161,51,159
F,53,161,54,158
M,68,177,70,175
F,59,157,59,155
M,76,170,76,165
M,76,167,77,165
M,69,186,73,180
M,71,178,71,175
M,65,171,64,170
M,70,175,75,174
F,166,57,56,163
F,51,161,52,158
F,64,168,64,165
F,52,163,57,160
F,65,166,66,165
M,92,187,101,185
F,62,168,62,165
M,76,197,75,200
F,61,175,61,171
M,119,180,124,178
F,61,170,61,170
M,65,175,66,173
M,66,173,70,170
F,54,171,59,168
F,50,166,50,165
F,63,169,61,168
F,58,166,60,160
F,39,157,41,153
M,101,183,100,180
F,71,166,71,165
M,75,178,73,175
M,79,173,76,173
F,52,164,52,161
F,68,169,63,170
M,64,176,65,175
F,56,166,54,165
M,69,174,69,171
M,88,178,86,175
M,65,187,67,188
F,54,164,53,160
M,80,178,80,178
F,63,163,59,159
M,78,183,80,180
M,85,179,82,175
F,54,160,55,158
M,73,180,NA,NA
F,49,161,NA,NA
F,54,174,56,173
F,75,162,75,158
M,82,182,85,183
F,56,165,57,163
M,74,169,73,170
M,102,185,107,185
M,64,177,NA,NA
M,65,176,64,172
F,66,170,65,NA
M,73,183,74,180
M,75,172,70,169
M,57,173,58,170
M,68,165,69,165
M,71,177,71,170
M,71,180,76,175
F,78,173,75,169
M,97,189,98,185
F,60,162,59,160
F,64,165,63,163
F,64,164,62,161
F,52,158,51,155
M,80,178,76,175
F,62,175,61,171
M,66,173,66,175
F,55,165,54,163
F,56,163,57,159
F,50,166,50,161
F,50,171,NA,NA
F,50,160,55,150
F,63,160,64,158
M,69,182,70,180
M,69,183,70,183
F,61,165,60,163
M,55,168,56,170
F,53,169,52,175
F,60,167,55,163
F,56,170,56,170
M,59,182,61,183
M,62,178,66,175
F,53,165,53,165
F,57,163,59,160
F,57,162,56,160
M,70,173,68,170
F,56,161,56,161
M,84,184,86,183
M,69,180,71,180
M,88,189,87,185
F,56,165,57,160
M,103,185,101,182
F,50,169,50,165
F,52,159,52,153
F,55,155,NA,154
F,55,164,55,163
M,63,178,63,175
F,47,163,47,160
F,45,163,45,160
F,62,175,63,173
F,53,164,51,160
F,52,152,51,150
F,57,167,55,164
F,64,166,64,165
F,59,166,55,163
M,84,183,90,183
M,79,179,79,171
F,55,174,57,171
M,67,179,67,179
F,76,167,77,165
F,62,168,62,163
M,83,184,83,181
M,96,184,94,183
M,75,169,76,165
M,65,178,66,178
M,78,178,77,175
M,69,167,73,165
F,68,178,68,175
F,55,165,55,163
M,67,179,NA,NA
F,52,169,56,NA
F,47,153,NA,154
F,45,157,45,153
F,68,171,68,169
F,44,157,44,155
F,62,166,61,163
M,87,185,89,185
F,56,160,53,158
F,50,148,47,148
M,83,177,84,175
F,53,162,53,160
F,64,172,62,168
F,62,167,NA,NA
M,90,188,91,185
M,85,191,83,188
M,66,175,68,175
F,52,163,53,160
F,53,165,55,163
F,54,176,55,176
F,64,171,66,171
F,55,160,55,155
F,55,165,55,165
F,59,157,55,158
F,70,173,67,170
M,88,184,86,183
F,57,168,58,165
F,47,162,47,160
F,47,150,45,152
F,55,162,NA,NA
F,48,163,44,160
M,54,169,58,165
M,69,172,68,174
F,59,170,NA,NA
F,58,169,NA,NA
F,57,167,56,165
F,51,163,50,160
F,54,161,54,160
F,53,162,52,158
F,59,172,58,171
M,56,163,58,161
F,59,159,59,155
F,63,170,62,168
F,66,166,66,165
M,96,191,95,188
F,53,158,50,155
M,76,169,75,165
F,54,163,NA,NA
M,61,170,61,170
M,82,176,NA,NA
M,62,168,64,168
M,71,178,68,178
F,60,174,NA,NA
M,66,170,67,165
M,81,178,82,175
M,68,174,68,173
M,80,176,78,175
F,43,154,NA,NA
M,82,181,NA,NA
F,63,165,59,160
M,70,173,70,173
F,56,162,56,160
F,60,172,55,168
F,58,169,54,166
M,76,183,75,180
F,50,158,49,155
M,88,185,93,188
M,89,173,86,173
F,59,164,59,165
F,51,156,51,158
F,62,164,61,161
M,74,175,71,175
M,83,180,80,180
M,81,175,NA,NA
M,90,181,91,178
M,79,177,81,178

0 件のコメント :

コメントを投稿