# スタンフォード/CS131/宿題6-3 主成分分析(固有分解・特異値分解)

スポンサーリンク

## Part 3: PCA¶

Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock market predictions, the analysis of gene expression data, and many more. In this tutorial, we will see that PCA is not just a “black box”, and we are going to unravel its internals in 3 basic steps.

### Introduction¶

The sheer size of data in the modern age is not only a challenge for computer hardware but also a main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data; PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. In a nutshell, this is what PCA is all about: Finding the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.

### A Summary of the PCA Approach¶

• Standardize the data.
データを標準化する。
• Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Vector Decomposition.
共分散行列、または、相関行列から固有ベクトルと固有値を得るか、特異ベクトル分解を実行する。
• Sort eigenvalues in descending order and choose the $k$ eigenvectors that correspond to the $k$ largest eigenvalues where $k$ is the number of dimensions of the new feature subspace ($k \leq d$).
降順に固有値をソートし、$k$が新しい特徴部分空間($k \leq d$)の次元数であるところの$k$最大固有値に対応する$k$固有値を選ぶ。
• Construct the projection matrix $\mathbf{W}$ from the selected $k$ eigenvectors.
選択された$k$固有ベクトルから射影行列$\mathbf{W}$を構築する。
• Transform the original dataset $\mathbf{X}$ via $\mathbf{W}$ to obtain a $k$-dimensional feature subspace Y.
$\mathbf{W}$によって元データセット$\mathbf{X}$を変換して$k$次元特徴部分空間Yを取得する。
import numpy as np
from skimage import io
import scipy
import scipy.linalg

class PCA(object):
"""Class implementing Principal component analysis (PCA).
Steps to perform PCA on a matrix of features X:
1. Fit the training data using method fit (either with eigen decomposition of SVD)
2. Project X into a lower dimensional space using method transform
3. Optionally reconstruct the original X (as best as possible) using method reconstruct
"""
def __init__(self):
self.W_pca = None
self.mean = None

def fit(self, X, method='svd'):
"""Fit the training data X using the chosen method.
Will store the projection matrix in self.W_pca and the mean of the data in self.mean
Args:
X: numpy array of shape (N, D). Each of the N rows represent a data point.
Each data point contains D features.
method: Method to solve PCA. Must be one of 'svd' or 'eigen'.
"""
_, D = X.shape
self.mean = None   # empirical mean, has shape (D,)
X_centered = None  # zero-centered data
# 1. Compute the mean and store it in self.mean
# 2. Apply either method to X_centered
self.mean = np.mean(X, axis=0)
X_centered = X - self.mean
if method == 'svd':
self.W_pca = self._svd(X_centered)[0]
else:
self.W_pca = self._eigen_decomp(X_centered)[0]
# Make sure that X_centered has mean zero
assert np.allclose(X_centered.mean(), 0.0)
# Make sure that self.mean is set and has the right shape
assert self.mean is not None and self.mean.shape == (D,)
# Make sure that self.W_pca is set and has the right shape
assert self.W_pca is not None and self.W_pca.shape == (D, D)
# Each column of self.W_pca should have norm 1 (each one is an eigenvector)
for i in range(D):
assert np.allclose(np.linalg.norm(self.W_pca[:, i]), 1.0)

def _eigen_decomp(self, X):
"""Performs eigendecompostion of feature covariance matrix.
Args:
X: Zero-centered data array, each ROW containing a data point.
Numpy array of shape (N, D).
Returns:
e_vecs: Eigenvectors of covariance matrix of X. Eigenvectors are
sorted in descending order of corresponding eigenvalues. Each
column contains an eigenvector. Numpy array of shape (D, D).
e_vals: Eigenvalues of covariance matrix of X. Eigenvalues are
sorted in descending order. Numpy array of shape (D,).
"""
N, D = X.shape
e_vecs = None
e_vals = None
# Steps:
#     1. compute the covariance matrix of X, of shape (D, D)
#     2. compute the eigenvalues and eigenvectors of the covariance matrix
#     3. Sort both of them in decreasing order (ex: 1.0 > 0.5 > 0.0 > -0.2 > -1.2)
covariance = np.dot(X.T, X) / (N - 1)
e_vals, e_vecs = np.linalg.eig(covariance)
idx = np.argsort(-e_vals)
e_vals = e_vals[idx]
e_vecs = e_vecs[:, idx]
# Check the output shapes
assert e_vals.shape == (D,)
assert e_vecs.shape == (D, D)
return e_vecs, e_vals

def _svd(self, X):
"""Performs Singular Value Decomposition (SVD) of X.
Args:
X: Zero-centered data array, each ROW containing a data point.
Numpy array of shape (N, D).
Returns:
vecs: right singular vectors. Numpy array of shape (D, D)
vals: singular values. Numpy array of shape (K,) where K = min(N, D)
"""
vecs = None  # shape (D, D)
N, D = X.shape
vals = None  # shape (K,)
# Here, compute the SVD of X
# Make sure to return vecs as the matrix of vectors where each column is a singular vector
_, vals, vecs = np.linalg.svd(X)
assert vecs.shape == (D, D)
K = min(N, D)
assert vals.shape == (K,)
return vecs.T, vals

pca = PCA()


### Eigendecomposition¶

The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the “core” of a PCA: The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes.

Implement _eigen_decomp in pca.py.
pca.pyに_eigen_decompを実装せよ。

X_train, y_train, classes_train = load_dataset('faces', train=True, as_gray=True)
X_test, y_test, classes_test = load_dataset('faces', train=False, as_gray=True)
assert classes_train == classes_test
classes = classes_train

# Flatten the image data into rows
# we now have one 4096 dimensional featue vector for each example
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))

# Perform eigenvalue decomposition on the covariance matrix of training data.
e_vecs, e_vals = pca._eigen_decomp(X_train - X_train.mean(axis=0))

print(e_vals.shape)
print(e_vecs.shape)

(4096,)
(4096, 4096)


### Singular Value Decomposition¶

Doing an eigendecomposition of the covariance matrix is very expensive, especially when the number of features (D = 4096 here) gets very high.

To obtain the same eigenvalues and eigenvectors in a more efficient way, we can use Singular Value Decomposition (SVD). If we perform SVD on matrix $X$, we obtain $U$, $S$ and $V$ such that:
もっと高効率に同じ特徴ベクトルと特徴値を得るために、特異値分解を使用できる。仮に行列$X$に対してSVDを実行すると、以下のような$U$、$S$、$V$を得る。
$$X = U S V^T$$

• the columns of $U$ are the eigenvectors of $X X^T$
$U$の列は$X X^T$の固有ベクトル
• the columns of $V^T$ are the eigenvectors of $X^T X$
$V^T$の列は$X^T X$の固有ベクトル
• the values of $S$ are the square roots of the eigenvalues of $X^T X$ (or $X X^T$)
$S$の値は$X^T X$(か$X X^T$)の固有値の平方根

Therefore, we can find out the top k eigenvectors of the covariance matrix $X^T X$ using SVD.

Implement _svd in pca.py.
pca.pyに_svdを実装する。

# Perform SVD on directly on the training data.
u, s = pca._svd(X_train - X_train.mean(axis=0))

print(s.shape)
print(u.shape)

(800,)
(4096, 4096)

# Test whether the square of singular values and eigenvalues are the same.
# We also observe that e_vecs and u are the same (only the sign of each column can differ).
N = X_train.shape[0]
assert np.allclose((s ** 2) / (N - 1), e_vals[:len(s)])

for i in range(len(s) - 1):
assert np.allclose(e_vecs[:, i], u[:, i]) or np.allclose(e_vecs[:, i], -u[:, i])
# (the last eigenvector for i = len(s) - 1 is very noisy because the eigenvalue is almost 0,
#  so imprecisions in the computation build up)


スポンサーリンク
スポンサーリンク

フォローする