前回のStanford University/CS131/宿題6-2 k-Nearest Neighbor(k近傍法)とCross-Validation(交差検証)の続きをやる。今回の宿題は、PCA(Principal Component Analysis：主成分分析)のEigendecomposition(固有分解)とSingular Value Decomposition(特異値分解)をカバーする。

Part 3: PCA¶

Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock market predictions, the analysis of gene expression data, and many more. In this tutorial, we will see that PCA is not just a “black box”, and we are going to unravel its internals in 3 basic steps.
主成分分析(PCA)は、株式市場予測、遺伝子発現データ分析、その他多くの用途で使用されている単純かつ好評な実用的線形変換テクニックである。このチュートリアルでは、PCAがただのブラックボックスではないことを理解し、その内部構造を3つの基本的ステップに分解していく。

Introduction¶

The sheer size of data in the modern age is not only a challenge for computer hardware but also a main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data; PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. In a nutshell, this is what PCA is all about: Finding the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.
現在の巨大なデータはコンピュータハードウェアにとって難題なだけでなく、多くの機械学習アルゴリズムにとっても主要な障壁となっている。PCAの主目標は、データのパターンを識別することにある。PCAは変数間の相関関係を検出することを目標としている。変数間に強い相関関係が存在する場合、次元削減だけを試みることは理にかなっている。平たく言えば、これがPCAの全てである。つまり、高次元における最大分散の方向を見つけ出し、情報の大部分を保持しながら、それをより低次元の部分空間に投影する。

A Summary of the PCA Approach¶

Standardize the data.
データを標準化する。
Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Vector Decomposition.
共分散行列、または、相関行列から固有ベクトルと固有値を得るか、特異ベクトル分解を実行する。
Sort eigenvalues in descending order and choose the $k$ eigenvectors that correspond to the $k$ largest eigenvalues where $k$ is the number of dimensions of the new feature subspace ($k \leq d$).
降順に固有値をソートし、$k$が新しい特徴部分空間($k \leq d$)の次元数であるところの$k$最大固有値に対応する$k$固有値を選ぶ。
Construct the projection matrix $\mathbf{W}$ from the selected $k$ eigenvectors.
選択された$k$固有ベクトルから射影行列$\mathbf{W}$を構築する。
Transform the original dataset $\mathbf{X}$ via $\mathbf{W}$ to obtain a $k$-dimensional feature subspace Y.
$\mathbf{W}$によって元データセット$\mathbf{X}$を変換して$k$次元特徴部分空間Yを取得する。

import numpy as np
from skimage import io
from utils import load_dataset
import scipy
import scipy.linalg

class PCA(object):
    """Class implementing Principal component analysis (PCA).
    Steps to perform PCA on a matrix of features X:
        1. Fit the training data using method `fit` (either with eigen decomposition of SVD)
        2. Project X into a lower dimensional space using method `transform`
        3. Optionally reconstruct the original X (as best as possible) using method `reconstruct`
    """
    def __init__(self):
        self.W_pca = None
        self.mean = None

    def fit(self, X, method='svd'):
        """Fit the training data X using the chosen method.
        Will store the projection matrix in self.W_pca and the mean of the data in self.mean
        Args:
            X: numpy array of shape (N, D). Each of the N rows represent a data point.
               Each data point contains D features.
            method: Method to solve PCA. Must be one of 'svd' or 'eigen'.
        """
        _, D = X.shape
        self.mean = None   # empirical mean, has shape (D,)
        X_centered = None  # zero-centered data
        # YOUR CODE HERE
        # 1. Compute the mean and store it in self.mean
        # 2. Apply either method to `X_centered`
        self.mean = np.mean(X, axis=0)
        X_centered = X - self.mean
        if method == 'svd':
            self.W_pca = self._svd(X_centered)[0]
        else:
            self.W_pca = self._eigen_decomp(X_centered)[0]
        # END YOUR CODE
        # Make sure that X_centered has mean zero
        assert np.allclose(X_centered.mean(), 0.0)
        # Make sure that self.mean is set and has the right shape
        assert self.mean is not None and self.mean.shape == (D,)
        # Make sure that self.W_pca is set and has the right shape
        assert self.W_pca is not None and self.W_pca.shape == (D, D)
        # Each column of `self.W_pca` should have norm 1 (each one is an eigenvector)
        for i in range(D):
            assert np.allclose(np.linalg.norm(self.W_pca[:, i]), 1.0)

    def _eigen_decomp(self, X):
        """Performs eigendecompostion of feature covariance matrix.
        Args:
            X: Zero-centered data array, each ROW containing a data point.
               Numpy array of shape (N, D).
        Returns:
            e_vecs: Eigenvectors of covariance matrix of X. Eigenvectors are
                    sorted in descending order of corresponding eigenvalues. Each
                    column contains an eigenvector. Numpy array of shape (D, D).
            e_vals: Eigenvalues of covariance matrix of X. Eigenvalues are
                    sorted in descending order. Numpy array of shape (D,).
        """
        N, D = X.shape
        e_vecs = None
        e_vals = None
        # YOUR CODE HERE
        # Steps:
        #     1. compute the covariance matrix of X, of shape (D, D)
        #     2. compute the eigenvalues and eigenvectors of the covariance matrix
        #     3. Sort both of them in decreasing order (ex: 1.0 > 0.5 > 0.0 > -0.2 > -1.2)
        covariance = np.dot(X.T, X) / (N - 1)
        e_vals, e_vecs = np.linalg.eig(covariance)
        idx = np.argsort(-e_vals)
        e_vals = e_vals[idx]
        e_vecs = e_vecs[:, idx]
        # END YOUR CODE
        # Check the output shapes
        assert e_vals.shape == (D,)
        assert e_vecs.shape == (D, D)
        return e_vecs, e_vals

    def _svd(self, X):
        """Performs Singular Value Decomposition (SVD) of X.
        Args:
            X: Zero-centered data array, each ROW containing a data point.
                Numpy array of shape (N, D).
        Returns:
            vecs: right singular vectors. Numpy array of shape (D, D)
            vals: singular values. Numpy array of shape (K,) where K = min(N, D)
        """
        vecs = None  # shape (D, D)
        N, D = X.shape
        vals = None  # shape (K,)
        # YOUR CODE HERE
        # Here, compute the SVD of X
        # Make sure to return vecs as the matrix of vectors where each column is a singular vector
        _, vals, vecs = np.linalg.svd(X)
        # END YOUR CODE
        assert vecs.shape == (D, D)
        K = min(N, D)
        assert vals.shape == (K,)
        return vecs.T, vals

pca = PCA()

Eigendecomposition¶

The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the “core” of a PCA: The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes.
共分散行列(もしくは相関行列)の固有ベクトルと固有値はPCAのコアを形成している。即ち、固有ベクトル(主成分)が新しい特徴空間の方向を決定し、固有値がそれらの大きさを決定する。換言すれば、固有値が新しい特徴軸に沿ってデータ分散を説明する。

Implement _eigen_decomp in pca.py.
pca.pyに_eigen_decompを実装せよ。

X_train, y_train, classes_train = load_dataset('faces', train=True, as_gray=True)
X_test, y_test, classes_test = load_dataset('faces', train=False, as_gray=True)
assert classes_train == classes_test
classes = classes_train

# Flatten the image data into rows
# we now have one 4096 dimensional featue vector for each example
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))

# Perform eigenvalue decomposition on the covariance matrix of training data.
e_vecs, e_vals = pca._eigen_decomp(X_train - X_train.mean(axis=0))

print(e_vals.shape)
print(e_vecs.shape)

(4096,)
(4096, 4096)

Singular Value Decomposition¶

Doing an eigendecomposition of the covariance matrix is very expensive, especially when the number of features (D = 4096 here) gets very high.
共分散行列の固有分解を行うと、特に、特徴数(ここではD = 4096)が非常に大きくなると高く付く(電気代や時間の浪費につながる)。

To obtain the same eigenvalues and eigenvectors in a more efficient way, we can use Singular Value Decomposition (SVD). If we perform SVD on matrix $X$, we obtain $U$, $S$ and $V$ such that:
もっと高効率に同じ特徴ベクトルと特徴値を得るために、特異値分解を使用できる。仮に行列$X$に対してSVDを実行すると、以下のような$U$、$S$、$V$を得る。
$$
X = U S V^T
$$

the columns of $U$ are the eigenvectors of $X X^T$
$U$の列は$X X^T$の固有ベクトル
the columns of $V^T$ are the eigenvectors of $X^T X$
$V^T$の列は$X^T X$の固有ベクトル
the values of $S$ are the square roots of the eigenvalues of $X^T X$ (or $X X^T$)
$S$の値は$X^T X$(か$X X^T$)の固有値の平方根

Therefore, we can find out the top k eigenvectors of the covariance matrix $X^T X$ using SVD.
従って、SVDを使うことで共分散行列$X^T X$の上位k固有ベクトルを割り出せる。

Implement _svd in pca.py.
pca.pyに_svdを実装する。

# Perform SVD on directly on the training data.
u, s = pca._svd(X_train - X_train.mean(axis=0))

print(s.shape)
print(u.shape)

(800,)
(4096, 4096)

# Test whether the square of singular values and eigenvalues are the same.
# We also observe that `e_vecs` and `u` are the same (only the sign of each column can differ).
N = X_train.shape[0]
assert np.allclose((s ** 2) / (N - 1), e_vals[:len(s)])

for i in range(len(s) - 1):
    assert np.allclose(e_vecs[:, i], u[:, i]) or np.allclose(e_vecs[:, i], -u[:, i])
    # (the last eigenvector for i = len(s) - 1 is very noisy because the eigenvalue is almost 0,
    #  so imprecisions in the computation build up)

参考サイトhttps://github.com/