スタンフォード/CS131/宿題6-1 画像圧縮

Stanford University/CS131/宿題6をやる。今回の宿題は以下をカバーする。

• image compression using SVD
SVD(Singular Value Decomposition：特異値分解)を使った画像圧縮
• kNN methods for image recognition.
画像認識用KNN(K Nearest Neighbor：k近傍)法
• PCA and LDA to improve kNN
PCA(Principal Component Analysis：主成分分析)とLDA(Linear Discriminant Analysis：線形判別分析)によるKNN性能向上
Image Compression¶

Image compression is used to reduce the cost of storage and transmission of images (or videos).
One lossy compression method is to apply Singular Value Decomposition (SVD) to an image, and only keep the top n singular values.

from time import time
from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
from skimage import io

%matplotlib inline
plt.rcParams['figure.figsize'] = (15.0, 10.0) # set default size of plots
plt.rcParams["font.size"] = "17"
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

image = io.imread('pitbull.jpg', as_gray=True)
plt.imshow(image)
plt.axis('off')
plt.show()


Let’s implement image compression using SVD.
SVDを使って画像圧縮を実装する。

We first compute the SVD of the image, and as seen in class we keep the n largest singular values and singular vectors to reconstruct the image.

Implement function compress_image in compression.py.
compression.pyにcompress_image関数を実装する。

def compress_image(image, num_values):
"""Compress an image using SVD and keeping the top num_values singular values.
Args:
image: numpy array of shape (H, W)
num_values: number of singular values to keep
Returns:
compressed_image: numpy array of shape (H, W) containing the compressed image
compressed_size: size of the compressed image
"""
compressed_image = None
compressed_size = 0
# YOUR CODE HERE
# Steps:
#     1. Get SVD of the image
#     2. Only keep the top num_values singular values, and compute compressed_image
#     3. Compute the compressed size
u, s, v = np.linalg.svd(image)
u = u[:,:num_values]
s = np.diag(s[:num_values])
v = v[:num_values,:]
compressed_image = u.dot(s.dot(v))
compressed_size = u.size+num_values+v.size
#compressed_size = num_values*u.shape[0]+num_values+num_values*v.shape1
# END YOUR CODE
assert compressed_image.shape == image.shape, \
"Compressed image and original image don't have the same shape"
assert compressed_size > 0, "Don't forget to compute compressed_size"
return compressed_image, compressed_size

compressed_image, compressed_size = compress_image(image, 100)
compression_ratio = float(compressed_size) / image.size
print('Original image shape:', image.shape)
print('Compressed size: %d' % compressed_size)
print('Compression ratio: %.3f' % compression_ratio)

assert compressed_size == 298500

Original image shape: (1704, 1280)
Compressed size: 298500
Compression ratio: 0.137

# Number of singular values to keep
n_values = [10, 50, 100]

for n in n_values:
# Compress the image using n singular values
compressed_image, compressed_size = compress_image(image, n)
compression_ratio = float(compressed_size) / image.size

print("Data size (original): %d" % (image.size))
print("Data size (compressed): %d" % compressed_size)
print("Compression ratio: %f" % (compression_ratio))

plt.imshow(compressed_image, cmap='gray')
title = "n = %s" % n
plt.title(title)
plt.axis('off')
plt.show()

Data size (original): 2181120
Data size (compressed): 29850
Compression ratio: 0.013686

Data size (original): 2181120
Data size (compressed): 149250
Compression ratio: 0.068428

Data size (original): 2181120
Data size (compressed): 298500
Compression ratio: 0.136856

Face Dataset¶

We will use a dataset of faces of celebrities. Download the dataset using the following command:

sh get_dataset.sh

The face dataset for CS131 assignment.
The directory containing the dataset has the following structure:
この宿題用の顔データセット。データセットが保存されているディレクトリは以下の構造を持つ。

faces/
train/
angelina jolie/
anne hathaway/
...
test/
angelina jolie/
anne hathaway/
...

Each class has 50 training images and 10 testing images.

from utils import load_dataset

X_train, y_train, classes_train = load_dataset('faces', train=True, as_gray=True)
X_test, y_test, classes_test = load_dataset('faces', train=False, as_gray=True)

assert classes_train == classes_test
classes = classes_train

print('Class names:', classes)
print('Training data shape:', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape:', X_test.shape)
print('Test labels shape: ', y_test.shape)

Class names: ['angelina jolie', 'anne hathaway', 'barack obama', 'brad pitt', 'cristiano ronaldo', 'emma watson', 'george clooney', 'hillary clinton', 'jennifer aniston', 'johnny depp', 'justin timberlake', 'leonardo dicaprio', 'natalie portman', 'nicole kidman', 'scarlett johansson', 'tom cruise']
Training data shape: (800, 64, 64)
Training labels shape:  (800,)
Test data shape: (160, 64, 64)
Test labels shape:  (160,)

# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
num_classes = len(classes)
samples_per_class = 10
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt_idx = i * num_classes + y + 1
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx])
plt.axis('off')
if i == 0:
plt.title(y)
plt.show()

# Flatten the image data into rows
# we now have one 4096 dimensional featue vector for each example
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print("Training data shape:", X_train.shape)
print("Test data shape:", X_test.shape)

Training data shape: (800, 4096)
Test data shape: (160, 4096)


