Stanford University/CS131/Homework-1の続きをやる。

Part 1: Convolutions¶

# Setup
import numpy as np
import matplotlib.pyplot as plt
from time import time
from skimage import io

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = 15.0, 10.0 # set default size of plots
plt.rcParams["font.size"] = "18"
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Open image as grayscale
img = io.imread('dog.jpg', as_gray=True)

関数 zero_pad¶

Let us implement a more efficient version of convolution using array operations in numpy. As shown in the lecture, a convolution can be considered as a sliding window that computes sum of the pixel values weighted by the flipped kernel. The faster version will i) zero-pad an image, ii) flip the kernel horizontally and vertically, and iii) compute weighted sum of the neighborhood at each pixel.
numpyの配列演算を使ってより高効率版の畳み込みを実装する。講義で示したように、畳み込みは反転したカーネルによって加重された画素値の合計を算出するスライドウィンドウと考えることができる。高速版は、i)画像をzero-pad、ii)カーネルを水平垂直に反転、iii)隣接画素毎の加重和を算出する。
First, implement the function zero_pad in filters.py.
先ず、filters.pyに関数zero_padを実装する。

def zero_pad(image, pad_height, pad_width):
    """ Zero-pad an image.
    Ex: a 1x1 image [1] with pad_height = 1, pad_width = 2 becomes:
    
        [[0, 0, 0, 0, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 0]]         of shape (3, 5)

    Args:
        image: numpy array of shape (H, W)
        pad_width: width of the zero padding (left and right padding)
        pad_height: height of the zero padding (bottom and top padding)
    Returns:
        out: numpy array of shape (H+2*pad_height, W+2*pad_width)
    """
    H, W = image.shape
    out = None
    ### YOUR CODE HERE
    out = np.zeros((H+2*pad_height, W+2*pad_width))
    out[pad_height:H+pad_height,pad_width:W+pad_width]=image
    ### END YOUR CODE
    return out

pad_width = 20 # width of the padding on the left and right
pad_height = 40 # height of the padding on the top and bottom

padded_img = zero_pad(img, pad_height, pad_width)

# Plot your padded dog
plt.subplot(1,2,1)
plt.imshow(padded_img)
plt.title('Padded dog')
plt.axis('off')
# Plot what you should get
solution_img = io.imread('padded_dog.jpg', as_gray=True)
plt.subplot(1,2,2)
plt.imshow(solution_img)
plt.title('What you should get')
plt.axis('off')
plt.show()

関数 conv_fast¶

Next, complete the function conv_fast in filters.py using zero_pad. Run the code below to compare the outputs by the two implementations. conv_fast should run significantly faster than conv_nested. Depending on your implementation and computer, conv_nested should take a few seconds and conv_fast should be around 5 times faster.
次に、zero_padを使ってfilters.pyに関数conv_fastを実装する。下のコードを走らせて2つの実装による出力を比較する。conv_fastはconv_nestedよりも有意に高速である必要がある。関数の出来とマシーン性能によるが、conv_nestedは数秒を要し、conv_fastは約5倍高速になるはずだ。

def conv_fast(image, kernel):
    """ An efficient implementation of convolution filter.
    This function uses element-wise multiplication and np.sum()
    to efficiently compute weighted sum of neighborhood at each
    pixel.
    Hints:
        - Use the zero_pad function you implemented above
        - There should be two nested for-loops
        - You may find np.flip() and np.sum() useful
    Args:
        image: numpy array of shape (Hi, Wi)
        kernel: numpy array of shape (Hk, Wk)
    Returns:
        out: numpy array of shape (Hi, Wi)
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))
    ### YOUR CODE HERE
    image = zero_pad(image, Hk//2, Wk//2)
    kernel = np.flip(np.flip(kernel, 0), 1)
    for m in range(Hi):
        for n in range(Wi):
            out[m,n]=np.sum(image[m:m+Hk,n:n+Wk]*kernel)
    ### END YOUR CODE
    return out

from filters import conv_nested
# Simple convolution kernel.
# Feel free to change the kernel and to see different outputs.
kernel = np.array(
[
    [1,0,-1],
    [2,0,-2],
    [1,0,-1]
])

t0 = time()
out_fast = conv_fast(img, kernel)
t1 = time()
out_nested = conv_nested(img, kernel)
t2 = time()
# Compare the running time of the two implementations
print("conv_nested: took %f seconds." % (t2 - t1))
print("conv_fast: took %f seconds." % (t1 - t0))

# Plot conv_nested output
plt.subplot(1,2,1)
plt.imshow(out_nested)
plt.title('conv_nested')
plt.axis('off')
# Plot conv_fast output
plt.subplot(1,2,2)
plt.imshow(out_fast)
plt.title('conv_fast')
plt.axis('off')
image = io.imread('dog.jpg', as_gray=True)
solution = io.imread('convoluted_dog.jpg', as_gray=True)
# Make sure that the two outputs are the same
if not (np.max(out_fast - out_nested) < 1e-10):
    print("Different outputs! Check your implementation.")

conv_nested: took 2.702991 seconds.
conv_fast: took 0.551556 seconds.

関数 conv_faster¶

Devise a faster version of convolution and implement conv_faster in filters.py. You will earn extra credit only if the conv_faster runs faster (by a fair margin) than conv_fast and outputs the same result.
畳み込みのより高速版を考案してfilters.pyにconv_fasterを実装する。

def conv_faster(image, kernel):
    """
    Args:
        image: numpy array of shape (Hi, Wi)
        kernel: numpy array of shape (Hk, Wk)
    Returns:
        out: numpy array of shape (Hi, Wi)
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))
    ### YOUR CODE HERE
    image = zero_pad(image, Hk//2, Wk//2)
    kernel = np.flip(np.flip(kernel, 0), 1)
    out1 = np.zeros((Hi*Wi, Hk*Wk))
    for i in range(Hi):
        for j in range(Wi):
            out1[i*Wi+j,:] = image[i:i+Hk, \
                     j:j+Wk].reshape(1, Hk*Wk)
    out = out1.dot(kernel.reshape(Hk*Wk, 1)).reshape(Hi, Wi)
    ### END YOUR CODE
    return out

t0 = time()
out_fast = conv_fast(img, kernel)
t1 = time()
out_faster = conv_faster(img, kernel)
t2 = time()
# Compare the running time of the two implementations
print("conv_fast: took %f seconds." % (t1 - t0))
print("conv_faster: took %f seconds." % (t2 - t1))
# Plot conv_nested output
plt.subplot(1,2,1)
plt.imshow(out_fast)
plt.title('conv_fast')
plt.axis('off')
# Plot conv_fast output
plt.subplot(1,2,2)
plt.imshow(out_faster)
plt.title('conv_faster')
plt.axis('off')
# Make sure that the two outputs are the same
if not (np.max(out_fast - out_faster) < 1e-10):
    print("Different outputs! Check your implementation.")

conv_fast: took 0.557135 seconds.
conv_faster: took 0.127873 seconds.

extra creditの問題だけあってかなり難しい。