今日からStanford University/CS131/Homework2をやる。この宿題は、Canny edge detector(キャニーエッジ検出器)とHough transform(ハフ変換)をカバーする。
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from time import time
from skimage import io
from __future__ import print_function
%matplotlib inline
plt.rcParams['figure.figsize'] = 18.0, 11.0 # set default size of plots
plt.rcParams["font.size"] = "18"
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
Part 1: Canny Edge Detector¶
In this part, you are going to implment Canny edge detector. The Canny edge detection algorithm can be broken down in to five steps:
このパートでは、キャニー法を実装する。キャニーエッジ検出アルゴリズムは、5つのステップに分解することができる。
- Smoothing(スムージング)
- Finding gradients(勾配検出)
- Non-maximum suppression(非極大値抑制)
- Double thresholding(二重閾値)
- Edge tracking by hysterisis(ヒステリシスによるエッジ追跡)
Smoothing¶
Implementation¶
We first smooth the input image by convolving it with a Gaussian kernel. The equation for a Gaussian kernel of size $(2k+1)\times(2k+1)$ is given by:
ガウス核を使った畳み込みで最初に入力画像をスムーズにする。サイズが $(2k+1)\times(2k+1)$のガウス核に対する方程式は以下の通り。
$$h_{ij}=\frac{1}{2\pi\sigma^2}\exp{\Bigl(-\frac{(i-k)^2+(j-k)^2}{2\sigma^2}\Bigr)}, 0\leq i,j < 2k+1$$
Implement gaussian_kernel in edge.py and run the code below.
edge.pyにconvとgaussian_kernelを実装して下のコードを走らせる。
def conv(image, kernel):
""" An implementation of convolution filter.
This function uses element-wise multiplication and np.sum()
to efficiently compute weighted sum of neighborhood at each
pixel.
Args:
image: numpy array of shape (Hi, Wi)
kernel: numpy array of shape (Hk, Wk)
Returns:
out: numpy array of shape (Hi, Wi)
"""
Hi, Wi = image.shape
Hk, Wk = kernel.shape
out = np.zeros((Hi, Wi))
# For this assignment, we will use edge values to pad the images.
# Zero padding will make derivatives at the image boundary very big,
# whereas we want to ignore the edges at the boundary.
pad_width0 = Hk // 2
pad_width1 = Wk // 2
pad_width = ((pad_width0,pad_width0),(pad_width1,pad_width1))
padded = np.pad(image, pad_width, mode='edge')
### YOUR CODE HERE
kernel = np.flip(np.flip(kernel, 0), 1)
for i in range(Hi):
for j in range(Wi):
out[i][j]=np.sum(padded[i:i+Hk,j:j+Wk]*kernel)
### END YOUR CODE
return out
def gaussian_kernel(size, sigma):
""" Implementation of Gaussian Kernel.
This function follows the gaussian kernel formula,
and creates a kernel matrix.
Hints:
- Use np.pi and np.exp to compute pi and exp
Args:
size: int of the size of output matrix
sigma: float of sigma to calculate kernel
Returns:
kernel: numpy array of shape (size, size)
"""
kernel = np.zeros((size, size))
### YOUR CODE HERE
k = size // 2
for i in range(size):
for j in range(size):
kernel[i][j]=(1/(2*np.pi*sigma**2))* \
np.exp(-((i-k)**2+(j-k)**2)/(2*sigma**2))
### END YOUR CODE
return kernel
# Define 3x3 Gaussian kernel with std = 1
kernel = gaussian_kernel(3, 1)
kernel_test = np.array(
[[ 0.05854983, 0.09653235, 0.05854983],
[ 0.09653235, 0.15915494, 0.09653235],
[ 0.05854983, 0.09653235, 0.05854983]]
)
# Test Gaussian kernel
if not np.allclose(kernel, kernel_test):
print('Incorrect values! Please check your implementation.')
# Test with different kernel_size and sigma
kernel_size = 5
sigma = 1.4
# Load image
img = io.imread('iguana.png', as_gray=True)
# Define 5x5 Gaussian kernel with std = sigma
kernel = gaussian_kernel(kernel_size, sigma)
# Convolve image with kernel to achieve smoothed effect
smoothed = conv(img, kernel)
plt.subplot(1,2,1)
plt.imshow(img)
plt.title('Original image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(smoothed)
plt.title('Smoothed image')
plt.axis('off')
plt.show()
What is the effect of the kernel_size and sigma?
カーネルとシグマの効果は何か?
The larger sigma and kernel size makes the image more blurry.
シグマとカーネルサイズが大きくなるほど画像はより不鮮明になる。
Finding gradients¶
The gradient of a 2D scalar function $I:\mathbb{R}^2\rightarrow{\mathbb{R}}$ in Cartesian coordinate is defined by:
デカルト座標の2Dスカラー関数$I:\mathbb{R}^2\rightarrow{\mathbb{R}}$の勾配は以下のように表わせる。
$$\nabla{I(x,y)}=\bigl[\frac{\partial{I}}{\partial{x}},\frac{\partial{I}}{\partial{y}}\bigr],$$
where
$$
\frac{\partial{I(x,y)}}{\partial{x}}=\lim_{\Delta{x}\to{0}}\frac{I(x+\Delta{x},y)-I(x,y)}{\Delta{x}} \\
\frac{\partial{I(x,y)}}{\partial{y}}=\lim_{\Delta{y}\to{0}}\frac{I(x,y+\Delta{y})-I(x,y)}{\Delta{y}}.
$$
In case of images, we can approximate the partial derivatives by taking differences at one pixel intervals:
画像の場合、1画素間隔差異を取得することで偏微分係数を概算できる。
$$
\frac{\partial{I(x,y)}}{\partial{x}}\approx{\frac{I(x+1,y)-I(x-1,y)}{2}} \\
\frac{\partial{I(x,y)}}{\partial{y}}\approx{\frac{I(x,y+1)-I(x,y-1)}{2}}
$$
Note that the partial derivatives can be computed by convolving the image $I$ with some appropriate kernels $D_x$ and $D_y$:
偏微分係数が、画像$I$を適切なカーネル$D_x$と$D_y$で畳み込むことで算出可能であることに留意する。
$$
\frac{\partial{I}}{\partial{x}}\approx{I*D_x}=G_x \\
\frac{\partial{I}}{\partial{y}}\approx{I*D_y}=G_y
$$
Implementation¶
Find the kernels $D_x$ and $D_y$ and implement partial_x and partial_y using conv defined in edge.py.
カーネル$D_x$と$D_y$を探し、edge.pyで定義されているconvを利用してpartial_xとpartial_yを実装する。
-Hint: Remember that convolution flips the kernel.
ヒント:畳み込みがカーネルを反転させることを忘れずに。
def partial_x(img):
""" Computes partial x-derivative of input img.
Hints:
- You may use the conv function in defined in this file.
Args:
img: numpy array of shape (H, W)
Returns:
out: x-derivative image
"""
out = None
### YOUR CODE HERE
kernel = np.array([[.5, 0., -.5]])
out = conv(img, kernel)
### END YOUR CODE
return out
def partial_y(img):
""" Computes partial y-derivative of input img.
Hints:
- You may use the conv function in defined in this file.
Args:
img: numpy array of shape (H, W)
Returns:
out: y-derivative image
"""
out = None
### YOUR CODE HERE
kernel = np.array([[.5], [0.], [-.5]])
out = conv(img, kernel)
### END YOUR CODE
return out
# Test input
I = np.array(
[[0, 0, 0],
[0, 1, 0],
[0, 0, 0]]
)
# Expected outputs
I_x_test = np.array(
[[ 0, 0, 0],
[ 0.5, 0, -0.5],
[ 0, 0, 0]]
)
I_y_test = np.array(
[[ 0, 0.5, 0],
[ 0, 0, 0],
[ 0, -0.5, 0]]
)
# Compute partial derivatives
I_x = partial_x(I)
I_y = partial_y(I)
# Test correctness of partial_x and partial_y
if not np.all(I_x == I_x_test):
print('partial_x incorrect')
if not np.all(I_y == I_y_test):
print('partial_y incorrect')
# Compute partial derivatives of smoothed image
Gx = partial_x(smoothed)
Gy = partial_y(smoothed)
plt.subplot(1,2,1)
plt.imshow(Gx)
plt.title('Derivative in x direction')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(Gy)
plt.title('Derivative in y direction')
plt.axis('off')
plt.show()
What is the reason for performing smoothing prior to computing the gradients?
勾配を計算する前に平坦化を施す理由は何か?
Because if there is a lot of noise, it degrades the computation performance.
ノイズが多いと計算速度を低下させるから。
Implementation¶
Now, we can compute the magnitude and direction of gradient with the two partial derivatives:
次に、2つの偏微分係数を使って勾配の大きさと方向を算出する。
$$
G = \sqrt{G_{x}^{2}+G_{y}^{2}} \\
\Theta = arctan\bigl(\frac{G_{y}}{G_{x}}\bigr)
$$
Implement gradient in edge.py which takes in an image and outputs $G$ and $\Theta$.
edge.pyに、画層を取り込んで、$G$と$\Theta$を出力するgradientを実装せよ。
-Hint: Use np.arctan2 to compute $\Theta$.
ヒント;$\Theta$の計算にnp.arctan2が使える。
def gradient(img):
""" Returns gradient magnitude and direction of input img.
Args:
img: Grayscale image. Numpy array of shape (H, W)
Returns:
G: Magnitude of gradient at each pixel in img.
Numpy array of shape (H, W)
theta: Direction(in degrees, 0 <= theta < 360) of gradient
at each pixel in img. Numpy array of shape (H, W)
"""
G = np.zeros(img.shape)
theta = np.zeros(img.shape)
### YOUR CODE HERE
Gx = partial_x(img)
Gy = partial_y(img)
G = np.sqrt(Gx**2+Gy**2)
theta = np.arctan2(Gx,Gx)
theta = np.rad2deg((theta)+180)%360
### END YOUR CODE
return G, theta
G, theta = gradient(smoothed)
if not np.all(G >= 0):
print('Magnitude of gradients should be non-negative.')
if not np.all((theta >= 0) * (theta < 360)):
print('Direction of gradients should be in range 0 <= theta < 360')
plt.imshow(G)
plt.title('Gradient magnitude')
plt.axis('off')
plt.show()
Non-maximum suppression¶
You should be able to note that the edges extracted from the gradient of the smoothed image is quite thick and blurry. The purpose of this step is to convert the “blurred” edges into “sharp” edges. Basically, this is done by preserving all local maxima in the gradient image and discarding everything else. The algorithm is for each pixel (x,y) in the gradient image:
平滑化画像の勾配から抽出したエッジは、厚く不鮮明であることに気付けるようになる必要がある。このステップの目的は、不鮮明なエッジを鮮明なエッジに変換することにある。基本的に、この事は、勾配画像の全ての極大値を残し、他は全て除去することで達成できる。アルゴリズムは勾配画像の各画素(x,y)に対応する。
-
Round the gradient direction $\Theta[y,x]$ to the nearest 45 degrees, corresponding to the use of an 8-connected neighbourhood.
勾配方向$\Theta[y,x]$を、8連結近傍適用に相当する直近45度へ四捨五入する。 -
Compare the edge strength of the current pixel with the edge strength of the pixel in the positive and negative gradient direction. For example, if the gradient direction is south (theta=90), compare with the pixels to the north and south.
現画素のエッジ強度と、正・負勾配方向画素のエッジ強度を比較する。例えば、勾配方向が南(θ=90)であれば、画素を北と南で比較する。 -
If the edge strength of the current pixel is the largest; preserve the value of the edge strength. If not, suppress (i.e. remove) the value.
現画素のエッジ強度が最大な場合はエッジ強度の値を維持する。もし、最大でない場合は値を除去する。
Implement non_maximum_suppression in edge.py.
edge.pyにnon_maximum_suppressionを実装せよ。
def non_maximum_suppression(G, theta):
""" Performs non-maximum suppression
This function performs non-maximum suppression along the direction
of gradient (theta) on the gradient magnitude image (G).
Args:
G: gradient magnitude image with shape of (H, W)
theta: direction of gradients with shape of (H, W)
Returns:
out: non-maxima suppressed image
"""
H, W = G.shape
out = np.zeros((H, W))
# Round the gradient direction to the nearest 45 degrees
theta = np.floor((theta+22.5)/45)*45
### BEGIN YOUR CODE
for i in range(1,H-1):
for j in range(1,W-1):
rad = np.deg2rad(theta[i,j])
m = np.int64(np.around(np.sin(rad)))
n = np.int64(np.around(np.cos(rad)))
p1 = G[i-m,j-n]
p2 = G[i+m,j+n]
if not (G[i,j]>=p1 and G[i,j]>=p2):
out[i,j] = 0
else:
out[i,j] = G[i,j]
### END YOUR CODE
return out
# Test input
g = np.array(
[[0.4, 0.5, 0.6],
[0.3, 0.5, 0.7],
[0.4, 0.5, 0.6]]
)
# Print out non-maximum suppressed output
# varying theta
for angle in range(0, 180, 45):
print('Thetas:', angle)
t = np.ones((3, 3)) * angle # Initialize theta
print(non_maximum_suppression(g, t))
nms = non_maximum_suppression(G, theta)
plt.imshow(nms)
plt.title('Non-maximum suppressed')
plt.axis('off')
plt.show()