スタンフォード/CS131/宿題3-4 RANSACとHOG

前回のStanford University/CS131/宿題3-3 Transformation Estimation(変換推定)の続きをやる。今回の宿題は、RANSACとHOGをカバーしている。

スポンサーリンク

Part 4 RANSAC

Rather than directly feeding all our keypoint matches into fit_affine_matrix function, we can instead use RANSAC (“RANdom SAmple Consensus”) to select only “inliers” to use to compute the transformation matrix.
fit_affine_matrix関数に全てのキーポイントマッチを直接フィードする代わりに、変換行列の算出に使うインライアだけを選択するのにRANSAC(ランダムサンプルコンセンサス)を利用できる。

The steps of RANSAC are(RANSACステップは):

1. Select random set of matches
ランダムなマッチした組を選択する。
2. Compute affine transformation matrix
アフィン変換行列を求める。
3. Find inliers using the given threshold
所与の閾値を用いてインライアを検出する。
4. Repeat and keep the largest set of inliers
繰り返して最大のインライアの組を保存する。
5. Re-compute least-squares estimate on all of the inliers
全てのインライアに対して最少二乗推定を再算出する。

Implement ransac in panorama.py, run through the following code to get a panorama. You can see the difference from the result we get without RANSAC.
panorama.pyにransacを実装して、下のコードを走らせてパノラマ画像を得る。RANSACなし画像との違いを見ることができる。

def ransac(keypoints1, keypoints2, matches, n_iters=200, threshold=20):
    """
    Use RANSAC to find a robust affine transformation
        1. Select random set of matches
        2. Compute affine transformation matrix
        3. Compute inliers
        4. Keep the largest set of inliers
        5. Re-compute least-squares estimate on all of the inliers
    Args:
        keypoints1: M1 x 2 matrix, each row is a point
        keypoints2: M2 x 2 matrix, each row is a point
        matches: N x 2 matrix, each row represents a match
            [index of keypoint1, index of keypoint 2]
        n_iters: the number of iterations RANSAC will run
        threshold: the number of threshold to find inliers
    Returns:
        H: a robust estimation of affine transformation from keypoints2 to
        keypoints 1
    """
    N = matches.shape[0]
    n_samples = int(N * 0.2)
    matched1 = pad(keypoints1[matches[:,0]])
    matched2 = pad(keypoints2[matches[:,1]])
    max_inliers = np.zeros(N)
    n_inliers = 0
    # RANSAC iteration start
    ### YOUR CODE HERE
    idx = np.random.choice(N, n_samples, replace=False)
    for i in range(n_iters):
        H = np.linalg.lstsq(matched2[idx],matched1[idx], rcond=None)[0]
        H[:, 2] = np.array([0, 0, 1])
        c = np.linalg.norm(np.dot(matched2,H)-matched1,axis=-1)**2<threshold
        n_inliers = np.sum(c)
    max_inliers = idx
    ### END YOUR CODE

    return H, matches[max_inliers]
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (15.0, 12.0) # set default size of plots
plt.rcParams["font.size"] = "17"
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
from utils import get_output_space,warp_image,pad,plot_matches
from panorama import harris_corners,describe_keypoints
from skimage.feature import corner_peaks
from panorama import simple_descriptor,match_descriptors,fit_affine_matrix
from skimage.io import imread

img1 = imread('uttower1.jpg', as_gray=True)
img2 = imread('uttower2.jpg', as_gray=True)

# Detect keypoints in two images
keypoints1 = corner_peaks(harris_corners(img1, window_size=3),
                          threshold_rel=0.05,
                          exclude_border=8)
keypoints2 = corner_peaks(harris_corners(img2, window_size=3),
                          threshold_rel=0.05,
                          exclude_border=8)

patch_size = 5
# Extract features from the corners
desc1 = describe_keypoints(img1, keypoints1,
                           desc_func=simple_descriptor,
                           patch_size=patch_size)
desc2 = describe_keypoints(img2, keypoints2,
                           desc_func=simple_descriptor,
                           patch_size=patch_size)

# Match descriptors in image1 to those in image2
matches = match_descriptors(desc1, desc2, 0.7)

# Extract matched keypoints
p1 = keypoints1[matches[:,0]]
p2 = keypoints2[matches[:,1]]

# Find affine transformation matrix H that maps p2 to p1
H = fit_affine_matrix(p1, p2)

H, robust_matches = ransac(keypoints1, keypoints2, matches, threshold=1)

# Visualize robust matches
fig, ax = plt.subplots(1, 1, figsize=(18, 14))
plot_matches(ax, img1, img2, keypoints1, keypoints2, robust_matches)
plt.axis('off')
plt.show()

plt.imshow(imread('solution_ransac.png'))
plt.axis('off')
plt.title('RANSAC Solution')
plt.show()
output_shape, offset = get_output_space(img1, [img2], [H])

# Warp images into output sapce
img1_warped = warp_image(img1, np.eye(3), output_shape, offset)
img1_mask = (img1_warped != -1) # Mask == 1 inside the image
img1_warped[~img1_mask] = 0     # Return background values to 0

img2_warped = warp_image(img2, H, output_shape, offset)
img2_mask = (img2_warped != -1) # Mask == 1 inside the image
img2_warped[~img2_mask] = 0     # Return background values to 0

# Plot warped images
plt.subplot(1,2,1)
plt.imshow(img1_warped)
plt.title('Image 1 warped')
plt.axis('off')

plt.subplot(1,2,2)
plt.imshow(img2_warped)
plt.title('Image 2 warped')
plt.axis('off')
plt.show()
merged = img1_warped + img2_warped

# Track the overlap by adding the masks together
overlap = (img1_mask * 1.0 +  # Multiply by 1.0 for bool -> float conversion
           img2_mask)

# Normalize through division by `overlap` - but ensure the minimum is 1
normalized = merged / np.maximum(overlap, 1)
plt.imshow(normalized)
plt.axis('off')
plt.show()

plt.imshow(imread('solution_ransac_panorama.png'))
plt.axis('off')
plt.title('RANSAC Panorama Solution')
plt.show()

Part 5 Histogram of Oriented Gradients (HOG)

In the above code, you are using the simple_descriptor, and in this section, you are going to implement a simplified version of HOG descriptor.
上のコードではsimple_descriptorを使用し、このセクションではHOG記述子の簡潔版を実装する。
HOG stands for Histogram of Oriented Gradients. In HOG descriptor, the distribution ( histograms ) of directions of gradients ( oriented gradients ) are used as features. Gradients ( x and y derivatives ) of an image are useful because the magnitude of gradients is large around edges and corners ( regions of abrupt intensity changes ) and we know that edges and corners pack in a lot more information about object shape than flat regions.
HOGは方向勾配ヒストグラムを意味する。HOG記述子では、勾配の方向(方向勾配)の分布(ヒストグラム)は特徴として使われる。画像の勾配(x・y微分係数)は、勾配の大きさがエッジとコーナー周辺で大きく(急激に強度が変わる)、エッジとコーナーが平坦領域よりも物体の形状に関する情報を多く詰め込んでいるので有益である。

The steps of HOG are(以下がHOGのステップ):

1. compute the gradient image in x and y directions
Use the sobel filter provided by skimage.filters
xとy方向の勾配画像を求める。
skimage.filtersのsobel filterを利用する。
2. compute gradient histograms
Divide image into cells, and calculate histogram of gradients in each cell.
勾配ヒストグラムを求める。
画像をセルに分割し、各セルの勾配ヒストグラムを計算する。
3. Flatten block of histograms into feature vector
  ヒストグラムのブロックを特徴ベクトルに平坦化する。 
4. Normalize flattened block
平坦化したブロックを正規化する。

Implement hog_descriptor in panorama.py, and run through the following code to get a panorama image.
panorama.pyにhog_descriptorを実装し、以下のコードを全て実行してパノラマ画像を獲得する。

def hog_descriptor(patch, pixels_per_cell=(8,8)):
    """
    Generating hog descriptor by the following steps:
    1. compute the gradient image in x and y (already done for you)
    2. compute gradient histograms
    3. normalize across block 
    4. flattening block into a feature vector
    Args:
        patch: grayscale image patch of shape (h, w)
        pixels_per_cell: size of a cell with shape (m, n)
    Returns:
        block: 1D array of shape ((h*w*n_bins)/(m*n))
    """
    assert (patch.shape[0] % pixels_per_cell[0] == 0),\
                'Heights of patch and cell do not match'
    assert (patch.shape[1] % pixels_per_cell[1] == 0),\
                'Widths of patch and cell do not match'
    n_bins = 9
    degrees_per_bin = 180 // n_bins
    Gx = filters.sobel_v(patch)
    Gy = filters.sobel_h(patch)   
    # Unsigned gradients
    G = np.sqrt(Gx**2 + Gy**2)
    theta = (np.arctan2(Gy, Gx) * 180 / np.pi) % 180
    G_cells = view_as_blocks(G, block_shape=pixels_per_cell)
    theta_cells = view_as_blocks(theta, block_shape=pixels_per_cell)
    rows = G_cells.shape[0]
    cols = G_cells.shape[1]
    cells = np.zeros((rows, cols, n_bins))
    # Compute histogram per cell
    ### YOUR CODE HERE
    for i in range(rows):
        for j in range(cols):
            hist,_ = np.histogram(theta_cells[i,j], \
             bins=n_bins,range=(0,180),weights=G_cells[i,j])
            cells[i, j] = hist
    cells = (cells - np.mean(cells))/np.std(cells)
    block = cells.ravel()
    ### YOUR CODE HERE    
    return block
img1 = imread('uttower1.jpg', as_gray=True)
img2 = imread('uttower2.jpg', as_gray=True)

# Detect keypoints in both images
keypoints1 = corner_peaks(harris_corners(img1, window_size=3),
                          threshold_rel=0.05,
                          exclude_border=8)
keypoints2 = corner_peaks(harris_corners(img2, window_size=3),
                          threshold_rel=0.05,
                          exclude_border=8)
from skimage import filters
from skimage.util.shape import view_as_blocks
# Extract features from the corners

desc1 = describe_keypoints(img1, keypoints1,
                           desc_func=hog_descriptor,
                           patch_size=16)
desc2 = describe_keypoints(img2, keypoints2,
                           desc_func=hog_descriptor,
                           patch_size=16)
# Match descriptors in image1 to those in image2
matches = match_descriptors(desc1, desc2, 0.7)

# Plot matches
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
ax.axis('off')
plot_matches(ax, img1, img2, keypoints1, keypoints2, matches)
plt.show()
plt.imshow(imread('solution_hog.png'))
plt.axis('off')
plt.title('HOG descriptor Solution')
plt.show()
H, robust_matches = ransac(keypoints1, keypoints2, matches, threshold=1)

# Plot matches
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
plot_matches(ax, img1, img2, keypoints1, keypoints2, robust_matches)
plt.axis('off')
plt.show()

plt.imshow(imread('solution_hog_ransac.png'))
plt.axis('off')
plt.title('HOG descriptor Solution')
plt.show()
output_shape, offset = get_output_space(img1, [img2], [H])
print(output_shape, offset)

# Warp images into output sapce
img1_warped = warp_image(img1, np.eye(3), output_shape, offset)
img1_mask = (img1_warped != -1) # Mask == 1 inside the image
img1_warped[~img1_mask] = 0     # Return background values to 0

img2_warped = warp_image(img2, H, output_shape, offset)
img2_mask = (img2_warped != -1) # Mask == 1 inside the image
img2_warped[~img2_mask] = 0     # Return background values to 0

# Plot warped images
plt.subplot(1,2,1)
plt.imshow(img1_warped)
plt.title('Image 1 warped')
plt.axis('off')

plt.subplot(1,2,2)
plt.imshow(img2_warped)
plt.title('Image 2 warped')
plt.axis('off')
plt.show()
[438 913] [0. 0.]
merged = img1_warped + img2_warped

# Track the overlap by adding the masks together
overlap = (img1_mask * 1.0 +  # Multiply by 1.0 for bool -> float conversion
           img2_mask)

# Normalize through division by `overlap` - but ensure the minimum is 1
normalized = merged / np.maximum(overlap, 1)
plt.imshow(normalized)
plt.axis('off')
plt.show()

plt.imshow(imread('solution_hog_panorama.png'))
plt.axis('off')
plt.title('HOG Descriptor Panorama Solution')
plt.show()

Extra Credit: Better Image Merging

You will notice the blurry region and unpleasant lines in the middle of the final panoramic image. In the cell below, come up with a better merging scheme to make the panorama look more natural. Be creative!
完成したパノラマ画像の真ん中に、ぼやけた領域と目障りな線に気付くだろう。下のセルに、パノラマ画像がもっと自然に見えるようにより良い結合スキームを考え出せ。クリエイティブであれ!

# Modify the code below
# np.set_printoptions(threshold='nan')
### YOUR CODE HERE
merged = img1_warped+img2_warped
overlap = (img1_mask*1.0+img2_mask)
output = merged/np.maximum(overlap,1)
plt.imshow(output)
plt.axis('off')
plt.title('method1')
plt.show()

merged = img1_warped + img2_warped
for i in range(img2_warped.shape[1]):
    if not np.all(img2_mask[:,i]==False):
        break
        
RL = i
RR = img1.shape[1]
RW = RR-RL+1

for i in range(RL,RR+1):
    for j in range(img2_warped.shape[0]):
        a = (i-RL)/(RW)
        a = 1-a
        if img1_mask[j,i] & img2_mask[j,i]:
            merged[j,i]=a*img1_warped[j,i] \
            + (1-a)*img2_warped[j,i]

output = merged
### END YOUR CODE
plt.imshow(output)
plt.axis('off')
plt.title('method2')
plt.show()

上の解はこのサイトを参照させてもらった。色々とコードをいじってみたがよー分からんかった。