今回は、Stanford University/CS231n/Assignment3/Network Visualization(ネットワーク可視化)(TensorFlow)をやる。

Network Visualization (TensorFlow)¶

このノートブックでは新たな画像生成に画像勾配を活用する。モデルを訓練する時、モデル性能不満足度を測る損失関数を定義する。その後、モデルパラメーターに対する損失勾配を誤差逆伝播法を用いて算出し、損失を最小化するためにモデルパラメーターに勾配降下法を実行する。

ここでは若干違うことをやる。先ずはImageNet datasetで画像分類をするように事前訓練されている畳み込みニューラルネットワークモデルから始める。画像に対する現在の不満を数値化する損失関数を定義するのにこのモデルを使用し、次に、誤差逆伝播法を用いて、画像の画素に対するこの損失の勾配を計算する。その次に、モデルは固定したままにして、損失を最小化する新しい画像を合成するために元画像に勾配降下法を実行する。

このノートブックで、画像生成用の3つの技術を精査する。

Saliency Maps：顕著性マップは、ネットワークが分類判定するのに画像のどの部分が影響を与えているのかを知るための手っ取り早い方法。
Fooling Images：人間には同じに見えても、事前学習済みネットワークは誤分類するように入力画像に摂動を加えることができる。
Class Visualization：ある特定のクラスの分類スコアを最大化するよう画像を合成できる。このことは、ネットワークがそのクラスの画像を分類する時に何を見ているのかの示唆を与えることができる。

# As usual, a bit of setup
#from __future__ import print_function
import time, os, json
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from cs231n.classifiers.squeezenet import SqueezeNet
from cs231n.data_utils import load_tiny_imagenet
from cs231n.image_utils import preprocess_image, deprocess_image
from cs231n.image_utils import SQUEEZENET_MEAN, SQUEEZENET_STD

plt.rcParams['figure.figsize'] = 15, 10 # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

def get_session():
    """Create a session that dynamically allocates memory."""
    # See: https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    session = tf.Session(config=config)
    return session

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
%matplotlib inline

Pretrained Model¶

全ての画像生成の試みは、ImageNetで画像分類するように事前学習された畳み込みニューラルネットワークを使って始める。ここではどんなモデルも使えるが、この課題の目的として、AlexNetに匹敵する正確度を実現する一方で有意に少ないパラメーター数と計算の複雑性を兼ね備えたSqueezeNet¹を使う。

AlexNetやVGGやResNetではなくSqueezeNetを使うことで、画像生成実験を簡単にCPUを使って実行できる。PyTorch SqueezeNetモデルをTensorFlowに移植してあるので、このモデルのアーキテクチャをcs231n/classifiers/squeezenet.pyを見て確認する。

SqueezeNetを使用するには、先ずcs231n/datasetsディレクトリからsqueezenet_tf.shを実行して重みをダウンロードする必要がある。get_assignment3_data.shを既に実行している場合はSqueezeNetはダウンロード済み。Squeezenetモデルをダウンロードしたら、それを新しいTensorFlowセッションにロードすることができる。

tf.reset_default_graph()
sess = get_session()

SAVE_PATH = 'cs231n/datasets/squeezenet.ckpt'
#if not os.path.exists(SAVE_PATH):
#    raise ValueError("You need to download SqueezeNet!")
model = SqueezeNet(save_path=SAVE_PATH, sess=sess)

INFO:tensorflow:Restoring parameters from cs231n/datasets/squeezenet.ckpt

Load some ImageNet images¶

ImageNet ILSVRC 2012 Classification datasetの検証セットからいくつかの標本画像を用意してある。これらの画像をダウンロードするには、cs231n/datasetsディレクトリからget_imagenet_val.shを実行する。

それらは検証セット由来の画像なので、今回使用する事前学習済みモデルは学習中にこれらの画像は見ていない。下のセルを実行して正解ラベルと一緒にいくつかの画像を見てみる。

from cs231n.data_utils import load_imagenet_val
X_raw, y, class_names = load_imagenet_val(num=5)

plt.figure(figsize=(25, 10))
plt.rcParams["font.size"] = "15"
for i in range(5):
    plt.subplot(1, 5, i + 1)
    plt.imshow(X_raw[i])
    plt.title(class_names[y[i]])
    plt.axis('off')
plt.gcf().tight_layout()

Preprocess images¶

事前学習済みモデルへの入力は正規化されていると仮定されているので、最初に先ずpixelwise mean(ピクセル毎の平均値)を差し引いてpixelwise standard deviation(ピクセル毎の標準偏差)で割ることで画像を前処理する。

X = np.array([preprocess_image(img) for img in X_raw])

Saliency Maps¶

この事前学習済みモデルを使ってクラス顕著性マップを²のSection 3.1に記載のとおり計算する。

顕著性マップは、画像中の各ピクセルがその画像の分類スコアに影響を与える度合いを教えてくれる。それを計算するために、画像の画素に対する正しいクラス(スカラー)に対応する非正規化スコアの勾配を計算する。もし、画像の形式が(H, W, 3)ならこの勾配も(H, W, 3)の形を持つ。画像の各画素に対してこの勾配は、画素が僅か変化する場合の分類スコアの変化量を示してくれる。顕著性マップ算出には、先ずこの勾配の絶対値を受け取った後、3の入力チャンネルに対する最大値を受け取る。最終的な顕著性マップは、従って、(H, W)形状で全エントリーは非負になる。

入力毎のスコアが入っているmodel.classifier Tensorを使う必要があり、勾配を評価する際にmodel.imageとmodel.labels placeholder用の値をフィードする必要がある。

cs231n/classifiers/squeezenet.pyを開いて、確実にモデルの使い方を理解するようにドキュメンテーションを読む。使用例にloss attributeを見ることができる。

def compute_saliency_maps(X, y, model):
    """
    Compute a class saliency map using the model for images X and labels y.

    Input:
    - X: Input images, numpy array of shape (N, H, W, 3)
    - y: Labels for X, numpy of shape (N,)
    - model: A SqueezeNet model that will be used to compute the saliency map.

    Returns:
    - saliency: A numpy array of shape (N, H, W) giving the saliency maps for the
    input images.
    """
    saliency = None
    # Compute the score of the correct class for each example.
    # This gives a Tensor with shape [N], the number of examples.
    #
    # Note: this is equivalent to scores[np.arange(N), y] we used in NumPy
    # for computing vectorized losses.
    correct_scores = tf.gather_nd(model.scores,
                                  tf.stack((tf.range(X.shape[0]), model.labels), axis=1))
    ###############################################################################
    # TODO: Implement this function. You should use the correct_scores to compute #
    # the loss, and tf.gradients to compute the gradient of the loss with respect #
    # to the input image stored in model.image.                                   #
    # Use the global sess variable to finally run the computation.                #
    # Note: model.image and model.labels are placeholders and must be fed values  #
    # when you call sess.run().                                                   #
    ###############################################################################
    loss = tf.nn.softmax_cross_entropy_with_logits(labels=model.labels,
                                                   logits=correct_scores)
    dx = tf.gradients(loss, model.image)
    dx = tf.squeeze(dx)
    abs_dx = tf.abs(dx)
    max_dx = tf.reduce_max(abs_dx, axis=3)   
    saliency = sess.run(max_dx, feed_dict={model.image:X, model.labels:y})
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    return saliency

上のセルでの実装が終わったら、下を実行して、ImageNet検証セットからの標本画像のclass saliency mapをいくつか見てみる。

def show_saliency_maps(X, y, mask):
    mask = np.asarray(mask)
    Xm = X[mask]
    ym = y[mask]

    saliency = compute_saliency_maps(Xm, ym, model)

    for i in range(mask.size):
        plt.subplot(2, mask.size, i + 1)
        plt.imshow(deprocess_image(Xm[i]))
        plt.axis('off')
        plt.title(class_names[ym[i]])
        plt.subplot(2, mask.size, mask.size + i + 1)
        plt.title(mask[i])
        plt.imshow(saliency[i], cmap=plt.cm.hot)
        plt.axis('off')
        plt.gcf().set_size_inches(21, 8)
    plt.show()

mask = np.arange(5)
show_saliency_maps(X, y, mask)

Fooling Images¶

³の中で論じられているように”fooling images”を生成するのに画像勾配を使うことができる。イメージとターゲットクラスが与えられれば、対象クラスを極大化するために画像に対して勾配上昇法を実行でき、ネットワークがターゲットクラスとしてイメージを分類する場合は停止できる。fooling imagesを生成するために下記の関数を実装する。

def make_fooling_image(X, target_y, model):
    """
    Generate a fooling image that is close to X, but that the model classifies
    as target_y.

    Inputs:
    - X: Input image, of shape (1, 224, 224, 3)
    - target_y: An integer in the range [0, 1000)
    - model: Pretrained SqueezeNet model

    Returns:
    - X_fooling: An image that is close to X, but that is classifed as target_y
    by the model.
    """
    X_fooling = X.copy()
    learning_rate = 1
    ##############################################################################
    # TODO: Generate a fooling image X_fooling that the model will classify as   #
    # the class target_y. Use gradient ascent on the target class score, using   #
    # the model.classifier Tensor to get the class scores for the model.image.   #
    # When computing an update step, first normalize the gradient:               #
    #   dX = learning_rate * g / ||g||_2                                         #
    #                                                                            #
    # You should write a training loop                                           #
    #                                                                            #  
    # HINT: For most examples, you should be able to generate a fooling image    #
    # in fewer than 100 iterations of gradient ascent.                           #
    # You can print your progress over iterations to check your algorithm.       #
    ##############################################################################
    g = tf.gradients(model.scores[:,target_y], model.image)
    g = tf.squeeze(g)
    norm_g = tf.sqrt(tf.reduce_sum(tf.pow(g,2)))
    dx = learning_rate*g/norm_g
    while True:
        step = sess.run(dx, feed_dict={model.image:X_fooling})
        X_fooling += step
        preds = sess.run(model.scores, feed_dict={model.image:X_fooling})
        if np.argmax(preds)==target_y:
            break
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    return X_fooling

下を実行してfooling image(騙し画像)を生成する。他の画像を見るためにidx変数を自由に変えてよい。

idx = 0
Xi = X[idx][None]
target_y = 6
X_fooling = make_fooling_image(Xi, target_y, model)

# Make sure that X_fooling is classified as y_target
scores = sess.run(model.scores, {model.image: X_fooling})
assert scores[0].argmax() == target_y, 'The network is not fooled!'

# Show original image, fooling image, and difference
orig_img = deprocess_image(Xi[0])
fool_img = deprocess_image(X_fooling[0])
# Rescale 
plt.figure(figsize=(20, 10))
plt.rcParams["font.size"] = "18"
plt.subplot(1, 4, 1)
plt.imshow(orig_img)
plt.axis('off')
plt.title(class_names[y[idx]])
plt.subplot(1, 4, 2)
plt.imshow(fool_img)
plt.title(class_names[target_y])
plt.axis('off')
plt.subplot(1, 4, 3)
plt.title('Difference')
plt.imshow(deprocess_image((Xi-X_fooling)[0]))
plt.axis('off')
plt.subplot(1, 4, 4)
plt.title('Magnified difference (10x)')
plt.imshow(deprocess_image(10 * (Xi-X_fooling)[0]))
plt.axis('off')
plt.gcf().tight_layout()

Class visualization¶

random noise image(不規則雑音画像)からスタートして、target classに勾配上昇法を実行することで、ネットワークがターゲットクラスとして認識する画像を生成することができる。この考えは、初めに⁴でプレゼンされた。⁵がこの考えを生成された画像の質を向上できるいくつかの正則化技術を提唱することで拡張した。

具体的に、$I$を画像、$y$をターゲットクラス、$s_y(I)$を、畳み込みネットワークがクラス$y$の画像$I$に割り当てるスコアにする。これらは、クラス確率ではなく生非正規化スコアであることに留意する。$R$が(たぶん暗黙的)regularizer(argmaxの$R(I)$の符号に留意：この正則化項は極小化したい)である問題$$I^* = \arg\max_I s_y(I) – R(I)$$を解くことでクラス$y$に対して高スコアを実現する画像$I^*$を生成したい。この最適化問題は勾配上昇法を使って生成された画像に対する勾配を計算することで解くことができる。式$$R(I) = \lambda \|I\|_2^2$$の(明示的)L2正則化と暗黙的正則化を⁶で示唆されているように生成された画像を周期的にぼかすことで使用する。この問題は生成された画像に勾配上昇法を用いて解くことができる。

下のセルのcreate_class_visualization関数の実装を完成させる。

from scipy.ndimage.filters import gaussian_filter1d
def blur_image(X, sigma=1):
    X = gaussian_filter1d(X, sigma, axis=1)
    X = gaussian_filter1d(X, sigma, axis=2)
    return X

def create_class_visualization(target_y, model, **kwargs):
    """
    Generate an image to maximize the score of target_y under a pretrained model.
    
    Inputs:
    - target_y: Integer in the range [0, 1000) giving the index of the class
    - model: A pretrained CNN that will be used to generate the image
    
    Keyword arguments:
    - l2_reg: Strength of L2 regularization on the image
    - learning_rate: How big of a step to take
    - num_iterations: How many iterations to use
    - blur_every: How often to blur the image as an implicit regularizer
    - max_jitter: How much to gjitter the image as an implicit regularizer
    - show_every: How often to show the intermediate result
    """
    l2_reg = kwargs.pop('l2_reg', 1e-5)
    learning_rate = kwargs.pop('learning_rate', 25)
    num_iterations = kwargs.pop('num_iterations', 100)
    blur_every = kwargs.pop('blur_every', 10)
    max_jitter = kwargs.pop('max_jitter', 16)
    show_every = kwargs.pop('show_every', 25)

    X = 255 * np.random.rand(224, 224, 3)
    X = preprocess_image(X)[None]
    
    ########################################################################
    # TODO: Compute the loss and the gradient of the loss with respect to  #
    # the input image, model.image. We compute these outside the loop so   #
    # that we don't have to recompute the gradient graph at each iteration #
    #                                                                      #
    # Note: loss and grad should be TensorFlow Tensors, not numpy arrays!  #
    #                                                                      #
    # The loss is the score for the target label, target_y. You should     #
    # use model.classifier to get the scores, and tf.gradients to compute  #
    # gradients. Don't forget the (subtracted) L2 regularization term!     #
    ########################################################################
    target_scores = tf.gather_nd(model.scores, tf.stack((tf.range(X.shape[0]), 
                                 model.labels), axis=1))
    grad = tf.gradients(target_scores, model.image) 
    grad = tf.squeeze(grad)
    grad -= 2*l2_reg*model.image
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    
    for t in range(num_iterations):
        # Randomly jitter the image a bit; this gives slightly nicer results
        ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)
        Xi = X.copy()
        X = np.roll(np.roll(X, ox, 1), oy, 2)
        
        ########################################################################
        # TODO: Use sess to compute the value of the gradient of the score for #
        # class target_y with respect to the pixels of the image, and make a   #
        # gradient step on the image using the learning rate. You should use   #
        # the grad variable you defined above.                                 #
        #                                                                      #
        # Be very careful about the signs of elements in your code.            #
        ########################################################################
        dx = sess.run(grad, feed_dict={model.image:X, model.labels:np.array([target_y])})
        X += learning_rate*dx
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        # Undo the jitter
        X = np.roll(np.roll(X, -ox, 1), -oy, 2)

        # As a regularizer, clip and periodically blur
        X = np.clip(X, -SQUEEZENET_MEAN/SQUEEZENET_STD, (1.0 - SQUEEZENET_MEAN)/SQUEEZENET_STD)
        if t % blur_every == 0:
            X = blur_image(X, sigma=0.5)

        # Periodically show the image
        if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:
            plt.imshow(deprocess_image(X[0]))
            class_name = class_names[target_y]
            plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))
            plt.gcf().set_size_inches(8, 8)
            plt.axis('off')
            plt.show()
    return X

上のセルへの実装が終わったら下のセルを実行してタランチュラの画像を生成する。

target_y = 76 # Tarantula
out = create_class_visualization(target_y, model)

他のクラスでもクラス可視化を試す。生成される画像の質を高めるのに各種ハイパーパラメーターを自由にいじっても良いが、これはあくまでも任意。

# target_y = 78 # Tick
# target_y = 187 # Yorkshire Terrier
# target_y = 683 # Oboe
# target_y = 366 # Gorilla
# target_y = 604 # Hourglass
target_y = 604
print(class_names[target_y])
X = create_class_visualization(target_y, model)

hourglass

参考サイトhttps://github.com/

Iandola et al, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and > 0.5MB model size,” arXiv 2016
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.
Szegedy et al, “Intriguing properties of neural networks”, ICLR 2014
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML 2015 Deep Learning Workshop
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML 2015 Deep Learning Workshop