前回のNumpy tutorialの続きを、このサイトを参考にしながらやる。
スポンサーリンク
NDArray¶
import os
import sys
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
%precision 4
plt.style.use('ggplot')
x = np.array([1,2,3,4,5,6])
print (x)
print ('dytpe', x.dtype)
print ('shape', x.shape)
print ('strides', x.strides)
x.shape = (2,3)
print (x)
print ('dytpe', x.dtype)
print ('shape', x.shape)
print ('strides', x.strides)
x = x.astype('complex')
print (x)
print ('dytpe', x.dtype)
print ('shape', x.shape)
print ('strides', x.strides)
スポンサーリンク
Creating arrays¶
# from lists
x_list = [(i,j) for i in range(2) for j in range(3)]
print (x_list, '\n')
x_array = np.array(x_list)
print (x_array)
# Using convenience functions
print (np.ones((3,2)), '\n')
print (np.zeros((3,2)), '\n')
print (np.eye(3), '\n')
print (np.diag([1,2,3]), '\n')
print (np.fromfunction(lambda i, j: (i-2)**2+(j-2)**2, (5,5)))
スポンサーリンク
Array indexing¶
正規偏差から10✕6配列を作成してints(整数値)に変換する。
# Create a 10 by 6 array from normal deviates and convert to ints
n, nrows, ncols = 100, 10, 6
xs = np.random.normal(n, 15, size=(nrows, ncols)).astype('int')
xs
スライス表記を使用する。
# Use slice notation
print(xs[0,0])
print(xs[-1,-1])
print(xs[3,:])
print(xs[:,0])
print(xs[::2,::2])
print(xs[2:5,2:5])
整数値リストを使ってインデックス化する。
# Indexing with list of integers
print(xs[0, [1,2,4,5]])
# Boolean indexing
print(xs[xs % 2 == 0])
xs[xs % 2 == 0] = 0 # set even entries to zero
print(xs)
# Extracting lower triangular, diagonal and upper triangular matrices
a = np.arange(16).reshape(4,4)
print (a, '\n')
print (np.tril(a, -1), '\n')
print (np.diag(np.diag(a)), '\n')
print (np.triu(a, 1))
スポンサーリンク
Broadcasting, row, column and matrix operations¶
行、列、全行列に対する演算
# operations across rows, cols or entire matrix
print(xs.max())
print(xs.max(axis=0)) # max of each col
print(xs.max(axis=1)) # max of each row
# A funcitonal rather than object-oriented approacha also wokrs
print(np.max(xs, axis=0))
print(np.max(xs, axis=1))
# broadcasting
xs = np.arange(12).reshape(2,6)
print(xs, '\n')
print(xs * 10, '\n')
# broadcasting just works when doing column-wise operations
col_means = xs.mean(axis=0)
print(col_means, '\n')
print(xs + col_means, '\n')
# but needs a little more work for row-wise operations
row_means = xs.mean(axis=1)[:, np.newaxis]
print(row_means)
print(xs + row_means)
# convert matrix to have zero mean and unit standard deviation
# using col summary statistics
print((xs - xs.mean(axis=0))/xs.std(axis=0))
# convert matrix to have zero mean and unit standard deviation
# using row summary statistics
print((xs - xs.mean(axis=1)[:, np.newaxis])/xs.std(axis=1)[:, np.newaxis])
# broadcasting for outer product
# e.g. create the 12x12 multiplication toable
u = np.arange(1, 13)
u[:,None] * u[None,:]
下記のポイント間のペアワイズ距離行列を計算する。
- (0,0)
- (4,0)
- (4,3)
- (0,3)
def distance_matrix_py(pts):
"""Returns matrix of pairwise Euclidean distances. Pure Python version."""
n = len(pts)
p = len(pts[0])
m = np.zeros((n, n))
for i in range(n):
for j in range(n):
s = 0
for k in range(p):
s += (pts[i,k] - pts[j,k])**2
m[i, j] = s**0.5
return m
def distance_matrix_np(pts):
"""Returns matrix of pairwise Euclidean distances. Vectorized numpy version."""
return np.sum((pts[None,:] - pts[:, None])**2, -1)**0.5
pts = np.array([(0,0), (4,0), (4,3), (0,3)])
distance_matrix_py(pts)
distance_matrix_np(pts)
# Broaccasting and vectorization is faster than looping
%timeit distance_matrix_py(pts)
%timeit distance_matrix_np(pts)
python loopよりもnumpy vectorizationの方が処理がかなり高速だ。
スポンサーリンク
スポンサーリンク