Python:rayの超初歩チュートリアル

今回は、このサイトのrayのtutorialをやる。rayはコードをパラレルで実行するモジュールらしく、これを使うと処理を高速化できるようだ。

スポンサーリンク

環境設定

先ずはチュートリアル用のフォルダーを作成して移動する。

!mkdir ray
cd ray
/home/workspace/ray

チュートリアルサイトをgitクローンする。

!git clone https://github.com/ray-project/tutorial.git
Cloning into 'tutorial'...
remote: Counting objects: 436, done.
remote: Compressing objects: 100% (59/59), done.
remote: Total 436 (delta 25), reused 83 (delta 24), pack-reused 351
Receiving objects: 100% (436/436), 238.39 KiB | 0 bytes/s, done.
Resolving deltas: 100% (226/226), done.
Checking connectivity... done.

チュートリアルフォルダに移動する。

cd tutorial
/home/workspace/ray/tutorial

Exercise 1 – Simple Data Parallel Example

最初にエクササイズに必要なモジュールをインポートする。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import ray
import time
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-18-c439dcfb2e83> in <module>()
      3 from __future__ import print_function
      4 
----> 5 import ray
      6 import time

ModuleNotFoundError: No module named 'ray'

肝心のrayがインストールされていないのでインストールする。

!pip3 install ray
Collecting ray
  Downloading https://files.pythonhosted.org/packages/48/c4/6faa8b99a4ce110b99c441a5c9ffcafd9ed51ddaa0bd603a55dc519abcc8/ray-0.5.2-cp36-cp36m-manylinux1_x86_64.whl (57.8MB)
    100% |################################| 57.8MB 1.4MB/s  eta 0:00:01    30% |#########                       | 17.5MB 102.3MB/s eta 0:00:01    84% |##########################      | 48.7MB 79.0MB/s eta 0:00:01
Collecting colorama (from ray)
  Downloading https://files.pythonhosted.org/packages/db/c8/7dcf9dbcb22429512708fe3a547f8b6101c0d02137acbd892505aee57adf/colorama-0.3.9-py2.py3-none-any.whl
Collecting redis (from ray)
  Downloading https://files.pythonhosted.org/packages/3b/f6/7a76333cf0b9251ecf49efff635015171843d9b977e4ffcf59f9c4428052/redis-2.10.6-py2.py3-none-any.whl (64kB)
    100% |################################| 71kB 34.7MB/s ta 0:00:01
Collecting funcsigs (from ray)
  Cache entry deserialization failed, entry ignored
  Using cached https://files.pythonhosted.org/packages/69/cb/f5be453359271714c01b9bd06126eaf2e368f1fddfff30818754b5ac2328/funcsigs-1.0.2-py2.py3-none-any.whl
Requirement already satisfied: six>=1.0.0 in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from ray) (1.11.0)
Requirement already satisfied: pytest in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from ray) (3.6.3)
Requirement already satisfied: pyyaml in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from ray) (3.12)
Collecting psutil (from ray)
  Downloading https://files.pythonhosted.org/packages/7d/9a/1e93d41708f8ed2b564395edfa3389f0fd6d567597401c2e5e2775118d8b/psutil-5.4.7.tar.gz (420kB)
    100% |################################| 430kB 3.1MB/s eta 0:00:01
Collecting flatbuffers (from ray)
  Downloading https://files.pythonhosted.org/packages/71/7b/ac7d5e6f9ad084b6ae9d2db845eee58bf9ee95b3d898ccdde35deb022f84/flatbuffers-2015.12.22.1-py2.py3-none-any.whl
Requirement already satisfied: click in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from ray) (6.7)
Requirement already satisfied: numpy in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from ray) (1.14.4)
Requirement already satisfied: atomicwrites>=1.0 in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from pytest->ray) (1.1.5)
Requirement already satisfied: setuptools in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from pytest->ray) (39.0.1)
Requirement already satisfied: pluggy<0.7,>=0.5 in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from pytest->ray) (0.6.0)
Requirement already satisfied: attrs>=17.4.0 in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from pytest->ray) (18.1.0)
Requirement already satisfied: py>=1.5.0 in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from pytest->ray) (1.5.4)
Requirement already satisfied: more-itertools>=4.0.0 in /root/.pyenv/versions/3.6.5/envs/py365/lib/python3.6/site-packages (from pytest->ray) (4.2.0)
Building wheels for collected packages: psutil
  Running setup.py bdist_wheel for psutil ... done
  Stored in directory: /root/.cache/pip/wheels/e2/9d/ea/1913d16f19bb927c32197308dec69cd8d10b61be8f7e265524
Successfully built psutil
Installing collected packages: colorama, redis, funcsigs, psutil, flatbuffers, ray
Successfully installed colorama-0.3.9 flatbuffers-2015.12.22.1 funcsigs-1.0.2 psutil-5.4.7 ray-0.5.2 redis-2.10.6
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import ray
import time

rayをイニシャライズする。

ray.init(num_cpus=12, ignore_reinit_error=True)
Process STDOUT and STDERR is being redirected to /tmp/raylogs/.
Waiting for redis server at 127.0.0.1:54009 to respond...
Waiting for redis server at 127.0.0.1:35671 to respond...
Starting local scheduler with the following resources: {'CPU': 12, 'GPU': 1}.

======================================================================
View the web UI at http://localhost:8889/notebooks/ray_ui42527.ipynb?token=0dd1665c2947364adbea56de5f963f965adaf061914f0629
======================================================================

{'node_ip_address': '172.17.0.2',
 'redis_address': '172.17.0.2:54009',
 'object_store_addresses': [ObjectStoreAddress(name='/tmp/plasma_store87072214', manager_name='/tmp/plasma_manager12344662', manager_port=20952)],
 'local_scheduler_socket_names': ['/tmp/scheduler68709243'],
 'raylet_socket_names': [],
 'webui_url': 'http://localhost:8889/notebooks/ray_ui42527.ipynb?token=0dd1665c2947364adbea56de5f963f965adaf061914f0629'}

下の遅いコードをrayを使って高速化する。

# This function is a proxy for a more interesting and computationally
# intensive function.
def slow_function(i):
    time.sleep(1)
    return i
# Sleep a little to improve the accuracy of the timing measurements below.
# We do this because workers may still be starting up in the background.
time.sleep(2.0)
start_time = time.time()

results = [slow_function(i) for i in range(4)]

end_time = time.time()
duration = end_time - start_time

print('The results are {}. This took {} seconds. Run the next cell to see '
      'if the exercise was done correctly.'.format(results, duration))
The results are [0, 1, 2, 3]. This took 4.004942417144775 seconds. Run the next cell to see if the exercise was done correctly.
@ray.remote
def f(i):
    time.sleep(1)
    return i
# Sleep a little to improve the accuracy of the timing measurements below.
# We do this because workers may still be starting up in the background.
time.sleep(2.0)
start_time = time.time()

results = ray.get([f.remote(i) for i in range(4)])

end_time = time.time()
duration = end_time - start_time

print('The results are {}. This took {} seconds. Run the next cell to see '
      'if the exercise was done correctly.'.format(results, duration))
The results are [0, 1, 2, 3]. This took 1.0047783851623535 seconds. Run the next cell to see if the exercise was done correctly.
assert results == [0, 1, 2, 3], 'Did you remember to call ray.get?'
assert duration < 1.1, ('The loop took {} seconds. This is too slow.'
                        .format(duration))
assert duration > 1, ('The loop took {} seconds. This is too fast.'
                      .format(duration))

print('Success! The example took {} seconds.'.format(duration))
Success! The example took 1.0047783851623535 seconds.

とりあえずはこのサイトを参考にしてエクササイズをするといいらしい。