TypeError: must be str, not bytesが出た時の対処法

このサイトのチュートリアルを学習していたら、以下のようなエラーが吐き出された。

mscoco = json.load(open('annotations/captions_train2014.json'))
captionStrings = ['[START] ' + entry['caption'].encode('ascii') for entry in mscoco['annotations']]

print('Number of sentences', len(captionStrings))
print('First sentence in the list', captionStrings[0])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-20f87e91bb2c> in <module>()
      1 mscoco = json.load(open('annotations/captions_train2014.json'))
----> 2 captionStrings = ['[START] ' + entry['caption'].encode('ascii') for entry in mscoco['annotations']]
      3 
      4 print('Number of sentences', len(captionStrings))
      5 print('First sentence in the list', captionStrings[0])

<ipython-input-1-20f87e91bb2c> in <listcomp>(.0)
      1 mscoco = json.load(open('annotations/captions_train2014.json'))
----> 2 captionStrings = ['[START] ' + entry['caption'].encode('ascii') for entry in mscoco['annotations']]
      3 
      4 print('Number of sentences', len(captionStrings))
      5 print('First sentence in the list', captionStrings[0])

TypeError: must be str, not bytes

つまり、strとbytesはconcatenateできないということで、bytesをstrに変換する必要がある。

a = [entry['caption'].encode('ascii') for entry in mscoco['annotations']]
print(a[0])
b'A very clean and well decorated empty bathroom'

この場合、.encode(‘ascii’)を取り去ればstrに変換される。というかbytesに変換されない。

a = [entry['caption'] for entry in mscoco['annotations']]
print(a[0])
A very clean and well decorated empty bathroom
mscoco = json.load(open('annotations/captions_train2014.json'))
captionStrings = ['[START] ' + entry['caption'] for entry in mscoco['annotations']]

print('Number of sentences', len(captionStrings))
print('First sentence in the list', captionStrings[0])
Number of sentences 414113
First sentence in the list [START] A very clean and well decorated empty bathroom

恐らくはpython2とpython3の違いかと思われる。

スポンサーリンク