人工知能はユーモア(皮肉)を理解できるのか？

人工知能が人間らしくなるには、人間の感情を模倣できなくてはなりません。わざわざ不完全な人間の脳を完璧に真似る必要はないと思うのですが、ただ、笑いのセンスはあった方がいいのかもしれません。人工知能コメディアンとかいれば面白いし、AIにギャグを考えさせるのも乙なものです。そう言った意味からも、AIはある程度は人間らしくあるべきなのでしょうが、ミスばかりするところまで似てしまったら困りもんです。

皮肉が分かるコンピュータ

How Vector Space Mathematics Helps Machines Spot Sarcasm

Back in 1970, the social activist Irina Dunn scribbled a slogan on the back of a toilet cubicle door at the University of Sydney. It said: “A woman needs a man like a fish needs a bicycle.” The phrase went viral and eventually became a famous refrain for the growing feminist movement of the time.

「1970年に、社会運動家のイリナ・ダンは、シドニー大学の個室トイレの扉の裏にスローガンを落書きしました。それは、”女は、魚が自転車が必要なように、男が必要です”という内容のものでした。そのフレーズは瞬く間に世間に拡散し、最終的に、当時盛り上がっていた女性解放運動の有名な決まり文句になりました。」

便所の落書きがネットのない時代に全世界に拡散するのは凄いですが、今は便所の落書きと言われている某掲示板が存在するのが皮肉でもあります。

The phrase is also an example of sarcasm. The humor comes from the fact that a fish doesn’t need a bicycle. Most humans have little trouble spotting this. But while various advanced machine learning techniques have helped computers spot other forms of humor, sarcasm still largely eludes them.

「そのフレーズは皮肉の一例でもあります。魚が自転車に乗らないことがユーモアになっています。人間ならそのユーモアに気付きますが、数多の高度な機械学習テクがコンピュータに他の種類のユーモアを理解するよう支援していますが、皮肉だけは未だにほとんど理解することができないでいます。」

皮肉を言い当てるのは人間でもかなり難しいので、コンピューターに理解させるのは至難な業かもしれません。そもそもユーモアのセンスがないと、皮肉を皮肉と受け取れないので、機械的にどう学習させるのか興味深いです。

These other forms of humor can be spotted by looking for, say, positive verbs associated with negative or undesirable situation. And some researchers have used this approach to look for sarcasm.

「他の形態のユーモアは、例えば、マイナスまたは不快を連想させるポジティブな動詞を探すことで言い当てられます。また、一部の研究者達は、このアプローチを皮肉を探すために使っています。」

汚いけど美味い、嫌だけど頑張るみたいなユーモアのことでしょう。

But sarcasm is often devoid of sentiment. The phrase above is a good example—it contains no sentiment-bearing words. So a new strategy is clearly needed if computers are ever to spot this kind of joke.

「でも皮肉は意味に欠けています。上のフレーズが良い例で、それには意味的な言葉は含まれていません。なので、コンピュータが、この種のジョークに気付くようになるには、新しい手法が明らかに必要です。」

魚が自転車が必要とか全くワケワカメです。そもそも足のある魚は少ないし、魚が自転車に乗れるのかという問題もあります。まともに意味を考えていたら、コンピュータも何のこっちゃとなってしまい、分析不能に陥ります。

Today, Aditya Joshi at the Indian Institute of Technology Bombay in India, and a few pals, say they’ve hit on just such a strategy. They say their new approach dramatically improves the ability of computers to spot sarcasm.

「現在、インド工科大学ボンベイ校のAditya Joshiと少数の仲間が、ちょうどそのような方法を思い付いたと言っています。その新たな手法は、コンピュータが皮肉を見極める能力を劇的に向上させると述べています。」

インド人のプログラミング技術は凄いものがあります。

Word2Vec

Their method is relatively straightforward. Instead analyzing the sentiment in a sentence, Joshi and co analyze the similarity of the words. They do this by studying the way words relate to each other in a vast database of Google News stories containing some three million words. This is known as the Word2Vec database.

「彼等のやり方は比較的単純です。文章中の意味を分析する代わりに、ジョシと仲間達は、言葉の類似点を解析しています。彼等は、約300万語を含むグーグルニュースストーリーの巨大なデータベースで、言葉が互いに関連し合う様を学習することでこれを行っています。これがWord2Vec データベースというやつです。」

Word2Vecは結構人気があるみたいで、ググると色々出てきます。グーグルのAIにかける本気度はかなりのものがあります。

This database has been analyzed extensively to determine how often words appear next to each other. This allows them to be represented as vectors in in a high dimensional space. It turns out that similar words can be represented by similar vectors and that vector space mathematics can capture simple relationships between them. For example, “king – man + woman = queen.”

「このデータベースは、どのくらいの頻度で単語が隣同士で現れるかを割り出すために、かなり広範にわたって解析を行っています。これが高次元空間で単語をベクトル化しています。似た単語が似たベクトルで表現でき、ベクトル空間数学が、単語間の単純な関係をとらえることを可能にしています。例えば、王-男+女=女王。」

王から男属性を取り除き女属性を足せば女王になるわけです。プリンスから男を引いて女を足せば、プリンセスです。

Although there are clear differences between the words “man” and “woman,” they occupy similar parts of the vector space. However, the words bicycle and fish occupy entirely different parts of the space and so are thought of as very different.

According to Joshi and co, sentences that contrast similar concepts with dissimilar ones are more likely to be sarcastic.

「男と女の単語の間には明らかな違いがありますが、それらはベクトル空間の相似部分を専有しています。しかし、単語、自転車と魚は、完全に異なるベクトル空間にあるので、全く違う物として考えられています。ジョシ達によると、類似コンセプトと相違コンセプトが対照をなしている文は、皮肉である可能性が高いです。」

男と女は類似概念ですが、魚と自転車は相違概念なので、対照をなしています。つまりこのような組み合わせを含む文が皮肉文と言えるようです。

To test this idea, they study the similarity between words in a database of quotes on the Goodreads website. The team chose only quotes that have been tagged “sarcastic” by readers and, as a control, also include quotes tagged as “philosophy.” This results in a database of 3,629 quotes, of which 759 are sarcastic. The team then compared the word vectors in each quote looking for similarities and differences

「この考えをテストするために、彼等は、Goodreadsサイト上の引用データベースの単語間の共通点を研究しています。チームは読者によって皮肉とタグ付けされている引用だけを選んで対照として、哲学とタグ付された引用を含めています。これが3629引用句のうちの759引用が皮肉というデータベースを作り出しています。チームはその後、共通点と相違点を探すのに、各引用文中の単語ベクトルを比較しました。」

Goodreadsというサイトも知りませんでしたが、こういうものを利用して安価に研究を成し遂げるインド人パワーには感服します。

The results make for interesting reading. Joshi and co say this word embedding approach is significantly better than other techniques at spotting sarcasm. “We observe an improvement in sarcasm detection,” they say.

The new approach isn’t perfect, of course. And the errors it makes are instructive. For example, it did not spot the sarcasm in the following quote: “Great. Relationship advice from one of America’s most wanted.”

「その結果は興味深い解釈をしています。ジョシ達は、この単語埋め込み方式が、皮肉を見つけ出すのには、他のテクニックよりもはるかに優れていると自負しています。”皮肉検知の向上を我々は観測しています。”と彼等は言っています。その新しい手法は、もちろん、完全ではありません。また、それが作るエラーは、教訓的です。例えば、以下の引用文の皮肉には気付きませんでした。”そりゃ凄い。全米屈指の指名手配犯の1人から恋愛に関するアドバイスを貰えるなんて”」

最重要指名手配犯から人間関係のアドバイスを頂けることはめったにありません。有り難いことです。人生の成功者の御高説は賜物です。

皮肉検知アルゴリズム

That’s probably because many of these words have multiple meanings that the Word2Vec embedding does not capture.

Another sarcastic sentence it fails to spot is: “Oh, and I suppose the apple ate the cheese.” In this case, apple and cheese have a high similarity score and none of the words pairs shows a meaningful difference. So this example does not follow the rule that the algorithm is designed to search for.

「これは恐らく、これらの単語の多くが、Word2Vecの埋め込みが捉え切れていない複数の意味を持っているためです。新手法では見極められない他の風刺文が、”おぉ、それで俺がりんごがチーズを食ったと思えと”、この文の場合、りんごとチーズは高い類似スコアを持ち、意味のある違いを示す単語の組み合わせは存在しません。なので、この例は、アルゴリズムが探すよう意図されているルールには従っていません。」

Word2Vec embedding = Word2Vec包埋、Word2Vecの組み込み

対照をなす単語の組み合わせがないのでどうしょうもありません。高い類似スコアと低い類似スコアを持つ必要があるので、この場合だと、単語が少ないというのもありますが、りんごがチーズを食べるとかイミフなので、機械には荷が重いです。

The algorithm also incorrectly identifies some sentences as sarcastic. Joshi and co point to this one, for example: “Oh my love, I like to vanish in you like a ripple vanishes in an ocean—slowly, silently and endlessly.”

Humans had not tagged this as sarcastic. However, it is not hard to imagine this sentence being used sarcastically.

「また、アルゴリズムは、皮肉としていくつかの文を誤って特定しています。ジョシ達はこの文を提示しています。例えば、”あぁ、私の愛しい人よ、私は海に消えるさざ波のようにあなたの中に消えたい、ゆっくり静かに延々と”。読者は、この文を皮肉とはタグ付していませんでした。でも、この文が皮肉的に使われても驚きません。」

not hard to imagine = 想像に難くない

確かに詩的というよりは、皮肉に聞こえなくもありません。きざったらしいというか、ストーカーっぽいというか、キモい文であるとも言えます。

Overall, this is interesting work which raises some directions for future research. In particular, it would be fascinating to use this kind of algorithm to create sarcastic sentences and perhaps use human judges to decide whether or not they work in this sense.

「全体的に見て、これは、将来のリサーチに対していくつかの指針を提起している、興味深い研究と言えます。特に、この種のアルゴリズムを、皮肉文を作るのに使ったり、あるいは、その意味で機能しているかどうかを判定するために、人間の審査員を使うのはかなり興味をそそるのではないでしょうか。」

コンピュータが作る皮肉文というのは面白そうです。機械に皮肉を作らせて、それを審査するとか、人間はどれだけ暇なんだよ、と皮肉る可能性もあります。

Beyond that is the task of computational humor itself. That’s an ambitious goal but perhaps one that is not entirely out of reach. Much humor is formulaic so an algorithm ought to be able to apply such a formula with ease. Yeah, right!

「それ以上は、自己コンピュータユーモアのタスクです。それは野心的なゴールではありますが、たぶん、完全に手が届かないというようなものではありません。多くのユーモアは定形型なので、アルゴリズムは、そのようなフォーミュラに容易に適応できるはずです。なわけねーだろ！」

究極形態はコンピュータお笑い芸人(stand up comedian)ですが、それはさすがに現段階では無理ゲーです。5年以内にコンピュータ風刺画家が誕生するのはまず間違いありません。それにしても、機械学習の分野はかなり面白みがあって、やり甲斐のある分野なのがよく分かります。ワトソン君みたいな人工知能を育てる企業が他に出てくるのが楽しみだし、自分でマシンを鍛えてみるのも楽しいかもしれません。