お決まりのモジュールのインポートです。

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import multivariate_normal, permutation
import pandas as pd
from pandas import DataFrame, Series

np.random.seed(20170426)

平均値-1.5、分散1.3の正規乱数を100個発生させて青のデータセットとします。train_tは[1,0,0]となっています。

平均値0.5、分散0.8の正規乱数を100個発生させて赤のデータセットとします。train_tは[0,1,0]となっています。

平均値3.5、分散1.2の正規乱数を100個発生させて緑のデータセットとします。train_tは[0,0,1]となっています。

permutationを使って順番を入れ替えています。

#青のデータセット
num_train0=100
x0=np.random.normal(-1.5,1.3,num_train0)
t0=np.zeros((num_train0,3))
df0=DataFrame(np.c_[x0,t0],columns=['x','blue','red','green'])
df0['blue']=1

#赤のデータセット
num_train1=100
x1=np.random.normal(0.5,0.8,num_train1)
t1=np.zeros((num_train1,3))
df1=DataFrame(np.c_[x1,t1],columns=['x','blue','red','green'])
df1['red']=1

#緑のデータセット
num_train2=100
x2=np.random.normal(3.5,1.2,num_train2)
t2=np.zeros((num_train2,3))
df2=DataFrame(np.c_[x2,t2],columns=['x','blue','red','green'])
df2['green']=1
#繋ぎあわせて一つのデータフレームにします
df=pd.concat([df0,df1,df2],ignore_index=True)
df=df.reindex(permutation(df.index)).reset_index(drop=True)
#データフレームから行列変換します。train_tはOne-Hot表現にしています。
train_x=df['x'].as_matrix().reshape([len(df),1])
train_t=df[['blue','red','green']].as_matrix().reshape([len(df),3])

冗長ですが散布図を描いてみました。縦軸に意味はありません。

値の小さい方から青→赤→緑となっていますが、その境目は渾然としています。

この境界をソフトマックス関数で決定しようというのが狙いです。

fig = plt.figure(figsize=(3,3))
subplot = fig.add_subplot(1,1,1)
subplot.set_xlim([-5,5])
subplot.set_ylim([-1,1.5])
red=df[df.red==1]
subplot.scatter(red.x,-0.5*red.red+0.05, color='r', marker='o')
blue=df[df.blue==1]
subplot.scatter(blue.x,-0.5*blue.blue-0.05, color='b', marker='o')
green=df[df.green==1]
subplot.scatter(green.x,-0.5*green.green, color='g', marker='o')

<matplotlib.collections.PathCollection at 0x6d28150>

xは300×1の行列です。300はデータセットの標本数です。

青、赤、緑の3つに分類するので、wとw0は3列となります。これは未知数です。

wとw0に適当な数値を仮定してf=x*w+w0という一次関数を計算します。

これにソフトマックス関数を適用することで、仮定したw,w0に対応する確率pを求めます。

プレースホルダーtにはtrain_tが代入されることとなります。

例えば実際に「青」であった場合t=[1 0 0]です。

t*log(p)を計算することにより、仮定したw,w0により「青」と推定された確率p[0]だけが意味を持つこととなります。

これの総和(reduce_sum)が最も良い値となるようにトレーニングを実施します。

x=tf.placeholder(tf.float32,[None,1])
w=tf.Variable(tf.zeros([1,3]))
w0=tf.Variable(tf.zeros([3]))
f=tf.matmul(x,w)+w0
p=tf.nn.softmax(f)

t=tf.placeholder(tf.float32,[None,3])
loss=-tf.reduce_sum(t*tf.log(p))
train_step=tf.train.AdamOptimizer().minimize(loss)

sess=tf.Session()
sess.run(tf.initialize_all_variables())

10000回トレーニングを実施します。

テストデータとして-5から5を50分割したtest_xを作成しました。

このテストデータで、青赤緑それぞれの確率を計算して描画します

#10000回トレーニングを実施します。
for _ in range(10000):
    sess.run(train_step,feed_dict={x:train_x, t:train_t})
    
# テストデータとして-5から5を50分割したtest_xを作成します
# このテストデータで、青赤緑それぞれの確率を計算して描画します
test_x=np.linspace(-5,5,50).reshape([50,1])
p_val=sess.run(p,feed_dict={x:test_x})
line=DataFrame(np.c_[test_x,p_val],columns=['x','blue','red','green'])

fig = plt.figure(figsize=(3,3))
subplot = fig.add_subplot(1,1,1)
subplot.set_xlim([-5,5])
subplot.set_ylim([-1,1.5])
subplot.plot(line.x,line.blue,color='b')
subplot.plot(line.x,line.red,color='r')
subplot.plot(line.x,line.green,color='g')

red=df[df.red==1]
subplot.scatter(red.x,-0.5*red.red+0.05, color='r', marker='o')
blue=df[df.blue==1]
subplot.scatter(blue.x,-0.5*blue.blue-0.05, color='b', marker='o')
green=df[df.green==1]
subplot.scatter(green.x,-0.5*green.green, color='g', marker='o')

<matplotlib.collections.PathCollection at 0x6d22710>

値の小さいうち、例えば-5から-3位までは青がほぼ100%で、赤や緑は0%です。

実際そのような値の赤点や緑点は存在しません。

赤点が存在し始める（より、やや小さな）-2位から赤の確率が上昇してきます。

同時に青の確率が下降し、概ね-0.2で両者は交差します

値の大きな青もあれば値の小さな赤もあるのですが、その境界は-0.2であったと断言することができるわけです。

同様に赤と緑の境界は+1.8であったと言い切れます。

このようにソフトマックス関数を適用することで渾然としたデータの境界を決定することが可能となります。

モジュールをインポートします。

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

Placeholder x を定義します。

x = tf.placeholder(tf.float32, [None, 3])

Variable w を定義します。

w = tf.Variable(tf.zeros([3, 1]))

計算式 y を定義します。

y = tf.matmul(x, w)

Placeholder t を定義します。

t = tf.placeholder(tf.float32, [None, 1])

誤差関数 loss を定義します。

loss = tf.reduce_sum(tf.square(y-t))

トレーニングアルゴリズム train_step を定義します。

train_step = tf.train.AdamOptimizer().minimize(loss)

セッションを用意して、Variableを初期化します。

sess = tf.Session()
sess.run(tf.initialize_all_variables())

トレーニングセットのデータを用意します。（実測値）

train_t = np.array([5.2, 5.7, 8.6, 14.9, 18.2, 20.4,
                    25.5, 26.4, 22.8, 17.5, 11.1, 6.6])
train_t = train_t.reshape([12,1])

トレーニングセットのデータを用意します。（モデル）

# (温度)=sin(30度*month+位相)+定数　でモデル化
train_x = np.zeros([12, 3])
cur=np.zeros([12,1])
for month in [1,2,3,4,5,6,7,8,9,10,11,12]:
    train_x[month-1][0]=1 #　定数
    train_x[month-1][1]=np.sin(30*month*3.141592/180) # 　周期1年のSIN関数
    train_x[month-1][2]=np.cos(30*month*3.141592/180) # 　周期1年のCOS関数

勾配降下法によるパラメーターの最適化を25000回繰り返します。

i = 0
sess.run(tf.initialize_all_variables())
for _ in range(25000):
    i += 1
    sess.run(train_step, feed_dict={x:train_x, t:train_t})
    if i % 1000 == 0:
        loss_val = sess.run(loss, feed_dict={x:train_x, t:train_t})
        print ('Step: %d, Loss: %f' % (i, loss_val))
print sess.run(w)

Step: 1000, Loss: 2935.489990
Step: 2000, Loss: 2488.358643
Step: 3000, Loss: 2093.184082
Step: 4000, Loss: 1744.432251
Step: 5000, Loss: 1438.418945
Step: 6000, Loss: 1172.378418
Step: 7000, Loss: 943.706543
Step: 8000, Loss: 749.378540
Step: 9000, Loss: 585.566284
Step: 10000, Loss: 447.873596
Step: 11000, Loss: 332.510803
Step: 12000, Loss: 237.278015
Step: 13000, Loss: 161.087250
Step: 14000, Loss: 102.957718
Step: 15000, Loss: 61.575798
Step: 16000, Loss: 35.087509
Step: 17000, Loss: 20.843044
Step: 18000, Loss: 15.194585
Step: 19000, Loss: 13.906918
Step: 20000, Loss: 13.797448
Step: 21000, Loss: 13.795701
Step: 22000, Loss: 13.795698
Step: 23000, Loss: 13.795697
Step: 24000, Loss: 13.795696
Step: 25000, Loss: 13.795695
[[ 15.24164963]
 [ -6.8297267 ]
 [ -7.76318884]]

トレーニング後のパラメーターを用いて、予測気温を計算する関数を定義します。

def predict(x):
    result = w[0] + w[1]*np.sin(30*x*3.141592/180) \
    + w[2]*np.cos(30*x*3.141592/180)
    return result

予測気温のグラフを描きます。

fig = plt.figure()
subplot = fig.add_subplot(1,1,1)
subplot.set_xlim(1,12)
subplot.scatter(range(1,13), train_t)
linex = np.linspace(1,12,100)
liney = sess.run(predict(linex))
subplot.plot(linex, liney)

[<matplotlib.lines.Line2D at 0x6d7d1d0>]

あるStray Engineer の日記

隠れ層（中間層）を用いた3色分類

tensorflowによる3色分類の実施

tensorflowで三角関数近似

robots統計

ubuntu16とMySQL5.7

比率に応じた乱数の発生

gnurobots