2017年5月20日 星期六

深度學習(4)--使用Tensorflow實現類Lenet5手寫數字辨識


    這一節介紹一完整的手寫數字辨識的範例,使用Tensorflow來實現類似Lenet5的架構。除了使用MNIST數據集來做訓練與測試外,我們將訓練好的模型儲存起來,並用微軟小畫家自行手寫幾張數字來進行實際的辨識預測,最後使用Kaggle網站上的手寫數字數據進行預測,並將結果上傳至kaggle網站上打分數,得到預測的成績。

   本文主要提供程式實作,不會再提到太多相關的理論,有關相關的理論,大多可以在最底下之前提過的文章或參考資料閱讀相關理論。這個程式的建立環境是python 3.5 +tensorflow1.10 win10底下使用GPU加速運算,估計跑在Linux或是純CPU運算應該也沒太大問題。


首先來回顧Lenet5的經典架構,如下<圖一>
Lenet5模型是Yann LeCun教授於1998年在論文Gradient-based learning applied to document recognition中提出的,它是第一個成功應用於數字識別問題的卷積神經網路。在MNIST數據集上,Lenet5模型可以達到大約99.2%的正確率,Lenet5模型總共有7層。

有關CNN卷積神經網路的介紹可參閱之前文章:
深度學習(2)--使用Tensorflow實作卷積神經網路(Convolutional neural network,CNN)

<圖一>Lenet5架構


第一層 卷積層(Convolution)
    這一層的輸入就是原始圖像像素,Lenet5接受的輸入層大小為32x32x1。第一個卷積層filter的尺寸為5x5,深度為6,不使用zero padding,步長(Stride)為1,故這一層輸出的尺寸為32-5+1=28,深度為6。總共有5x5x1x6+6=156個參數,其中6為bias偏誤。因下一層節點矩陣有28x28x6=4704個節點,每個節點和5x5=25個節點相連,所以本層卷積層總共有4707x(25+1)=122304個連接。

第二層 池化層(Pooling)
    這一層的輸入為第一層的輸出,是一28x28x6的節點矩陣,本層採用的filter為2x2,stride=2,故輸出矩陣大小為14x14x6。

第三層 卷積層(Convolution)
    本層的輸入矩陣大小為14x14x6,使用的filter為5x5,深度為16,不使用Zero padding,stride=1,本層的輸出矩陣大小為10x10x6。因此本層應該有5x5x6x16+16=2416個參數,
10x10x16x(25+1)=41600個連接。

第四層 池化層(Pooling)
    本層的輸入矩陣大小為10x10x16,採用filter為2x2,stride為2,本層的輸出矩陣大小為5x5x16。

第五層 卷積層(Convolution)
    本層的輸入矩陣大小為5x5x16,在Lenet5模型的論文終將這一層稱為卷積層,但是因為filter大小即為5x5,所以和全連接層沒有區別,如果將5x5x16節點拉成一個向量,那麼這一層就和全連接層一樣,本層的輸出節點數為120,總共有5x5x16x120+120=48120個參數。

第六層 全連接層
    本層的輸入節點數為120個,輸出節點數為84個,參數總共為120x84+84=10164個。

第七層 全連接層
    本層的輸入節點數為84個,輸出節點為10個,參數總共為84x10+10=850個。

底下會使用Tensorflow來實現類似Lenet5的架構如下<圖二>
在卷積層因為使用了Zero padding所以,圖像尺寸不會縮小
在程式裡是以padding='SAME'來宣告。
conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')

<圖二>



首先我們用Tensorflow定義了該架構的向前傳遞演算,請看向前傳遞演算程式<lenet5_infernece.py>
該範例完整程式可從Github下載:

我們將向前傳遞演算,訓練演算,及測試分別寫在不同的程式檔案,並將訓練好的模型及參數儲存可反覆使用,可直接預測結果而不須每次都需要再進行訓練的動作。

底下列出向前傳遞演算程式<lenet5_infernece.py>,可從程式裡看出圖二架構每一層的定義。

向前傳遞演算程式<lenet5_infernece.py>:

import tensorflow as tf

INPUT_NODE = 784
OUTPUT_NODE = 10

IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10

CONV1_DEEP = 32
CONV1_SIZE = 5

CONV2_DEEP = 64
CONV2_SIZE = 5

FC_SIZE = 512

def inference(input_tensor, train, regularizer):
 with tf.variable_scope('layer1-conv1'):
  conv1_weights = tf.get_variable(
   "weight", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
   initializer=tf.truncated_normal_initializer(stddev=0.1))
  conv1_biases = tf.get_variable("bias", [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
  conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
  relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))

 with tf.name_scope("layer2-pool1"):
  pool1 = tf.nn.max_pool(relu1, ksize = [1,2,2,1],strides=[1,2,2,1],padding="SAME")

 with tf.variable_scope("layer3-conv2"):
  conv2_weights = tf.get_variable(
   "weight", [CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],
   initializer=tf.truncated_normal_initializer(stddev=0.1))
  conv2_biases = tf.get_variable("bias", [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
  conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
  relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))

 with tf.name_scope("layer4-pool2"):
  pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
  pool_shape = pool2.get_shape().as_list()
  nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
  #reshaped = tf.reshape(pool2, [pool_shape[0], nodes])
  reshaped = tf.reshape(pool2, [-1, nodes])

 with tf.variable_scope('layer5-fc1'):
  fc1_weights = tf.get_variable("weight", [nodes, FC_SIZE],
           initializer=tf.truncated_normal_initializer(stddev=0.1))
  if regularizer != None: tf.add_to_collection('losses', regularizer(fc1_weights))
  fc1_biases = tf.get_variable("bias", [FC_SIZE], initializer=tf.constant_initializer(0.1))

  fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)
  if train: fc1 = tf.nn.dropout(fc1, 0.7)

 with tf.variable_scope('layer6-fc2'):
  fc2_weights = tf.get_variable("weight", [FC_SIZE, NUM_LABELS],
           initializer=tf.truncated_normal_initializer(stddev=0.1))
  if regularizer != None: tf.add_to_collection('losses', regularizer(fc2_weights))
  fc2_biases = tf.get_variable("bias", [NUM_LABELS], initializer=tf.constant_initializer(0.1))
  logit = tf.matmul(fc1, fc2_weights) + fc2_biases

 return logit


底下列出訓練演算演算程式<lenet5_train.py>

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
import struct
import numpy as np
from matplotlib import pyplot as plt
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import cv2,csv
import lenet5_infernece

   
def encode_labels( y, k):
 """Encode labels into one-hot representation
 """
 onehot = np.zeros((y.shape[0],k ))
 for idx, val in enumerate(y):
  onehot[idx,val] = 1.0  ##idx=0~xxxxx,if val =3 ,表示欄位3要設成1.0
 return onehot

def load_mnist(path, kind='train'):
 """Load MNIST data from `path`"""
 if kind=='train':
  labels_path=os.path.abspath('D:\\pycode35\\AI\\mnist\\train-labels.idx1-ubyte')  
  images_path=os.path.abspath('D:\\pycode35\\AI\\mnist\\train-images.idx3-ubyte')
 else:
  labels_path=os.path.abspath('D:\\pycode35\\AI\\mnist\\t10k-labels.idx1-ubyte')  
  images_path=os.path.abspath('D:\\pycode35\\AI\\mnist\\t10k-images.idx3-ubyte')
 
 with open(labels_path, 'rb') as lbpath:
  magic, n = struct.unpack('>II',
         lbpath.read(8))
  labels = np.fromfile(lbpath,
        dtype=np.uint8)

 with open(images_path, 'rb') as imgpath:
  magic, num, rows, cols = struct.unpack(">IIII",
              imgpath.read(16))
  images = np.fromfile(imgpath,
        dtype=np.uint8).reshape(len(labels), 784)

 return images, labels

BATCH_SIZE = 100 
LEARNING_RATE_BASE = 0.01
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 20000
MOVING_AVERAGE_DECAY = 0.99 
MODEL_SAVE_PATH = "./lenet5/"
MODEL_NAME = "lenet5_model"
INPUT_NODE = 784
OUTPUT_NODE = 10
IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10
display_step = 100
learning_rate_flag=True


def train(X_train,y_train_lable,X_test,y_test_lable):
 shuffle=True
 batch_idx=0
 
 batch_len =int( X_train.shape[0]/BATCH_SIZE)
 test_batch_len =int( X_test.shape[0]/BATCH_SIZE)
 test_acc=[]
 train_acc=[]
 train_idx=np.random.permutation(batch_len)#打散btach_len=600 group
 # 定義輸出為4維矩陣的placeholder
 x_ = tf.placeholder(tf.float32, [None, INPUT_NODE],name='x-input') 
 x = tf.reshape(x_, shape=[-1, 28, 28, 1])
 
 y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')
 
 regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
 y = lenet5_infernece.inference(x,True,regularizer)
 global_step = tf.Variable(0, trainable=False)

 # Evaluate model
 pred_max=tf.argmax(y,1)
 y_max=tf.argmax(y_,1)
 correct_pred = tf.equal(pred_max,y_max)
 accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
 
 # 定義損失函數、學習率、及訓練過程。

 cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
 cross_entropy_mean = tf.reduce_mean(cross_entropy)
 loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))
 if learning_rate_flag==True:
  learning_rate = tf.train.exponential_decay(
   LEARNING_RATE_BASE,
   global_step,
   X_train.shape[0] / BATCH_SIZE, LEARNING_RATE_DECAY,
   staircase=True)
 else: 
  learning_rate = 0.001 #Ashing test
 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

 # 初始化TensorFlow持久化類。
 saver = tf.train.Saver()
 with tf.Session() as sess:
  tf.global_variables_initializer().run()
  
  step = 1
  print ("Start  training!")
  while step < TRAINING_STEPS:
   #batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE)
   if shuffle==True:
    batch_shuffle_idx=train_idx[batch_idx]
    batch_xs=X_train[batch_shuffle_idx*BATCH_SIZE:batch_shuffle_idx*BATCH_SIZE+BATCH_SIZE]
    batch_ys=y_train_lable[batch_shuffle_idx*BATCH_SIZE:batch_shuffle_idx*BATCH_SIZE+BATCH_SIZE] 
   else:
    batch_xs=X_train[batch_idx*BATCH_SIZE:batch_idx*BATCH_SIZE+BATCH_SIZE]
    batch_ys=y_train_lable[batch_idx*BATCH_SIZE:batch_idx*BATCH_SIZE+BATCH_SIZE]
  
   if batch_idx
#mnist = input_data.read_data_sets("./mnist", one_hot=True)
以上一般在tensorflow裡讀取MNIST的數據集程式如上式,讀取完後,資料已經完成了常態分布,或標準分布以及onehot編碼,但是如果我們有自己的數據要做同樣的處理該如何做呢?
所以我用了以下程式取代原本內建的讀取資料。

底下讀取MNIST資料,請先下載MNIST數據放置相對路徑或絕對路徑

X_train, y_train = load_mnist('..\mnist', kind='train')
print('X_train Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1])) 

X_test, y_test = load_mnist('mnist', kind='t10k')
print('X_test Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1]))

使用常態分布,在這裡使用了sklearn的MinMaxScaler()函式

mms=MinMaxScaler()
X_train=mms.fit_transform(X_train)
X_test=mms.transform(X_test)

或是也可使用標準化分布,可使用sklearn的MinMaxScaler()函式
#stdsc=StandardScaler()
#X_train=stdsc.fit_transform(X_train)
#X_test=stdsc.transform(X_test)

將label轉成onehot編碼形式,也可使用Sklearn內建的函式OneHotEncoder()來完成
y_train_lable = encode_labels(y_train,10)
y_test_lable = encode_labels(y_test,10)
print("y_train_lable.shape=",y_train_lable.shape)
print("y_test_lable.shape=",y_test_lable.shape)

底下是選擇固定的learing rate或是指數型調整的learning rate,經實驗選擇指數型調整的learning rate,訓練效果較好,指數型調整的learning rate會隨著訓練的週期數作調整。
if learning_rate_flag==True:
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
X_train.shape[0] / BATCH_SIZE, LEARNING_RATE_DECAY,
staircase=True)
else:
learning_rate = 0.001 #Ashing test
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

底下shuffle會選擇隨機的batch 組做訓練,否則就會按順序選擇每一個batch,由於我們改成了用自己的方式讀進資料訓練,故對於資料集batch的選擇也須做一些改變。

if shuffle==True:
batch_shuffle_idx=train_idx[batch_idx]
batch_xs=X_train[batch_shuffle_idx*BATCH_SIZE:batch_shuffle_idx*BATCH_SIZE+BATCH_SIZE]
batch_ys=y_train_lable[batch_shuffle_idx*BATCH_SIZE:batch_shuffle_idx*BATCH_SIZE+BATCH_SIZE]
else:
batch_xs=X_train[batch_idx*BATCH_SIZE:batch_idx*BATCH_SIZE+BATCH_SIZE]
batch_ys=y_train_lable[batch_idx*BATCH_SIZE:batch_idx*BATCH_SIZE+BATCH_SIZE]

透過底下程式可以將訓練的圖,節點,變數數據儲存以利反覆直接使用,它會存成下列檔案<圖三>。

# 初始化TensorFlow持久化類。
saver = tf.train.Saver()
#saver.save(sess, "./lenet5/lenet5_model")
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME))


<圖三>


其訓練過程及結果如下<圖四>,計算每一batch 的loss及正確率,最後再計算平均正確率。
<圖四>

    底下列出訓練後的評估程式<lenet5_eval.py>,即是restore上述訓練好的模型及數據,直接引用即可做預測的動作,其預測動作的 演算,就是再呼叫<lenet5_infernece.py>向前傳遞演算的架構,並算出loss以及正確率Acc。

  # 'Saver' op to save and restore all the variables saver = tf.train.Saver() with tf.Session() as sess: saver.restore(sess,"./lenet5/lenet5_model")

     底下由於我是用GPU進行訓練及預測運算,故會受限於GPU記憶體大小的限制,也就是如果我的測試樣本數有20000筆,而我的GPU記憶體只有2GB,那麼我沒辦法直接就用10000筆資料一次性地做預測估算。
    所以在此,我還是利用迴圈及同訓練batch的大小去做預測,每個batch做完預測後,再加總平均算出全部的平均正確率,這在底下使用kaggle測試資料時,也是類似的做法。如果你擁有像1080 8GB 以上記憶體的VGA卡,估計不用分batch是可以一次性10000筆直接做完預測。

   for i in range(test_batch_len):
temp_acc= sess.run(accuracy, feed_dict={x: 
                                test_xs[batchsize*i:batchsize*i+batchsize], 
                                y_: y_test_lable[batchsize*i:batchsize*i+batchsize]})
test_acc.append(temp_acc)
print ("Test  batch ",i,":Testing Accuracy:",temp_acc)

t_acc=tf.reduce_mean(tf.cast(test_acc, tf.float32))
print("Average Testing Accuracy=",sess.run(t_acc))
return


import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import lenet5_infernece
import lenet5_train
import os
import numpy as np
from sklearn.preprocessing import MinMaxScaler


def evaluate(X_test,y_test_lable):
 with tf.Graph().as_default() as g:
 
  # 定義輸出為4維矩陣的placeholder
  x_ = tf.placeholder(tf.float32, [None, lenet5_train.INPUT_NODE],name='x-input') 
  x = tf.reshape(x_, shape=[-1, 28, 28, 1])
 
  y_ = tf.placeholder(tf.float32, [None, lenet5_train.OUTPUT_NODE], name='y-input')
 
  regularizer = tf.contrib.layers.l2_regularizer(lenet5_train.REGULARIZATION_RATE)
  y = lenet5_infernece.inference(x,False,regularizer)
  global_step = tf.Variable(0, trainable=False)

  # Evaluate model
  pred_max=tf.argmax(y,1)
  y_max=tf.argmax(y_,1)
  correct_pred = tf.equal(pred_max,y_max)
  accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
 
  test_batch_len =int( X_test.shape[0]/lenet5_train.BATCH_SIZE)
  test_acc=[]
  

  test_xs = np.reshape(X_test, (
     X_test.shape[0],
     lenet5_train.IMAGE_SIZE,
     lenet5_train.IMAGE_SIZE,
     lenet5_train.NUM_CHANNELS))
  
  batchsize=lenet5_train.BATCH_SIZE
 
  # 'Saver' op to save and restore all the variables
  saver = tf.train.Saver()
  with tf.Session() as sess:
   
   saver.restore(sess,"./lenet5/lenet5_model")

   for i in range(test_batch_len):
    temp_acc= sess.run(accuracy, feed_dict={x: test_xs[batchsize*i:batchsize*i+batchsize], y_: y_test_lable[batchsize*i:batchsize*i+batchsize]})
    test_acc.append(temp_acc)
    print ("Test  batch ",i,":Testing Accuracy:",temp_acc) 

   t_acc=tf.reduce_mean(tf.cast(test_acc, tf.float32)) 
   print("Average Testing Accuracy=",sess.run(t_acc))
   return

def main(argv=None):
 #### Loading the data
 X_train, y_train = lenet5_train.load_mnist('..\mnist', kind='train')
 print('X_train Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1])) #X_train=60000x784
 X_test, y_test = lenet5_train.load_mnist('mnist', kind='t10k')      #X_test=10000x784
 print('X_test Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1]))
 mms=MinMaxScaler()
 X_train=mms.fit_transform(X_train)
 X_test=mms.transform(X_test)

 y_train_lable = lenet5_train.encode_labels(y_train,10)
 y_test_lable = lenet5_train.encode_labels(y_test,10)
 ##============================
 
 evaluate(X_test,y_test_lable)

if __name__ == '__main__':
 main()




下圖五即為訓練後評估MNIST測試數據集的測試結果。準確率可達98.99%

 <圖五>
 



    接著我用微軟小畫家自行畫出20張28x28如同MNIST數據集格式的手寫數字,經過OPENCV的處裡,當成數據輸入做預測,結果如下<圖六>
    上列黑底白字即為手寫原始圖檔,下方即為預測結果,綠色自為正確的預測結果,可看出只有一紅色7為錯誤預測,準確率計算也可達95%。


 <圖六>

底下列出這一程式<lenet5_test.py>:
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import lenet5_infernece
import lenet5_train
import os
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import cv2
from matplotlib import pyplot as plt

img_num=[0]*20

def evaluate(X_test,y_test_lable,My_Yd):
 with tf.Graph().as_default() as g:
  # 定義輸出為4維矩陣的placeholder
  x_ = tf.placeholder(tf.float32, [None, lenet5_train.INPUT_NODE],name='x-input') 
  x = tf.reshape(x_, shape=[-1, 28, 28, 1])
 
  y_ = tf.placeholder(tf.float32, [None, lenet5_train.OUTPUT_NODE], name='y-input')
 
  regularizer = tf.contrib.layers.l2_regularizer(lenet5_train.REGULARIZATION_RATE)
  y = lenet5_infernece.inference(x,False,regularizer)
  global_step = tf.Variable(0, trainable=False)

  # Evaluate model
  pred_max=tf.argmax(y,1)
  y_max=tf.argmax(y_,1)
  correct_pred = tf.equal(pred_max,y_max)
  accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
  batchsize=20
  test_batch_len =int( X_test.shape[0]/batchsize)
  test_acc=[]
  
  test_xs = np.reshape(X_test, (
     X_test.shape[0],
     lenet5_train.IMAGE_SIZE,
     lenet5_train.IMAGE_SIZE,
     lenet5_train.NUM_CHANNELS))
  
  # 'Saver' op to save and restore all the variables
  saver = tf.train.Saver()
  #saver = tf.train.import_meta_graph("./mnist/mnist_model.meta")
  with tf.Session() as sess:
   
   saver.restore(sess,"./lenet5/lenet5_model")

   My_test_pred=sess.run(pred_max, feed_dict={x: test_xs[:20]})
   print("期望值:",My_Yd)
   print("預測值:",My_test_pred)
   My_acc = sess.run(accuracy, feed_dict={x: test_xs, y_: y_test_lable})
   print('Test accuracy: %.2f%%' % (My_acc * 100))
   display_result(My_test_pred)  
   return
   
def display_result(my_prediction): 
 img_res=[0]*20
 font = cv2.FONT_HERSHEY_SIMPLEX
 for i in range(20):  
  img_res[i] = np.zeros((64,64,3), np.uint8)
  img_res[i][:,:]=[255,255,255]
  if (my_prediction[i]%10)==(i%10):
   cv2.putText(img_res[i],str(my_prediction[i]),(15,52), font, 2,(0,255,0),3,cv2.LINE_AA)
  else:
   cv2.putText(img_res[i],str(my_prediction[i]),(15,52), font, 2,(255,0,0),3,cv2.LINE_AA)

 Input_Numer_name = ['Input 0', 'Input 1','Input 2', 'Input 3','Input 4',\
     'Input 5','Input 6', 'Input 7','Input8', 'Input9',\
     'Input 0', 'Input 1','Input 2', 'Input 3','Input 4',\
     'Input 5','Input 6', 'Input 7','Input8', 'Input9',
     ]
     
 predict_Numer_name =['predict 0', 'predict 1','predict 2', 'predict 3','predict 4', \
     'predict 5','predict6 ', 'predict 7','predict 8', 'predict 9',\
     'predict 0', 'predict 1','predict 2', 'predict 3','predict 4', \
     'predict 5','predict6 ', 'predict 7','predict 8', 'predict 9',
     ]
    
 for i in range(20):
  if i<10: 20="" __name__="=" argv="None):" cmap="gray" data="" def="" digits="" dtype="int)" else:="" evaluate="" for="" i="" if="" img.copy="" img="img.reshape(My_X.shape[1])" img_num="" img_res="" in="" input_numer="" jpg="" loading="" main="" main__="" mms="MinMaxScaler()" my_label_ohe="lenet5_train.encode_labels(My_Yd,10)" my_test="mms.fit_transform(My_X)" my_x="" my_yd="np.array([0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9]," nput_numer_name="" picture="" pixel="" plt.imshow="" plt.show="" plt.subplot="" plt.title="" plt.xticks="" plt.yticks="" pre="" predict_numer_name="" range="" ray="" read="" the="" x28="784" y_label_ohe="" y_test="" y_yd="">

    在這裡我會使用OPENCV來讀進小畫家產生的圖檔,並做灰階化預處裡,因為MNIST的圖檔資料是28x28x1維度,估需將小畫家產生的原始圖檔28x28x3 轉成28x28x1也就是RGB三通道圖,轉成一通道的灰階圖。

for i in range(20): #read 20 digits picture
img = cv2.imread(Input_Numer[i],0)  #Gray
img_num[i]=img.copy()
img=img.reshape(My_X.shape[1])
My_X[i] =img.copy()

 底下還會使用到OPENCV的cv2.putText繪製文字的功能以其做到圖形視覺化的效果。
for i in range(20):  
img_res[i] = np.zeros((64,64,3), np.uint8)
img_res[i][:,:]=[255,255,255]
if (my_prediction[i]%10)==(i%10):
cv2.putText(img_res[i],str(my_prediction[i]),(15,52), font, 2,
                              (0,255,0),3,cv2.LINE_AA)
else:
cv2.putText(img_res[i],str(my_prediction[i]),(15,52), font, 2,
                             (255,0,0),3,cv2.LINE_AA)


   在這裡讀進圖檔或灰階預處裡不一定要選用OPENCV模組來使用,使用OPENCV原因是 如果將來要直接利用攝影機讀取即時影像及做更多的圖形預處裡時會比較方便,另外OPENCV是我比較熟悉的工具。


最後採用了Kaggle的手寫數據集去預測結果,將結果呈上kaggle網站後可得到正確率,0.99214。比起之前用MLP預測的結果大幅進步,排名也進步到235名<圖七>。

有關kaggle的手寫數據集比賽入門練習可看之前的文章介紹:


<圖七>


其程式<lenet5_kaggle.py>如下:
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import lenet5_infernece
import lenet5_train
import os,csv
import numpy as np
from sklearn.preprocessing import MinMaxScaler

def nomalizing(array): 
 m,n=np.shape(array)  
 for i in range(m):  
  for j in range(n):  
   if array[i,j]!=0:  
    array[i,j]=1  
 return array  
 
def toInt(array):  
 array=np.mat(array)  
 m,n=np.shape(array)  
 newArray=np.zeros((m,n))  
 for i in range(m):  
  for j in range(n):  
    newArray[i,j]=int(array[i,j])  
 return newArray
def loadTrainData():  
 l=[]  
 with open('train.csv') as file:  
   lines=csv.reader(file)  
   for line in lines:  
    l.append(line) #42001*785 
 l.remove(l[0]) 
 l=np.array(l) 
 label=l[:,0]  
 data=l[:,1:]  
 return toInt(data),toInt(label) 
 #return nomalizing(toInt(data)),toInt(label)  
 
def loadTestData():  
 l=[]  
 with open('test.csv') as file: 
   lines=csv.reader(file)  
   for line in lines:  
    l.append(line) #28001*784  
 l.remove(l[0]) 
 data=np.array(l)  
 return toInt(data)
 #return nomalizing(toInt(data))   
 
def loadTestResult():  
 l=[]  
 with open('knn_benchmark.csv') as file:  
   lines=csv.reader(file)  
   for line in lines:  
    l.append(line)  
  #28001*2  
 l.remove(l[0]) 
 label=np.array(l) 
 return toInt(label[:,1]) 

def saveResult(result):
 with open ('result.csv', mode='w',newline="\n") as write_file:
  writer = csv.writer(write_file)
  writer.writerow(["ImageId","Label"])
  for i in range(len(result)):
   writer.writerow([i+1,result[i]])
  
def saveweight(w1,w2):
 with open ('weight1.csv', mode='w',newline="\n") as write_file:
  writer = csv.writer(write_file)
  for i in range(len(w1)):
   writer.writerow([w1[i]])
 with open ('weight2.csv', mode='w',newline="\n") as write_file2:
  writer = csv.writer(write_file2)
  for i in range(len(w2)):
   writer.writerow([w2[i]])  
   

def evaluate(X_test):
 with tf.Graph().as_default() as g:
 
  # 定義輸出為4維矩陣的placeholder
  x_ = tf.placeholder(tf.float32, [None, lenet5_train.INPUT_NODE],name='x-input') 
  x = tf.reshape(x_, shape=[-1, 28, 28, 1])
 
  y_ = tf.placeholder(tf.float32, [None, lenet5_train.OUTPUT_NODE], name='y-input')
 
  regularizer = tf.contrib.layers.l2_regularizer(lenet5_train.REGULARIZATION_RATE)
  y = lenet5_infernece.inference(x,False,regularizer)
  global_step = tf.Variable(0, trainable=False)

  # Evaluate model
  pred_max=tf.argmax(y,1)
  y_max=tf.argmax(y_,1)
  correct_pred = tf.equal(pred_max,y_max)
  accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
 
  test_batch_len =int( X_test.shape[0]/lenet5_train.BATCH_SIZE)
  test_acc=[]
  
  kaggle_pred=np.array([])
  
  test_xs = np.reshape(X_test, (
     X_test.shape[0],
     lenet5_train.IMAGE_SIZE,
     lenet5_train.IMAGE_SIZE,
     lenet5_train.NUM_CHANNELS))
  
  batchsize=lenet5_train.BATCH_SIZE
 
  # 'Saver' op to save and restore all the variables
  saver = tf.train.Saver()

  with tf.Session() as sess:
   saver.restore(sess,"./lenet5/lenet5_model")
   
   for i in range(test_batch_len):
    pred_result=sess.run(pred_max, feed_dict={x: test_xs[batchsize*i:batchsize*i+batchsize]})
    kaggle_pred=np.append(kaggle_pred,pred_result)
    kaggle_pred=kaggle_pred.astype(int)
    kaggle_pred=kaggle_pred.tolist()
   
   print("pred_result.length:",len(kaggle_pred))
   #print("pred_result=",kaggle_pred)
   print("Save prediction result...")
   saveResult(kaggle_pred) 
   return

def main(argv=None):
##load kaggle data+ 
 print("Load kaggle Mnist data...")
 X_test=loadTestData() 
 print("test_data.shape=",X_test.shape)
 
##load data-
 ##============================
 evaluate(X_test)

if __name__ == '__main__':
 main()


這個程式跟上述評估程式<lenet5_eval.py>大致相同 僅須注意如何讀進kaggle的數據檔,如下:
print("Load kaggle Mnist data...") X_test=loadTestData() print("test_data.shape=",X_test.shape)

以及如何儲存預測好的結果,將儲存好的預測結果檔result.csv上傳到kaggle後才會得到正確率的成績,故在這裡進行預測時並沒有輸入所謂的label資料,只得到預測的結果。


for i in range(test_batch_len):
pred_result=sess.run(pred_max, feed_dict={x:     
                                      test_xs[batchsize*i:batchsize*i+batchsize]})
kaggle_pred=np.append(kaggle_pred,pred_result)
kaggle_pred=kaggle_pred.astype(int)
kaggle_pred=kaggle_pred.tolist()

print("pred_result.length:",len(kaggle_pred))
#print("pred_result=",kaggle_pred)
print("Save prediction result...")
saveResult(kaggle_pred)
return




總結:  
    以上即是一個完整的CNN仿製Lenet5手寫數字辨識的範例,使用Tensorflow來實現,實現了 模型持久化可重複執行利用不須再反覆做訓練,除了實現一般訓練以及測試數據的預測來算正確率外,我們也利用小畫家自行產生一些手寫的圖檔來進行預測,並也得到不錯的效果。
    最後也直接使用kaggle的測試數據集來做手寫數字比賽的練習,也得到了99.214%準確率的成績。比起之前使用MLP進行同樣的預測,得到大幅的進步。  
     未來還可以進一步實作的是,使用攝影機透過OPENCV讀取影像,再經過預處裡,如影像裁減,縮放大小,灰階化,那麼將該影像取代小畫家所產生的圖檔,就可以做到即時的影像手寫數字識別。



加入阿布拉機的3D列印與機器人的FB專頁
https://www.facebook.com/arbu00/

<參考資料>
[1]書名:Tensorflow 實戰Google深度學習框架 作者:鄭澤宇 顧思宇
[2]書名:NLP漢語自然語言處裡原理與實踐 作者:鄭捷
[3]http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf
[4]零基础入门深度学习(5) - 循环神经网络


<其他相關文章>
人工神經網路(1)--使用Python實作perceptron(感知器)
人工神經網路(2)--使用Python實作後向傳遞神經網路演算法(Backprogation artificial neature network)
深度學習(1)-如何在windows安裝Theano +Keras +Tensorflow並使用GPU加速訓練神經網路
深度學習(2)--使用Tensorflow實作卷積神經網路(Convolutional neural network,CNN)
深度學習(3)--循環神經網絡(RNN, Recurrent Neural Networks)
機器學習(1)--使用OPENCV KNN實作手寫辨識
機器學習(2)--使用OPENCV SVM實作手寫辨識
演算法(1)--蒙地卡羅法求圓周率及橢圓面積(Monte carlo)
機器學習(3)--適應線性神經元與梯度下降法(Adaline neuron and Gradient descent)
機器學習(4)--資料標準常態化與隨機梯度下降法( standardization & Stochastic Gradient descent)
機器學習(5)--邏輯斯迴歸,過度適合與正規化( Logistic regression,overfitting and regularization)
機器學習(6)--主成分分析(Principal component analysis,PCA)
機器學習(7)--利用核主成分分析(Kernel PCA)處理非線性對應
機器學習(8)--實作多層感知器(Multilayer Perceptron,MLP)手寫數字辨識
機器學習(9)--大數據競賽平台Kaggle入門,練習手寫數字辨識