1
0

Added to report, references and wrote a function for test_nn.py

This commit is contained in:
Silver-T 2018-05-25 13:43:35 +10:00
parent d75c878afa
commit a9d1a73bc1
4 changed files with 97 additions and 31 deletions

BIN
mini_proj/Waldo.h5 Normal file

Binary file not shown.

View File

@ -56,3 +56,25 @@ url = {http://books.google.com/books?hl=en{\&}lr={\&}id=vLiTXDHr{\_}sYC{\&}oi=fn
volume = {31},
year = {2007}
}
@incollection{NIPS2012_4824,
title = {ImageNet Classification with Deep Convolutional Neural Networks},
author = {Alex Krizhevsky and Sutskever, Ilya and Hinton, Geoffrey E},
booktitle = {Advances in Neural Information Processing Systems 25},
editor = {F. Pereira and C. J. C. Burges and L. Bottou and K. Q. Weinberger},
pages = {1097--1105},
year = {2012},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf}
}
@ARTICLE{726791,
author={Y. Lecun and L. Bottou and Y. Bengio and P. Haffner},
journal={Proceedings of the IEEE},
title={Gradient-based learning applied to document recognition},
year={1998},
volume={86},
number={11},
pages={2278-2324},
keywords={backpropagation;convolution;multilayer perceptrons;optical character recognition;2D shape variability;GTN;back-propagation;cheque reading;complex decision surface synthesis;convolutional neural network character recognizers;document recognition;document recognition systems;field extraction;gradient based learning technique;gradient-based learning;graph transformer networks;handwritten character recognition;handwritten digit recognition task;high-dimensional patterns;language modeling;multilayer neural networks;multimodule systems;performance measure minimization;segmentation recognition;Character recognition;Feature extraction;Hidden Markov models;Machine learning;Multi-layer neural network;Neural networks;Optical character recognition software;Optical computing;Pattern recognition;Principal component analysis},
doi={10.1109/5.726791},
ISSN={0018-9219},
month={Nov},}

View File

@ -127,20 +127,16 @@
\cite{randomforest}
\subsection{Neural Network Architectures}
\todo{Did we only do the three in the end? (Alexnet?)}
Yeah, we implemented the LeNet architecture, then improved on it for a fairly standar convolutional neural network (CNN) that was deeper, extracted more features, and condensed that image information more. Then we implemented a more fully convolutional network (FCN) which contained only one dense layer for the final binary classification step. The FCN added an extra convolutional layer, meaning the before classifying each image, the network abstracted the data more than the other two.
\begin{itemize}
\item LeNet
\item CNN
\item FCN
\end{itemize}
\paragraph{Convolutional Neural Networks}
\paragraph{LeNet}
\paragraph{Fully Convolutional Neural Networks}
\tab There are many well established architectures for Neural Networks depending on the task being performed.
In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}.
One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth.
These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network.
The difference between the two network use of convolutional and dense layers.
The convolutional neural network contains dense layers in the final stages of the network.
The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step.
The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations.
\\
\textbf{Insert image of LeNet from slides)
\section{Method} \label{sec:method}
\tab
@ -149,7 +145,7 @@
The distinction between images containing Waldo, and those that do not, was providied by the separation of the images in different sub-directories.
It was therefore necessary to preprocess these images before they could be utilised by the proposed machine learning algorithms.
\subsection{Image Processing}
\subsection{Image Processing} \label{imageProcessing}
\tab
The Waldo image database consists of images of size 64$\times$64, 128$\times$128, and 256$\times$256 pixels obtained by dividing complete Where's Waldo? puzzles.
Within each set of images, those containing Waldo are located in a folder called `waldo', and those not containing Waldo, in a folder called `not\_waldo'.
@ -171,9 +167,21 @@
\\
\par
Despite the additional data, there were still over ten times as many non-Waldo images than Waldo images.
Therefore, it was necessary to cull the no-Waldo data, so that there was an even split of Waldo and non-Waldo images, improving the representation of true positives in the image data set.
Therefore, it was necessary to cull the no-Waldo data, so that there was an even split of Waldo and non-Waldo images, improving the representation of true positives in the image data set. Following preprocessing, the images (and associated labels) were divided into a training and a test set with a 3:1 split.
\\
\subsection{Neural Network Training}\label{nnTraining}
\tab The neural networks used to classify the images were supervised learning models; requiring training on a dataset of typical images.
Each network was trained using the preprocessed training dataset and labels, for 25 epochs (one forward and backward pass of all data) in batches of 150.
The number of epochs was chosen to maximise training time and prevent overfitting\footnote{Overfitting occurs when a model learns from the data too specifically, and loses its ability to generalise its predictions for new data (resulting in loss of prediction accuracy)} of the training data, given current model parameters.
The batch size is the number of images sent through each pass of the network. Using the entire dataset would train the network quickly, but decrease the network's ability to learn unique features from the data.
Passing one image at a time may allow the model to learn more about each image, however it would also increase the training time and risk of overfitting the data.
Therefore the batch size was chosen to maintain training accuracy while minimising training time.
\subsection{Neural Network Testing}\label{nnTesting}
\tab After training each network, a separate test set of images (and labels) was used to evaluate the models.
The result of this testing was expressed primarily in the form of an accuracy (percentage).
These results as well as the other methods presented in this paper are given in Figure \textbf{[insert ref to results here]} of the Results section.
\textbf{***********}
% Kelvin Start
\subsection{Benchmarking}\label{benchmarking}
@ -206,7 +214,7 @@
false negatives.
\end{itemize}
Accuracy is a common performance metric used in Machine Learning,
\emph{Accuracy} is a common performance metric used in Machine Learning,
however in classification problems where the training data is heavily
biased toward one category, sometimes a model will learn to optimize its
accuracy by classifying all instances as one category. I.e. the
@ -214,7 +222,8 @@
containing Waldo, but will also classify all images containing Waldo as
not containing Waldo. Thus we use, other metrics to measure performance
as well.
\\
\par
\emph{Precision} returns the percentage of classifications of Waldo that
are actually Waldo. \emph{Recall} returns the percentage of Waldos that
were actually predicted as Waldo. In the case of a classifier that

View File

@ -1,19 +1,54 @@
import numpy as np
from keras.models import Model
from keras.models import Model, load_model
from keras.utils import to_categorical
import cv2
from skimage import color, exposure
pred_y = np.load("predicted_results.npy")
test_y = np.load("Waldo_test_lbl.npy")
def man_result_check():
pred_y = np.load("predicted_results.npy")
test_y = np.load("Waldo_test_lbl.npy")
test_y = to_categorical(test_y)
test_y = to_categorical(test_y)
f = open("test_output.txt", 'w')
z = 0
for i in range(0, len(test_y)):
print(pred_y[i], test_y[i], file=f)
# Calculates correct predictions
if pred_y[i][0] == test_y[i][0]:
z+=1
f = open("test_output.txt", 'w')
z = 0
for i in range(0, len(test_y)):
print(pred_y[i], test_y[i], file=f)
# Calculates correct predictions
if pred_y[i][0] == test_y[i][0]:
z+=1
print("Accuracy: {}%".format(z/len(test_y)*100))
f.close()
print("Accuracy: {}%".format(z/len(test_y)*100))
f.close()
'''
Purpose:Loads a trained neural network model (using Keras) to classify an image
Input: path/to/trained_model
image [or] path/to/image [if from_file=True]
Returns:Boolean variable
'''
def is_Wally(trained_model_path, image, from_file=False):
if from_file:
img = cv2.imread(image) # Opens the image (in BGR format)
# Histogram normalization in v channel
hsv = color.rgb2hsv(img)
hsv[:, :, 2] = exposure.equalize_hist(hsv[:, :, 2])
img = color.hsv2rgb(hsv)
image = np.rollaxis(img, -1) # Rolls the colour axis to the front
trained_model = load_model(trained_model_path)
if trained_model.predict(image, verbose=1, batch_size=1)[0] == 1:
return 0
else:
return 1
# Load fully puzzle image
# Split image into array of images
# use is_Wally(img) to classify image
# Mark Wally image somehow (colour the border)
# Stitch original image back together
is_Wally("Waldo.h5", image)