1
0

A lot of cleanup + spell check

This commit is contained in:
Jip J. Dekker 2018-05-25 14:53:03 +10:00
parent 2dbb5f9e76
commit bf5b88c46d
2 changed files with 111 additions and 63 deletions

View File

@ -126,3 +126,14 @@ month={Nov},}
year={2006},
publisher={Trelgol Publishing USA}
}
@article{scikit-learn,
title={Scikit-learn: Machine Learning in {P}ython},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011}
}

View File

@ -28,7 +28,7 @@
\maketitle
\begin{abstract}
%
The famous brand of picture puzzles ``Where's Waldo?'' relates well to many
unsolved image classification problem. This offers us the opportunity to
test different image classification methods on a data set that is both small
@ -40,7 +40,7 @@
\todo{I don't like this big summation but I think it is the important
information}
Our comparison shows that \todo{...}
%
\end{abstract}
\section{Introduction}
@ -98,15 +98,7 @@
their basic versions. In contrast, we will use different neural network
architectures, as this method is currently the most used for image
classification.
\todo{
\\A couple of papers that may be useful (if needed):
- LeNet: \cite{lenet}
- AlexNet: \cite{alexnet}
- General comparison of LeNet and AlexNet:\cite{lenetVSalexnet}
- Deep NN Architecture:\cite{deepNN}
}
\subsection{Classical Machine Learning Methods}
The following paragraphs will give only brief descriptions of the different
@ -162,67 +154,112 @@
trees is used which avoids this problem.
\subsection{Neural Network Architectures}
\tab There are many well established architectures for Neural Networks depending on the task being performed.
In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}.
One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth.
These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network.
The difference between the two network use of convolutional and dense layers.
The convolutional neural network contains dense layers in the final stages of the network.
The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step.
The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations.
\\
\textbf{Insert image of LeNet from slides}
There are many well established architectures for Neural Networks depending
on the task being performed. In this paper, the focus is placed on
convolution neural networks, which have been proven to effectively classify
images \cite{NIPS2012_4824}. One of the pioneering works in the field, the
LeNet architecture~\cite{726791}, will be implemented to compare against two
rudimentary networks with more depth. These networks have been constructed
to improve on the LeNet architecture by extracting more features, condensing
image information, and allowing for more parameters in the network. The
difference between the two network use of convolutional and dense layers.
The convolutional neural network contains dense layers in the final stages
of the network. The Fully Convolutional Network (FCN) contains only one
dense layer for the final binary classification step. The FCN instead
consists of an extra convolutional layer, resulting in an increased ability
for the network to abstract the input data relative to the other two
configurations.
\todo{Insert image of LeNet from slides}
\section{Method} \label{sec:method}
\tab
In order to effectively utilize the aforementioned modelling and classification techniques, a key consideration is the data they are acting on.
A dataset containing Waldo and non-Waldo images was obtained from an Open Database\footnote{``The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use [a] Database while maintaining [the] same freedom for others"\cite{openData}}hosted on the predictive modelling and analytics competition framework, Kaggle.
The distinction between images containing Waldo, and those that do not, was providied by the separation of the images in different sub-directories.
It was therefore necessary to preprocess these images before they could be utilised by the proposed machine learning algorithms.
In order to effectively utilize the aforementioned modeling and
classification techniques, a key consideration is the data they are acting
on. A dataset containing Waldo and non-Waldo images was obtained from an
Open Database\footnote{``The Open Database License (ODbL) is a license
agreement intended to allow users to freely share, modify, and use [a]
Database while maintaining [the] same freedom for
others"\cite{openData}}hosted on the predictive modeling and analytics
competition framework, Kaggle. The distinction between images containing
Waldo, and those that do not, was provided by the separation of the images
in different sub-directories. It was therefore necessary to preprocess these
images before they could be utilized by the proposed machine learning
algorithms.
\subsection{Image Processing} \label{imageProcessing}
\tab
The Waldo image database consists of images of size 64$\times$64, 128$\times$128, and 256$\times$256 pixels obtained by dividing complete Where's Waldo? puzzles.
Within each set of images, those containing Waldo are located in a folder called `waldo', and those not containing Waldo, in a folder called `not\_waldo'.
Since Where's Waldo? puzzles are usually densely populated and contain fine details, the 64$\times$64 pixel set of images were selected to train and evaluate the machine learning models.
These images provide the added benefit of containing the most individual images of the three size groups.
\\
\par
Each of the 64$\times$64 pixel images were inserted into a Numpy~\cite{numpy}
array of images, and a binary value was inserted into a seperate list at the same index.
These binary values form the labels for each image (waldo or not waldo).
Colour normalisation was performed on each so that artefacts in an image's colour profile correspond to meaningful features of the image (rather than photographic method).
\\
\par
Each original puzzle is broken down into many images, and only contains one Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this means that the non-Waldo data far outnumbers the Waldo data.
To combat the bias introduced by the skewed data, all Waldo images were artificially augmented by performing random rotations, reflections, and introducing random noise in the image to produce news images.
In this way, each original Waldo image was used to produce an additional 10 variations of the image, inserted into the image array.
This provided more variation in the true positives of the data set and assists in the development of more robust methods by exposing each technique to variations of the image during the training phase.
\\
\par
Despite the additional data, there were still over ten times as many non-Waldo images than Waldo images.
Therefore, it was necessary to cull the no-Waldo data, so that there was an even split of Waldo and non-Waldo images, improving the representation of true positives in the image data set. Following preprocessing, the images (and associated labels) were divided into a training and a test set with a 3:1 split.
\\
The Waldo image database consists of images of size 64$\times$64,
128$\times$128, and 256$\times$256 pixels obtained by dividing complete
Where's Waldo? puzzles. Within each set of images, those containing Waldo
are located in a folder called `waldo', and those not containing Waldo, in a
folder called `not\_waldo'. Since Where's Waldo? puzzles are usually densely
populated and contain fine details, the 64$\times$64 pixel set of images
were selected to train and evaluate the machine learning models. These
images provide the added benefit of containing the most individual images of
the three size groups. \\
Each of the 64$\times$64 pixel images were inserted into a
Numpy~\cite{numpy} array of images, and a binary value was inserted into a
seperate list at the same index. These binary values form the labels for
each image (waldo or not waldo). Colour normalisation was performed on each
so that artefacts in an image's colour profile correspond to meaningful
features of the image (rather than photographic method).\\
Each original puzzle is broken down into many images, and only contains one
Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this
means that the non-Waldo data far outnumbers the Waldo data. To combat the
bias introduced by the skewed data, all Waldo images were artificially
augmented by performing random rotations, reflections, and introducing
random noise in the image to produce news images. In this way, each original
Waldo image was used to produce an additional 10 variations of the image,
inserted into the image array. This provided more variation in the true
positives of the data set and assists in the development of more robust
methods by exposing each technique to variations of the image during the
training phase. \\
Despite the additional data, there were still over ten times as many
non-Waldo images than Waldo images. Therefore, it was necessary to cull the
no-Waldo data, so that there was an even split of Waldo and non-Waldo
images, improving the representation of true positives in the image data
set. Following preprocessing, the images (and associated labels) were
divided into a training and a test set with a 3:1 split. \\
\subsection{Neural Network Training}\label{nnTraining}
\tab The neural networks used to classify the images were supervised learning models; requiring training on a dataset of typical images.
Each network was trained using the preprocessed training dataset and labels, for 25 epochs (one forward and backward pass of all data) in batches of 150.
The number of epochs was chosen to maximise training time and prevent overfitting\footnote{Overfitting occurs when a model learns from the data too specifically, and loses its ability to generalise its predictions for new data (resulting in loss of prediction accuracy)} of the training data, given current model parameters.
The batch size is the number of images sent through each pass of the network. Using the entire dataset would train the network quickly, but decrease the network's ability to learn unique features from the data.
Passing one image at a time may allow the model to learn more about each image, however it would also increase the training time and risk of overfitting the data.
Therefore the batch size was chosen to maintain training accuracy while minimising training time.
The neural networks used to classify the images were supervised learning
models; requiring training on a dataset of typical images. Each network was
trained using the preprocessed training dataset and labels, for 25 epochs
(one forward and backward pass of all data) in batches of 150. The number of
epochs was chosen to maximise training time and prevent
overfitting\footnote{Overfitting occurs when a model learns from the data
too specifically, and loses its ability to generalise its predictions for
new data (resulting in loss of prediction accuracy)} of the training data,
given current model parameters. The batch size is the number of images sent
through each pass of the network. Using the entire dataset would train the
network quickly, but decrease the network's ability to learn unique features
from the data. Passing one image at a time may allow the model to learn more
about each image, however it would also increase the training time and risk
of overfitting the data. Therefore the batch size was chosen to maintain
training accuracy while minimising training time.
\subsection{Neural Network Testing}\label{nnTesting}
\tab After training each network, a separate test set of images (and labels) was used to evaluate the models.
The result of this testing was expressed primarily in the form of an accuracy (percentage).
These results as well as the other methods presented in this paper are given in Figure \textbf{[insert ref to results here]} of the Results section.
\textbf{***********}
% Kelvin Start
After training each network, a separate test set of images (and labels) was
used to evaluate the models. The result of this testing was expressed
primarily in the form of an accuracy (percentage). These results as well as
the other methods presented in this paper are given in Figure
\todo{insert ref to results here} of the Results section.
\todo{***********}
\subsection{Benchmarking}\label{benchmarking}
In order to benchmark the Neural Networks, the performance of these
algorithms are evaluated against other Machine Learning algorithms. We
use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian
Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn.
algorithms are evaluated against other Machine Learning algorithms. We use
Support Vector Machines, K-Nearest Neighbors (\(K=5\)), Naive Bayes and
Random Forest classifiers, as provided in Scikit-Learn~\cite{scikit-learn}.
\subsection{Performance Metrics}\label{performance-metrics}
@ -262,7 +299,7 @@
are actually Waldo. \emph{Recall} returns the percentage of Waldos that
were actually predicted as Waldo. In the case of a classifier that
classifies all things as Waldo, the recall would be 0. \emph{F1-Measure}
returns a combination of precision and recall that heavily penalises
returns a combination of precision and recall that heavily penalizes
classifiers that perform poorly in either precision or recall.
% Kelvin End