merge
This commit is contained in:
commit
9ab43a7866
@ -146,3 +146,8 @@ author={Bilogur, Aleksey},
|
||||
year={2017},
|
||||
month={Oct}
|
||||
}
|
||||
@misc{kaggle,
|
||||
title = {Kaggle: The Home of Data Science \& Machine Learning},
|
||||
howpublished = {\url{https://www.kaggle.com/}},
|
||||
note = {Accessed: 2018-05-25}
|
||||
}
|
||||
|
@ -50,7 +50,7 @@
|
||||
Almost every child around the world knows about ``Where's Waldo?'', also
|
||||
known as ``Where's Wally?'' in some countries. This famous puzzle book has
|
||||
spread its way across the world and is published in more than 25 different
|
||||
languages. The idea behind the books is to find the character ``Waldo'',
|
||||
languages. The idea behind the books is to find the character Waldo,
|
||||
shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
|
||||
however, not as easy as it sounds. Every picture in the book is full of tiny
|
||||
details and Waldo is only one out of many. The puzzle is made even harder by
|
||||
@ -64,7 +64,7 @@
|
||||
\includegraphics[scale=0.35]{waldo.png}
|
||||
\centering
|
||||
\caption{
|
||||
A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo
|
||||
A headshot of the character Waldo, or Wally. Pictures of Waldo
|
||||
copyrighted by Martin Handford and are used under the fair-use policy.
|
||||
}
|
||||
\label{fig:waldo}
|
||||
@ -82,7 +82,7 @@
|
||||
setting that is humanly tangible. In this report we will try to identify
|
||||
Waldo in the puzzle images using different classification methods. Every
|
||||
image will be split into different segments and every segment will have to
|
||||
be classified as either being ``Waldo'' or ``not Waldo''. We will compare
|
||||
be classified as either being Waldo or not Waldo. We will compare
|
||||
various different classification methods from more classical machine
|
||||
learning, like naive Bayes classifiers, to the currently state of the art,
|
||||
Neural Networks. In \Cref{sec:background} we will introduce the different
|
||||
@ -158,15 +158,23 @@
|
||||
of randomness and the mean of these trees is used which avoids this problem.
|
||||
|
||||
\subsection{Neural Network Architectures}
|
||||
\tab There are many well established architectures for Neural Networks depending on the task being performed.
|
||||
In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}.
|
||||
One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth.
|
||||
These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network.
|
||||
The difference between the two network use of convolutional and dense layers.
|
||||
The convolutional neural network contains dense layers in the final stages of the network.
|
||||
The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step.
|
||||
The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations.
|
||||
\\
|
||||
|
||||
There are many well established architectures for Neural Networks depending
|
||||
on the task being performed. In this paper, the focus is placed on
|
||||
convolution neural networks, which have been proven to effectively classify
|
||||
images \cite{NIPS2012_4824}. One of the pioneering works in the field, the
|
||||
LeNet \cite{726791}architecture, will be implemented to compare against two
|
||||
rudimentary networks with more depth. These networks have been constructed
|
||||
to improve on the LeNet architecture by extracting more features, condensing
|
||||
image information, and allowing for more parameters in the network. The
|
||||
difference between the two network use of convolutional and dense layers.
|
||||
The convolutional neural network contains dense layers in the final stages
|
||||
of the network. The Fully Convolutional Network (FCN) contains only one
|
||||
dense layer for the final binary classification step. The FCN instead
|
||||
consists of an extra convolutional layer, resulting in an increased ability
|
||||
for the network to abstract the input data relative to the other two
|
||||
configurations. \\
|
||||
|
||||
\begin{figure}[H]
|
||||
\includegraphics[scale=0.50]{LeNet}
|
||||
\centering
|
||||
@ -184,11 +192,11 @@
|
||||
agreement intended to allow users to freely share, modify, and use [a]
|
||||
Database while maintaining [the] same freedom for
|
||||
others"\cite{openData}}hosted on the predictive modeling and analytics
|
||||
competition framework, Kaggle. The distinction between images containing
|
||||
Waldo, and those that do not, was provided by the separation of the images
|
||||
in different sub-directories. It was therefore necessary to preprocess these
|
||||
images before they could be utilized by the proposed machine learning
|
||||
algorithms.
|
||||
competition framework, Kaggle~\cite{kaggle}. The distinction between images
|
||||
containing Waldo, and those that do not, was provided by the separation of
|
||||
the images in different sub-directories. It was therefore necessary to
|
||||
preprocess these images before they could be utilized by the proposed
|
||||
machine learning algorithms.
|
||||
|
||||
\subsection{Image Processing} \label{imageProcessing}
|
||||
|
||||
@ -203,15 +211,15 @@
|
||||
containing the most individual images of the three size groups. \\
|
||||
|
||||
Each of the 64$\times$64 pixel images were inserted into a
|
||||
Numpy~\cite{numpy} array of images, and a binary value was inserted into a
|
||||
NumPy~\cite{numpy} array of images, and a binary value was inserted into a
|
||||
separate list at the same index. These binary values form the labels for
|
||||
each image (``Waldo'' or ``not Waldo''). Color normalization was performed
|
||||
each image (Waldo or not Waldo). Color normalization was performed
|
||||
on each so that artifacts in an image's color profile correspond to
|
||||
meaningful features of the image (rather than photographic method).\\
|
||||
|
||||
Each original puzzle is broken down into many images, and only contains one
|
||||
Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this
|
||||
means that the ``non-Waldo'' data far outnumbers the ``Waldo'' data. To
|
||||
means that the non-Waldo data far outnumbers the Waldo data. To
|
||||
combat the bias introduced by the skewed data, all Waldo images were
|
||||
artificially augmented by performing random rotations, reflections, and
|
||||
introducing random noise in the image to produce news images. In this way,
|
||||
@ -221,10 +229,10 @@
|
||||
robust methods by exposing each technique to variations of the image during
|
||||
the training phase. \\
|
||||
|
||||
Despite the additional data, there were still ten times more ``non-Waldo''
|
||||
Despite the additional data, there were still ten times more non-Waldo
|
||||
images than Waldo images. Therefore, it was necessary to cull the
|
||||
``non-Waldo'' data, so that there was an even split of ``Waldo'' and
|
||||
``non-Waldo'' images, improving the representation of true positives in the
|
||||
non-Waldo data, so that there was an even split of Waldo and
|
||||
non-Waldo images, improving the representation of true positives in the
|
||||
image data set. Following preprocessing, the images (and associated labels)
|
||||
were divided into a training and a test set with a 3:1 split. \\
|
||||
|
||||
@ -260,7 +268,7 @@
|
||||
To evaluate the performance of the models, we record the time taken by
|
||||
each model to train, based on the training data and the accuracy with which
|
||||
the model makes predictions. We calculate accuracy as
|
||||
\(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\)
|
||||
\[a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\]
|
||||
where \(tp\) is the number of true positives, \(tn\) is the number of true
|
||||
negatives, \(fp\) is the number of false positives, and \(tp\) is the number
|
||||
of false negatives.
|
||||
@ -300,11 +308,19 @@
|
||||
\label{tab:results}
|
||||
\end{table}
|
||||
|
||||
We can see by in these results that Deep Neural Networks outperform our benchmark
|
||||
We can see by the results that Deep Neural Networks outperform our benchmark
|
||||
classification models, although the time required to train these networks is
|
||||
significantly greater.
|
||||
|
||||
\section{Conclusion} \label{sec:conclusion}
|
||||
% models that learn relationships between pixels outperform those that don't
|
||||
|
||||
Of the benchmark classifiers we see the best performance with Random
|
||||
Forests and the worst performance with K Nearest Neighbours. As supported
|
||||
by the rest of the results, this comes down to a models ability to learn
|
||||
the hidden relationships between the pixels. This is made more apparent by
|
||||
performance of the Neural Networks.
|
||||
|
||||
\section{Conclusion} \label{sec:conclusion}
|
||||
|
||||
Image from the ``Where's Waldo?'' puzzle books are ideal images to test
|
||||
image classification techniques. Their tendency for hidden objects and ``red
|
||||
|
Reference in New Issue
Block a user