1
0
This commit is contained in:
Silver-T 2018-05-25 18:48:54 +10:00
commit 9ab43a7866
2 changed files with 50 additions and 29 deletions

View File

@ -145,4 +145,9 @@ publisher={Aleksey Bilogur},
author={Bilogur, Aleksey}, author={Bilogur, Aleksey},
year={2017}, year={2017},
month={Oct} month={Oct}
} }
@misc{kaggle,
title = {Kaggle: The Home of Data Science \& Machine Learning},
howpublished = {\url{https://www.kaggle.com/}},
note = {Accessed: 2018-05-25}
}

View File

@ -50,7 +50,7 @@
Almost every child around the world knows about ``Where's Waldo?'', also Almost every child around the world knows about ``Where's Waldo?'', also
known as ``Where's Wally?'' in some countries. This famous puzzle book has known as ``Where's Wally?'' in some countries. This famous puzzle book has
spread its way across the world and is published in more than 25 different spread its way across the world and is published in more than 25 different
languages. The idea behind the books is to find the character ``Waldo'', languages. The idea behind the books is to find the character Waldo,
shown in \Cref{fig:waldo}, in the different pictures in the book. This is, shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
however, not as easy as it sounds. Every picture in the book is full of tiny however, not as easy as it sounds. Every picture in the book is full of tiny
details and Waldo is only one out of many. The puzzle is made even harder by details and Waldo is only one out of many. The puzzle is made even harder by
@ -64,7 +64,7 @@
\includegraphics[scale=0.35]{waldo.png} \includegraphics[scale=0.35]{waldo.png}
\centering \centering
\caption{ \caption{
A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo A headshot of the character Waldo, or Wally. Pictures of Waldo
copyrighted by Martin Handford and are used under the fair-use policy. copyrighted by Martin Handford and are used under the fair-use policy.
} }
\label{fig:waldo} \label{fig:waldo}
@ -82,7 +82,7 @@
setting that is humanly tangible. In this report we will try to identify setting that is humanly tangible. In this report we will try to identify
Waldo in the puzzle images using different classification methods. Every Waldo in the puzzle images using different classification methods. Every
image will be split into different segments and every segment will have to image will be split into different segments and every segment will have to
be classified as either being ``Waldo'' or ``not Waldo''. We will compare be classified as either being Waldo or not Waldo. We will compare
various different classification methods from more classical machine various different classification methods from more classical machine
learning, like naive Bayes classifiers, to the currently state of the art, learning, like naive Bayes classifiers, to the currently state of the art,
Neural Networks. In \Cref{sec:background} we will introduce the different Neural Networks. In \Cref{sec:background} we will introduce the different
@ -158,15 +158,23 @@
of randomness and the mean of these trees is used which avoids this problem. of randomness and the mean of these trees is used which avoids this problem.
\subsection{Neural Network Architectures} \subsection{Neural Network Architectures}
\tab There are many well established architectures for Neural Networks depending on the task being performed.
In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}. There are many well established architectures for Neural Networks depending
One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth. on the task being performed. In this paper, the focus is placed on
These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network. convolution neural networks, which have been proven to effectively classify
The difference between the two network use of convolutional and dense layers. images \cite{NIPS2012_4824}. One of the pioneering works in the field, the
The convolutional neural network contains dense layers in the final stages of the network. LeNet \cite{726791}architecture, will be implemented to compare against two
The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step. rudimentary networks with more depth. These networks have been constructed
The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations. to improve on the LeNet architecture by extracting more features, condensing
\\ image information, and allowing for more parameters in the network. The
difference between the two network use of convolutional and dense layers.
The convolutional neural network contains dense layers in the final stages
of the network. The Fully Convolutional Network (FCN) contains only one
dense layer for the final binary classification step. The FCN instead
consists of an extra convolutional layer, resulting in an increased ability
for the network to abstract the input data relative to the other two
configurations. \\
\begin{figure}[H] \begin{figure}[H]
\includegraphics[scale=0.50]{LeNet} \includegraphics[scale=0.50]{LeNet}
\centering \centering
@ -184,11 +192,11 @@
agreement intended to allow users to freely share, modify, and use [a] agreement intended to allow users to freely share, modify, and use [a]
Database while maintaining [the] same freedom for Database while maintaining [the] same freedom for
others"\cite{openData}}hosted on the predictive modeling and analytics others"\cite{openData}}hosted on the predictive modeling and analytics
competition framework, Kaggle. The distinction between images containing competition framework, Kaggle~\cite{kaggle}. The distinction between images
Waldo, and those that do not, was provided by the separation of the images containing Waldo, and those that do not, was provided by the separation of
in different sub-directories. It was therefore necessary to preprocess these the images in different sub-directories. It was therefore necessary to
images before they could be utilized by the proposed machine learning preprocess these images before they could be utilized by the proposed
algorithms. machine learning algorithms.
\subsection{Image Processing} \label{imageProcessing} \subsection{Image Processing} \label{imageProcessing}
@ -203,15 +211,15 @@
containing the most individual images of the three size groups. \\ containing the most individual images of the three size groups. \\
Each of the 64$\times$64 pixel images were inserted into a Each of the 64$\times$64 pixel images were inserted into a
Numpy~\cite{numpy} array of images, and a binary value was inserted into a NumPy~\cite{numpy} array of images, and a binary value was inserted into a
separate list at the same index. These binary values form the labels for separate list at the same index. These binary values form the labels for
each image (``Waldo'' or ``not Waldo''). Color normalization was performed each image (Waldo or not Waldo). Color normalization was performed
on each so that artifacts in an image's color profile correspond to on each so that artifacts in an image's color profile correspond to
meaningful features of the image (rather than photographic method).\\ meaningful features of the image (rather than photographic method).\\
Each original puzzle is broken down into many images, and only contains one Each original puzzle is broken down into many images, and only contains one
Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this
means that the ``non-Waldo'' data far outnumbers the ``Waldo'' data. To means that the non-Waldo data far outnumbers the Waldo data. To
combat the bias introduced by the skewed data, all Waldo images were combat the bias introduced by the skewed data, all Waldo images were
artificially augmented by performing random rotations, reflections, and artificially augmented by performing random rotations, reflections, and
introducing random noise in the image to produce news images. In this way, introducing random noise in the image to produce news images. In this way,
@ -221,10 +229,10 @@
robust methods by exposing each technique to variations of the image during robust methods by exposing each technique to variations of the image during
the training phase. \\ the training phase. \\
Despite the additional data, there were still ten times more ``non-Waldo'' Despite the additional data, there were still ten times more non-Waldo
images than Waldo images. Therefore, it was necessary to cull the images than Waldo images. Therefore, it was necessary to cull the
``non-Waldo'' data, so that there was an even split of ``Waldo'' and non-Waldo data, so that there was an even split of Waldo and
``non-Waldo'' images, improving the representation of true positives in the non-Waldo images, improving the representation of true positives in the
image data set. Following preprocessing, the images (and associated labels) image data set. Following preprocessing, the images (and associated labels)
were divided into a training and a test set with a 3:1 split. \\ were divided into a training and a test set with a 3:1 split. \\
@ -260,7 +268,7 @@
To evaluate the performance of the models, we record the time taken by To evaluate the performance of the models, we record the time taken by
each model to train, based on the training data and the accuracy with which each model to train, based on the training data and the accuracy with which
the model makes predictions. We calculate accuracy as the model makes predictions. We calculate accuracy as
\(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\) \[a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\]
where \(tp\) is the number of true positives, \(tn\) is the number of true where \(tp\) is the number of true positives, \(tn\) is the number of true
negatives, \(fp\) is the number of false positives, and \(tp\) is the number negatives, \(fp\) is the number of false positives, and \(tp\) is the number
of false negatives. of false negatives.
@ -299,12 +307,20 @@
network and traditional machine learning technique} network and traditional machine learning technique}
\label{tab:results} \label{tab:results}
\end{table} \end{table}
We can see by in these results that Deep Neural Networks outperform our benchmark We can see by the results that Deep Neural Networks outperform our benchmark
classification models, although the time required to train these networks is classification models, although the time required to train these networks is
significantly greater. significantly greater.
\section{Conclusion} \label{sec:conclusion} % models that learn relationships between pixels outperform those that don't
Of the benchmark classifiers we see the best performance with Random
Forests and the worst performance with K Nearest Neighbours. As supported
by the rest of the results, this comes down to a models ability to learn
the hidden relationships between the pixels. This is made more apparent by
performance of the Neural Networks.
\section{Conclusion} \label{sec:conclusion}
Image from the ``Where's Waldo?'' puzzle books are ideal images to test Image from the ``Where's Waldo?'' puzzle books are ideal images to test
image classification techniques. Their tendency for hidden objects and ``red image classification techniques. Their tendency for hidden objects and ``red