merge
This commit is contained in:
commit
9ab43a7866
@ -145,4 +145,9 @@ publisher={Aleksey Bilogur},
|
|||||||
author={Bilogur, Aleksey},
|
author={Bilogur, Aleksey},
|
||||||
year={2017},
|
year={2017},
|
||||||
month={Oct}
|
month={Oct}
|
||||||
}
|
}
|
||||||
|
@misc{kaggle,
|
||||||
|
title = {Kaggle: The Home of Data Science \& Machine Learning},
|
||||||
|
howpublished = {\url{https://www.kaggle.com/}},
|
||||||
|
note = {Accessed: 2018-05-25}
|
||||||
|
}
|
||||||
|
@ -50,7 +50,7 @@
|
|||||||
Almost every child around the world knows about ``Where's Waldo?'', also
|
Almost every child around the world knows about ``Where's Waldo?'', also
|
||||||
known as ``Where's Wally?'' in some countries. This famous puzzle book has
|
known as ``Where's Wally?'' in some countries. This famous puzzle book has
|
||||||
spread its way across the world and is published in more than 25 different
|
spread its way across the world and is published in more than 25 different
|
||||||
languages. The idea behind the books is to find the character ``Waldo'',
|
languages. The idea behind the books is to find the character Waldo,
|
||||||
shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
|
shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
|
||||||
however, not as easy as it sounds. Every picture in the book is full of tiny
|
however, not as easy as it sounds. Every picture in the book is full of tiny
|
||||||
details and Waldo is only one out of many. The puzzle is made even harder by
|
details and Waldo is only one out of many. The puzzle is made even harder by
|
||||||
@ -64,7 +64,7 @@
|
|||||||
\includegraphics[scale=0.35]{waldo.png}
|
\includegraphics[scale=0.35]{waldo.png}
|
||||||
\centering
|
\centering
|
||||||
\caption{
|
\caption{
|
||||||
A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo
|
A headshot of the character Waldo, or Wally. Pictures of Waldo
|
||||||
copyrighted by Martin Handford and are used under the fair-use policy.
|
copyrighted by Martin Handford and are used under the fair-use policy.
|
||||||
}
|
}
|
||||||
\label{fig:waldo}
|
\label{fig:waldo}
|
||||||
@ -82,7 +82,7 @@
|
|||||||
setting that is humanly tangible. In this report we will try to identify
|
setting that is humanly tangible. In this report we will try to identify
|
||||||
Waldo in the puzzle images using different classification methods. Every
|
Waldo in the puzzle images using different classification methods. Every
|
||||||
image will be split into different segments and every segment will have to
|
image will be split into different segments and every segment will have to
|
||||||
be classified as either being ``Waldo'' or ``not Waldo''. We will compare
|
be classified as either being Waldo or not Waldo. We will compare
|
||||||
various different classification methods from more classical machine
|
various different classification methods from more classical machine
|
||||||
learning, like naive Bayes classifiers, to the currently state of the art,
|
learning, like naive Bayes classifiers, to the currently state of the art,
|
||||||
Neural Networks. In \Cref{sec:background} we will introduce the different
|
Neural Networks. In \Cref{sec:background} we will introduce the different
|
||||||
@ -158,15 +158,23 @@
|
|||||||
of randomness and the mean of these trees is used which avoids this problem.
|
of randomness and the mean of these trees is used which avoids this problem.
|
||||||
|
|
||||||
\subsection{Neural Network Architectures}
|
\subsection{Neural Network Architectures}
|
||||||
\tab There are many well established architectures for Neural Networks depending on the task being performed.
|
|
||||||
In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}.
|
There are many well established architectures for Neural Networks depending
|
||||||
One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth.
|
on the task being performed. In this paper, the focus is placed on
|
||||||
These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network.
|
convolution neural networks, which have been proven to effectively classify
|
||||||
The difference between the two network use of convolutional and dense layers.
|
images \cite{NIPS2012_4824}. One of the pioneering works in the field, the
|
||||||
The convolutional neural network contains dense layers in the final stages of the network.
|
LeNet \cite{726791}architecture, will be implemented to compare against two
|
||||||
The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step.
|
rudimentary networks with more depth. These networks have been constructed
|
||||||
The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations.
|
to improve on the LeNet architecture by extracting more features, condensing
|
||||||
\\
|
image information, and allowing for more parameters in the network. The
|
||||||
|
difference between the two network use of convolutional and dense layers.
|
||||||
|
The convolutional neural network contains dense layers in the final stages
|
||||||
|
of the network. The Fully Convolutional Network (FCN) contains only one
|
||||||
|
dense layer for the final binary classification step. The FCN instead
|
||||||
|
consists of an extra convolutional layer, resulting in an increased ability
|
||||||
|
for the network to abstract the input data relative to the other two
|
||||||
|
configurations. \\
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[H]
|
||||||
\includegraphics[scale=0.50]{LeNet}
|
\includegraphics[scale=0.50]{LeNet}
|
||||||
\centering
|
\centering
|
||||||
@ -184,11 +192,11 @@
|
|||||||
agreement intended to allow users to freely share, modify, and use [a]
|
agreement intended to allow users to freely share, modify, and use [a]
|
||||||
Database while maintaining [the] same freedom for
|
Database while maintaining [the] same freedom for
|
||||||
others"\cite{openData}}hosted on the predictive modeling and analytics
|
others"\cite{openData}}hosted on the predictive modeling and analytics
|
||||||
competition framework, Kaggle. The distinction between images containing
|
competition framework, Kaggle~\cite{kaggle}. The distinction between images
|
||||||
Waldo, and those that do not, was provided by the separation of the images
|
containing Waldo, and those that do not, was provided by the separation of
|
||||||
in different sub-directories. It was therefore necessary to preprocess these
|
the images in different sub-directories. It was therefore necessary to
|
||||||
images before they could be utilized by the proposed machine learning
|
preprocess these images before they could be utilized by the proposed
|
||||||
algorithms.
|
machine learning algorithms.
|
||||||
|
|
||||||
\subsection{Image Processing} \label{imageProcessing}
|
\subsection{Image Processing} \label{imageProcessing}
|
||||||
|
|
||||||
@ -203,15 +211,15 @@
|
|||||||
containing the most individual images of the three size groups. \\
|
containing the most individual images of the three size groups. \\
|
||||||
|
|
||||||
Each of the 64$\times$64 pixel images were inserted into a
|
Each of the 64$\times$64 pixel images were inserted into a
|
||||||
Numpy~\cite{numpy} array of images, and a binary value was inserted into a
|
NumPy~\cite{numpy} array of images, and a binary value was inserted into a
|
||||||
separate list at the same index. These binary values form the labels for
|
separate list at the same index. These binary values form the labels for
|
||||||
each image (``Waldo'' or ``not Waldo''). Color normalization was performed
|
each image (Waldo or not Waldo). Color normalization was performed
|
||||||
on each so that artifacts in an image's color profile correspond to
|
on each so that artifacts in an image's color profile correspond to
|
||||||
meaningful features of the image (rather than photographic method).\\
|
meaningful features of the image (rather than photographic method).\\
|
||||||
|
|
||||||
Each original puzzle is broken down into many images, and only contains one
|
Each original puzzle is broken down into many images, and only contains one
|
||||||
Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this
|
Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this
|
||||||
means that the ``non-Waldo'' data far outnumbers the ``Waldo'' data. To
|
means that the non-Waldo data far outnumbers the Waldo data. To
|
||||||
combat the bias introduced by the skewed data, all Waldo images were
|
combat the bias introduced by the skewed data, all Waldo images were
|
||||||
artificially augmented by performing random rotations, reflections, and
|
artificially augmented by performing random rotations, reflections, and
|
||||||
introducing random noise in the image to produce news images. In this way,
|
introducing random noise in the image to produce news images. In this way,
|
||||||
@ -221,10 +229,10 @@
|
|||||||
robust methods by exposing each technique to variations of the image during
|
robust methods by exposing each technique to variations of the image during
|
||||||
the training phase. \\
|
the training phase. \\
|
||||||
|
|
||||||
Despite the additional data, there were still ten times more ``non-Waldo''
|
Despite the additional data, there were still ten times more non-Waldo
|
||||||
images than Waldo images. Therefore, it was necessary to cull the
|
images than Waldo images. Therefore, it was necessary to cull the
|
||||||
``non-Waldo'' data, so that there was an even split of ``Waldo'' and
|
non-Waldo data, so that there was an even split of Waldo and
|
||||||
``non-Waldo'' images, improving the representation of true positives in the
|
non-Waldo images, improving the representation of true positives in the
|
||||||
image data set. Following preprocessing, the images (and associated labels)
|
image data set. Following preprocessing, the images (and associated labels)
|
||||||
were divided into a training and a test set with a 3:1 split. \\
|
were divided into a training and a test set with a 3:1 split. \\
|
||||||
|
|
||||||
@ -260,7 +268,7 @@
|
|||||||
To evaluate the performance of the models, we record the time taken by
|
To evaluate the performance of the models, we record the time taken by
|
||||||
each model to train, based on the training data and the accuracy with which
|
each model to train, based on the training data and the accuracy with which
|
||||||
the model makes predictions. We calculate accuracy as
|
the model makes predictions. We calculate accuracy as
|
||||||
\(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\)
|
\[a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\]
|
||||||
where \(tp\) is the number of true positives, \(tn\) is the number of true
|
where \(tp\) is the number of true positives, \(tn\) is the number of true
|
||||||
negatives, \(fp\) is the number of false positives, and \(tp\) is the number
|
negatives, \(fp\) is the number of false positives, and \(tp\) is the number
|
||||||
of false negatives.
|
of false negatives.
|
||||||
@ -299,12 +307,20 @@
|
|||||||
network and traditional machine learning technique}
|
network and traditional machine learning technique}
|
||||||
\label{tab:results}
|
\label{tab:results}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
We can see by in these results that Deep Neural Networks outperform our benchmark
|
We can see by the results that Deep Neural Networks outperform our benchmark
|
||||||
classification models, although the time required to train these networks is
|
classification models, although the time required to train these networks is
|
||||||
significantly greater.
|
significantly greater.
|
||||||
|
|
||||||
\section{Conclusion} \label{sec:conclusion}
|
% models that learn relationships between pixels outperform those that don't
|
||||||
|
|
||||||
|
Of the benchmark classifiers we see the best performance with Random
|
||||||
|
Forests and the worst performance with K Nearest Neighbours. As supported
|
||||||
|
by the rest of the results, this comes down to a models ability to learn
|
||||||
|
the hidden relationships between the pixels. This is made more apparent by
|
||||||
|
performance of the Neural Networks.
|
||||||
|
|
||||||
|
\section{Conclusion} \label{sec:conclusion}
|
||||||
|
|
||||||
Image from the ``Where's Waldo?'' puzzle books are ideal images to test
|
Image from the ``Where's Waldo?'' puzzle books are ideal images to test
|
||||||
image classification techniques. Their tendency for hidden objects and ``red
|
image classification techniques. Their tendency for hidden objects and ``red
|
||||||
|
Reference in New Issue
Block a user