1
0
This repository has been archived on 2025-03-06. You can view files and clone it, but cannot push or open issues or pull requests.
2018-05-25 12:46:14 +10:00

257 lines
13 KiB
TeX

\documentclass[a4paper]{article}
% To compile PDF run: latexmk -pdf {filename}.tex
\usepackage{graphicx} % Used to insert images into the paper
\usepackage{float}
\usepackage[justification=centering]{caption} % Used for captions
\captionsetup[figure]{font=small} % Makes captions small
\newcommand\tab[1][0.5cm]{\hspace*{#1}} % Defines a new command to use 'tab' in text
\usepackage[comma, numbers]{natbib} % Used for the bibliography
\usepackage{amsmath} % Math package
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
\usepackage{hyperref}
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
\usepackage[capitalise,nameinlink]{cleveref}
% UTF-8 encoding
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc} %support umlauts in the input
% Easier compilation
\usepackage{bookmark}
\usepackage{natbib}
\usepackage{xcolor}
\newcommand{\todo}[1]{\marginpar{{\textsf{TODO}}}{\textbf{\color{red}[#1]}}}
\begin{document}
\title{What is Waldo?}
\author{Kelvin Davis \and Jip J. Dekker \and Anthony Silvestere}
\maketitle
\begin{abstract}
%
The famous brand of picture puzzles ``Where's Waldo?'' relates well to many
unsolved image classification problem. This offers us the opportunity to
test different image classification methods on a data set that is both small
enough to compute in a reasonable time span and easy for humans to
understand. In this report we compare the well known machine learning
methods Naive Bayes, Support Vector Machines, $k$-Nearest Neighbors, and
Random Forest against the Neural Network Architectures LeNet, Fully
Convolutional Neural Networks, and Fully Convolutional Neural Networks.
\todo{I don't like this big summation but I think it is the important
information}
Our comparison shows that \todo{...}
%
\end{abstract}
\section{Introduction}
Almost every child around the world knows about ``Where's Waldo?'', also
known as ``Where's Wally?'' in some countries. This famous puzzle book has
spread its way across the world and is published in more than 25 different
languages. The idea behind the books is to find the character ``Waldo'',
shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
however, not as easy as it sounds. Every picture in the book is full of tiny
details and Waldo is only one out of many. The puzzle is made even harder by
the fact that Waldo is not always fully depicted, sometimes it is just his
head or his torso popping out from behind something else. Lastly, the reason
that even adults will have trouble spotting Waldo is the fact that the
pictures are full of ``Red Herrings'': things that look like (or are colored
as) Waldo, but are not actually Waldo.
\begin{figure}[ht]
\includegraphics[scale=0.35]{waldo}
\centering
\caption{
A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo
copyrighted by Martin Handford and are used under the fair-use policy.
}
\label{fig:waldo}
\end{figure}
The task of finding Waldo is something that relates to a lot of real life
image recognition tasks. Fields like mining, astronomy, surveillance,
radiology, and microbiology often have to analyse images (or scans) to find
the tiniest details, sometimes undetectable by the human eye. These tasks
are especially hard when the thing(s) you are looking for are similar to the
rest of the images. These tasks are thus generally performed using computers
to identify possible matches.
``Where's Waldo?'' offers us a great tool to study this kind of problem in a
setting that is humanly tangible. In this report we will try to identify
Waldo in the puzzle images using different classification methods. Every
image will be split into different segments and every segment will have to
be classified as either being ``Waldo'' or ``not Waldo''. We will compare
various different classification methods from more classical machine
learning, like naive Bayes classifiers, to the currently state of the art,
Neural Networks. In \Cref{sec:background} we will introduce the different
classification methods, \Cref{sec:method} will explain the way in which
these methods are trained and how they will be evaluated, in
\Cref{sec:results} will discuss the results, and \Cref{sec:conclusion} will
offer our final conclusions.
\section{Background} \label{sec:background}
The classification methods used can separated into two separate groups:
classical machine learning methods and neural network architectures. Many of
the classical machine learning algorithms have variations and improvements
for various purposes; however, for this report we will be using their only
their basic versions. In contrast, we will use different neural network
architectures, as this method is currently the most used for image
classification.
\todo{
\\A couple of papers that may be useful (if needed):
- LeNet: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
- AlexNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- General comparison of LeNet and AlexNet:
"On the Performance of GoogLeNet and AlexNet Applied to Sketches", Pedro Ballester and Ricardo Matsumura Araujo
- Deep NN Architecture:
https://www-sciencedirect-com.ezproxy.lib.monash.edu.au/science/article/pii/S0925231216315533
}
\subsection{Classical Machine Learning Methods}
The following paragraphs will give only brief descriptions of the different
classical machine learning methods used in this reports. For further reading
we recommend reading ``Supervised machine learning: A review of
classification techniques'' \cite{Kotsiantis2007}.
\paragraph{Naive Bayes Classifier}
\cite{naivebayes} is a classification method according to Bayes' theorem,
shown in \Cref{eq:bayes}. Bayes' theorem allows us to calculate the
probability of an event taking into account prior knowledge of conditions of
the event in question. In classification this allows us to calculate the
probability that a new instance has a certain class based its features. We
then assign the class that has the highest probability.
\begin{equation}
\label{eq:bayes}
P(A\mid B)=\frac {P(B\mid A)\,P(A)}{P(B)}
\end{equation}
\paragraph{$k$-Nearest Neighbors}
($k$-NN) \cite{knn} is one of the simplest machine learning algorithms. It
classifies a new instance based on its ``distance'' to the known instances.
It will find the $k$ closest instances to the new instance and assign the
new instance the class that the majority of the $k$ closest instances has.
The method has to be configured in several ways: the number of $k$, the
distance measure, and (depending on $k$) a tie breaking measure all have to
be chosen.
\paragraph{Support Vector Machine}
\cite{svm}
\paragraph{Random Forest}
\cite{randomforest}
\subsection{Neural Network Architectures}
\todo{Did we only do the three in the end? (Alexnet?)}
Yeah, we implemented the LeNet architecture, then improved on it for a fairly standar convolutional neural network (CNN) that was deeper, extracted more features, and condensed that image information more. Then we implemented a more fully convolutional network (FCN) which contained only one dense layer for the final binary classification step. The FCN added an extra convolutional layer, meaning the before classifying each image, the network abstracted the data more than the other two.
\begin{itemize}
\item LeNet
\item CNN
\item FCN
\end{itemize}
\paragraph{Convolutional Neural Networks}
\paragraph{LeNet}
\paragraph{Fully Convolutional Neural Networks}
\section{Method} \label{sec:method}
\tab
In order to effectively utilize the aforementioned modelling and classification techniques, a key consideration is the data they are acting on.
A dataset containing Waldo and non-Waldo images was obtained from an Open Database\footnote{``The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use [a] Database while maintaining [the] same freedom for others"\cite{openData}}hosted on the predictive modelling and analytics competition framework, Kaggle.
The distinction between images containing Waldo, and those that do not, was providied by the separation of the images in different sub-directories.
It was therefore necessary to preprocess these images before they could be utilised by the proposed machine learning algorithms.
\subsection{Image Processing}
\tab
The Waldo image database consists of images of size 64$\times$64, 128$\times$128, and 256$\times$256 pixels obtained by dividing complete Where's Waldo? puzzles.
Within each set of images, those containing Waldo are located in a folder called `waldo', and those not containing Waldo, in a folder called `not\_waldo'.
Since Where's Waldo? puzzles are usually densely populated and contain fine details, the 64$\times$64 pixel set of images were selected to train and evaluate the machine learning models.
These images provide the added benefit of containing the most individual images of the three size groups.
\\
\par
Each of the 64$\times$64 pixel images were inserted into a Numpy
\footnote{Numpy is a popular Python programming library for scientific computing}
array of images, and a binary value was inserted into a seperate list at the same index.
These binary values form the labels for each image (waldo or not waldo).
Colour normalisation was performed on each so that artefacts in an image's colour profile correspond to meaningful features of the image (rather than photographic method).
\\
\par
Each original puzzle is broken down into many images, and only contains one Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this means that the non-Waldo data far outnumbers the Waldo data.
To combat the bias introduced by the skewed data, all Waldo images were artificially augmented by performing random rotations, reflections, and introducing random noise in the image to produce news images.
In this way, each original Waldo image was used to produce an additional 10 variations of the image, inserted into the image array.
This provided more variation in the true positives of the data set and assists in the development of more robust methods by exposing each technique to variations of the image during the training phase.
\\
\par
Despite the additional data, there were still over ten times as many non-Waldo images than Waldo images.
Therefore, it was necessary to cull the no-Waldo data, so that there was an even split of Waldo and non-Waldo images, improving the representation of true positives in the image data set.
\\
% Kelvin Start
\subsection{Benchmarking}\label{benchmarking}
In order to benchmark the Neural Networks, the performance of these
algorithms are evaluated against other Machine Learning algorithms. We
use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian
Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn.
\subsection{Performance Metrics}\label{performance-metrics}
To evaluate the performance of the models, we record the time taken by
each model to train, based on the training data and statistics about the
predictions the models make on the test data. These prediction
statistics include:
\begin{itemize}
\item
\textbf{Accuracy:}
\[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
\item
\textbf{Precision:}
\[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
\item
\textbf{Recall:}
\[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
\item
\textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
number of true positives, \(tn\) is the number of true negatives,
\(fp\) is the number of false positives, and \(tp\) is the number of
false negatives.
\end{itemize}
Accuracy is a common performance metric used in Machine Learning,
however in classification problems where the training data is heavily
biased toward one category, sometimes a model will learn to optimize its
accuracy by classifying all instances as one category. I.e. the
classifier will classify all images that do not contain Waldo as not
containing Waldo, but will also classify all images containing Waldo as
not containing Waldo. Thus we use, other metrics to measure performance
as well.
\emph{Precision} returns the percentage of classifications of Waldo that
are actually Waldo. \emph{Recall} returns the percentage of Waldos that
were actually predicted as Waldo. In the case of a classifier that
classifies all things as Waldo, the recall would be 0. \emph{F1-Measure}
returns a combination of precision and recall that heavily penalises
classifiers that perform poorly in either precision or recall.
% Kelvin End
\section{Results} \label{sec:results}
\section{Conclusion} \label{sec:conclusion}
\clearpage % Ensures that the references are on a seperate page
\pagebreak
\bibliographystyle{alpha}
\bibliography{references}
\end{document}