200 lines
8.5 KiB
TeX
200 lines
8.5 KiB
TeX
\documentclass[a4paper]{article}
|
|
% To compile PDF run: latexmk -pdf {filename}.tex
|
|
|
|
\usepackage{graphicx} % Used to insert images into the paper
|
|
\usepackage{float}
|
|
\usepackage[justification=centering]{caption} % Used for captions
|
|
\captionsetup[figure]{font=small} % Makes captions small
|
|
\newcommand\tab[1][0.5cm]{\hspace*{#1}} % Defines a new command to use 'tab' in text
|
|
% Math package
|
|
\usepackage{amsmath}
|
|
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
|
|
\usepackage{hyperref}
|
|
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
|
|
\usepackage[capitalise,nameinlink]{cleveref}
|
|
% UTF-8 encoding
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[utf8]{inputenc} %support umlauts in the input
|
|
% Easier compilation
|
|
\usepackage{bookmark}
|
|
\usepackage{natbib}
|
|
|
|
\usepackage{xcolor}
|
|
\newcommand{\todo}[1]{\marginpar{{\textsf{TODO}}}{\textbf{\color{red}[#1]}}}
|
|
|
|
\begin{document}
|
|
\title{What is Waldo?}
|
|
\author{Kelvin Davis \and Jip J. Dekker \and Anthony Silvestere}
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
%
|
|
The famous brand of picture puzzles ``Where's Waldo?'' relates well to many
|
|
unsolved image classification problem. This offers us the opportunity to
|
|
test different image classification methods on a data set that is both small
|
|
enough to compute in a reasonable time span and easy for humans to
|
|
understand. In this report we compare the well known machine learning
|
|
methods Naive Bayes, Support Vector Machines, $k$-Nearest Neighbors, and
|
|
Random Forest against the Neural Network Architectures LeNet, Fully
|
|
Convolutional Neural Networks, and Fully Convolutional Neural Networks.
|
|
\todo{I don't like this big summation but I think it is the important
|
|
information}
|
|
Our comparison shows that \todo{...}
|
|
%
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
Almost every child around the world knows about ``Where's Waldo?'', also
|
|
known as ``Where's Wally?'' in some countries. This famous puzzle book has
|
|
spread its way across the world and is published in more than 25 different
|
|
languages. The idea behind the books is to find the character ``Waldo'',
|
|
shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
|
|
however, not as easy as it sounds. Every picture in the book is full of tiny
|
|
details and Waldo is only one out of many. The puzzle is made even harder by
|
|
the fact that Waldo is not always fully depicted, sometimes it is just his
|
|
head or his torso popping out from behind something else. Lastly, the reason
|
|
that even adults will have trouble spotting Waldo is the fact that the
|
|
pictures are full of ``Red Herrings'': things that look like (or are colored
|
|
as) Waldo, but are not actually Waldo.
|
|
|
|
\begin{figure}[ht]
|
|
\includegraphics[scale=0.35]{waldo}
|
|
\centering
|
|
\caption{
|
|
A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo
|
|
copyrighted by Martin Handford and are used under the fair-use policy.
|
|
}
|
|
\label{fig:waldo}
|
|
\end{figure}
|
|
|
|
The task of finding Waldo is something that relates to a lot of real life
|
|
image recognition tasks. Fields like mining, astronomy, surveillance,
|
|
radiology, and microbiology often have to analyse images (or scans) to find
|
|
the tiniest details, sometimes undetectable by the human eye. These tasks
|
|
are especially hard when the thing(s) you are looking for are similar to the
|
|
rest of the images. These tasks are thus generally performed using computers
|
|
to identify possible matches.
|
|
|
|
``Where's Waldo?'' offers us a great tool to study this kind of problem in a
|
|
setting that is humanly tangible. In this report we will try to identify
|
|
Waldo in the puzzle images using different classification methods. Every
|
|
image will be split into different segments and every segment will have to
|
|
be classified as either being ``Waldo'' or ``not Waldo''. We will compare
|
|
various different classification methods from more classical machine
|
|
learning, like naive Bayes classifiers, to the currently state of the art,
|
|
Neural Networks. In \Cref{sec:background} we will introduce the different
|
|
classification methods, \Cref{sec:method} will explain the way in which
|
|
these methods are trained and how they will be evaluated, in
|
|
\Cref{sec:results} will discuss the results, and \Cref{sec:conclusion} will
|
|
offer our final conclusions.
|
|
|
|
\section{Background} \label{sec:background}
|
|
|
|
The classification methods used can separated into two separate groups:
|
|
classical machine learning methods and neural network architectures. Many of
|
|
the classical machine learning algorithms have variations and improvements
|
|
for various purposes; however, for this report we will be using their only
|
|
their basic versions. In contrast, we will use different neural network
|
|
architectures, as this method is currently the most used for image
|
|
classification.
|
|
|
|
\subsection{Classical Machine Learning Methods}
|
|
|
|
The following paragraphs will give only brief descriptions of the different
|
|
classical machine learning methods used in this reports. For further reading
|
|
we recommend reading ``Supervised machine learning: A review of
|
|
classification techniques'' \cite{Kotsiantis2007}.
|
|
|
|
\paragraph{Naive Bayes Classifier}
|
|
|
|
\cite{naivebayes}
|
|
|
|
\paragraph{$k$-Nearest Neighbors}
|
|
|
|
($k$-NN) \cite{knn} is one of the simplest machine learning algorithms. It
|
|
classifies a new instance based on its ``distance'' to the known instances.
|
|
It will find the $k$ closest instances to the new instance and assign the
|
|
new instance the class that the majority of the $k$ closest instances has.
|
|
The method has to be configured in several ways: the number of $k$, the
|
|
distance measure, and (depending on $k$) a tie breaking measure all have to
|
|
be chosen.
|
|
|
|
\paragraph{Support Vector Machine}
|
|
|
|
\cite{svm}
|
|
|
|
\paragraph{Random Forest}
|
|
|
|
\cite{randomforest}
|
|
|
|
\subsection{Neural Network Architectures}
|
|
\todo{Did we only do the three in the end? (Alexnet?)}
|
|
|
|
\paragraph{Convolutional Neural Networks}
|
|
|
|
\paragraph{LeNet}
|
|
|
|
\paragraph{Fully Convolutional Neural Networks}
|
|
|
|
|
|
\section{Method} \label{sec:method}
|
|
|
|
% Kelvin Start
|
|
\subsection{Benchmarking}\label{benchmarking}
|
|
|
|
In order to benchmark the Neural Networks, the performance of these
|
|
algorithms are evaluated against other Machine Learning algorithms. We
|
|
use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian
|
|
Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn.
|
|
|
|
\subsection{Performance Metrics}\label{performance-metrics}
|
|
|
|
To evaluate the performance of the models, we record the time taken by
|
|
each model to train, based on the training data and statistics about the
|
|
predictions the models make on the test data. These prediction
|
|
statistics include:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
\textbf{Accuracy:}
|
|
\[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
|
|
\item
|
|
\textbf{Precision:}
|
|
\[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
|
|
\item
|
|
\textbf{Recall:}
|
|
\[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
|
|
\item
|
|
\textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
|
|
number of true positives, \(tn\) is the number of true negatives,
|
|
\(fp\) is the number of false positives, and \(tp\) is the number of
|
|
false negatives.
|
|
\end{itemize}
|
|
|
|
Accuracy is a common performance metric used in Machine Learning,
|
|
however in classification problems where the training data is heavily
|
|
biased toward one category, sometimes a model will learn to optimize its
|
|
accuracy by classifying all instances as one category. I.e. the
|
|
classifier will classify all images that do not contain Waldo as not
|
|
containing Waldo, but will also classify all images containing Waldo as
|
|
not containing Waldo. Thus we use, other metrics to measure performance
|
|
as well.
|
|
|
|
\emph{Precision} returns the percentage of classifications of Waldo that
|
|
are actually Waldo. \emph{Recall} returns the percentage of Waldos that
|
|
were actually predicted as Waldo. In the case of a classifier that
|
|
classifies all things as Waldo, the recall would be 0. \emph{F1-Measure}
|
|
returns a combination of precision and recall that heavily penalises
|
|
classifiers that perform poorly in either precision or recall.
|
|
% Kelvin End
|
|
|
|
\section{Results} \label{sec:results}
|
|
|
|
\section{Conclusion} \label{sec:conclusion}
|
|
|
|
\bibliographystyle{alpha}
|
|
\bibliography{references}
|
|
|
|
\end{document}
|