\documentclass[a4paper]{article} % To compile PDF run: latexmk -pdf {filename}.tex \usepackage{graphicx} % Used to insert images into the paper \usepackage{float} \usepackage[justification=centering]{caption} % Used for captions \captionsetup[figure]{font=small} % Makes captions small \newcommand\tab[1][0.5cm]{\hspace*{#1}} % Defines a new command to use 'tab' in text % Math package \usepackage{amsmath} % Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document \usepackage{hyperref} %enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link \usepackage[capitalise,nameinlink]{cleveref} % UTF-8 encoding \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} %support umlauts in the input % Easier compilation \usepackage{bookmark} \usepackage{natbib} \usepackage{xcolor} \newcommand{\todo}[1]{\marginpar{{\textsf{TODO}}}{\textbf{\color{red}[#1]}}} \begin{document} \title{What is Waldo?} \author{Kelvin Davis \and Jip J. Dekker \and Anthony Silvestere} \maketitle \begin{abstract} % The famous brand of picture puzzles ``Where's Waldo?'' relates well to many unsolved image classification problem. This offers us the opportunity to test different image classification methods on a data set that is both small enough to compute in a reasonable time span and easy for humans to understand. In this report we compare the well known machine learning methods Naive Bayes, Support Vector Machines, $k$-Nearest Neighbors, and Random Forest against the Neural Network Architectures LeNet, Fully Convolutional Neural Networks, and Fully Convolutional Neural Networks. \todo{I don't like this big summation but I think it is the important information} Our comparison shows that \todo{...} % \end{abstract} \section{Introduction} Almost every child around the world knows about ``Where's Waldo?'', also known as ``Where's Wally?'' in some countries. This famous puzzle book has spread its way across the world and is published in more than 25 different languages. The idea behind the books is to find the character ``Waldo'', shown in \Cref{fig:waldo}, in the different pictures in the book. This is, however, not as easy as it sounds. Every picture in the book is full of tiny details and Waldo is only one out of many. The puzzle is made even harder by the fact that Waldo is not always fully depicted, sometimes it is just his head or his torso popping out from behind something else. Lastly, the reason that even adults will have trouble spotting Waldo is the fact that the pictures are full of ``Red Herrings'': things that look like (or are colored as) Waldo, but are not actually Waldo. \begin{figure}[ht] \includegraphics[scale=0.35]{waldo} \centering \caption{ A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo copyrighted by Martin Handford and are used under the fair-use policy. } \label{fig:waldo} \end{figure} The task of finding Waldo is something that relates to a lot of real life image recognition tasks. Fields like mining, astronomy, surveillance, radiology, and microbiology often have to analyse images (or scans) to find the tiniest details, sometimes undetectable by the human eye. These tasks are especially hard when the thing(s) you are looking for are similar to the rest of the images. These tasks are thus generally performed using computers to identify possible matches. ``Where's Waldo?'' offers us a great tool to study this kind of problem in a setting that is humanly tangible. In this report we will try to identify Waldo in the puzzle images using different classification methods. Every image will be split into different segments and every segment will have to be classified as either being ``Waldo'' or ``not Waldo''. We will compare various different classification methods from more classical machine learning, like naive Bayes classifiers, to the currently state of the art, Neural Networks. In \Cref{sec:background} we will introduce the different classification methods, \Cref{sec:method} will explain the way in which these methods are trained and how they will be evaluated, in \Cref{sec:results} will discuss the results, and \Cref{sec:conclusion} will offer our final conclusions. \section{Background} \label{sec:background} The classification methods used can separated into two separate groups: classical machine learning methods and neural network architectures. Many of the classical machine learning algorithms have variations and improvements for various purposes; however, for this report we will be using their only their basic versions. In contrast, we will use different neural network architectures, as this method is currently the most used for image classification. \subsection{Classical Machine Learning Methods} The following paragraphs will give only brief descriptions of the different classical machine learning methods used in this reports. For further reading we recommend reading ``Supervised machine learning: A review of classification techniques'' \cite{Kotsiantis2007}. \paragraph{Naive Bayes Classifier} \cite{naivebayes} \paragraph{$k$-Nearest Neighbors} ($k$-NN) \cite{knn} is one of the simplest machine learning algorithms. It classifies a new instance based on its ``distance'' to the known instances. It will find the $k$ closest instances to the new instance and assign the new instance the class that the majority of the $k$ closest instances has. The method has to be configured in several ways: the number of $k$, the distance measure, and (depending on $k$) a tie breaking measure all have to be chosen. \paragraph{Support Vector Machine} \cite{svm} \paragraph{Random Forest} \cite{randomforest} \subsection{Neural Network Architectures} \todo{Did we only do the three in the end? (Alexnet?)} \paragraph{Convolutional Neural Networks} \paragraph{LeNet} \paragraph{Fully Convolutional Neural Networks} \section{Method} \label{sec:method} % Kelvin Start \subsection{Benchmarking}\label{benchmarking} In order to benchmark the Neural Networks, the performance of these algorithms are evaluated against other Machine Learning algorithms. We use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn. \subsection{Performance Metrics}\label{performance-metrics} To evaluate the performance of the models, we record the time taken by each model to train, based on the training data and statistics about the predictions the models make on the test data. These prediction statistics include: \begin{itemize} \item \textbf{Accuracy:} \[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\] \item \textbf{Precision:} \[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\] \item \textbf{Recall:} \[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\] \item \textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the number of true positives, \(tn\) is the number of true negatives, \(fp\) is the number of false positives, and \(tp\) is the number of false negatives. \end{itemize} Accuracy is a common performance metric used in Machine Learning, however in classification problems where the training data is heavily biased toward one category, sometimes a model will learn to optimize its accuracy by classifying all instances as one category. I.e. the classifier will classify all images that do not contain Waldo as not containing Waldo, but will also classify all images containing Waldo as not containing Waldo. Thus we use, other metrics to measure performance as well. \emph{Precision} returns the percentage of classifications of Waldo that are actually Waldo. \emph{Recall} returns the percentage of Waldos that were actually predicted as Waldo. In the case of a classifier that classifies all things as Waldo, the recall would be 0. \emph{F1-Measure} returns a combination of precision and recall that heavily penalises classifiers that perform poorly in either precision or recall. % Kelvin End \section{Results} \label{sec:results} \section{Conclusion} \label{sec:conclusion} \bibliographystyle{alpha} \bibliography{references} \end{document}