\documentclass[a4paper]{article}
	% To compile PDF run: latexmk -pdf {filename}.tex

	\usepackage{graphicx}				% Used to insert images into the paper
	\usepackage{float}
	\usepackage[justification=centering]{caption}	% Used for captions
	\captionsetup[figure]{font=small}	% Makes captions small
	\newcommand\tab[1][0.5cm]{\hspace*{#1}}		% Defines a new command to use 'tab' in text
	% Math package
	\usepackage{amsmath}
	% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
	\usepackage{hyperref}
	%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
	\usepackage[capitalise,nameinlink]{cleveref}
	% UTF-8 encoding
	\usepackage[T1]{fontenc}
	\usepackage[utf8]{inputenc} %support umlauts in the input
	% Easier compilation
	\usepackage{bookmark}
	\usepackage{natbib}

	\begin{document}
		\title{Waldo discovery using Neural Networks}
		\author{Kelvin Davis \and Jip J. Dekker\and Anthony Silvestere}
		\maketitle

		\begin{abstract}

		\end{abstract}

		\section{Introduction}

		\section{Background}

      This paper is mad \cite{Kotsiantis2007}.

		\section{Methods}

      % Kelvin Start
      \subsection{Benchmarking}\label{benchmarking}

      In order to benchmark the Neural Networks, the performance of these
      algorithms are evaluated against other Machine Learning algorithms. We
      use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian
      Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn.

      \subsection{Performance Metrics}\label{performance-metrics}

      To evaluate the performance of the models, we record the time taken by
      each model to train, based on the training data and statistics about the
      predictions the models make on the test data. These prediction
      statistics include:

      \begin{itemize}
      \tightlist
      \item
        \textbf{Accuracy:}
        \[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
      \item
        \textbf{Precision:}
        \[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
      \item
        \textbf{Recall:}
        \[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
      \item
        \textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
        number of true positives, \(tn\) is the number of true negatives,
        \(fp\) is the number of false positives, and \(tp\) is the number of
        false negatives.
      \end{itemize}

      Accuracy is a common performance metric used in Machine Learning,
      however in classification problems where the training data is heavily
      biased toward one category, sometimes a model will learn to optimize its
      accuracy by classifying all instances as one category. I.e. the
      classifier will classify all images that do not contain Waldo as not
      containing Waldo, but will also classify all images containing Waldo as
      not containing Waldo. Thus we use, other metrics to measure performance
      as well.

      \emph{Precision} returns the percentage of classifications of Waldo that
      are actually Waldo. \emph{Recall} returns the percentage of Waldos that
      were actually predicted as Waldo. In the case of a classifier that
      classifies all things as Waldo, the recall would be 0. \emph{F1-Measure}
      returns a combination of precision and recall that heavily penalises
      classifiers that perform poorly in either precision or recall.
    % Kelvin End

		\section{Results}

		\section{Discussion and Conclusion}

		\bibliographystyle{humannat}
		\bibliography{references}

	\end{document}