1
0

Cut other stats

This commit is contained in:
Kelvin Davis 2018-05-25 17:42:03 +10:00
parent d57c73be04
commit ad057de020

View File

@ -252,41 +252,12 @@
\subsection{Performance Metrics}\label{performance-metrics} \subsection{Performance Metrics}\label{performance-metrics}
To evaluate the performance of the models, we record the time taken by To evaluate the performance of the models, we record the time taken by
each model to train, based on the training data and statistics about the each model to train, based on the training data and the accuracy with which
predictions the models make on the test data. These prediction the model makes predictions. We calculate accuracy as
statistics include: \(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\)
where \(tp\) is the number of true positives, \(tn\) is the number of true
\begin{itemize} negatives, \(fp\) is the number of false positives, and \(tp\) is the number
\item of false negatives.
\textbf{Accuracy:}
\[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
\item
\textbf{Precision:}
\[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
\item
\textbf{Recall:}
\[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
\item
\textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
number of true positives, \(tn\) is the number of true negatives,
\(fp\) is the number of false positives, and \(tp\) is the number of
false negatives.
\end{itemize}
\emph{Accuracy} is a common performance metric used in Machine Learning,
however in classification problems where the training data is heavily biased
toward one category, sometimes a model will learn to optimize its accuracy
by classifying all instances as one category. I.e. the classifier will
classify all images that do not contain Waldo as not containing Waldo, but
will also classify all images containing Waldo as not containing Waldo. Thus
we use, other metrics to measure performance as well. \\
\emph{Precision} returns the percentage of classifications of Waldo that are
actually Waldo. \emph{Recall} returns the percentage of Waldos that were
actually predicted as Waldo. In the case of a classifier that classifies all
things as Waldo, the recall would be 0. \emph{F1-Measure} returns a
combination of precision and recall that heavily penalizes classifiers that
perform poorly in either precision or recall.
\section{Results} \label{sec:results} \section{Results} \label{sec:results}
@ -323,6 +294,10 @@
\label{tab:results} \label{tab:results}
\end{table} \end{table}
We can see by the results that Deep Neural Networks outperform our benchmark
classification models, although the time required to train these networks is
significantly greater.
\section{Conclusion} \label{sec:conclusion} \section{Conclusion} \label{sec:conclusion}
Image from the ``Where's Waldo?'' puzzle books are ideal images to test Image from the ``Where's Waldo?'' puzzle books are ideal images to test