Cut other stats
This commit is contained in:
parent
d57c73be04
commit
ad057de020
@ -252,41 +252,12 @@
|
|||||||
\subsection{Performance Metrics}\label{performance-metrics}
|
\subsection{Performance Metrics}\label{performance-metrics}
|
||||||
|
|
||||||
To evaluate the performance of the models, we record the time taken by
|
To evaluate the performance of the models, we record the time taken by
|
||||||
each model to train, based on the training data and statistics about the
|
each model to train, based on the training data and the accuracy with which
|
||||||
predictions the models make on the test data. These prediction
|
the model makes predictions. We calculate accuracy as
|
||||||
statistics include:
|
\(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\)
|
||||||
|
where \(tp\) is the number of true positives, \(tn\) is the number of true
|
||||||
\begin{itemize}
|
negatives, \(fp\) is the number of false positives, and \(tp\) is the number
|
||||||
\item
|
of false negatives.
|
||||||
\textbf{Accuracy:}
|
|
||||||
\[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
|
|
||||||
\item
|
|
||||||
\textbf{Precision:}
|
|
||||||
\[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
|
|
||||||
\item
|
|
||||||
\textbf{Recall:}
|
|
||||||
\[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
|
|
||||||
\item
|
|
||||||
\textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
|
|
||||||
number of true positives, \(tn\) is the number of true negatives,
|
|
||||||
\(fp\) is the number of false positives, and \(tp\) is the number of
|
|
||||||
false negatives.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
\emph{Accuracy} is a common performance metric used in Machine Learning,
|
|
||||||
however in classification problems where the training data is heavily biased
|
|
||||||
toward one category, sometimes a model will learn to optimize its accuracy
|
|
||||||
by classifying all instances as one category. I.e. the classifier will
|
|
||||||
classify all images that do not contain Waldo as not containing Waldo, but
|
|
||||||
will also classify all images containing Waldo as not containing Waldo. Thus
|
|
||||||
we use, other metrics to measure performance as well. \\
|
|
||||||
|
|
||||||
\emph{Precision} returns the percentage of classifications of Waldo that are
|
|
||||||
actually Waldo. \emph{Recall} returns the percentage of Waldos that were
|
|
||||||
actually predicted as Waldo. In the case of a classifier that classifies all
|
|
||||||
things as Waldo, the recall would be 0. \emph{F1-Measure} returns a
|
|
||||||
combination of precision and recall that heavily penalizes classifiers that
|
|
||||||
perform poorly in either precision or recall.
|
|
||||||
|
|
||||||
\section{Results} \label{sec:results}
|
\section{Results} \label{sec:results}
|
||||||
|
|
||||||
@ -323,6 +294,10 @@
|
|||||||
\label{tab:results}
|
\label{tab:results}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
We can see by the results that Deep Neural Networks outperform our benchmark
|
||||||
|
classification models, although the time required to train these networks is
|
||||||
|
significantly greater.
|
||||||
|
|
||||||
\section{Conclusion} \label{sec:conclusion}
|
\section{Conclusion} \label{sec:conclusion}
|
||||||
|
|
||||||
Image from the ``Where's Waldo?'' puzzle books are ideal images to test
|
Image from the ``Where's Waldo?'' puzzle books are ideal images to test
|
||||||
|
Reference in New Issue
Block a user