1
0

Merge remote-tracking branch 'origin/master'

This commit is contained in:
Jip J. Dekker 2018-05-25 12:14:13 +10:00
commit 7d635447d1
3 changed files with 175 additions and 49 deletions

View File

@ -1,3 +1,10 @@
@misc{openData,
title={Open Database License (ODbL) v1.0},
url={https://opendatacommons.org/licenses/odbl/1.0/},
journal={Open Data Commons},
year={2018},
month={Feb}
}
@techreport{knn, @techreport{knn,
title={Discriminatory analysis-nonparametric discrimination: consistency properties}, title={Discriminatory analysis-nonparametric discrimination: consistency properties},
author={Fix, Evelyn and Hodges Jr, Joseph L}, author={Fix, Evelyn and Hodges Jr, Joseph L},

View File

@ -6,8 +6,8 @@
\usepackage[justification=centering]{caption} % Used for captions \usepackage[justification=centering]{caption} % Used for captions
\captionsetup[figure]{font=small} % Makes captions small \captionsetup[figure]{font=small} % Makes captions small
\newcommand\tab[1][0.5cm]{\hspace*{#1}} % Defines a new command to use 'tab' in text \newcommand\tab[1][0.5cm]{\hspace*{#1}} % Defines a new command to use 'tab' in text
% Math package \usepackage[comma, numbers]{natbib} % Used for the bibliography
\usepackage{amsmath} \usepackage{amsmath} % Math package
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document % Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
\usepackage{hyperref} \usepackage{hyperref}
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link %enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
@ -99,6 +99,16 @@
architectures, as this method is currently the most used for image architectures, as this method is currently the most used for image
classification. classification.
\textbf{
\\A couple of papers that may be useful (if needed):
- LeNet: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
- AlexNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- General comparison of LeNet and AlexNet:
"On the Performance of GoogLeNet and AlexNet Applied to Sketches", Pedro Ballester and Ricardo Matsumura Araujo
- Deep NN Architecture:
https://www-sciencedirect-com.ezproxy.lib.monash.edu.au/science/article/pii/S0925231216315533
}
\subsection{Classical Machine Learning Methods} \subsection{Classical Machine Learning Methods}
The following paragraphs will give only brief descriptions of the different The following paragraphs will give only brief descriptions of the different
@ -130,6 +140,12 @@
\subsection{Neural Network Architectures} \subsection{Neural Network Architectures}
\todo{Did we only do the three in the end? (Alexnet?)} \todo{Did we only do the three in the end? (Alexnet?)}
Yeah, we implemented the LeNet architecture, then improved on it for a fairly standar convolutional neural network (CNN) that was deeper, extracted more features, and condensed that image information more. Then we implemented a more fully convolutional network (FCN) which contained only one dense layer for the final binary classification step. The FCN added an extra convolutional layer, meaning the before classifying each image, the network abstracted the data more than the other two.
\begin{itemize}
\item LeNet
\item CNN
\item FCN
\end{itemize}
\paragraph{Convolutional Neural Networks} \paragraph{Convolutional Neural Networks}
@ -139,6 +155,36 @@
\section{Method} \label{sec:method} \section{Method} \label{sec:method}
\tab
In order to effectively utilize the aforementioned modelling and classification techniques, a key consideration is the data they are acting on.
A dataset containing Waldo and non-Waldo images was obtained from an Open Database\footnote{``The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use [a] Database while maintaining [the] same freedom for others"\cite{openData}}hosted on the predictive modelling and analytics competition framework, Kaggle.
The distinction between images containing Waldo, and those that do not, was providied by the separation of the images in different sub-directories.
It was therefore necessary to preprocess these images before they could be utilised by the proposed machine learning algorithms.
\subsection{Image Processing}
\tab
The Waldo image database consists of images of size 64$\times$64, 128$\times$128, and 256$\times$256 pixels obtained by dividing complete Where's Waldo? puzzles.
Within each set of images, those containing Waldo are located in a folder called `waldo', and those not containing Waldo, in a folder called `not\_waldo'.
Since Where's Waldo? puzzles are usually densely populated and contain fine details, the 64$\times$64 pixel set of images were selected to train and evaluate the machine learning models.
These images provide the added benefit of containing the most individual images of the three size groups.
\\
\par
Each of the 64$\times$64 pixel images were inserted into a Numpy
\footnote{Numpy is a popular Python programming library for scientific computing}
array of images, and a binary value was inserted into a seperate list at the same index.
These binary values form the labels for each image (waldo or not waldo).
Colour normalisation was performed on each so that artefacts in an image's colour profile correspond to meaningful features of the image (rather than photographic method).
\\
\par
Each original puzzle is broken down into many images, and only contains one Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this means that the non-Waldo data far outnumbers the Waldo data.
To combat the bias introduced by the skewed data, all Waldo images were artificially augmented by performing random rotations, reflections, and introducing random noise in the image to produce news images.
In this way, each original Waldo image was used to produce an additional 10 variations of the image, inserted into the image array.
This provided more variation in the true positives of the data set and assists in the development of more robust methods by exposing each technique to variations of the image during the training phase.
\\
\par
Despite the additional data, there were still over ten times as many non-Waldo images than Waldo images.
Therefore, it was necessary to cull the no-Waldo data, so that there was an even split of Waldo and non-Waldo images, improving the representation of true positives in the image data set.
\\
% Kelvin Start % Kelvin Start
\subsection{Benchmarking}\label{benchmarking} \subsection{Benchmarking}\label{benchmarking}
@ -193,7 +239,11 @@
\section{Conclusion} \label{sec:conclusion} \section{Conclusion} \label{sec:conclusion}
\clearpage % Ensures that the references are on a seperate page
\pagebreak
% References
\section{References}
\renewcommand{\refname}{}
\bibliographystyle{alpha} \bibliographystyle{alpha}
\bibliography{references} \bibliography{references}
\end{document} \end{document}

View File

@ -25,7 +25,7 @@ from keras.utils import to_categorical
''' '''
Model definition define the network structure Model definition define the network structure
''' '''
def FCN(): def CNN():
## List of model layers ## List of model layers
inputs = Input((3, 64, 64)) inputs = Input((3, 64, 64))
@ -33,7 +33,6 @@ def FCN():
m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool1) conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool1)
#drop1 = Dropout(0.2)(conv2) # Drop some portion of features to prevent overfitting
m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool2) conv3 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool2)
@ -47,13 +46,81 @@ def FCN():
drop3 = Dropout(0.2)(dense) drop3 = Dropout(0.2)(dense)
classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify
## Define the model structure ## Define the model start and end
model = Model(inputs=inputs, outputs=classif) model = Model(inputs=inputs, outputs=classif)
# Optimizer recommended Adadelta values (lr=0.01) # Optimizer recommended Adadelta values (lr=0.01)
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1]) model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1])
return model return model
'''
Model definition for a fully convolutional (no dense layers) network structure
'''
def FCN():
## List of model layers
inputs = Input((3, 64, 64))
conv1 = Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=(64, 64, 3))(inputs)
m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool1)
m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool2)
drop2 = Dropout(0.2)(conv3) # Drop some portion of features to prevent overfitting
m_pool2 = MaxPooling2D(pool_size=(2, 2))(drop2)
conv4 = Conv2D(64, (2, 2), activation='relu', padding='same')(m_pool2)
flat = Flatten()(conv4) # Makes data 1D
drop3 = Dropout(0.2)(flat)
classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify
## Define the model start and end
model = Model(inputs=inputs, outputs=classif)
# Optimizer recommended Adadelta values (lr=0.01)
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1])
return model
'''
Model definition for the network structure of LeNet
Note: LeNet was designed to classify into 10 classes, but we are only performing binary classification
'''
def LeNet():
## List of model layers
inputs = Input((3, 64, 64))
conv1 = Conv2D(6, (5, 5), activation='relu', padding='valid', input_shape=(64, 64, 3))(inputs)
m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, (5, 5), activation='relu', padding='valid')(m_pool1)
m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
flat = Flatten()(m_pool2) # Makes data 1D
dense1 = Dense(120, activation='relu')(flat) # Fully connected layer
dense2 = Dense(84, activation='relu')(dense1) # Fully connected layer
drop3 = Dropout(0.2)(dense2)
classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify
## Define the model start and end
model = Model(inputs=inputs, outputs=classif)
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1])
return model
'''
AlexNet architecture
'''
def AlexNet():
inputs = Input(shape=(3, 64, 64))
return model
def f1(y_true, y_pred): def f1(y_true, y_pred):
def recall(y_true, y_pred): def recall(y_true, y_pred):
"""Recall metric. """Recall metric.
@ -110,14 +177,16 @@ lbl_train = to_categorical(lbl_train) # One hot encoding the labels
lbl_test = to_categorical(lbl_test) lbl_test = to_categorical(lbl_test)
## Define model ## Define model
#model = CNN()
model = FCN() model = FCN()
#model = LeNet()
# svm_iclf = ImageClassifier(svm.SVC) # svm_iclf = ImageClassifier(svm.SVC)
# tree_iclf = ImageClassifier(tree.DecisionTreeClassifier) # tree_iclf = ImageClassifier(tree.DecisionTreeClassifier)
# naive_bayes_iclf = ImageClassifier(naive_bayes.GaussianNBd) # naive_bayes_iclf = ImageClassifier(naive_bayes.GaussianNBd)
# ensemble_iclf = ImageClassifier(ensemble.RandomForestClassifier) # ensemble_iclf = ImageClassifier(ensemble.RandomForestClassifier)
## Define training parameters ## Define training parameters
epochs = 10 # an epoch is one forward pass and back propogation of all training data epochs = 25 # an epoch is one forward pass and back propogation of all training data
batch_size = 150 # batch size - number of training example used in one forward/backward pass batch_size = 150 # batch size - number of training example used in one forward/backward pass
# (higher batch size uses more memory, smaller batch size takes more time) # (higher batch size uses more memory, smaller batch size takes more time)
#lrate = 0.01 # Learning rate of the model - controls magnitude of weight changes in training the NN #lrate = 0.01 # Learning rate of the model - controls magnitude of weight changes in training the NN