ResearchMethods/wk10/week10.tex

\documentclass[a4paper]{article}
% To compile PDF run: latexmk -pdf {filename}.tex

\usepackage{graphicx}				% Used to insert images into the paper
\graphicspath{ {} }
\usepackage{float}
\usepackage[justification=centering]{caption}	% Used for captions
\captionsetup[figure]{font=small}	% Makes captions small
\newcommand\tab[1][0.5cm]{\hspace*{#1}}		% Defines a new command to use 'tab' in text
% Math package
\usepackage{amsmath}
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
\usepackage[capitalise,nameinlink]{cleveref}
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
\usepackage{hyperref}
% UTF-8 encoding
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc} %support umlauts in the input
% Easier compilation
\usepackage{bookmark}
\usepackage{natbib}
% \usepackage{graphicx}

\begin{document}
	\title{Week 10 - Comparing Algorithms}
	\author{Kelvin Davis \and Jip J. Dekker\and Anthony Silvestere}
	\maketitle

	\section{Introduction}
	For a lot of research comparisons are made between different algorithms. Why
	is one algorithm better than another? Programming will generally teach you
	that an algorithm is better if it can be executed faster, but this is not
	always true. The behaviour of different algorithms must be studied in relation
	to its input and it gets even more complicated when random values are used. In
	this assignment we will compare two algorithms for the ``Dawkin's weasel''
	problem. Both algorithms are based on randomisation: the first algorithm is a
	simple hill climbing algorithm and the second algorithm is a genetic
	algorithm.

	\section{Hill Climbing and Genetic Algorithms}
	% Describe methods
	\tab
	The experiment compared the capability of two algorithms to generate words from scratch. The first algorithm, the hill climbing approach, randomly ``guesses'' each character of the required word, and fixes the ones that are correctly guessed in their respective place. The second approach however, uses a genetic algorithm to generate the words by ``breeding'' the most correct words at each iteration.
	\\
	\par
	The capability of each algorithm for this purpose was assessed by collecting repeated measurements of the number of time steps each algorithm required to produce words of varying fixed length. Due to the stochastic component of each of the algorithms, repeated measurements were taken to improve the precision of the results. Each algorithm was tasked to generate words of length 1, 2, 4, 8, and 16 letters, and each of these measurements were recorded ten times. The average of the results for both methods was used to construct the plot given in Figure \ref{fig:plot1}.
	\\
	\begin{figure}[H]
		\includegraphics[scale=0.55]{chart}
		\centering
		\captionsetup{width=0.80\textwidth}
		\caption{Plot of the hill climbing (red) and genetic algorithm (blue) against the word length}
		\label{fig:plot1}
	\end{figure}
	\par
	Both algorithms appear to scale linearly with the number of words (after words of length 2), and the genetic algorithm consistently requires many more time steps (approximately an order of magnitude) than the hill climbing algorithm to find the words. The reason for the spike in the genetic algorithm for words of length 2, as well as the overall relative performance of the algorithm, may be one of the central tenants of the genetic algorithm; to replicate and propagate the correct/desirable features of solutions at each step, to solutions in the proceeding steps. For each iteration of solutions, this means favouring the reoccurrence of correct letters in the following iteration's solutions. This becomes problematic for generating words, as they are typically short (compared to the alphabet size) and do not often contain a high number of repeating letters.
	\\
	\textbf{*** How great is the range of variation in the time taken to reach a perfect match? ***}
	\par
	In order to assess the \textit{rate} at which each method correctly finds a word, the fitness (percentage of correct letters) of the Hill Climbing algorithm (Figure \ref{fig:fitness1}) and the Genetic Algorithm (Figure \ref{fig:fitness2}) were recorded at every iteration for a four letter word. These plots indicate that for a four letter word, the fitness increases much faster (takes less iterations) for the Hill Climbing algorithm than the Genetic algorithm. Figure \ref{fig:fitness2} exhibits a linear growth, while Figure \ref{fig:fitness1} presents a much steeper linear (near super linear) growth before finding the correct word. This may also be due to the aforementioned property of the genetic algorithm disagreeing with the formulation of many words; the encouragement of repetitive patterns/letters.
	\\
	\par
	\textbf{*** Matching a given string is an artificial problem (we already know the answer). Based on your tests,
	what can you say about the ability of the two approaches for solving real problems? ***}
	% explores a larger solution space
	% answer is not already known in real world mostly
	% there are repeating patterns in nature
	% different alphabet
	\\
	\begin{figure}[H]
		\includegraphics[scale=0.55]{chart-1}
		\centering
		\captionsetup{width=0.80\textwidth}
		\caption{Repeated measurements (five) of the fitness of the stochastic fixed algorithm (as a percentage of the word) against the number of iterations taken}
		\label{fig:fitness1}
	\end{figure}

	\begin{figure}[H]
		\includegraphics[scale=0.55]{chart-2}
		\centering
		\captionsetup{width=0.80\textwidth}
		\caption{Repeated measurements (five) of the fitness of the genetic algorithm (as a percentage of the word) against the number of iterations taken}
		\label{fig:fitness2}
	\end{figure}

	\section{Conclusion}
	\textbf{*** Which algorithm performs better on this task? What is your evidence? What do you think makes
	 it perform better? ***}
	% Make sure Qs  are answered
\end{document}