ResearchMethods/wk10/week10.tex

\documentclass[a4paper]{article}
% To compile PDF run: latexmk -pdf {filename}.tex

\usepackage{graphicx}				% Used to insert images into the paper
\graphicspath{ {} }
\usepackage{float}
\usepackage[justification=centering]{caption}	% Used for captions
\captionsetup[figure]{font=small}	% Makes captions small
\newcommand\tab[1][0.5cm]{\hspace*{#1}}		% Defines a new command to use 'tab' in text
% Math package
\usepackage{amsmath}
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
\usepackage[capitalise,nameinlink]{cleveref}
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
\usepackage{hyperref}
% UTF-8 encoding
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc} %support umlauts in the input
% Easier compilation
\usepackage{bookmark}
\usepackage{natbib}
% \usepackage{graphicx}

\begin{document}
	\title{Week 10 - Comparing Algorithms}
	\author{Kelvin Davis \and Jip J. Dekker\and Anthony Silvestere}
	\maketitle

	\section{Introduction}
	\tab
	For a lot of research comparisons are made between different algorithms. Why	is one algorithm better than another? Programming will generally teach you that an algorithm is better if it can be executed faster, but this is not always true. The behaviour of different algorithms must be studied in relation	to its input and it gets even more complicated when random values are used. In this assignment we will compare two algorithms for the ``Dawkin's weasel'' problem. Both algorithms are based on randomisation: the first algorithm is a simple hill climbing algorithm and the second algorithm is a genetic algorithm.
	\\
	\par
	In the hill climbing algorithm, letters in a string of characters of fixed length are randomly generated, and any letter(s) in the correct spot are fixed in place for the next iterations (until the complete words is found). The genetic algorithm finds a character string by treating the alphabet of characters as the population, and choosing which parts of the string to propagate based on the fittness of the ``parent'' strings,
	\\
	\section{Hill Climbing and Genetic Algorithms}
	\tab
	The experiment compared the capability of two algorithms to generate words from scratch. The first algorithm, the hill climbing approach, randomly ``guesses'' each character of the required word, and fixes the ones that are correctly guessed in their respective place. The second approach however, uses a genetic algorithm to generate the words by ``breeding'' the most correct words at each iteration.
	\\
	\par
	The capability of each algorithm for this purpose was assessed by collecting repeated measurements of the number of time steps each algorithm required to produce words of varying fixed length. Due to the stochastic component of each of the algorithms, repeated measurements were taken to improve the precision of the results. Each algorithm was tasked to generate words of length 1, 2, 4, 8, and 16 letters, and each of these measurements were recorded ten times. The average of the results for both methods was used to construct the plot given in Figure \ref{fig:plot1}.
	\\
	\begin{figure}[H]
		\includegraphics[scale=0.55]{chart}
		\centering
		\captionsetup{width=0.80\textwidth}
		\caption{Plot of the hill climbing (red) and genetic algorithm (blue) against the word length}
		\label{fig:plot1}
	\end{figure}
	\par
	Both algorithms appear to scale linearly with the number of words (after words of length 2), with the genetic algorithm consistently requiring many more time steps than the hill climbing algorithm to generate correct words. After fitting each set of data to a linear model, the hill climbing algorithm exhibits a scaling factor (gradient) of 5.57, and the genetic algorithm has a scaling factor of 31.6 (time steps per additional letter). The genetic algorithm appears to scale at approximately 6 times the rate of the hill climbing algorithm.
	\\
	\par
	The reason for the spike in the genetic algorithm for words of length 2, as well as the overall relative performance of the algorithm, may be one of the central tenants of the genetic algorithm; to replicate and propagate the correct/desirable features of solutions at each step, to solutions in the proceeding steps. For each iteration of solutions, this means favouring the reoccurrence of correct letters in the following iteration's solutions. This becomes problematic for generating words, as they are typically short (compared to the alphabet size) and do not often contain a high number of repeating letters.
	\\
	\par
	In order to further explore the \textit{rate} at which each method correctly finds a word, the fitness (percentage of correct letters) of the hill climbing algorithm (Figure \ref{fig:fitness1}) and the genetic algorithm (Figure \ref{fig:fitness2}) were recorded at every iteration for a four letter word. These plots indicate that for a four letter word, the fitness increases much faster (takes less iterations) for the hill climbing algorithm than the genetic algorithm. Figure \ref{fig:fitness2} exhibits a linear growth, while Figure \ref{fig:fitness1} presents a much steeper linear (near super linear) growth before finding the correct word. This may also be due to the aforementioned property of the genetic algorithm disagreeing with the formulation of many words; the encouragement of repetitive patterns/letters.
	\\
	\begin{figure}[H]
		\includegraphics[scale=0.55]{chart-1}
		\centering
		\captionsetup{width=0.80\textwidth}
		\caption{Repeated measurements (five) of the fitness of the stochastic fixed algorithm (as a percentage of the word) against the number of iterations taken}
		\label{fig:fitness1}
	\end{figure}

	\begin{figure}[H]
		\includegraphics[scale=0.55]{chart-2}
		\centering
		\captionsetup{width=0.80\textwidth}
		\caption{Repeated measurements (five) of the fitness of the genetic algorithm (as a percentage of the word) against the number of iterations taken}
		\label{fig:fitness2}
	\end{figure}

	\section{Conclusion}
	This report compares the hill climbing algorithm and the genetic algorithm for the task of generating words from scratch. The number of time steps required to generate word of varying length was repeatedly measured (to gain a more precise result) and graphed. It was found that the hill climbing approach was more suitable to the task, with the number of time steps required to match a string scaling linearly at a much lower rate than using the genetic algorithm.
	\\
	\par
	The encouragement of letter repetition seemed to be a limiting factor for the genetic algorithm. However, this can be interpreted as the result of the artificial conditions imposed during this investigation: that each letter in the constructed strings must be drawn from the complete english alphabet, and that the correct string is known. In the more general case of string finding problems, neither of these conditions may be true. For example, in DNA sequencing for biological problems, the alphabet that comprises a DNA sequence consists of four letters (G, T, C, and A). In this case there are only four letters to draw from, and letters (as well as sequences of letters) often repeat. For this purpose, the genetic algorithm may outperform the hill climbing approach.
	\\
\end{document}