\documentclass[a4paper]{article}
% To compile PDF run: latexmk -pdf {filename}.tex

% Math package
\usepackage{amsmath}
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
\usepackage[capitalise,nameinlink]{cleveref}
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
\usepackage{hyperref}
% UTF-8 encoding
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc} %support umlauts in the input
% Easier compilation
\usepackage{bookmark}
\usepackage{graphicx}

\begin{document}
	\title{Week 9 - Correlation and Regression}
	\author{
		Jai Bheeman \and Kelvin Davis \and Jip J. Dekker \and Nelson Frew \and Tony
		Silvestere
	}
	\maketitle

	\section{Introduction} \label{sec:introduction}
	We present a report on the relationship between the heights and weights of the
	top tennis players as catalogued in provided data. We use statistical analysis 
	techniques to numerically describe the characteristics of the data, to see how 
	trends are exhibited within the data set. We conclude the report with a brief 
	discussion of the implications of the analysis and provide insights on 
	potential correlations that may exist.  

	\section{Method} \label{sec:method}
	Provided with a set of 132 unique records of the top 200 male tennis players,
	we sought to investigate the relationship between the height of particular
	individuals with their respective weights. We conducted basic statistical
	correlation analyses of the two variables with both Pearson's and Spearman's
	correlation coefficients to achieve this. Further, to understand the
	correlations more deeply, we carried out these correlation tests on the full
	population of cleaned data (removed duplicates etc), alongside several random
	samples and samples of ranking ranges within the top 200. To this end, we made
	use of Microsoft Excel tools and functions of the Python library SciPy.

	We specifically have made use of these separate statistical analysis tools in the 
	interest of sanity checking our findings. To do this, we simply replicated the 
	correlation tests within other software environments. 

	\section{Results} \label{sec:results}
	We performed separate statistical analyses on 10 different samples of the
	population, as well as the population itself. This included 11 separate
	subsets of the rankings:
	\begin{itemize}
		\item The top 20 entries
		\item The middle 20 entries
		\item The bottom 20 entries
		\item The top 50 entries
		\item The bottom 50 entries
		\item 5 randomly chosen sets of 20 entries
	\end{itemize}
\vspace{1em}
	Table \ref{tab:excel_results} shows the the results for the conducted tests.

	\begin{table}[ht]
		\centering
		\label{tab:excel_results}
		\begin{tabular}{|l|r|r|}
		\hline
		\textbf{Test Set}       & \textbf{Pearson's Coefficient} & \textbf{Spearman's Coefficient} \\
		\hline
		\textbf{Full Population}     & 0.77953                        & 0.73925                         \\
		\textbf{Top 20}         & 0.80743                        & 0.80345                         \\
		\textbf{Middle 20}      & 0.54134                        & 0.36565                         \\
		\textbf{Bottom 20}      & 0.84046                        & 0.88172                         \\
		\textbf{Top 50}         & 0.80072                        & 0.78979                         \\
		\textbf{Bottom 50}      & 0.84237                        & 0.81355                         \\
		\textbf{Random Set \#1} & 0.84243                        & 0.80237                         \\
		\textbf{Random Set \#2} & 0.56564                        & 0.58714                         \\
		\textbf{Random Set \#3} & 0.59223                        & 0.63662                         \\
		\textbf{Random Set \#4} & 0.65091                        & 0.58471                         \\
		\textbf{Random Set \#5} & 0.86203                        & 0.77832
		\\ \hline
		\end{tabular}
		\caption{Table showing the correlation coefficients between height and
		weight using different test sets. All data is rounded to 5 decimal
		places}
	\end{table}

	\begin{figure}[ht]
		\centering
		\label{fig:scipy}
		\includegraphics[width=0.6\textwidth]{pearson.png}
		\includegraphics[width=0.6\textwidth]{spearman.png}
		\caption{The Pearsion (top) and Spearman (bottom) correlations coefficients
		of the data set as computed by the Pandas Python library}
	\end{figure}

	\section{Discussion} \label{sec:discussion}
	The results generally indicate that there is a fairly strong positive
	correlation between the weight and weight of an individual tennis player,
	within the top 200 male players. The population maintains a strong positive
	correlation with both Pearson's and Spearman's correlation coefficients,
	indicating that a relationship may exist. Our population samples show
	promising consistency with this, with 6 seperate samples having values above
	0.6 with both techniques. The sample taken from the middle 20 players,
	however, shows a relatively weaker correlation compared with the top 20 and
	middle 20, which provides some insight into the distribution of the strongest
	correlated heights and weights amongst the rankings. All five random samples
	of 20 taken from the population indicate however that there does appear to be
	a consistent trend through the population, which corresponds accurately with
	the coefficients on the general population.


\end{document}