86 lines
4.2 KiB
TeX
86 lines
4.2 KiB
TeX
\documentclass[a4paper]{article}
|
|
% To compile PDF run: latexmk -pdf {filename}.tex
|
|
|
|
% Math package
|
|
\usepackage{amsmath}
|
|
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
|
|
\usepackage[capitalise,nameinlink]{cleveref}
|
|
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
|
|
\usepackage{hyperref}
|
|
% UTF-8 encoding
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[utf8]{inputenc} %support umlauts in the input
|
|
% Easier compilation
|
|
\usepackage{bookmark}
|
|
|
|
\begin{document}
|
|
\title{Week 9 - Correlation and Regression}
|
|
\author{
|
|
Jai Bheeman \and Kelvin Davis \and Jip J. Dekker \and Nelson Frew \and Tony
|
|
Silvestere
|
|
}
|
|
\maketitle
|
|
|
|
\section{Introduction} \label{sec:introduction}
|
|
|
|
\section{Method} \label{sec:method}
|
|
Provided with a set of 132 unique records of the top 200 male tennis players,
|
|
we sought to investigate the relationship between the height of particular
|
|
individuals with their respective weights. We conducted basic statistical
|
|
correlation analyses of the two variables with both Pearson's and Spearman's
|
|
correlation coefficients to achieve this. Further, to understand the
|
|
correlations more deeply, we carried out these correlation tests on the full
|
|
population of cleaned data (removed duplicates etc), alongside several random
|
|
samples and samples of ranking ranges within the top 200. To this end, we made
|
|
use of Microsoft Excel tools and functions of the Python library SciPy.
|
|
|
|
\section{Results} \label{sec:results}
|
|
We performed seperate statistical analyses on 10 different samples of the
|
|
population, as well as the population itself. This included 5 separate subsets
|
|
of the rankings (top 20 and 50, middle 20, bottom 20 and 50) and 5 seperate
|
|
randomly chosen samples of 20 players.
|
|
\\ \\
|
|
\Cref{tab:excel-results} shows the the results for the conducted tests.
|
|
|
|
\begin{table}[ht]
|
|
\centering
|
|
\begin{tabular}{|l|r|r|}
|
|
\hline
|
|
\textbf{Test Set} & \textbf{Pearson's Coefficient} & \textbf{Spearman's Coefficient} \\
|
|
\hline
|
|
\textbf{Population} & 0.77953 & 0.73925 \\
|
|
\textbf{Top 20} & 0.80743 & 0.80345 \\
|
|
\textbf{Middle 20} & 0.54134 & 0.36565 \\
|
|
\textbf{Bottom 20} & 0.84046 & 0.88172 \\
|
|
\textbf{Top 50} & 0.80072 & 0.78979 \\
|
|
\textbf{Bottom 50} & 0.84237 & 0.81355 \\
|
|
\textbf{Random Set \#1} & 0.84243 & 0.80237 \\
|
|
\textbf{Random Set \#2} & 0.56564 & 0.58714 \\
|
|
\textbf{Random Set \#3} & 0.59223 & 0.63662 \\
|
|
\textbf{Random Set \#4} & 0.65091 & 0.58471 \\
|
|
\textbf{Random Set \#5} & 0.86203 & 0.77832
|
|
\\ \hline
|
|
\end{tabular}
|
|
\caption{TODO: Insert better caption for this table. All data is rounded to 5 decimal
|
|
places}
|
|
\label{tab:excel-results}
|
|
\end{table}
|
|
|
|
\section{Discussion} \label{sec:discussion}
|
|
The results generally indicate that there is a fairly strong positive
|
|
correlation between the weight and weight of an individual tennis player,
|
|
within the top 200 male players. The population maintains a strong positive
|
|
correlation with both Pearson's and Spearman's correlation coefficients,
|
|
indicating that a relationship may exist. Our population samples show
|
|
promising consistency with this, with 6 seperate samples having values above
|
|
0.6 with both techniques. The sample taken from the middle 20 players,
|
|
however, shows a relatively weaker correlation compared with the top 20 and
|
|
middle 20, which provides some insight into the distribution of the strongest
|
|
correlated heights and weights amongst the rankings. All five random samples
|
|
of 20 taken from the population indicate however that there does appear to be
|
|
a consistent trend through the population, which corresponds accurately with
|
|
the coefficients on the general population.
|
|
|
|
|
|
\end{document}
|