\documentclass[a4paper]{article} % To compile PDF run: latexmk -pdf {filename}.tex % Math package \usepackage{amsmath} %enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link \usepackage[capitalise,nameinlink]{cleveref} % Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document \usepackage{hyperref} % UTF-8 encoding \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} %support umlauts in the input % Easier compilation \usepackage{bookmark} \usepackage{graphicx} \begin{document} \title{Week 7 - Evidence and experiments} \author{ Jai Bheeman \and Kelvin Davis \and Jip J. Dekker \and Nelson Frew \and Tony Silvestere } \maketitle \section{Introduction} \label{sec:introduction} In this report we have documented a series of hypothesis tests regarding provided data in high-ranking Tennis players. The focus of these hypotheses concerns a player's handedness with regards to overall ranking. We first provide an overview of how we address these notions, with visualisations and descriptions of our overall methodology. Following this, we then provide a brief discussion of what we can infer given our statistical analysis techniques. \section{Method} \label{sec:method} We are testing two hypotheses. The first hypothesis that we test is that tall players have an advantage over smaller players. The second hypothesis that we test is that left-handed players have an advantage over right-handed players. To build an intuition of how the data behaves with respect to the hypotheses we are testing, we created visual representations using tools from the Matplotlib, and Seaborn libraries and then we perform statistical tests to measure these effects. \subsection{Visualisation} \label{subsec:visualisation} \subsubsection{Effect of Height} \label{subsubsec:vheight} We started by performing a scatter plot of points earned by players with respect to their heights, to which we were surprised to find a player recorded to approximately 18m tall. This, we found to be somewhat contradictory to the currently held record of 2.72m. Removing this outlier, we can see a sufficient spread in height, points and ranking. We can also see slight discrepancy in height between males and females, and because of this, we perform separate statistical tests on males and females as to remove the effect of the gender. We plot both points with respect to height and height with respect to ranking. The plot of height with respect to ranking does not show an explicit relationship between the two variables, however we aim to test this relation in the Results Section. \subsubsection{Effect of Handedness} \label{subsubsec:vhand} We use distribution plots from Seaborn to visualise the distribution of points earned by left-handed and right-handed players overlapped on the same plot. The visualisation uses a kernel density estimate of the probability density function derived from the sample provided. We also plot separate distributions for male and female players in case there are any noticeable differences between genders. \begin{figure}[ht] \centering \label{fig:distr} \includegraphics[width=\textwidth]{correlation.png} \caption{Correlation matrix of the numerical values in the dataset} \end{figure} \begin{figure}[ht] \centering \label{fig:distr} \includegraphics[width=\textwidth]{outlier.png} \caption{Scatter plot of points against height with an outlier} \end{figure} \begin{figure}[ht] \centering \label{fig:distr} \includegraphics[width=\textwidth]{pointheight.png} \caption{Scatter plot of points against height with the outlier removed} \end{figure} \begin{figure}[ht] \centering \label{fig:distr} \includegraphics[width=\textwidth]{heightrank.png} \caption{Scatter plot of height against rank} \end{figure} \begin{figure}[ht] \centering \label{fig:distr} \includegraphics[width=\textwidth]{handdistr.png} \caption{Distribution plots of points separated by handedness} \end{figure} \begin{figure}[ht] \centering \label{fig:distr} \includegraphics[width=\textwidth]{handdistr_gender.png} \caption{Distribution plots of points separated by handedness for males and distribution plots of points separated by handedness for females} \end{figure} \subsection{Statistical Tests} \label{subsec:stattests} In testing the first hypothesis, we perform T-tests to analyse the effect of height on the points earned by players. Two T-tests are performed; one for each gender. Each gender of players are separated into two groups; a group of players that scored above the mean number of points and a group of players that scored below the mean number of points and these groups are compared in the T-tests. Later we perform a $\chi^2$ test on the groups together. To test the second hypothesis, we use a T-test to measure the effect of handedness and a $\chi^2$ test to measure the difference between the expected values and the observed values and garner a probability that the sample belongs to the $\chi^2$ distribution. \section{Results} \label{sec:results} We investigate both the advantage of height and the advantage of being left-handed using a $\chi^2$ test and a T-test. For every test we will state the exact hypothesis and the null-hypothesis. \subsection{The advantage of height} \textbf{$\chi^2$-test:} To test if there is an advantage of being tall we ran a $\chi^2$ with the following hypotheses:\\ $H$: Players that are taller have a higher rank \\ $H_0$: The rank of a player is independent of their height \\ \\ To perform the test the players are groups into groups dependant on their rank and if they are taller than the mean height for their gender. The expected data is computed using the chances of being taller than the mean, and the chance of being in the group of rankings. The data used is found in table 1. \begin{table}[ht] \centering \begin{tabular}{|l|r|r|r|r|} \hline & \textbf{M: 168 - 188} & \textbf{M: 189 - 210} & \textbf{F: 155 - 171} & \textbf{F: 172 - 189} \\ \hline \textbf{1 - 99} & 67 / 73 & 32 / 26 & 38 / 42 & 60 / 55 \\ \textbf{100 - 199} & 69 / 72 & 30 / 26 & 31 / 27 & 32 / 36 \\ \textbf{200 - 299} & 75 / 68 & 17 / 25 & 18 / 17 & 22 / 23 \\ \textbf{300 - 399} & 61 / 60 & 21 / 23 & 11 /12 & 17 / 16 \\ \textbf{400 - 499} & 59 / 60 & 22 / 22 & 7 / 6 & 7 / 8 \\ \hline \end{tabular} \label{tab:chiheight} \caption{Observed / Expected values used for the $\chi^2$-test. The groups are divided by their rank (vertical) and, per gender, their height (horizontal).} \end{table} The $\chi^2$ value found is approximately $7.697606186049128$. With 12 degrees of freedom our $p$-value will be $0.8082925814979871$ \textbf{T-test:} A slightly different hypothesis can be tested using a T-Test: \\ $H$: Players that are taller have significantly more point \\ $H_0$: The points a player has is independent of their height \\ We ran this T-test twice, once for the women and once for the men, by splitting the groups of players into two: one being taller than the mean height, one being shorter than the mean height. Our T-test for the men revealed a T-value of 1.711723, this has a p-value of 0.043815. For the women the T-value found was 1.860241, which has a p-value of 0.032030. \subsection{The advantage of left-handedness} \textbf{$\chi^2$-test:} To test if there is an advantage of being left-handed we ran a $\chi^2$ with the following hypotheses:\\ $H$: Players that are left-handed have a higher rank \\ $H_0$: The rank of a player is independent their preferred hand \\ \\ To perform the test the players are groups into groups dependant on their rank and if they play with their left hand. The expected data is computed using the chances of being left-handed. The data used is found in table 2. \begin{table}[ht] \centering \label{tab:chihand} \begin{tabular}{|l|l|l|l|l|l|} \hline & \textbf{1 - 99} & \textbf{100 - 199} & \textbf{200 - 299} & \textbf{300 - 399} & \textbf{400 - 499} \\ \hline \textbf{L} & 22 / 21 & 23 / 18 & 17 / 15 & 6 / 12 & 8 / 10 \\ \textbf{R} & 174 / 177 & 139 / 144 & 117 / 119 & 105 / 98 & 88 / 86 \\ \hline \end{tabular} \caption{Observed / Expected values used for the $\chi^2$-test. The groups are divided by which hand they use (vertical) and their rank (horizontal).} \end{table} The $\chi^2$ value found is approximately $6.467312944404331$. With 4 degrees of freedom our $p$-value will be $0.1668616190847413$ \textbf{T-test:} A slightly different hypothesis can be tested using a T-Test: \\ $H$: Players that are left-handed have significantly more point \\ $H_0$: The points a player has is independent of their preferred hand \\ We ran this T-test by splitting the groups of players into two depending on their preferred hand. Our T-test revealed a T-value of 0.451694, this has a p-value of 0.325815. \section{Discussion} \label{sec:discussion} In our investigation we did not find any strong correlation between the ranking of a player (or their number of points) and with which hand they played or how tall they are. Most tests failed to pass the required p value of $<0.05$. The only tests that did give us positive results are the T-test that were conducted on the correlation between height and the number of points. However, without the $\chi^2$-test confirming the correlation, the existence of the correlation is questionable. These results might not be so surprising when the visual exploration is taken into account. Only slight deviations are visible in our graphs, so the test mainly confirmed our suspicion that no definitive correlation exists between the different attributes. \end{document}