Initial tex version

2018-05-04 16:00:42 +10:00 · 2018-05-04 16:00:42 +10:00 · 03beb98766
commit 03beb98766
parent 2200f0ed71
1 changed files with 155 additions and 0 deletions
--- a/wk8/week8.tex
+++ b/wk8/week8.tex
@ -12,6 +12,8 @@
 \usepackage[utf8]{inputenc} %support umlauts in the input
 % Easier compilation
 \usepackage{bookmark}
+\usepackage{natbib}
+\usepackage{graphicx}

 \begin{document}
 	\title{Week 8 - Quantitative data analysis}
@ -25,8 +27,161 @@

 	\section{Method} \label{sec:method}

+	The purpose of this report is to re-analyse the data presented in the paper by
+	\cite{dong2018methods}, which investigates the effect that protests (as an
+	example of disruptive social behaviours in general) have on consumer
+	behaviours. \cite{dong2018methods} hypothesise that protests decrease
+	consumer behaviour in the surrounding area of the event, and suggest that
+	consumer spending could be used as an additional non-traditional economic
+	indicator and as a gauge of consumer sentiment. Consumer spending was analysed
+	using credit card transaction data from a metropolitan area within a country
+	that is part of The Organisation for Economic Co-operation and Development
+	(OECD). Although \cite{dong2018methods} investigate temporal and spatial
+	effects on consumer spending, for the purposes of this analysis, only the
+	spatial effect of variables (with relation to the geographical distance from
+	the event) is considered. The dataset consists of variables measured as a
+	function of the distance from the event (in km), including: the number of
+	customers, the median spending amount, the number of transactions, and the
+	total sales amount.
+
+	The re-analysis is conducted on the data provided in the
+	paper\cite{dong2018methods}, using Python in conjunction with packages such as
+	pandas, matplotlib, numpy and seaborn, to process and visualise the data. As
+	aformentioned, only spatial data and the variables mentioned above are
+	considered, for the reference days and the change occuring Day 62 (day of
+	first socially disruptive event). The distribution of the difference between
+	the reference period and Day 62 is visualised by plotting a histogram for each
+	variable. Since the decrease of each the variables from the reference period
+	to Day 62 is provided, the mean and the median of these distributions can be
+	used to perform a one-sample (as we have are given the difference) hypothesis
+	test to assess whether the protests on Day 62 had a discernable effect.
+
+	Assuming the mean of each variable over the reference period is the midpoint
+	between their respective maximum and minimum values, we can reconstruct
+	approximate actual values for Day 62 (given the decrease in value on Day 62
+	from the reference period). By comparing these value to the range over the
+	reference period, another assessment can be made to determine whether the data
+	presents a discernible effect on consumer spending as a result of social
+	discuption, scaling with distance.
+
+	Although time series data was not explicitely provided, by extrapolating
+	information from a graph in \cite{dong2018methods} we can quantify the decrease
+	in number of customers and median spending on Day 62 using information about the
+	reference days (from 43 to 61). After collecting the values for each of the
+	reference days (43-61), the mean and standard deviation of this sample can be
+	calculated. Assuming a normal distribution of the data, we can calculate a
+	z-score for each observation on Day 62, and use this to assess the original
+	hypothesis.
+
+	By performing each of the above test, a re-analysis will be conducted on
+	\cite{dong2018methods}'s paper hypothesising that consumer spending decreases
+	as a result of social events such as protests. In the Results section, we will
+	perform the statistical analyses described above. The results of these tests
+	will then be explored in the Discussion section, along with assumptions and
+	limitations of the tests and what can be conclused from them.
+
 	\section{Results} \label{sec:results}

+	For each of the variables in the given data (number of customers, median
+	spending amount, number of transactions, and sales totals) we construct a
+	histogram of the decrease of each (on Day 62). We then compute the mean and
+	median of the data so we can proceed to perform a one-sample hypothesis test.
+
+	\begin{figure}[ht]
+		\centering
+		\label{fig:distr}
+		\includegraphics[width=\textwidth]{distr.png}
+		\caption{Distribution of each of the variables recorded in the data, as a function of the distance from an event}
+	\end{figure}
+
+	Using a mean/median of the reference period, obtained by taking the midpoint of the minimum and maximum values over for each distance measure, a value can be reconstructed for the measurement on Day 62 (for each location) using:
+
+	\begin{equation}
+		\textrm{value} = \frac{\textrm{min} + \text{max}}{2} - \textrm{decrease.}
+		\tag{1}
+	\end{equation}
+\\
+	We can then plot the maximum and minimum values for the reference period, as well as the reconstructed Day 62 variables to observe the behaviour of consumer spending after the event.
+
+	\begin{figure}[ht]
+		\centering
+		\label{fig:effect}
+		\includegraphics[width=\textwidth]{effect.png}
+		\caption{The reconstructed values for Day 62 of each variable plotted against their respective minimums and maximums over the reference period}
+	\end{figure}
+
+	Using the data recorded, for each of the three distance recorded, the mean and standard deviation of the reference period can be calculated. The z-score for each observed value on Day 62 can be computed using:
+
+	\begin{equation}
+	\textrm{Z} = \frac{\textrm{X} - \mu}{\sigma},
+	\tag{2}
+	\end{equation}
+\\
+	where X is the observed value, $\mu$ and $\sigma$ are the mean and standard deviation (respectively) of the reference period.
+
+	\begin{table}[ht]
+		\centering
+		\label{my-label}
+		\begin{tabular}{|l|l|r|r|}
+		\hline
+		\textbf{Variable}        & \textbf{Distance} & \textbf{X} & \textbf{Z} \\
+		\hline
+		\textbf{Customers}       & \textless 2km     & -0.600     &  6.87798   \\
+		\textbf{Customers}       & 2km - 4km         & -0.200     & -3.33253   \\
+		\textbf{Customers}       & \textgreater 4km  & -0.100     & -3.70740   \\
+		\textbf{Median Spending} & \textless 2km     & -0.200     & -3.05849   \\
+		\textbf{Median Spending} & 2km - 4km         & -0.100     & -1.46508   \\
+		\textbf{Median Spending} & \textgreater 4km  & -0.035     & -1.99199   \\
+		\hline
+		\end{tabular}
+		\caption{The $Z$ score computed using equation 2 and the temporal data}
+	\end{table}
+
 	\section{Discussion} \label{sec:discussion}

+	As shown in each of the subplots of Figure 1, the mean and median values of
+	the decrease in each of the distributions are greater than zero (note: higher
+	values of the decrease variable indicate a larger decrease/negative change).
+	These mean and median values can be used to perform a one-sample hypothesis
+	tests, which finds that since each of the mean/median values is greater than
+	zero, we can infer that the event had a net decreasing affect on the number of
+	customers, median spending amount, number of transactions, and total sales
+	amount.
+
+	In Figure \ref{fig:effect} values were approximated for each variable on Day
+	62, using Equation 1, and plotted against the minimum and maximum values of
+	the respective variables. This allows us to visually assess whether the
+	reconstructed value for Day 62 lies outside the range of recorded values for
+	the reference period, and presents uncharacteristic behaviour. A decrease is
+	evident in each of the variables after the event has occurred (on Day 62)
+	within a distance of approximately 2 km, and appears to stabilise thereafter.
+	This provides support to \cite{dong2018methods}'s hypothesis that consumer
+	spending is affected by socially disruptive events, and also provides evidence
+	to the notion of spatial scaling of this effect (based on the event location).
+	It is important to note that the approximation used in this technique is
+	subject to a level of error due to the ideal calculation of the mean/median of
+	the reference data as the midpoint between the minimum and maximum values
+	provided.
+
+	Extrapolating data from a graph in \cite{dong2018methods} provided time series
+	data (divided into three radius') to analyse. This data was collected by
+	visually estimating the values from the graph which will inherently introduce
+	a source of error. However, by computing the z-score as described in Equation
+	2, the table provided in Figure 3 was constructed. Each of the z-score values
+	in the table are negative, indicating a decrease in both the number of
+	customers and median spending on Day 62. The much larger magnitude of z-scores
+	for the <2km distance ring for both variables is in agreement with earlier
+	discussion, strengthening the hypothesis of the spatial correlation of
+	consumer spending.
+
+	Each of the above tests have agreed on the spatial and temporal correlation of
+	consumer spending and socially disruptive events. With the limited data
+	available, we can therefore concur with the hypothesis of Dong et al. that
+	consumer spending decreases in the area around disruptive social behaviour,
+	after finding the temporal correlation on Day 62, as well as the spatially
+	decreasing effect further from the event.
+
+	\bibliographystyle{humannat}
+	\bibliography{references}
+
 \end{document}