188 lines
9.5 KiB
TeX
188 lines
9.5 KiB
TeX
\documentclass[a4paper]{article}
|
|
% To compile PDF run: latexmk -pdf {filename}.tex
|
|
|
|
% Math package
|
|
\usepackage{amsmath}
|
|
%enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
|
|
\usepackage[capitalise,nameinlink]{cleveref}
|
|
% Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
|
|
\usepackage{hyperref}
|
|
% UTF-8 encoding
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[utf8]{inputenc} %support umlauts in the input
|
|
% Easier compilation
|
|
\usepackage{bookmark}
|
|
\usepackage{natbib}
|
|
\usepackage{graphicx}
|
|
|
|
\begin{document}
|
|
\title{Week 8 - Quantitative data analysis}
|
|
\author{
|
|
Jai Bheeman \and Kelvin Davis \and Jip J. Dekker \and Nelson Frew \and Tony
|
|
Silvestere
|
|
}
|
|
\maketitle
|
|
|
|
\section{Introduction} \label{sec:introduction}
|
|
|
|
\section{Method} \label{sec:method}
|
|
|
|
The purpose of this report is to re-analyse the data presented in the paper by
|
|
\cite{dong2018methods}, which investigates the effect that protests (as an
|
|
example of disruptive social behaviours in general) have on consumer
|
|
behaviours. \cite{dong2018methods} hypothesise that protests decrease
|
|
consumer behaviour in the surrounding area of the event, and suggest that
|
|
consumer spending could be used as an additional non-traditional economic
|
|
indicator and as a gauge of consumer sentiment. Consumer spending was analysed
|
|
using credit card transaction data from a metropolitan area within a country
|
|
that is part of The Organisation for Economic Co-operation and Development
|
|
(OECD). Although \cite{dong2018methods} investigate temporal and spatial
|
|
effects on consumer spending, for the purposes of this analysis, only the
|
|
spatial effect of variables (with relation to the geographical distance from
|
|
the event) is considered. The dataset consists of variables measured as a
|
|
function of the distance from the event (in km), including: the number of
|
|
customers, the median spending amount, the number of transactions, and the
|
|
total sales amount.
|
|
|
|
The re-analysis is conducted on the data provided in the
|
|
paper\cite{dong2018methods}, using Python in conjunction with packages such as
|
|
pandas, matplotlib, numpy and seaborn, to process and visualise the data. As
|
|
aformentioned, only spatial data and the variables mentioned above are
|
|
considered, for the reference days and the change occuring Day 62 (day of
|
|
first socially disruptive event). The distribution of the difference between
|
|
the reference period and Day 62 is visualised by plotting a histogram for each
|
|
variable. Since the decrease of each the variables from the reference period
|
|
to Day 62 is provided, the mean and the median of these distributions can be
|
|
used to perform a one-sample (as we have are given the difference) hypothesis
|
|
test to assess whether the protests on Day 62 had a discernable effect.
|
|
|
|
Assuming the mean of each variable over the reference period is the midpoint
|
|
between their respective maximum and minimum values, we can reconstruct
|
|
approximate actual values for Day 62 (given the decrease in value on Day 62
|
|
from the reference period). By comparing these value to the range over the
|
|
reference period, another assessment can be made to determine whether the data
|
|
presents a discernible effect on consumer spending as a result of social
|
|
discuption, scaling with distance.
|
|
|
|
Although time series data was not explicitely provided, by extrapolating
|
|
information from a graph in \cite{dong2018methods} we can quantify the decrease
|
|
in number of customers and median spending on Day 62 using information about the
|
|
reference days (from 43 to 61). After collecting the values for each of the
|
|
reference days (43-61), the mean and standard deviation of this sample can be
|
|
calculated. Assuming a normal distribution of the data, we can calculate a
|
|
z-score for each observation on Day 62, and use this to assess the original
|
|
hypothesis.
|
|
|
|
By performing each of the above test, a re-analysis will be conducted on
|
|
\cite{dong2018methods}'s paper hypothesising that consumer spending decreases
|
|
as a result of social events such as protests. In the Results section, we will
|
|
perform the statistical analyses described above. The results of these tests
|
|
will then be explored in the Discussion section, along with assumptions and
|
|
limitations of the tests and what can be conclused from them.
|
|
|
|
\section{Results} \label{sec:results}
|
|
|
|
For each of the variables in the given data (number of customers, median
|
|
spending amount, number of transactions, and sales totals) we construct a
|
|
histogram of the decrease of each (on Day 62). We then compute the mean and
|
|
median of the data so we can proceed to perform a one-sample hypothesis test.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\label{fig:distr}
|
|
\includegraphics[width=\textwidth]{distr.png}
|
|
\caption{Distribution of each of the variables recorded in the data, as a function of the distance from an event}
|
|
\end{figure}
|
|
|
|
Using a mean/median of the reference period, obtained by taking the midpoint of the minimum and maximum values over for each distance measure, a value can be reconstructed for the measurement on Day 62 (for each location) using:
|
|
|
|
\begin{equation}
|
|
\textrm{value} = \frac{\textrm{min} + \text{max}}{2} - \textrm{decrease.}
|
|
\tag{1}
|
|
\end{equation}
|
|
\\
|
|
We can then plot the maximum and minimum values for the reference period, as well as the reconstructed Day 62 variables to observe the behaviour of consumer spending after the event.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\label{fig:effect}
|
|
\includegraphics[width=\textwidth]{effect.png}
|
|
\caption{The reconstructed values for Day 62 of each variable plotted against their respective minimums and maximums over the reference period}
|
|
\end{figure}
|
|
|
|
Using the data recorded, for each of the three distance recorded, the mean and standard deviation of the reference period can be calculated. The z-score for each observed value on Day 62 can be computed using:
|
|
|
|
\begin{equation}
|
|
\textrm{Z} = \frac{\textrm{X} - \mu}{\sigma},
|
|
\tag{2}
|
|
\end{equation}
|
|
\\
|
|
where X is the observed value, $\mu$ and $\sigma$ are the mean and standard deviation (respectively) of the reference period.
|
|
|
|
\begin{table}[ht]
|
|
\centering
|
|
\label{my-label}
|
|
\begin{tabular}{|l|l|r|r|}
|
|
\hline
|
|
\textbf{Variable} & \textbf{Distance} & \textbf{X} & \textbf{Z} \\
|
|
\hline
|
|
\textbf{Customers} & \textless 2km & -0.600 & 6.87798 \\
|
|
\textbf{Customers} & 2km - 4km & -0.200 & -3.33253 \\
|
|
\textbf{Customers} & \textgreater 4km & -0.100 & -3.70740 \\
|
|
\textbf{Median Spending} & \textless 2km & -0.200 & -3.05849 \\
|
|
\textbf{Median Spending} & 2km - 4km & -0.100 & -1.46508 \\
|
|
\textbf{Median Spending} & \textgreater 4km & -0.035 & -1.99199 \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{The $Z$ score computed using equation 2 and the temporal data}
|
|
\end{table}
|
|
|
|
\section{Discussion} \label{sec:discussion}
|
|
|
|
As shown in each of the subplots of Figure 1, the mean and median values of
|
|
the decrease in each of the distributions are greater than zero (note: higher
|
|
values of the decrease variable indicate a larger decrease/negative change).
|
|
These mean and median values can be used to perform a one-sample hypothesis
|
|
tests, which finds that since each of the mean/median values is greater than
|
|
zero, we can infer that the event had a net decreasing affect on the number of
|
|
customers, median spending amount, number of transactions, and total sales
|
|
amount.
|
|
|
|
In Figure \ref{fig:effect} values were approximated for each variable on Day
|
|
62, using Equation 1, and plotted against the minimum and maximum values of
|
|
the respective variables. This allows us to visually assess whether the
|
|
reconstructed value for Day 62 lies outside the range of recorded values for
|
|
the reference period, and presents uncharacteristic behaviour. A decrease is
|
|
evident in each of the variables after the event has occurred (on Day 62)
|
|
within a distance of approximately 2 km, and appears to stabilise thereafter.
|
|
This provides support to \cite{dong2018methods}'s hypothesis that consumer
|
|
spending is affected by socially disruptive events, and also provides evidence
|
|
to the notion of spatial scaling of this effect (based on the event location).
|
|
It is important to note that the approximation used in this technique is
|
|
subject to a level of error due to the ideal calculation of the mean/median of
|
|
the reference data as the midpoint between the minimum and maximum values
|
|
provided.
|
|
|
|
Extrapolating data from a graph in \cite{dong2018methods} provided time series
|
|
data (divided into three radius') to analyse. This data was collected by
|
|
visually estimating the values from the graph which will inherently introduce
|
|
a source of error. However, by computing the z-score as described in Equation
|
|
2, the table provided in Figure 3 was constructed. Each of the z-score values
|
|
in the table are negative, indicating a decrease in both the number of
|
|
customers and median spending on Day 62. The much larger magnitude of z-scores
|
|
for the <2km distance ring for both variables is in agreement with earlier
|
|
discussion, strengthening the hypothesis of the spatial correlation of
|
|
consumer spending.
|
|
|
|
Each of the above tests have agreed on the spatial and temporal correlation of
|
|
consumer spending and socially disruptive events. With the limited data
|
|
available, we can therefore concur with the hypothesis of Dong et al. that
|
|
consumer spending decreases in the area around disruptive social behaviour,
|
|
after finding the temporal correlation on Day 62, as well as the spatially
|
|
decreasing effect further from the event.
|
|
|
|
\bibliographystyle{humannat}
|
|
\bibliography{references}
|
|
|
|
\end{document}
|