Initial tex version
This commit is contained in:
parent
2200f0ed71
commit
03beb98766
155
wk8/week8.tex
155
wk8/week8.tex
@ -12,6 +12,8 @@
|
||||
\usepackage[utf8]{inputenc} %support umlauts in the input
|
||||
% Easier compilation
|
||||
\usepackage{bookmark}
|
||||
\usepackage{natbib}
|
||||
\usepackage{graphicx}
|
||||
|
||||
\begin{document}
|
||||
\title{Week 8 - Quantitative data analysis}
|
||||
@ -25,8 +27,161 @@
|
||||
|
||||
\section{Method} \label{sec:method}
|
||||
|
||||
The purpose of this report is to re-analyse the data presented in the paper by
|
||||
\cite{dong2018methods}, which investigates the effect that protests (as an
|
||||
example of disruptive social behaviours in general) have on consumer
|
||||
behaviours. \cite{dong2018methods} hypothesise that protests decrease
|
||||
consumer behaviour in the surrounding area of the event, and suggest that
|
||||
consumer spending could be used as an additional non-traditional economic
|
||||
indicator and as a gauge of consumer sentiment. Consumer spending was analysed
|
||||
using credit card transaction data from a metropolitan area within a country
|
||||
that is part of The Organisation for Economic Co-operation and Development
|
||||
(OECD). Although \cite{dong2018methods} investigate temporal and spatial
|
||||
effects on consumer spending, for the purposes of this analysis, only the
|
||||
spatial effect of variables (with relation to the geographical distance from
|
||||
the event) is considered. The dataset consists of variables measured as a
|
||||
function of the distance from the event (in km), including: the number of
|
||||
customers, the median spending amount, the number of transactions, and the
|
||||
total sales amount.
|
||||
|
||||
The re-analysis is conducted on the data provided in the
|
||||
paper\cite{dong2018methods}, using Python in conjunction with packages such as
|
||||
pandas, matplotlib, numpy and seaborn, to process and visualise the data. As
|
||||
aformentioned, only spatial data and the variables mentioned above are
|
||||
considered, for the reference days and the change occuring Day 62 (day of
|
||||
first socially disruptive event). The distribution of the difference between
|
||||
the reference period and Day 62 is visualised by plotting a histogram for each
|
||||
variable. Since the decrease of each the variables from the reference period
|
||||
to Day 62 is provided, the mean and the median of these distributions can be
|
||||
used to perform a one-sample (as we have are given the difference) hypothesis
|
||||
test to assess whether the protests on Day 62 had a discernable effect.
|
||||
|
||||
Assuming the mean of each variable over the reference period is the midpoint
|
||||
between their respective maximum and minimum values, we can reconstruct
|
||||
approximate actual values for Day 62 (given the decrease in value on Day 62
|
||||
from the reference period). By comparing these value to the range over the
|
||||
reference period, another assessment can be made to determine whether the data
|
||||
presents a discernible effect on consumer spending as a result of social
|
||||
discuption, scaling with distance.
|
||||
|
||||
Although time series data was not explicitely provided, by extrapolating
|
||||
information from a graph in \cite{dong2018methods} we can quantify the decrease
|
||||
in number of customers and median spending on Day 62 using information about the
|
||||
reference days (from 43 to 61). After collecting the values for each of the
|
||||
reference days (43-61), the mean and standard deviation of this sample can be
|
||||
calculated. Assuming a normal distribution of the data, we can calculate a
|
||||
z-score for each observation on Day 62, and use this to assess the original
|
||||
hypothesis.
|
||||
|
||||
By performing each of the above test, a re-analysis will be conducted on
|
||||
\cite{dong2018methods}'s paper hypothesising that consumer spending decreases
|
||||
as a result of social events such as protests. In the Results section, we will
|
||||
perform the statistical analyses described above. The results of these tests
|
||||
will then be explored in the Discussion section, along with assumptions and
|
||||
limitations of the tests and what can be conclused from them.
|
||||
|
||||
\section{Results} \label{sec:results}
|
||||
|
||||
For each of the variables in the given data (number of customers, median
|
||||
spending amount, number of transactions, and sales totals) we construct a
|
||||
histogram of the decrease of each (on Day 62). We then compute the mean and
|
||||
median of the data so we can proceed to perform a one-sample hypothesis test.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\label{fig:distr}
|
||||
\includegraphics[width=\textwidth]{distr.png}
|
||||
\caption{Distribution of each of the variables recorded in the data, as a function of the distance from an event}
|
||||
\end{figure}
|
||||
|
||||
Using a mean/median of the reference period, obtained by taking the midpoint of the minimum and maximum values over for each distance measure, a value can be reconstructed for the measurement on Day 62 (for each location) using:
|
||||
|
||||
\begin{equation}
|
||||
\textrm{value} = \frac{\textrm{min} + \text{max}}{2} - \textrm{decrease.}
|
||||
\tag{1}
|
||||
\end{equation}
|
||||
\\
|
||||
We can then plot the maximum and minimum values for the reference period, as well as the reconstructed Day 62 variables to observe the behaviour of consumer spending after the event.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\label{fig:effect}
|
||||
\includegraphics[width=\textwidth]{effect.png}
|
||||
\caption{The reconstructed values for Day 62 of each variable plotted against their respective minimums and maximums over the reference period}
|
||||
\end{figure}
|
||||
|
||||
Using the data recorded, for each of the three distance recorded, the mean and standard deviation of the reference period can be calculated. The z-score for each observed value on Day 62 can be computed using:
|
||||
|
||||
\begin{equation}
|
||||
\textrm{Z} = \frac{\textrm{X} - \mu}{\sigma},
|
||||
\tag{2}
|
||||
\end{equation}
|
||||
\\
|
||||
where X is the observed value, $\mu$ and $\sigma$ are the mean and standard deviation (respectively) of the reference period.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\label{my-label}
|
||||
\begin{tabular}{|l|l|r|r|}
|
||||
\hline
|
||||
\textbf{Variable} & \textbf{Distance} & \textbf{X} & \textbf{Z} \\
|
||||
\hline
|
||||
\textbf{Customers} & \textless 2km & -0.600 & 6.87798 \\
|
||||
\textbf{Customers} & 2km - 4km & -0.200 & -3.33253 \\
|
||||
\textbf{Customers} & \textgreater 4km & -0.100 & -3.70740 \\
|
||||
\textbf{Median Spending} & \textless 2km & -0.200 & -3.05849 \\
|
||||
\textbf{Median Spending} & 2km - 4km & -0.100 & -1.46508 \\
|
||||
\textbf{Median Spending} & \textgreater 4km & -0.035 & -1.99199 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{The $Z$ score computed using equation 2 and the temporal data}
|
||||
\end{table}
|
||||
|
||||
\section{Discussion} \label{sec:discussion}
|
||||
|
||||
As shown in each of the subplots of Figure 1, the mean and median values of
|
||||
the decrease in each of the distributions are greater than zero (note: higher
|
||||
values of the decrease variable indicate a larger decrease/negative change).
|
||||
These mean and median values can be used to perform a one-sample hypothesis
|
||||
tests, which finds that since each of the mean/median values is greater than
|
||||
zero, we can infer that the event had a net decreasing affect on the number of
|
||||
customers, median spending amount, number of transactions, and total sales
|
||||
amount.
|
||||
|
||||
In Figure \ref{fig:effect} values were approximated for each variable on Day
|
||||
62, using Equation 1, and plotted against the minimum and maximum values of
|
||||
the respective variables. This allows us to visually assess whether the
|
||||
reconstructed value for Day 62 lies outside the range of recorded values for
|
||||
the reference period, and presents uncharacteristic behaviour. A decrease is
|
||||
evident in each of the variables after the event has occurred (on Day 62)
|
||||
within a distance of approximately 2 km, and appears to stabilise thereafter.
|
||||
This provides support to \cite{dong2018methods}'s hypothesis that consumer
|
||||
spending is affected by socially disruptive events, and also provides evidence
|
||||
to the notion of spatial scaling of this effect (based on the event location).
|
||||
It is important to note that the approximation used in this technique is
|
||||
subject to a level of error due to the ideal calculation of the mean/median of
|
||||
the reference data as the midpoint between the minimum and maximum values
|
||||
provided.
|
||||
|
||||
Extrapolating data from a graph in \cite{dong2018methods} provided time series
|
||||
data (divided into three radius') to analyse. This data was collected by
|
||||
visually estimating the values from the graph which will inherently introduce
|
||||
a source of error. However, by computing the z-score as described in Equation
|
||||
2, the table provided in Figure 3 was constructed. Each of the z-score values
|
||||
in the table are negative, indicating a decrease in both the number of
|
||||
customers and median spending on Day 62. The much larger magnitude of z-scores
|
||||
for the <2km distance ring for both variables is in agreement with earlier
|
||||
discussion, strengthening the hypothesis of the spatial correlation of
|
||||
consumer spending.
|
||||
|
||||
Each of the above tests have agreed on the spatial and temporal correlation of
|
||||
consumer spending and socially disruptive events. With the limited data
|
||||
available, we can therefore concur with the hypothesis of Dong et al. that
|
||||
consumer spending decreases in the area around disruptive social behaviour,
|
||||
after finding the temporal correlation on Day 62, as well as the spatially
|
||||
decreasing effect further from the event.
|
||||
|
||||
\bibliographystyle{humannat}
|
||||
\bibliography{references}
|
||||
|
||||
\end{document}
|
||||
|
Reference in New Issue
Block a user