ANLY482 AY2017-18T2 Group06 Analysis Findings

From Analytics Practicum
Revision as of 17:11, 1 March 2018 by Yuxuan.tee.2014 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Logo.PNG

 

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

MAIN PAGE


 

DATA CLEANING AND PREPARATION

For our data cleaning and preparation, we used the following software to both visualize and ETL the data into other forms:
1. JNP Pro
2. Tableau
3. SQL Server Data Tools 2015 (MSSQL)
4. Microsoft Excel

Through the visualization seen earlier in the report, we realized that there is a need to perform data transformation to visualize all the data. Therefore, the data was prepared with MSSQL instead to produce a ‘day aggregated’ data-set for analysis on the day-time period basis.

Our initial methodology was to use the clustering method to identify clusters which could be treated as baskets for investment. As the currency values of USDJPY and the rest are vastly different, there was a need to transform into percentage change and standard deviation for clustering.

However, our client does not have 15-20 or more currency pairs in their database. Hence, we would be focusing on forecasting with these 5 currency pairs. Our team used the ARIMA forecasting method and thus the data transformation method would not be required as the ARIMA model uses its own unique method to transform data.

The image below shows the result of our first data transformation.

Data6.PNG

We performed data transformation to allows us to visualize the data differently and derive new insights on the data:

Data7.PNG
Data8.PNG

As seen in the visualization of the data of the same currency pair and time period, we can see the trends and price movements for the entire time period of 2 years for USDJPY data.

This provided additional data discoveries which we observe significant shifts in the price movements and their variations throughout the time period. This allows us to visually compare across multiple currency pairs to spot any prominent similarities and trends between them. Although nothing of significance was identified through the visualization charts as shown below, we could identify periods of time which could increase the granularity of the data points to allow deeper analysis for our forecasting.

DATA VISUALISATION

Below would be initial visualizations 2 sets of the data by Tableau, using the original data without any transformation:

Data3.PNG
Data4.PNG

Initial observations of the data revealed incomplete dataset, which was revealed to be time periods of the weekends when the market is closed. Future analysis of the data will take this information into consideration.

Attempting to visualize minute tick data is restricted to a maximum of 1-month time periods due to the volume of data. The result of this visualization is as shown below:

Data5.PNG

EXPLORATORY DATA ANALYSIS

Fundamentals

The Japanese yen is one of the “safe haven” currency, together with the swiss franc, it is a go to for investors whenever there is major uncertainty in the market. The Japanese yen has a 13.6% weight of the US Dollar index, 2nd behind the EUR (ICE Futures U.S., N.D.). It is also the world’s 3rd biggest economy behind the united states and china, with 5.9% of the world’s GDP.

Data9.PNG

In the initial data exploration, it is observed that the peak of the USDJPY currency pair has an overall 2-year high point of around 125 and a 2 year low of around 100. With a range of 25 approximately (25%), it implies that it is a very volatile pair. From the graph, the currency pair has very fast movements, with huge price changes in a short-time periods generally.

In the graph below, we have identified 4 very high movement periods. In August 2015 and February 2016 there are huge drops in the USD currency value relative to the JPY. In July 2016 there is a huge rise followed by a drop in the USDJPY value. Lastly, in November till December, there is a huge rise in the USDJPY value which is attributed to trump’s election campaign win.

Data10.PNG
Data11.PNG

The chart above consists of 2 other lines of moving averages which include the monthly moving average (20 days) and the quarterly moving average of 70 days.

The quarterly moving average 2-year trend is as follows:

Data18.PNG

Monthly moving average 2-year trend:

Data13.PNG

R Library Forecast: Graph of Plots

Data14.PNG

This graph of plots is generated by the decomposition function of the R library forecast with the STL function. The decomposition function does the following steps, by decomposing the existing plot into 3 different types of data, the seasonal component, the trend component and the cycle component.

From the seasonal component, it is evident that there are 21 cycles of seasonal components, in these2 years, which equates to approximately one season a month.

From the trend aspect, there is no certain indicator of increment or decrement in this time period of 2 years. Overall, it reduced slightly in the 2 years after a decrement followed by an increment. A more detailed breakdown is as shown in the graph above.

From the remainder, it can be seen that the remainder is relatively near to the average, with only a few points of outliers and the outliers are gradual spikes and drops.

Autocorrelation Function

Data15.PNG

The autocorrelation function (ACF) does not give a good gauge of the lag as all the lag values are all above. Hence, there is a need to use the partial ACF.

Data16.PNG

The partial ACF shows a lag order of 1 and 2 to be considered. However, with the high amount of spike in order 1, it is evident that there is a need to differentiate the data by the first order.

4 Augmented Dickey-Fuller Test using R

The following table is our R output:

Data17.PNG

Using the augmented dickey fuller test on the differenced first order data, the non-stationarity is rejected. Hence the hypothesis is stationary as a requirement for using the ARIMA model. Using the auto.arima function, the value of ARIMA(3,1,2) is reflected. This implies that P, D and Q are 3,1 and 2 respectively. This model has an AR(3), I(1) and MA(2).