ANLY482 AY2017-18T2 Group06 Analysis Finding Finals

From Analytics Practicum
Jump to navigation Jump to search
Logo.PNG

 

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

MAIN PAGE


 

DATA PREPARATION

We configured a total of 4 functions to access our client’s brokerage platform, OANDA. These functions can retrieve/send data directly from the brokerage on demand through a live connection. Considerations put into deciding on these functions include:

• Retrieval of historical data from a chosen time period, with the latest possible date being ‘now’.
• Ability to choose the granularity of the data retrieved, ranging from minutes, daily to weekly.
• The specific financial instrument, currency pair, which we want to retrieve data on.
• Submission and modification of orders to the brokerage upon decision at the coding end for a buy/sell trading action.

The resulting functions to achieve all of the following considerations are below:

1. ActualPriceV20: Returns the current bid/ask price of the chosen currency pair
2. AccountInfoV20: Returns account information (balance, profit & loss)
3. AccountPositionsV20: Returns all open trade positions in chosen trading account
4. HisPricesV20: Returns the currency pair’s information according to parameters

TECHNICAL ANALYSIS: BOLLINGER BANDS & RELATIVESTRENGTHINDEX

For the currency pair of USD/JPY, we will be using BBs & RSI to predict currency price movements and deliver a trading action based on their analysis.

The BBs consist of two lines, 2 standard deviation from the center line which is a 20-day simple moving average of the prices. The widening and contraction of the bands directly relates to the increase and decrease of volatility respectively as standard deviation is a measure of volatility. One of the pattern analysis seen by many traders, including our clients, on BBs is a pattern when prices moves significantly closer to the bands, indicating an overbought or oversold situation in the market. This would be a situation where traders might consider selling in an overbought situation and buying in an oversold market.

Bbformula.png

The Relative Strength Index (RSI) is a momentum indicator which compares the magnitude of the recent gain and loss over a specific period of time, to measure speed and change in price movements. The primary use of RSI is for identifying an overbought or oversold market.

RSIformula.png


BBs and RSI function from the TTR package were used with mins data from 3rd January 2017 to 4th January 2017. The day will be split into 4 quarters in order to better visualize the data for the Bollinger bands. Basic descriptive analysis are in each individual quarters. Thereafter, the Bollinger band width of the chart will be plotted to analyses the day, followed by the relative strength index section.

BBs R Package Library

xTs: eXtensible time series to prepare the time series data
TTR: to use the BBands() function to create the bollinger bands data
ggplot2: to create data visualizations for the bollinger band charts
HisPricesV20: function created for this project to retrieve historical data from the OANDA brokerage servers

BBs Minute Data Analysis

BBQ1.png

From 0800hours till 0930 hours, the market is more volatile from the +8gmt market opening hours. Forex traders tend to be active during market opening hours causing more volatility as they have information from the US market, affecting the prices of the currency. Hence, traders could take advantage of the situation. Volatility also decreases past 1030hours after the market stabilizes.










BBQ2.png

At 1500, there is a very strong uptrend which caused the price to spike from approximately 117.3 to approximately 118.2 at 1800hours.Whenever such strong uptrend or downtrend occurs, the price will touch the upper or lower bounds of the BBs respectively. Also, it can be noted that the bands expanded upon the strong uptrend occurring. Nearing the 1800hours at the approximate price of 118.1, the trend eventually reflects less volatility and the bandwidth decreases and eventually climbs to 118.3 at 2000hours.






BBQ3.png

The US market opens at 2000hours, a time usually marked with the highest trading volumes. The bands have a bigger bandwidth in this period compared to the previous 2 as seen from the difference in the Y axis across the charts. 2300hours marks the sharpest bullish trend in the day which reflects an approximate 0.6 increase in price in less than an hour. After 2300hours, the bandwidth eventually expanded to almost 1.0 in the next few minutes, signifying a period of extreme volatility. Thereafter, in the next hour after the peak, there was a reversal in the direction of the price. This is followed by a price drop to 117.6 at 0200hours the next day.






BBQ4.png

At 0200 hours, there was a minor drop of price from 117.5 to 117.3 with a reversal in the direction of the price followed by an increase to an approximate value of 117.6 for the rest of the day. However, the main takeaway is the difficulty in analysis with this chart alone. Hence, a plot for the BBs width will be done to analyze the volatility for the day.









BBWIDTH.png

From the chart, BB width of the period from 8am to 1300hours is relatively low, below 0.002. From 1300, there are 4 occurrences of it exceeding 0.002. At 2300 hours, there is one huge spike from the chart exceeding 0.006 width and at 0000 hours. There is another huge spike observed in the chart followed by a decrease in volatility as time passes









RSI R Package Library

xTs: eXtensible time series to prepare the time series data
TTR: to use the RSI() function to create the prepare the Relative Strength Index data
ggplot2: to create data visualizations for the RSI charts
HisPricesV20: function created for this project to retrieve historical data from the OANDA brokerage servers

RSI Minute Data Analysis

RSI min.png

This plot shows the timestamps where the RSI is deemed as overbought or oversold. The red lines indicate an RSI upper limit of 80 and lower limit of 20 while the blue lines indicate RSI 70 and 30 respectively. Our focus is on the timings of overbought or oversold for potential entries. First point to note is that there are plenty of such occurrences in a minute tick data as the RSI is subjected to large spikes in price leading to false signals. Hence, we would use the values of 80-20.









BB & RSI ANALYSIS

BB&RSI.PNG

The plot above shows the recommended buy or sell based on the 2 technical indicators. The red circle on the chart represents an overbought situation and the recommended action will be selling at the indicated price. The green circle represents an oversold situation and the recommended action will be buying at the indicated price. The table shows the detailed time when the action should be taken, the condition identified by both technical indicators, and the recommended buy or sell trading action.











TIME SERIES FORECASTING-ARIMA

ARIMA analyses the autocorrelation within the price data to identify patterns within them, with the additional consideration for seasonality and making the data stationary by differencing. We chose ARIMA as it has better forecasting abilities base on our literature review and our technical analysis with BBs and Relative Strength Index (RSI) perform stronger in short time periods like the minute tick data. ARIMA relies on past values and is expected to work better than BBs and RSI for daily and weekly data. We will be performing ARIMA on 3 different sets of data: Minutes, Daily, Weekly

ARIMA MODEL WITH MINUTES DATA

The data is from 3rd January 2017 to 4th January 2017, first half of the day, extracted from the OANDA brokerage.

ARIMA R Package Library

xTs: eXtensible time series to prepare the time series data
forecast: the main package used to prepare ARIMA data, including the creation of ACF and PACF plots, seasonal adjustment by seasadj(), auto.arima() and arima() for the modeling and forecasting.
tseries: to use adf.test() for the Augmented Dickey Fuller Test
HisPricesV20: function created for this project to retrieve historical data from the OANDA brokerage servers

Decomposition Chart

Decompmin.png

Based on the decomposition, there is a trend of frequent seasonality. In the decomposition, there are 48 cycles of seasons for the day data of 2nd to 3rd January 2017. It points out into a single season per 30 minutes in the day for the data of the USD/JPY currency pair, due to the popularity of using 30-minute chart with intraday trading. With such strong seasonality involved, we would be removing the seasonality by using the seasadj() function in R before proceeding to next step

Decompmin2.png

The null hypothesis is that the unit root is present in time series, the alternative hypothesis is that it is stationary. The test statistic is -1.2466 which is not past the threshold of -2.87 with a 5% confidence level. Hence null hypothesis of unit root present in time series is not rejected. The function used above are from the forecast package: - seasadj:used to remove the seasonal component in a dataset. With the input being the data. - adf.test: To perform the Augmented Dickey-Fuller test for the null hypothesis of a unit root of a time series object.

The Autocorrelation Function Chart

ACFmin.png

ACF is used to find the similarity of the observations with the lag of time inputted. The value of the lag 2 and above are likely because of the propagation of lag 1, meaning that the partial autocorrelation function should be used to confirm the lags. The function above is from the forecast package: - ACF: used to calculate the auto correlation function value of a dataset. input: moving average data

The Partial Autocorrelation Function Chart

PACFmin.png

PACF is a conditional correlation. It is different from ACF with linear dependency with signal at shorter lags removed. Based on the PACF, the significant lag PACF value is for lag order 1, confirming the propagation of the lag. The function above is from the forecast package: - PACF: used to calculate the partial auto correlation function value of a dataset. input: moving average data

Lag Order 1 Deseasonal Chart

LOmin.png

The chart shows a differencing of lag order 1 and stationarity can be observed. The function above is from the base package: - diff: returns the difference of the dataset. input: dataset and order of difference

ACF Differenced Series

Pp111.PNG

The null hypothesis is that the unit root is present in time series, the alternative hypothesis is that it is stationary. In this case, the test statistic is -7.1651. Hence, the null hypothesis is rejected and alternative hypothesis of stationarity is accepted.

Capture.PNG

The graph has many values present that past the boundary. The value of lag 2 and above are likely due to the propagation of lag 1 and hence we will be relying on the partial ACF to determine the lags.

PACF Differenced Series

Pp3.png

Model Residuals (Auto ARIMA)

Pp4.png

The function is used with raw data. The auto. ARIMA suggests an ARIMA of ARIMA(1,2,0)(1,0,0)[30] model. The best model is selected from the following four: 1. ARIMA (2, d,2), 2. ARIMA(0,d,0), 3. ARIMA(1,d,0), 4. ARIMA(0,d,1). However, it does not mean that it is the ideal choice. Based on the PACF suggesting the AR value to be (1), we will be trying out a variety of models of ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(1,1,2), ARIMA(1,1,3), ARIMA(1,2,1), ARIMA(1,2,2), ARIMA(1,2,3). The seasonal component suggests ARIMA of order (1,0,0) and we will be using that together while trying out other variants of ARIMA in the other segments. The ACF and the PACF functions illustrate good results in terms of not having any significant positive PACF or ACF values in any of the lag orders.

Model Errors for other ARIMA models

However, the default method of ARIMA of Maximum Likelihood are unable to compute ARIMA of order (1,2,1)(1,0,0) and ARIMA of order (1,2,3)(1,0,0). Alternatively, we could use sum of squares, but it does not generate the AICC, BIC and AIC values

Pp5.png

Based on the figure above, ARIMA (1,1,3)(1,0,0) has the smallest AICC value at -16518.32 hence we will be selecting the model, in contrary from the auto ARIMA value. Our second smallest value for AICC is ARIMA(1,1,1)(1,0,0) which has a close AICc value at -16516.35

Testing and Cross Validation

To determine the performance of the model, we did a hold out and comparison to the actual values. It is done for both the selected choice of ARIMA(1,1,3)(1,0,0) and the one done by auto ARIMA function ARIMA(1,2,0)(1,0,0).

Pp6.png

Means square error (MSE) is the difference between the actual results and the forecasted value. A smaller mean square error would indicate a higher accuracy for model fit.

From the chart, the forecast deviates slightly from the actual value. The calculated has a mean square error between the forecast and actual is 0.009882969. The function used above is from the forecast package: forecast: used to forecast time series models based on the type of model used. input: model

Pp7.png

The forecast deviates slightly from the actual value as seen above. The calculated has a mean square error between the forecast and actual is 0.08584668. Comparing both of the error, the order of ARIMA(1,1,3)(1,0,0) is selected as the mean squared error is lower. However, this is done for a whole day dataset and is not granular enough. Hence, we split the data into 1 hour sets to observe their individual holdouts.

We split the data into 1 hour sets to prepare the data to do holdout periods for individual hours. A holdout of 10% is done for all the hourly data. The plots and the mean squared error is calculated and shown below.

Pp8.PNG
PP999.PNG

The mean squared errors with each individual hour of the day above are sorted in ascending order. The 7pm forecast has the lowest mean squared error while the 11pm had the highest.

FORECAST OF ARIMA

Pp10.png

From the chart, it shows the forecast with ARIMA(1,1,3)(1,0,0). The forecast has a very small confidence interval range for 10 predictions from 117.5935 to 117.6301 for 80% confidence interval and 117.5838 to 117.6398 for 95% confidence interval. In this case, there is a prediction of the price going upwards for the next 10 minutes.

ARIMA MODEL WITH DAILY DATA

We used USD/JPY prices from 1st January 2016 to 31st December 2017, extracted from the Oanda brokerage.

PACF

Dd1.png

This pattern indicates a high order autoregressive term in data and the PACF should be used to determine the order of the autoregressive term.

Seasonally Adjusted Model

Dd2.png

Differencing needs to be done, we can see that the data are non-stationary as the series wanders up and down for long periods of time.

ACF for Differenced Series

Dd3.png

Model Residuals for ARIMA (1,1,1)

Dd4.png

The auto.ARIMA() suggests an ARIMA of (1,1,1) model. Which in this event, we wish to compare against other orders for the most optimal choice Based on the PACF suggesting the AR value to be (1), we will be trying out a variety of models of ARIMA (1,1,0), ARIMA (1,1,1), ARIMA (1,1,2), ARIMA (1,1,3), ARIMA (1,2,0),ARIMA (1,2,1),ARIMA (1,2,3).

Model Residuals for other ARIMA models

Dd5.png

The default method of ARIMA of Maximum Likelihood are unable to compute ARIMA of order (1,2,1)(1,0,0) and ARIMA of order (1,1,0)(1,0,0). Based on the table above, it can be seen that ARIMA (1,1,1) have the smallest AICC value at -1996.738.

Testing and Cross Validation

Dd6.PNG
Dd7.png

The above shows mean squared errors recorded with each 50 days of daily data in ascending order. The 400th-450th day forecast had the lowest mean square errors while the 50th-100th day had the highest mean square error.

Forecast of ARIMA

Dd8.png
Dd9.png

We have done the forecast with ARIMA(1,1,1). From the chart, the errors that were calculated were very small. Hence this is the recommended model for daily data USDJPY chart.

ARIMA (Daily Dataset) Analysis

The results for ARIMA on the daily data set has shown that ARIMA’s forecasting capabilities is very possible, with 5 of the 10 validation holdouts returning a low MSE. Although its performance pales in comparison to its use on the minute dataset

ARIMA MODEL WITH WEEKLY DATA

The weekly dataset starts from 1st January 2014 to 31st December 2017 , 4 years worth of data

PACF

Ww1.png

The significant lag PACF value is the lag order of 1 since it still has a much higher significance than the other 3 correlations being considered.

Seasonally Adjusted Model

Ww2.png

A differenced series will be done on the non-stationary data above to make it stationary.

ACF Differenced

Ww3.png

The chart above shows a wave pattern that alternated between positive and negative correlations, thus PACF should be used to determine the autoregressive term

PACF Differenced

Ww4.png

The differentiated PACF model continues to show the lag order of 1 having much more significance as compared to any other lag orders.

Model Residuals (1,2,0)

Ww5.png
Ww6.png

The auto.ARIMA suggests an ARIMA of (1,2,0) model. Like before, we will make multiple model comparisons. Based on the PACF suggesting the AR value to be (1), we will be trying out a variety of models of ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(1,1,2), ARIMA(1,1,3), ARIMA(1,2,0),ARIMA(1,2,1),ARIMA(1,2,3).

Model Residuals for other ARIMA Models

Ww7.png

The default method of ARIMA of Maximum Likelihood are unable to compute ARIMA of order (1,2,0)(1,0,0), (1,2,1)(1,0,0), (1,2,3)(1,0,0). Hence, ARIMA (1,1,1) has the smallest AICC value at -350.7329. We can see that the best ARIMA is ARIMA (1,1,1) from the ARIMA model comparison.
Similarly, we did a hold out and comparison to the actual values to see how well the model performs. It will be done for the selected choice of ARIMA(1,1,1)(1,0,0).

Testing and Cross Validation

Ww8.png

From above, the forecast deviates slightly from the actual value. The ARIMA has a mean square error between the forecast and actual is 1.02538875768055, a very high value. In the next segment, we will split the data into 50 day sets to see their individual holdouts. A holdout of 10% will be done for all 50 week sets data.

Ww9.PNG
Ww10.png

Forecast from ARIMA (1,1,0)

The Mean Square Error recorded with each 50 days of daily data above are sorted in ascending order. The results for weekly data are all rather high values, indicating that ARIMA does not forecast weekly price data well.

ARIMA (Weekly Dataset) Analysis (1,1,0)

The results for ARIMA on the weekly data set has shown that ARIMA’s forecasting on weekly data is not very convincing. The mean squared errors are all high forecasts tend to deviate from the actual values. The study results do not suggest the use of ARIMA for forecasting of USD/JPY currency pair weekly data.

RECOMMENDATIONS

R1.png

From the above table, we can see that ARIMA works best on minutes data, with significantly lower MSE compared to daily or weekly. The MSE values for daily data are still considerably positive and indicate that ARIMA is still a possible model to be used for USD/JPY currency pair forecasting. However, the MSE results for weekly data is significantly poorer and we would not suggest ARIMA to be used on weekly currency pair data. The ARIMA result shows how the model deteriorates with the increase of number of periods being forecasted. Thus, ARIMA is more suitable for short term forecasting, performing well for minutes data, satisfactory on daily data, but poorer on weekly data.

Both technical indicators work well on mins data together to identifying overbought and oversold signals. The deployment of both technical indicators on R and directly connecting to near-live data with the trading software opens opportunity for our client to further built towards automated trading with the brokerage. Hence, we recommend using mins and daily data for ARIMA, mins data for BB and RSI to obtain a predicted trading action.

CONCLUSIONS

There are three methods that we have covered in this study: Bollinger Bands, RSI and ARIMA. BBs and RSI are paired together due to both technical indicators’ ability to identify overbought and oversold conditions. The usage of BBs and RSI together, analyses both the volatility of the market with price movements and the momentum of the price movement itself, strengthening the overbought and oversold signals generated.

To overcome the limitation of how our current analysis only focuses on the USD/JPY currency pair, future work can consider developing into a R Shiny interactive platform, where our client can change the data parameters easily through an interface, such as choosing between short/medium/long term moving averages, and receive new charts and forecasts based on their inputs. This would allow them to analyze other currency pairs other than USD/JPY.

Future work can also explore ways to build upon ARIMA. One such limitation is how ARIMA is unable to match seasonality and patterns to actual events that resulted in price movements. With access to financial news data related to the currency pair of research, further research can be done to match the news to movements and improve forecasting accuracy and available insights.