Difference between revisions of "ANLY482 AY2017-18T2 Group06 Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(15 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
 
 
 
 
<br>  
 
<br>  
Proprietary trading has long relied on computers to help automate and execute trades. Data scientists, or more commonly known as Quants by Wall Street, have developed huge statistical models for the purpose of this automation. These models though complex, are somewhat static and as the market changes, a commonality in finance markets, they do not work as well as they do in the past.
+
The ability to understand and visualize price movements of foreign exchange rates plays an important role in discovering insights for trading companies. Price movements based on market fundamentals are not sufficient to understand the irrational market behaviors in short time frames such as seconds or minutes movements. To address this problem, better models are required to give more insights to learn about the price patterns. This allows us to better understand the currency pair, US Dollar to Japanese Yen movement and to discover actionable insights based on the two techniques used in our paper: Technical Analysis and Time Series Forecasting.
 
   
 
   
<br>
+
Using the existing market data that pH7 has collected, this research study aims to share with you our journey through this research process to understand the currency price movements. The research study starts with an overview of the business and research motivations to understand the trends within the dollar yen in different time frames and time periods. Through our consolidated findings and the ARIMA model to forecast price movements for the USD/JPY, we hope to be able to find actionable insights to advise our client on a better approach to tackle this currency pair.
As technology advances, we are entering an era of Artificial Intelligence and Machine Learning. Systems have capabilities to analyse large amounts of data at enormous speed and improve themselves through the process. This evolutionary computation and deep learning are seen to be able to automatically recognize changes in the market and adapt in ways the previous statistical models fail to do so.
+
 
  
 
<br>
 
<br>
Line 41: Line 41:
 
&nbsp;
 
&nbsp;
  
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">MOTIVATION</font></div></div>==
+
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">MOTIVATION &OBJECTIVES</font></div></div>==
 
 
The team’s motivation for doing this project is primarily an interest in undertaking a challenging project in an interesting area of research which has been a hot topic among the finance industry, Algorithmic-centric Funding. The opportunity to learn and put into practice a new area of machine learning not covered in our academics was appealing. Algorithmic-centric funding is expected to take a huge role in automated system trades, causing a notable shift the the trading markets. Utilising past data,the opportunity pH7 Global has given us allows us to tap on their expertise in trading of financial instruments and the existing market data they have collected. This gives us a whole new experience of applying analytics in the financial markets.
 
 
 
&nbsp;
 
 
 
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">OBJECTIVES</font></div></div>==
 
  
 +
The price movements of foreign exchange rate currency pairs have always been an instrument of focus by financial institutions and investors.
  
Utilising the minute tick data from our sponsor, we would like to discover useful and practical insights which will allow traders to make more informed decisions in their trading. We would be coming up with a predictive modelling for currency pair.  
+
Currently, pH7 views technical analysis models through their brokerage provided dashboards which do not deliver any combined analysis across more than one technical analysis model or provide any form of suggested trading action they should take. They expressed an interest in using Bollinger Bands together with Relative Strength Index (RSI) to better understand the price movement patterns.
  
The team and our sponsor pH7 Global have identified 2 areas of focus for this project:
+
Therefore, we intend to use technical analysis-Bollinger Bands, RSI and Time Series Forecasting- ARIMA method to analyze price movements and provide a form of trading action which they could adopt. Our objective is to develop a simple and yet useful R-Markdown file that our sponsor would be able to edit and deploy to generate insights for his future trade executions.
  
1. Preliminary Data Analysis and Information Research
+
With our methodologies used to deduce these insights, this would allow them to forecast future trends and behaviors in the financial markets.
<br>
 
2. Predictive Algorithm Modeling and Strategy Testing
 
  
At the end of the project, the teams aims to design a unique predictive model from the data insights discovered during the analysis.
 
 
&nbsp;
 
&nbsp;
  
 
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">METHODOLOGY</font></div></div>==
 
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">METHODOLOGY</font></div></div>==
Our methodology will be a 5 step approach to data prediction, explanation modelling for USD/JPY 1 minute chart.
+
Our methodology will be a 5-step approach for the analysis on the time series data for foreign exchange currency pairs.
 
<br>
 
<br>
 
===<div style="font-family:'Century Gothic';">Exploratory Segment</div>===
 
===<div style="font-family:'Century Gothic';">Exploratory Segment</div>===
 
 
<b>1. Data Collection</b> <br>
 
<b>1. Data Collection</b> <br>
 
At the initial phases of data collection, we must ensure that we have the sufficient fields that are needed for modelling in the later stage.  <br>
 
At the initial phases of data collection, we must ensure that we have the sufficient fields that are needed for modelling in the later stage.  <br>
 
<br>
 
<br>
 
<b>2. Data Cleaning + Transformation</b> <br>
 
<b>2. Data Cleaning + Transformation</b> <br>
In the data cleaning and transformation phase, the data would be tweaked into necessary statistical and analytics parameters necessary for prediction later. <br>
+
In the data cleaning and transformation phase, the data would be tweaked into necessary statistical and analytics parameters necessary for running analysis models later. <br>
 
<br>
 
<br>
 
<b>3. Initial Data Exploration</b> <br>
 
<b>3. Initial Data Exploration</b> <br>
In this area, the data would be initially explored and we would determine the approach of modelling based on the nature of the dataset. Necessary preparations such as checking for multicollinearity of the variables would be taken into consideration before modelling of the variables would be done. Due to the nature of our dataset, careful data exploration must be done.
+
In this area, the data would be initially explored, and we would determine the approach of analysis model based on the nature of the dataset. The nature of our dataset focuses on time series and price related movements, careful data exploration must be done to understand the best tools to use.
</p>
 
  
 
===<div style="font-family:'Century Gothic';">Iterative Segment</div>===
 
===<div style="font-family:'Century Gothic';">Iterative Segment</div>===
<b>4. Model Building</b> <br>
+
<b>4. Selecting and Deploying the Analysis Model</b> <br>
Creating model, determining predictor and target variables. In this area, we would be experimenting with multiple different approaches based on our initial understanding of the dataset after the exploration. It could range from visualizations to machine learning algorithms to achieve the objectives of our client. <br>
+
In this area, we would be experimenting with multiple different analysis approaches based on our initial understanding of the dataset after the exploration. It could range from forecasting to technical analysis, discovering seasonal trends and visualizations to uncover time series patterns to achieve the objectives of our client. <br>
 
<br>
 
<br>
 
<b>5. Model Validation</b> <br>
 
<b>5. Model Validation</b> <br>
We would be proposing a multi-variate methodology of sampling data in order to validate our model. In this aspect, we would be using the 3 way of approach of model validation called “train, test and validate”. Due to the nature of the project, we would like to avoid overfitting and bias in our models. Hence, we will be aiming for a more rigorous testing process with a larger sample data size to avoid such issues. <br>
+
We would be proposing a multi-variate methodology of sampling data to validate our analysis model. In this aspect, we would be using the 2-way of approach of model validation called “train and test”. <br>
 
<br>
 
<br>
We would also be using benchmark metrics to test our predictive modelling to ensure that it is satisfactory. Should it not be satisfactory, we would go back to phase 4 of model building or phase 2 to rebuild the model till the results is satisfactory. <br>
+
We would also be using benchmark metrics to test our analysis models to ensure that it is satisfactory. Should it not be satisfactory, we would go back to phase 4 of model building or phase 2 to rebuild the model till the results is satisfactory. <br>
  
 
+
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">REFERENCES</font></div></div>==
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">DATA</font></div></div>==
+
AS, B., & SK, R. (2015). Exchange Rate Forecasting using ARIMA, Neural Network and Fuzzy Neuron. Retrieved from https://pdfs.semanticscholar.org/c229/b2436364db18b9fb51cd2974b1b4d6766f02.pdf.<BR>
 
 
===<div style="font-family:'Century Gothic';">Data Source</div>===
 
<p style="padding-left: 1cm;">
 
 
 
The dataset given to us includes multiple timeframes of the same period of time series data for a 2 years’ time period; 1st July 2015 to 30th June 2017.<BR>
 
 
<BR>
 
<BR>
The data fields include:<BR>
+
B. (2017). Monetary Policy. Retrieved from https://www.boj.or.jp/en/mopo/mpmdeci/mpr_2017/index.htm/<BR>
- Timestamp (timestamp of the data)
 
 
<BR>
 
<BR>
- High (High point of the currency pair for the minute)
+
BAASHER, A. A., & FAKHR, M. W. (n.d.). FOREX Trend Classification using Machine Learning Techniques. Retrieved from https://pdfs.semanticscholar.org/3c2f/cbcb9bdc0205e924c0f2518d01864da8979a.pdf <BR>
 
<BR>
 
<BR>
- Low (Low point of the currency pair for the minute)
+
Balsara, N. J., Chen, G., & Zheng, L. (2007). The Chinese stock market: An examination of the random walk model and technical trading rules. Quarterly Journal of Business & Economics, 46(2), 43–63. <BR>
<BR>
 
- Open (open price of the currency pair for the minute)
 
<BR>
 
- Close (closing price of the currency pair for the minute)
 
<BR>
 
[[Image:Data1.PNG|centre|600px|]]
 
 
 
To access our client’s database, we used Rstudio codes to directly access the AWS servers and retrieve the data as needed for our analysis. This gave us the flexibility of choosing time periods we want to work with for our analysis.
 
 
 
The resulting data retrieval for 2 years worth of minute tick data for 1 currency pairs comes close to 750,000 rows.
 
 
 
[[Image:Data2.PNG|centre|600px|]]
 
 
 
Below would be initial visualizations 2 sets of the data by Tableau, using the original data without any transformation:
 
[[Image:Data3.PNG|centre|600px|]]
 
[[Image:Data4.PNG|centre|600px|]]
 
 
 
Initial observations of the data revealed incomplete dataset, which was revealed to be time periods of the weekends when the market is closed. Future analysis of the data will take this information into consideration.
 
 
 
Attempting to visualize minute tick data is restricted to a maximum of 1-month time periods due to the volume of data. The result of this visualization is as shown below:
 
[[Image:Data5.PNG|centre|600px|]]
 
 
 
===<div style="font-family:'Century Gothic';">Data Cleaning and Preparation</div>===
 
 
 
For our data cleaning and preparation, we used the following software to both visualize and ETL the data into other forms:
 
1. JNP Pro
 
2. Tableau
 
3. SQL Server Data Tools 2015 (MSSQL)
 
4. Microsoft Excel
 
 
 
Through the visualization seen earlier in the report, we realized that there is a need to perform data transformation to visualize all the data. Therefore, the data was prepared with MSSQL instead to produce a ‘day aggregated’ data-set for analysis on the day-time period basis.
 
 
 
Our initial methodology was to use the clustering method to identify clusters which could be treated as baskets for investment. As the currency values of USDJPY and the rest are vastly different, there was a need to transform into percentage change and standard deviation for clustering.
 
 
 
However, our client does not have 15-20 or more currency pairs in their database. Hence, we would be focusing on forecasting with these 5 currency pairs. Our team used the ARIMA forecasting method and thus the data transformation method would not be required as the ARIMA model uses its own unique method to transform data.
 
 
 
The image below shows the result of our first data transformation.
 
[[Image:Data5.PNG|centre|600px|]]
 
 
 
We performed data transformation to allows us to visualize the data differently and derive new insights on the data:
 
[[Image:Data5.PNG|centre|600px|]]
 
[[Image:Data5.PNG|centre|600px|]]
 
 
 
As seen in the visualization of the data of the same currency pair and time period, we can see the trends and price movements for the entire time period of 2 years for USDJPY data.
 
 
 
This provided additional data discoveries which we observe significant shifts in the price movements and their variations throughout the time period. This allows us to visually compare across multiple currency pairs to spot any prominent similarities and trends between them.
 
Although nothing of significance was identified through the visualization charts as shown below, we could identify periods of time which could increase the granularity of the data points to allow deeper analysis for our forecasting.  
 
 
 
 
 
&nbsp;
 
 
 
 
 
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">SCOPE OF WORK</font></div></div>==
 
We intend to adopt the following steps in our analysis:<BR>
 
 
<BR>
 
<BR>
• Discover insights within the provided data
+
Brewer, M. J., Butler, A., & Cooksley, S. L. (n.d.). The Relative Performance of AIC, AICC and BIC in the Presence of Unobserved Heterogeneity. Retrieved from https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12541.<BR>
 
<BR>
 
<BR>
• To collect and ensure the data of currency pair is relevant in modelling
+
Butler, M., & Kazakov, D. (2010). Particle swarm optimization of Bollinger Bands. In Swarm Intelligence (pp. 504–511), Springer, Berlin.<BR>
 
<BR>
 
<BR>
• Ensure accuracy of data by checking for multicollinearity during data exploration stage 
+
Jebb, A. T., Tay, L., Wang, W., & Huang, Q. (2015). Time series analysis for psychological research: Examining and forecasting change. <BR>
 
<BR>
 
<BR>
• Identification of approaches that range from visualization to machine learning algorithms to determine predictor and target variables
+
J Hyndman, R. (n.d.). ARIMA modelling in R. Retrieved from https://www.otexts.org/fpp/8/7 <BR>
 
<BR>
 
<BR>
• Validate model through “train, test and validate”
+
Kamruzzamana, J. and Sarkerb, R. A. (2003). Comparing ANN Based Models with ARIMA for Prediction of Forex Rates . Retrieved from https://pdfs.semanticscholar.org/959e/dc19a0dfdc94464ac7d6d1f0e2927000d565.pdf <BR>
 
<BR>
 
<BR>
• Use a large sample data to prevent overfitting and bias in our model
+
Kiiski, J. (2009). PERFORMANCE OF RSI INVESTMENT STRATEGY ON FOREIGN EXCHANGE MARKETS. Retrieved from https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12541. <BR>
 +
<BR>
 +
Kuepper, J. (n.d.). Technical Analysis: Indicators And Oscillators. Retrieved from https://www.investopedia.com/university/technical/techanalysis10.asp#ixzz5B2AU2GDa <BR>
 
<BR>
 
<BR>
• Design a unique predictive model
+
Nau, R. (2017, December 14). Identifying the numbers of AR or MA terms in an ARIMA model. Retrieved from https://people.duke.edu/~rnau/411home.htm <BR>
 
<BR>
 
<BR>
• Utilisation of benchmark metrics to test the success rate of the predictive model
+
Petrusheva, N., & Jordanoski, I. (2016). COMPARATIVE ANALYSIS BETWEEN THE FUNDAMENTAL AND TECHNICAL ANALYSIS OF STOCKS. Retrieved from http://scindeks-clanci.ceon.rs/data/pdf/2334-735X/2016/2334-735X1602026P.pdf <BR>
 
<BR>
 
<BR>
• It is important to note that the scope of the project is versatile and can be furthered to address additional questions pH7 might have on the dataset
+
S. (2017). April 2017 Current Events: U.S. News. Retrieved from https://www.infoplease.com/world/2017-current-events/april-2017-current-events-us-news <BR>
 
<BR>
 
<BR>
 
+
Williams, O. D. (2006). Empirical Optimization of BBsFor Profitability. 1-72. Retrieved from file:///C:/Users/User/Downloads/etd2519 (1).pdf. <BR>
==<div style="background: #708090; line-height: 0.5em; font-family:'Century Gothic';  border-left: #2E5593 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">REFERENCES</font></div></div>==
 
Rise of the billionaire robots: how algorithms have redefined hedge funds. (2016, May 15). Retrieved from https://www.theguardian.com/business/us-money-blog/2016/may/15/hedge-fund-managers-algorithms-robots-investment-tips<BR>
 
 
<BR>
 
<BR>
Satariano, A., & Kumar, N. (2017, September 27). The Massive Hedge Fund Betting on AI. Retrieved from https://www.bloomberg.com/news/features/2017-09-27/the-massive-hedge-fund-betting-on-ai
 
 
 
<!--Body End-->
 
<!--Body End-->

Latest revision as of 02:28, 14 April 2018

Logo.PNG

 

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

MAIN PAGE


 
The ability to understand and visualize price movements of foreign exchange rates plays an important role in discovering insights for trading companies. Price movements based on market fundamentals are not sufficient to understand the irrational market behaviors in short time frames such as seconds or minutes movements. To address this problem, better models are required to give more insights to learn about the price patterns. This allows us to better understand the currency pair, US Dollar to Japanese Yen movement and to discover actionable insights based on the two techniques used in our paper: Technical Analysis and Time Series Forecasting.

Using the existing market data that pH7 has collected, this research study aims to share with you our journey through this research process to understand the currency price movements. The research study starts with an overview of the business and research motivations to understand the trends within the dollar yen in different time frames and time periods. Through our consolidated findings and the ARIMA model to forecast price movements for the USD/JPY, we hope to be able to find actionable insights to advise our client on a better approach to tackle this currency pair.




 

MOTIVATION &OBJECTIVES

The price movements of foreign exchange rate currency pairs have always been an instrument of focus by financial institutions and investors.

Currently, pH7 views technical analysis models through their brokerage provided dashboards which do not deliver any combined analysis across more than one technical analysis model or provide any form of suggested trading action they should take. They expressed an interest in using Bollinger Bands together with Relative Strength Index (RSI) to better understand the price movement patterns.

Therefore, we intend to use technical analysis-Bollinger Bands, RSI and Time Series Forecasting- ARIMA method to analyze price movements and provide a form of trading action which they could adopt. Our objective is to develop a simple and yet useful R-Markdown file that our sponsor would be able to edit and deploy to generate insights for his future trade executions.

With our methodologies used to deduce these insights, this would allow them to forecast future trends and behaviors in the financial markets.

 

METHODOLOGY

Our methodology will be a 5-step approach for the analysis on the time series data for foreign exchange currency pairs.

Exploratory Segment

1. Data Collection
At the initial phases of data collection, we must ensure that we have the sufficient fields that are needed for modelling in the later stage.

2. Data Cleaning + Transformation
In the data cleaning and transformation phase, the data would be tweaked into necessary statistical and analytics parameters necessary for running analysis models later.

3. Initial Data Exploration
In this area, the data would be initially explored, and we would determine the approach of analysis model based on the nature of the dataset. The nature of our dataset focuses on time series and price related movements, careful data exploration must be done to understand the best tools to use.

Iterative Segment

4. Selecting and Deploying the Analysis Model
In this area, we would be experimenting with multiple different analysis approaches based on our initial understanding of the dataset after the exploration. It could range from forecasting to technical analysis, discovering seasonal trends and visualizations to uncover time series patterns to achieve the objectives of our client.

5. Model Validation
We would be proposing a multi-variate methodology of sampling data to validate our analysis model. In this aspect, we would be using the 2-way of approach of model validation called “train and test”.

We would also be using benchmark metrics to test our analysis models to ensure that it is satisfactory. Should it not be satisfactory, we would go back to phase 4 of model building or phase 2 to rebuild the model till the results is satisfactory.

REFERENCES

AS, B., & SK, R. (2015). Exchange Rate Forecasting using ARIMA, Neural Network and Fuzzy Neuron. Retrieved from https://pdfs.semanticscholar.org/c229/b2436364db18b9fb51cd2974b1b4d6766f02.pdf.

B. (2017). Monetary Policy. Retrieved from https://www.boj.or.jp/en/mopo/mpmdeci/mpr_2017/index.htm/

BAASHER, A. A., & FAKHR, M. W. (n.d.). FOREX Trend Classification using Machine Learning Techniques. Retrieved from https://pdfs.semanticscholar.org/3c2f/cbcb9bdc0205e924c0f2518d01864da8979a.pdf

Balsara, N. J., Chen, G., & Zheng, L. (2007). The Chinese stock market: An examination of the random walk model and technical trading rules. Quarterly Journal of Business & Economics, 46(2), 43–63.

Brewer, M. J., Butler, A., & Cooksley, S. L. (n.d.). The Relative Performance of AIC, AICC and BIC in the Presence of Unobserved Heterogeneity. Retrieved from https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12541.

Butler, M., & Kazakov, D. (2010). Particle swarm optimization of Bollinger Bands. In Swarm Intelligence (pp. 504–511), Springer, Berlin.

Jebb, A. T., Tay, L., Wang, W., & Huang, Q. (2015). Time series analysis for psychological research: Examining and forecasting change.

J Hyndman, R. (n.d.). ARIMA modelling in R. Retrieved from https://www.otexts.org/fpp/8/7

Kamruzzamana, J. and Sarkerb, R. A. (2003). Comparing ANN Based Models with ARIMA for Prediction of Forex Rates . Retrieved from https://pdfs.semanticscholar.org/959e/dc19a0dfdc94464ac7d6d1f0e2927000d565.pdf

Kiiski, J. (2009). PERFORMANCE OF RSI INVESTMENT STRATEGY ON FOREIGN EXCHANGE MARKETS. Retrieved from https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12541.

Kuepper, J. (n.d.). Technical Analysis: Indicators And Oscillators. Retrieved from https://www.investopedia.com/university/technical/techanalysis10.asp#ixzz5B2AU2GDa

Nau, R. (2017, December 14). Identifying the numbers of AR or MA terms in an ARIMA model. Retrieved from https://people.duke.edu/~rnau/411home.htm

Petrusheva, N., & Jordanoski, I. (2016). COMPARATIVE ANALYSIS BETWEEN THE FUNDAMENTAL AND TECHNICAL ANALYSIS OF STOCKS. Retrieved from http://scindeks-clanci.ceon.rs/data/pdf/2334-735X/2016/2334-735X1602026P.pdf

S. (2017). April 2017 Current Events: U.S. News. Retrieved from https://www.infoplease.com/world/2017-current-events/april-2017-current-events-us-news

Williams, O. D. (2006). Empirical Optimization of BBsFor Profitability. 1-72. Retrieved from file:///C:/Users/User/Downloads/etd2519 (1).pdf.