ANLY482 AY2017-18T2 Group06 Project Overview

Proprietary trading has long relied on computers to help automate and execute trades. Data scientists, or more commonly known as Quants by Wall Street, have developed huge statistical models for the purpose of this automation. These models though complex, are somewhat static and as the market changes, a commonality in finance markets, they do not work as well as they do in the past.

As technology advances, we are entering an era of Artificial Intelligence and Machine Learning. Systems have capabilities to analyse large amounts of data at enormous speed and improve themselves through the process. This evolutionary computation and deep learning are seen to be able to automatically recognize changes in the market and adapt in ways the previous statistical models fail to do so.

MOTIVATION

The team’s motivation for doing this project is primarily an interest in undertaking a challenging project in an interesting area of research which has been a hot topic among the finance industry, Algorithmic-centric Funding. The opportunity to learn and put into practice a new area of machine learning not covered in our academics was appealing. Algorithmic-centric funding is expected to take a huge role in automated system trades, causing a notable shift the the trading markets. Utilising past data,the opportunity pH7 Global has given us allows us to tap on their expertise in trading of financial instruments and the existing market data they have collected. This gives us a whole new experience of applying analytics in the financial markets.

OBJECTIVES

Utilising the minute tick data from our sponsor, we would like to discover useful and practical insights which will allow traders to make more informed decisions in their trading. We would be coming up with a predictive modelling for currency pair.

The team and our sponsor pH7 Global have identified 2 areas of focus for this project:

1. Preliminary Data Analysis and Information Research
2. Predictive Algorithm Modeling and Strategy Testing

At the end of the project, the teams aims to design a unique predictive model from the data insights discovered during the analysis.

METHODOLOGY

Our methodology will be a 5 step approach to data prediction, explanation modelling for USD/JPY 1 minute chart.

Exploratory Segment

1. Data Collection
At the initial phases of data collection, we must ensure that we have the sufficient fields that are needed for modelling in the later stage.

2. Data Cleaning + Transformation
In the data cleaning and transformation phase, the data would be tweaked into necessary statistical and analytics parameters necessary for prediction later.

3. Initial Data Exploration
In this area, the data would be initially explored and we would determine the approach of modelling based on the nature of the dataset. Necessary preparations such as checking for multicollinearity of the variables would be taken into consideration before modelling of the variables would be done. Due to the nature of our dataset, careful data exploration must be done.

Iterative Segment

4. Model Building
Creating model, determining predictor and target variables. In this area, we would be experimenting with multiple different approaches based on our initial understanding of the dataset after the exploration. It could range from visualizations to machine learning algorithms to achieve the objectives of our client.

5. Model Validation
We would be proposing a multi-variate methodology of sampling data in order to validate our model. In this aspect, we would be using the 3 way of approach of model validation called “train, test and validate”. Due to the nature of the project, we would like to avoid overfitting and bias in our models. Hence, we will be aiming for a more rigorous testing process with a larger sample data size to avoid such issues.

We would also be using benchmark metrics to test our predictive modelling to ensure that it is satisfactory. Should it not be satisfactory, we would go back to phase 4 of model building or phase 2 to rebuild the model till the results is satisfactory.

DATA

Data Source

The dataset given to us includes multiple timeframes of the same period of time series data for a 2 years’ time period; 1st July 2015 to 30th June 2017.

The data fields include:
- Timestamp (timestamp of the data)
- High (High point of the currency pair for the minute)
- Low (Low point of the currency pair for the minute)
- Open (open price of the currency pair for the minute)
- Close (closing price of the currency pair for the minute)

To access our client’s database, we used Rstudio codes to directly access the AWS servers and retrieve the data as needed for our analysis. This gave us the flexibility of choosing time periods we want to work with for our analysis.

The resulting data retrieval for 2 years worth of minute tick data for 1 currency pairs comes close to 750,000 rows.

SCOPE OF WORK

We intend to adopt the following steps in our analysis:

• Discover insights within the provided data
• To collect and ensure the data of currency pair is relevant in modelling
• Ensure accuracy of data by checking for multicollinearity during data exploration stage
• Identification of approaches that range from visualization to machine learning algorithms to determine predictor and target variables
• Validate model through “train, test and validate”
• Use a large sample data to prevent overfitting and bias in our model
• Design a unique predictive model
• Utilisation of benchmark metrics to test the success rate of the predictive model
• It is important to note that the scope of the project is versatile and can be furthered to address additional questions pH7 might have on the dataset

REFERENCES

B. (n.d.). Dynamic Bayesian Networks. Retrieved November 3, 2010. Retrieved from https://bi.snu.ac.kr/Courses/g-ai10f/Ch9_DBN.pdf.

Gray, A. (2017, March 9). The world’s 10 biggest economies in 2017. Retrieved from https://www.weforum.org/agenda/2017/03/worlds-biggest-economies-in-2017/

PETRICĂ, A., STANCU, S., & TINDECHE, A. (2016). Limitation of ARIMA models in financial and monetary economics. Retrieved from http://store.ectap.ro/articole/1222.pdf

Rise of the billionaire robots: how algorithms have redefined hedge funds. (2016, May 15). Retrieved from https://www.theguardian.com/business/us-money-blog/2016/may/15/hedge-fund-managers-algorithms-robots-investment-tips
Satariano, A., & Kumar, N. (2017, September 27). The Massive Hedge Fund Betting on AI. Retrieved from https://www.bloomberg.com/news/features/2017-09-27/the-massive-hedge-fund-betting-on-ai

US, I. F. (2011). U.S. Dollar Index. Retrieved from https://www.theice.com/publicdocs/ICE_USDX_Brochure.pdf.

ANLY482 AY2017-18T2 Group06 Project Overview

Contents

MOTIVATION

OBJECTIVES

METHODOLOGY

Exploratory Segment

Iterative Segment

DATA

Data Source

SCOPE OF WORK

REFERENCES

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools